Wednesday, October 22, 2014

Programming by Voice: Staying Productive without Harming Yourself

http://www.extrahop.com/post/blog/programming-by-voice-staying-productive-without-harming-yourself

One of the reasons I love working at ExtraHop is the lack of meetings and abundance of uninterrupted development time. However, I quickly found after starting that I was unaccustomed to coding for such long periods. A few weeks after I started at ExtraHop, I began to develop discomfort in my wrists and forearms. I have had intermittent trouble with this in the past, but limiting my computer usage at home in the evenings had always been enough to previously solve it. This time, however, was different.
As a very recent college graduate, I was concerned that my daily work activities could be causing permanent injury. I started looking into ergonomic keyboards and mice, hoping to find a cure-all solution. As you might have guessed, I did not find a magical solution, and my situation worsened with each passing week.
While the discomfort was frustrating, I was much more concerned that the injury was preventing me from being able to quickly and easily create and communicate at work and at home.

An Introduction to a Solution

After trying and abandoning several other solutions, a coworker of mine at ExtraHop showed me a PyCon talk by Tavis Rudd, a developer who programs by using his voice. At first, I was skeptical that this solution would be reliable and productive. However, after watching the video, I was convinced that voice input was a compelling option for programmers. Rudd suffered from a similar injury, and he had gone through all of the same investigations that I had, finally determining that a fancy keyboard wasn’t enough to fix it.
That night, I scoured the Internet for people who programmed by voice, looking for tips or tutorials. They were few and far between, and many people claimed that it was impossible. Not easily deterred, I started to piece together a toolkit that would allow me to program by voice on a Linux machine.

Configuration: The Hard Part

It was immediately clear that Dragon NaturallySpeaking was the only option for dictation software. Their product was miles ahead of others in voice recognition, but it only ran on Windows or Mac. Unfortunately I was never successful running Dragon NaturallySpeaking in Wine and had to settle for running in a Windows VM and proxying the commands to the Linux host.
I will leave out some of the configuration steps that I went through in this post. You can find detailed instructions on how to get everything up and running on my GitHub repo.
If you are following along with the instructions, you should now be able to send dictation and the example command to your Linux host, but that will not get you very far with programming. I ended up spending most of the next two weeks writing grammars. The majority of the process was:
  1. Attempt to perform a task (programming, switching windows, etc).
  2. Write a command that would let me do this by voice.
  3. Test that command and add related commands.
  4. Repeat.
The process was slow going, I am hopeful that the repository I linked will help you avoid starting from scratch. Even after using this for about a month, I am still tweaking my commands a couple times a day. Tavis Rudd claims to have over 2000 custom commands, which means that I must still have a long way to go.

The Results

Like Rudd explained in his talk, the microphone is a critical link in this setup. A good microphone that hears only you will make a big difference in both accuracy and speed of recognition. I really like the Yeti from Blue that I am using, but I can generally only use it if the office is mostly quiet.
With the commands I have created so far, I can switch between windows, navigate the web (with the help of Vimium), switch between workspaces, and, most importantly, I can program in Python and Go with decent speed. It is not quite as fast as programming with a keyboard, but it is surprisingly efficient once you learn the commands.
The grammars I have shared in the above GitHub repository are specific to what I need in my workflow. I recommend that you use them as a starting point, while keeping in mind that the computer may recognize words differently for you than it does for me. These grammars are also specific to the languages I use most often. Please don’t hesitate to write ones for your favorite languages. And finally, look for my .vimrc file in my dotfiles repository to find the custom shortcuts that the voice commands trigger.
Coding by voice is not perfect, but it has reached a point where it is a practical option. Don’t continue suffering through wrist and arm discomfort when there is an alternative. Feel free to send me a pull request and we can continue making voice programming better for everyone.

No comments:

Post a Comment