For a long time, touchless voice control was considered the holy grail of mobile devices. We knew it would make the experience on our phones irresistible—for our users driving in the car, cooking in the kitchen, glued to the TV watching that last episode of Breaking Bad—or whatever other reason you have for not wanting to pick up your phone.
- Everyone is different—especially in the way they speak. We set out to build a deeply personal, super useful phone experience for our users—and we quickly realized that for a device to recognize a voice command, we had to consider all of the different dialects across the world that our phones support (English, Spanish, Portuguese, French, German, and Italian). For every language, we gathered 100-200 people of various genders and regional accents to help us build a reference dictionary. This dictionary would provide the foundation for the technology. Then we tested the software against audio clips from TV shows. The last step was ruthless trial and error until we could refine the product to work for everyone. It turns out, to make something truly personal, you do have to make it work for everyone!
- Toys can yield insight. Touchless Control was the first hands-free voice control ever offered on a mobile phone. So during the development phase we tried to find products in other industries that used similar voice command technology. We found two, both in the toy industry: a blue, one-eyed Alien alarm clock that could listen to you speak and a little Christmas ornament sold on TV that would automatically play holiday songs when you called out to it. A good toy provides endless joy.
- It’s all in the trigger. The trigger—or the “OK Google Now” phrase that activates Touchless Control—is a crucial cornerstone in getting always-on software to work. It had to be very distinctive, so it wouldn’t be confused with other ubiquitous phrases used in everyday conversation, and it had to be easy for the device to pick up. In linguistic terms, this meant the trigger had to be at least four syllables (ideally at least five) and include unique sounds. We worked with several professional linguists and tested dozens of potential triggers. Some random words like “magic” and “genie” performed really well. But ultimately “OK Google Now” worked the best and made sense for our Android-based software.
- Some women tend to say their w’s with an h. We didn’t even hear it ourselves at first, really. But when we looked into the audio data, sure enough, we learned that a significant number English-speaking women tend to begin their w sound with an h. Quite elegant.