Susan Kuchinskas explores how automakers and tech vendors are making voice commands as intuitive as speaking
In 2010, Harris' Autotechcast found that 35 percent of consumers would be likely to adopt voice-activated systems in their vehicles.
With rollouts or announcements from more than a dozen OEMs, voice recognition is moving from gee-whiz to de rigeuer. (For TU’s previous take this technology, see Telematics and speech recognition: Finally ready for prime time?)
Voice-recognition technology is making waves in the news thanks to Apple's release of the iPhone with Siri, an "intelligent personal assistant" that responds to natural-language voice commands. Theoretically, a Siri-equipped phone might be all you need to find nearby restaurants, check traffic and navigate to your destination.
Of course, Android phones also allow drivers to speak their destinations, while Google voice search allows for keyword search on the phone. But the iPhone release is certainly more bad news for makers of PNDs as well as embedded systems. As smartphones get smarter, paying extra for onboard systems may seem less enticing to consumers.
Nevertheless, the auto industry continues to push forward on embedded voice recognition, whether in-car connectivity is built in or brought in. They say that the car brings special challenges that may not be addressed at the smartphone level.
One issue is all the background noise. Today's automotive voice-recognition systems address this in part by limiting the commands to a simple grammar.
Next-generation systems under development are moving toward natural-language commands.
Just talk naturally
"Today, I can say, ‘Phone, Javier, on mobile,'" explains Brian Radloff, director of solutions architects, automotive, for Nuance Communications.
"Going forward, we are starting to introduce more open grammar and allowing more free-form input." For example, you'd be able to say something like, ‘I'd like to get Javier on his mobile.’
Natural speech interfaces are much more involved than just recognizing a specific word from the car control grammar.
Says Radloff, "I'd like to be able to get a novice user in the car and be able to listen to them, interpret what they want and execute it. Within the next few years, we will begin to see this."
He adds that the technology already exists in Nuance's lab, but the company is doing some further refinements and testing in order to make sure it works for users.
"We have algorithms, and a basic voice-recognition system that is trained across dialects and user groups, plus some algorithms that help it adapt to your specific voice," Radloff says.
The current system, introduced on MyFord Touch, can differentiate between different drivers.
The system needs two or three utterances in order to identify a known speaker or revert to the baseline and begin adapting to a new voice.
Nuance also powers voice recognition for Toyota, BMW, Hyundai, Volkswagen and others.
The task is more difficult because of today's multicultural world, according to Radloff. For example, there are strong cultural differences in the way people say commands.
"In the past, when you were developing a grammar-based system that understood specific words and commands, you could localize very quickly for each market," he says.
"When you start with natural speech commands, there are a lot of cultural characteristics that make the application much more difficult to deploy."
Another issue is changes in language. For example, a German driver may also speak French and English. As she travels from Germany to France, she switches to French when speaking addresses into the navigation system.
The task is harder still in countries that use mixed alphabets. "People use SYNC most to call names in their phone book," says Brigitt Richardson, a Ford voice control engineer.
"It's common for people in China to have both Chinese and Latin characters in their phone books."
While their voice recognition software is set to Chinese, they may have European or American friends or colleagues and listen to music in a mix of language, she points out.
You can't just tell the driver, ‘I'm sorry, you're in China, so you can't call someone named Edwin.’
To solve this problem, Ford is looking at sophisticated algorithms to create a grammar, or database of recognized words, that uses phonetics to accept both Latin and Chinese characters.
This approach is an example of the way Ford engineers work closely with Nuance engineers to obtain a differentiated voice recognition system, instead of simply licensing the software or creating a spec and letting the vendor deliver on it.
"In the past, you'd work with a navigation provider and say, 'We want voice recognition capabilities,' and maybe you would have to negotiate or they would make all the decisions," Richardson says.
“We took more ownership in designing how the user interface works. It worked wonderfully.”
Going to the cloud
Ford SYNC includes Microsoft Tellme’s voice-activated Traffic, Directions and Information (TDI) system for off-board services and Nuance's voice recognition for its embedded system.
When a Ford driver pushes a SYNC button, she's talking to Nuance's voice recognition engine. The embedded system then makes a phone call to Microsoft Tellme. When the driver is accessing all the information Tellme provides, she's talking to Tellme's voice recognition in the cloud.
"Everybody seems to be focusing on off-board," notes Richardson. But for cloud-based applications, there's always a tradeoff between latency and quality, she says. "Can they provide a quality response in a sufficient amount of time? You can get a better result when you give it more time."
The next big step forward will be using voice recognition to control off-board or connected applications, Radloff says. Doing things like updating your Facebook status or controlling your Pandora stations by voice, as well as searching for a point of interest and adding it to your maps database.
When voice recognition is computed on remote servers, there can be a significant lag that makes it unsuitable for safety and security applications. But that's not a problem for most infotainment.
For example, Rovi provides a database of metadata and rich media for music, television, movies and celebrities, along with discovery tools. Rovi Cloud Services lets developers build apps that can access the metadata by connecting to remote servers.
"You can say, 'King of Pop,' and the service will play Michael Jackson, or you could say, 'Who influenced Janet Jackson?'" says Woody Deguchi, vice president of sales for Rovi.
“As people get used to voice recognition, our database has a huge benefit.”
For this kind of voice control, the technical challenges have more or less been resolved, Radloff says.
The main challenges are from a business development perspective. "You have the big OEMs, you have the mobile carriers, and you have the application providers," Radloff notes.
“That is quite an ecosystem and, because of the large carriers and the OEMs, it's quite complex.” These deals have been driven by the automakers, according to Radloff, because they want to be very careful about the user interface and what's brought into the car.
Susan Kuchinskas is a regular contributor to TU.
For more all the latest telematics trends, join the sector’s other key players at Telematics Munich 2011 on November 9-10.
Read TU’s report Smart Vehicle Technology: The Future of Insurance Telematics for exclusive business insights into the global UBI market.
Siegfried Mortkowitz chronicles the tentative growth of the Russian telematics market as the country prepares for the implementation of its nationwide emergency response system, ERA-GLONASS
Jessica Royer Ocken explores how greater in-car connectivity can lead to greater revenues
Pascal de Mul, global head of hardware partnerships at Spotify, on standardization of in-vehicle infotainment platforms and bringing 20 million music tracks to the car in a safe and non-distracting way
Steven H. Bayless, senior director, telecommunications and telematics at the Intelligent Transportation Society (ITS) of America, on why a common platform for vehicle communications will provide more opportunity for the industry than individual OEM solutions
Crispin Moger, managing director of the Marmalade Group of Companies, on targeting usage-based insurance to an underserved audience
Arvin Baalu, director of automotive engineering & product delivery, Harman International (India) Pvt. Ltd., on the importance of timing and bold decision-making when designing infotainment platforms for emerging markets