Upload
jeffrey-funk
View
1.386
Download
1
Embed Size (px)
Citation preview
By: Ma Jie (A0129447X)Niu Rui (A0040287J)Nguyen Gia Huy (A0045581E)Liu Lili (A0132407R)Tan Gee Kwang (A0147159X)
Speech Recognition: Ready to Take Off?
Overview
• Siri
• Other applicationsPerformance
of SR
• Underlying technologySR improvement
• Avionics
• Field AutomationEmerging
Application
Overview
• Siri
• Other applicationsPerformance
of SR
• Underlying technologySR improvement
• Avionics
• Field AutomationEmerging
Application
In 2013, Intelligent Voice survey showed that only 15% of respondents said that they had used Siri in iOS7. Nearly half believed Apple had “oversold Siri’s voice recognition capabilities”
2015 WWDC, Apple’s software engineering vice president claimed that Siri Gets 1 Billion Requests a Week
Performance of Siri
Doing Basic Math faster Find facts two times faster Four Time faster than you to set alarms Tweets more than two times faster than you Convert measurements
Siri Usage Rate Detail and Customer Satisfaction
Source: http://www.imore.com/siri-months-community-report-card
15%
36%
10%
20%
12%
7%
Do you use Siri on your iOS device?
Yes, and I like it
Yes, but it could be better
Yes, and I'm neutral
No: tried it and didn't like it
No: I didn't even try because I have no desire
Other
Source: http://www.besttechie.com/2013/03/07/do-people-still-use-siri/
Performance of Siri
Apple claims that iOS 9, Siriwill be up to 40 percent faster and 40 percent more accurate
What has hold it back?1. There is learning curve. 2. It’s far from perfect3. The use cases are limited4. Lack of integration of third-party apps
Speech Recognition Market
Source: Matt M., Joshua S., and David H. 2014. Dynamic Commercialization Strategies for Disruptive Technologies: Evidence from the Speech Recognition Industry
In past 50 years, the technological breakthroughs haven enabled the SR become reality.
Coupled with the advances in CPU power and enhanced software algorithms, SR had achieved steep improvement and commercial feasibility after 1990s.
Current Applications of SR
Applications in various industries
Call Centers
Medical Industries
Education
Automotive
Home Automation
Students with disabilities used a SR powered Hosted Transcription System (HTS) to convert digitized audio and video into accessible, Multimedia Transcripts
In 2011, 52% of Canadian disability service providers interviewed reported using speech to text supports
Strengthen by lowering WER
Problems:
– Scalability to meet temporal demands
– Fixed cost for infrastructure
SR in Educational – Liberated learning project (LLR)Quality
Cost
Source: http://www.transcribeyourclass.ca/financial.html
HIS Automotive: About 25% U.S. motorists use speech recognition in their cars dailyand 53% use it at least once a week; by 2020, 68 million vehicles worldwide will have voice controls, increased by 84% from 37 million in 2014.
SR in Automotive
Most SR in today’s market have about 50 to 60 voice commands
Common used features: Make calls, play music, temperature control, navigation.
More features available: Reminders, Send emails, search nearby restaurants/shops/petrol stations, real-time traffic conditions, connect to other SR control system (e.g. home automation)…
SR in Automotive
Nuance – Dragon Drive Platform
– Cloud-based voice and content solutions
– Integrated with in-vehicle cloud-based search capabilities from Telenav, leader of location-based services (Source: Telenav, Nov 3, 2015)
– Attractive features – Read out the daily update when enters the car, Connect the home to your car through LG HomeChat software
SR in Automotive
Video: https://www.youtube.com/watch?v=laxXWUxXcWs
Problems encountered with ASR in cars -
– Doesn’t recognize/misinterprets verbal commands (63 percent)
– Doesn’t recognize/misinterprets names/words (44 percent)
– Doesn’t recognize/misinterprets numbers (31 percent)
– Wind noise
– Language accents
– Imperfect speech recognition software might prove to be a distraction
SR in Automotive
SR in Home Automation
Smart home
– Lighting control (Vocca)
– TV (apple TV)
– Personal Assistant (Echo, Homey)
SR in Home Automation – Apple TV
The Apple TV uses Siri search as the glue that holds all those individual apps together. Voice commands (also found on Roku, Android TV and Amazon Fire TV) are easier than entering names on a virtual keyboard. And despite some rough edges, Siri is more helpful than the rest.
Siri’s advantage is more advanced queries.
Six degrees of Kevin Bacon
Filter TV episodes by actors
Rewind
Siri’s limitation:
Pronunciation of difficult names
TV show recognition by genres
Source: http://www.wsj.com/articles/apple-tv-review-a-giant-iphone-for-your-living-room-1446080460
The TV of the future needs to be as powerful and easy to use as an iPhone, and this Apple TV is the first box—and the first Apple TV—to achieve that.
Amazon Echo – launch in November 6, 2014 Limited and June 23, 2015 Wide
Can answer general questions, reorder the items you buy frequently from Amazon, and play music
SR in Home Automation
Source: http://www.amazon.com/Amazon-SK705DI-Echo/dp/B00X4WHP5E/ref=sr_1_1?ie=UTF8&qid=1446173814&sr=8-1&keywords=amazon+echo
Source: http://www.cnet.com/products/amazon-echo-review/
Apple's HomeKit
– A framework for communicating with and controlling connected accessories in a user’s home, announced in Apple WWDC 2014.
SR in Home Automation
HomeKit-certified devicesecobee3 Use sensors and a thermostat to keep tabs on your home’s temp.
ElgatoA variety of Elgato’s Eve sensors will give you all kinds of information about what’s going on inside your home. (Door & Window, Energy, Weather, Room)
iHome Connect ordinary devices into the smart plug, and you can start controlling them with your phone.Insteon The company’s hub can control all its products, including lights and locks, even from outside your home.Lutron Control your lights and shades with its bridges and kits.
iDevicesPlug anything into the company’s indoor or outdoor switch to make the device smart, and control your climate with the thermostat.
Schlage You’ll be able to ask Siri to lock and unlock your door.
AugustThe smart lock company announced a doorbell camera and keypad to its lineup, but it’s just the new lock that works with Siri for now.
Coming Plugs, Thermostats (Honeywell Lyric), Lighting (Philips), Alarm System (Honeywell Lynx Security System)
PartnershipsChamberlain MA Garage, Cree, Friday Smart Lock, GE (color-changing LEDs), Haier (smart air-conditioner), Incipio, Kwikset, Netatmo, Osram Sylvania, Philips Hue, SkyBell, Withings (baby monitors)
Source: http://www.digitaltrends.com/home/a-list-of-apple-homekit-compatible-devices/
Total price: US$2000
SR in Home Automation
Source: http://publications.lib.chalmers.se/records/fulltext/203117/203117.pdf
Most common used features
Other features that users would like
There is user base for SR (doctors, drivers, smart phone users…)
But the fact is that most of the customers only tried few times or use basic commands for SR when they have to (driving, busy hands, etc.)
Why?
– SR doesn’t recognize the complicated commands, which offers limitations to the features
– SR reacts very slow
– Takes time to train it
– Interaction with SR is not natural; words must be clear and without emotion
– Bad first impression, no interest to try even SR is improving
Summary of Challenges in SR
Customers don’t think that using SR is necessary in their daily life!
Overview
• Siri
• Other applicationsPerformance
of SR
• Underlying technologySR improvement
• Avionics
• Field AutomationEmerging
Application
ComponentsRequirementsDimension
SpeedProcess the algorithms
Processor
Underlying Technology of Speech Recognition
Source: http://web.sfc.keio.ac.jp/~rdv/keio/sfc/teaching/architecture/architecture-2008/lec07-cache.html
AchievementsRequirementsDimension
Accuracy
Quality of Signal Receive
Background noise
elimination
Channel effect elimination
Acoustic scoring
Deep Learning
Acoustic database
Language Matching
Modelling
Language database
Underlying Technology of Speech Recognition
AchievementsRequirementsDimension
Accuracy
Quality of Signal Receive
Background noise
elimination
Channel effect elimination
Acoustic scoring
Deep Learning
Acoustic database
Language Matching
Modelling
Language database
Underlying Technology of Speech Recognition
Microphone
Components
AchievementsRequirementsDimension
Accuracy
Quality of Signal Receive
Background noise
elimination
Channel effect elimination
Acoustic scoring
Deep Learning
Acoustic database
Language Matching
Modelling
Language database
Underlying Technology of Speech Recognition
Memory
Components
• Speech Recognition needs support from data base which can be local or in Cloud.
• Performance of memory is far behind processor, bottleneck of SRS is memory speed (network speed if with Cloud)
Source: http://web.sfc.keio.ac.jp/~rdv/keio/sfc/teaching/architecture/architecture-2008/lec07-cache.html
AchievementsRequirementsDimension
Accuracy
Quality of Signal Receive
Background noise
elimination
Channel effect elimination
Acoustic scoring
Deep Learning
Acoustic database
Language Matching
Modelling
Language database
Underlying Technology of Speech Recognition
Algorithms
Components
Noise Elimination Algorithm Performance• Noise has two main effects over the speech representation: distortion in the
representation space, and a loss of information. • Study shows that noise compensation methods will help to improve the accuracy in
different SNR (signal noise ratio) levels and distances
Source: Angel de la T. et al. Speech Recognition Under Noise Conditions: Compensation Methods
Source: Pedro J. Moreno, 1996, Speech Recognition in Noisy Environments
Speakers may have different accents, dialects, or pronunciations, and speak in different styles, at different rates, and in different emotional states.
Deep learning, introduced in 2006, attempt to learn multiple levels of representation of increasing complexity/abstraction.
A new architecture, the deep belief network (DBN)-HMM, has been developed in 2012.
Deep Learning
Idea was started from 1970s, but the progress is very slow -> Computational and data limitations
Deep learning - one step closer to artificial intelligence
Deep Learning
More data Faster hardware
Word error rate (WER) for SR technology in automotive has been reduced to below <1%
Accuracy of SR
Source: http://whatsnext.nuance.com/in-the-labs/deep-learning-in-connected-cars/
Overall WER improvement for SR
Accuracy of SR
Source: http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/
Accuracy of SR According to Baidu, their error rates in a clean environment were at 6.56% and
19.06% in noisy environments by using GPUs
Apple claims that Siri in iOS 9 has only a 5% word error rate
Siri in iOS 9 requests to teach Siri your voice whenever change to a new language
Source: NVIDIA GTC: The Race To Perfect Voice Recognition Using GPUs
TARGET: < 0.1% or even 0%
How will SR improve further?
Customers don’t think that using SR is necessary in their daily life!
BUT IF –SR is faster and smarter to understand the commands, with more features available
Customers might start thinking: Why not try SR?
For example: Ability to recognize multilingual content, direct link to third-party apps, allow multi-users to interact at the same time…
So, when will SR like Siri be able to widely used by customers?
2020 to 2025– Improvement of Deep Learning (Apple has just acquired VocalIQ in Oct, 2015) for
more intelligent algorithm
– Improvement of Big data, multiple channels to enhance data base used in modeling for higher accuracy
– Improvement of Mobile network, faster response for better customer experience
– With diffusion of smart devices and apps, new customers will get more chance to accept SR before old hobby formed
– Potential new standard of human-machine interface
– Cost will be reduced further with core components improvement
How will SR improve further?
Speech Recognition: Future Market Trend Voice will be the most important area for growth in mobile user interfaces
Tractica forecasts the growth rate for SR: reach $5.1 billion by 2024 at a CAGR of 40%
Strongest market - Consumer-facing market: Mobile device authentication and control of wearable devices
Global Automotive Voice Recognition Market 2014-2018 forecasts the automotive voice recognition sector to grow at 10.59% CAGR to 2018
Speech Recognition: Future Market Trend SR market in Automotive
Market for Home automation
– Annual growth rate can reach 67% over next 5 years
– Revenue arrives $61billion with 52% compound annual growth rate, forecast the value can reach $490 billion in 2019
Speech Recognition: Future Market Trend
Overview
• Siri
• Other applicationsPerformance
of SR
• Underlying technologySR improvement
• Avionics
• Field AutomationEmerging
Application
SR in Avionics - Head-in and Head-out in cockpit
Multi-function displays with menu structures many tiers deep
Pilot needs one hand on collective while the other one on the joystick
SR in Avionics
Speech recognition reduce workload and free hands for pilots.
With increment of head up time, pilot can focus on flying the aircraft and response to out environment.
Noise elimination and integration with onboard system
http://www.speech.sri.com/press/airforce-print-news-oct15-2007.pdf http://www.gizmag.com/go/7484/
Navigation Functions
• Entering waypoints and inputting FMS data
• Reduce confusion
Communication Functions
• Change frequencies of channel by voice control
• Query system by “asking”
Checklist
• Task list
• Avionics monitor
Safety and security are roadblocks for SR adoption in avionics
Entry level functions with low safety concerns
SR in Avionics
SR Deployment in Avionics
2000 2007 2008 2014 2015Typhoon Gazelle F-35 & F-22 Sferion Assistance System
Direct input voice system Speaker- independent system
Start in civil avionics
Pro Line Fusion flight deck
It is not a technology problem, but more of an acceptance problem.
Air transport will accept after SR product actually comes out and proves its value
SR Commercialization in Avionics
"We've hit our sweet spot finally and its gotten to the point where its getting very, very close to being product ready in terms of being mature enough to get out there."
- Geoff Shapiro from Rockwell CollinesResource: http://www.aviationtoday.com/av/topstories/Rockwell-Collins-Rapidly-Advancing-Cockpit-Voice-Recognition-Technology_83515.html#.Vjm710b0wTY
SR in Field Automation
Equipment inspection in the field by using portable devices embedded with speech recognition system
Enter data faster and reduce the cost
Source: https://www.earthworksaction.org/issues/detail/oil_and_gas_noise#.Vh_SNN-qpBchttp://www.ehjournal.net/content/14/1/18
SR in Field Automation
Noise level is very high thus noise elimination will be more challenging
Robot designed for dedicate functions can only receive pre-defined instruction
Low request for noise elimination, process and memory
SR in Personal Robot for Family
Artificial Intelligence – Key technology for future improvement of SR
We should “talk” rather than type
Artificial Intelligence should be deployed in any complex environment with capacity to understand the instruction
High request for noise elimination, process and memory
SR in the future – everywhere in your life
Driving in the car Shopping in the mall Eating in the canteen
Q&A