49
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Nikko Strom, Sr. Principal Scientist Arpit Gupta, Scientist November 30, 2016 Deep Learning in Alexa MAC202

AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Embed Size (px)

Citation preview

Page 1: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Nikko Strom, Sr. Principal Scientist

Arpit Gupta, Scientist

November 30, 2016

Deep Learning in AlexaMAC202

Page 2: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Outline

• History of Deep Learning

• Deep Learning in Alexa

• The Alexa Skills Kit

Page 3: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Intense academic

activity“Neural winter” The “GPU era"

History of Deep Learning

1986 1998 2007 20162014

Amazon

Echo

launches!

Hinton, Rumelhart

and Williams invent

backpropagation

training

Page 4: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Multilayer perceptron

input, x

output, y

“input layer”

“hidden layer”

“hidden layer”

“output layer”

h1 = sigmoid(A1x+b1)

h2 = sigmoid(A2h1+b2)

y = sigmoid(Aoh2+bo)

x

Page 5: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Mohamed, Dahl and Hinton beat a

well-known speech recognition

benchmark (TIMIT)

Neural winter

Deep Learning milestones

1986 1998 2009 2010 2016

Krizhevsky, Sutskever, and

Hinton win the ImageNet

object recognition challenge.

AlphaGo beats a Go

World Champion

Microsoft and Google

demonstrate breakthrough

results on large vocabulary

speech recognition.

Hinton, Rumelhart

and Williams Salakhutdinov and

Hinton discover a

method to train very

deep neural

networks.

2002 2011

LeCun, Bottou,

Bengio and Haffner

publish CNN for

Computer Vision

1997

Hochreiter and

Schmidthuber invent LSTM

for recurrent networks with

long memory.

Page 6: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Neural winter

Deep Learning in Speech Recognition

1986 1998 2009 2010 20162002 2011

Mohamed, Dahl and Hinton beat a

well-known speech recognition

benchmark (TIMIT)

Microsoft and Google

demonstrate breakthrough

results on large vocabulary

speech recognition.

‘96‘91 ‘92‘89

Waibel, Hanazawa,

Hinton, Shikano, and

Lang publish time-

delay neural network

(TDNN).

Strom combines

time-delay NN and

RNN (RTDNN)

Strom introduces

speaker vectors for

speaker adaptation

Robinson demonstrates

RNN for ASR and get the

best result on TIMIT so far.

Bourlard, Morgan, Wooters and

Renals introduce context

dependent MLP models.

Page 7: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Impact of data corpus size

= 140,160 hours16 years

≈14,016 hours of speech

Page 8: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Neural winter

Impact of data corpus size

8800 GTX

350 GFLOPS

1986 1998 2007 2016

Page 9: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Neural winter

Impact of compute capacity

Cray X-MP/48

1986

1 GFLOPS

8800 GTX

350 GFLOPS

p2.16xlarge

23 TFLOPS

(70 TFLOPS single)

cg1.4xlarge

1 TFLOPS

ASCI Red

1 TFLOPS

1986 1998 2007 2016

Sun Ultra 60

1 GFLOPS

Taihu

100 PFLOPSRoadrunner

1 PFLOPS

Page 10: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Neural winter

Impact of compute infrastructure

1986 1998 2007 2012 2016

Reign of EM

• During the “neural winter,” EM became a dominant distributed

computing paradigm for machine learning (ML)

• ML algorithms that use the EM algorithms benefited greatly

• Distributed SGD broke out Deep Learning from the single box

Distributed SGD

Strom Dean et al.

2015

Page 11: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Conclusion – how we got here

• Theory and algorithm design in the 80s and 90s

• Orders of magnitude more data available

• Orders of magnitude more computational capacity

• A few algorithmic inventions enabled deep networks

• The rise of distributed SGD training

We are in a period of massive Deep Learning adoption because:

Page 12: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Deep Learning in Alexa

Page 13: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Large-scale distributed training

Up to 80 EC2 g2.2xlarge GPU

instances working in sync to train

a model

Thousands of

hours of speech

training data stored

in Amazon S3

Page 14: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Large-scale distributed training

All nodes must communicate

updates to the model to all

other nodes.

GPUs compute model

updates fast – Think updates

per second

A model update is hundreds

of MB

Page 15: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

0

100,000

200,000

300,000

400,000

500,000

600,000

0 20 40 60 80

Fra

mes p

er

second

Number of GPU workers

DNN training speed

Strom, Nikko. "Scalable Distributed DNN Training using Commodity GPU Cloud Computing." INTERSPEECH. Vol. 7. 2015.

Page 16: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Speech Recognition

Page 17: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Signal

processingAcoustic model

Decoder

(inference)

Post

processing

Feature

vectors

[4.7, 2.3, -1.4, …]

Phonetic

probabilities

[0.1, 0.1, 0.4, …]

Words

increase to 70 degrees

Text

Increase to 70⁰

Sound

Speech recognition

Page 18: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Transfer learning from English to German

Hidden layer 1

Hidden layer 2

Last hidden layer

æI ɑɜ ʊ … eæI ɑɜ u: … œ

Output layer

Page 19: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Natural Language

Understanding

Page 20: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Intent and entities

play two steps behind by def leppard

IntentPlayMusic

EntitiesSong Artist

Two problems:

1. Words are symbols – not vectors of numbers

2. Requests are of different lengths

Page 21: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
Page 22: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

PlayMusic

Recurrent Neural Networks

Recurrent

Network

play two steps behind by def leppard

Page 23: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Speech

synthesis

Page 24: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Speech synthesis

Text

Text normalization

Grapheme-to-phoneme conversion

Waveform generation

Speech

She has 20$ in her pocket.

she has twenty dollars in her pocket

ˈ ʃ i ˈ h æ z ˈ t w ɛ n . t i ˈ d ɑ . ɫ ə ɹ z ˈ ɪ n ˈ h ɝ ɹ ˈ p ɑ . k ə t

Page 25: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Concatenative synthesis

Di-phone

segment

database

Di-phone unit selection

SpeechInput

ˈ ʃ i ˈ h æ z ˈ t w ɛ n . t i ˈ d ɑ . ɫ ə ɹ z ˈ ɪ n ˈ h ɝ ɹ ˈ p ɑ . k ə t

Page 26: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Prosody for natural sounding reading

Bi-directional recurrent network

pitch duration

• Phonetic features

• Linguistic features

• Semantic word vectors

targets for segment

intensity

Page 27: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Long-form example

“Over a lunch of diet cokes and lobster salad one

balmy fall day in Boston, Joseph Martin, the

genial, white-haired, former dean of Harvard

medical school, told me how many hours of pain

education Harvard med students get during four

years of medical school.”

Before After

Page 28: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

The Alexa Skills Kit

Page 29: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

The Alexa Skills Kit

Alexa!

Customers DevelopersAlexa

Page 30: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Growth of Published Skills

0

1000

2000

3000

4000

March May July September

2016

Page 31: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Alexa Skills: Examples

Business: Uber, Dominos, Fidelity, Capital One, Home Advisor, 1-800

Flowers

Info: Washington Post, Campbell’s Kitchen, Boston Children’s Hospital,

Stocks, Bitcoin Price, History Buff, Savvy Consumer

Fitness: Fitbit, 7-Minute Workout

Automation: Nest, Garageio, Alarm.com, Scout Alarm

Misc: Quick Events, Phone Finder, Cat Facts, Famous Quotes

Games: Jeopardy!, Minesweeper, Word Master, Blackjack, Math Puzzles,

Guess Number, Spelling Bee

Page 32: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Customers

ASK for Developers

Alexa!

DevelopersAlexa

Page 33: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

ASK for Developers

• Define a Voice User Interface

• Provide a finite number of sample utterances

• ASK automatically builds and deploys

machine learning models

Page 34: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Developer Input

Page 35: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Model Build Workflow

DEVELOPER

Developer

Portal

Website

creates/edits

skill

Skill Model Builder

builds/uploads

skill models

reads

skill.json

writes

skill defnData

Store

Runtime

Cloud

Store

Page 36: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Model Building

Finite-state transducers (FSTs)

(exact match)

ML Entity Recognizer

ML Intent Recognizer

Developer Input

We build two models: FSTs are for exact matches,

machine learning models for fuzzy matches.

Page 37: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

ASK Machine Learning

ASK

Machine

Learning

Model

hey uhm i need a

car to starbucks

Training: Finite number

of sample utterances

MATCH TRAIN

Runtime: Infinite number of

possible utterances

DevelopersCustomers

get a car to <Destination>

get me a car

Page 38: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

• Neural Networks (NNs)

• Transfer Learning:

• Use knowledge learned from

large related training data

• Example: We’ve seen slots

like <Destination> before, no

need to learn from scratch.

get a car to <Destination>

get me a car

ASK Machine Learning (contd.)

Page 39: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

How to Write Great Skills

Slots• Catalogs: Provide as many values as possible.

Add representative values of different lengths where

appropriate

• Use built-in slots where possible

(e.g., cities, states, first names)

• Do not use too many slots in one utterance

(rather ask for missing slots in a dialog)

• Use context around each slot

Page 40: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

How to Write Great Skills

Intents• Split heterogeneous intents

• Use built-in intents where possible

• Provide as many carrier phrases as possible

• Use Thesaurus or paraphrasing tools, ask your friends or

mechanical turk for utterances

Page 41: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Conclusions

• ASK connects developers to customers

• Developers constantly extend Alexa’s capabilities

• We constantly get more data and improve experience

via machine learning

• Making Alexa more intelligent and powerful, bridging

the gap between human and machine

Page 42: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Thank you!

Page 43: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Remember to complete

your evaluations!

Page 44: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Related Sessions

Page 45: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
Page 46: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Images used

Glove vectors. Produced internally.

Page 47: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Images used

Macaw. Public domain. https://pixabay.com/en/macaw-bird-beak-parrot-650638/

VW. Free for editorial use. http://media.vw.com/images/category/11/

Page 48: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Images used

ASCI Red. Public domain. https://commons.wikimedia.org/wiki/File:Asci_red_-_tflop4m.jpeg

8800 GTX. Permission by email by Tri Hyunth at Nvidia.

Page 49: AWS re:Invent 2016: Deep Learning in Alexa (MAC202)

Images used

https://commons.wikimedia.org/wiki/File:President_Ronald_Reagan_addresses_Congress_in_1981.jpg

https://commons.wikimedia.org/wiki/File:President_George_W._Bush_(8003096992).jpg

https://commons.wikimedia.org/wiki/File:President_Obama_interview_January_27,_2009.jpg

https://commons.wikimedia.org/wiki/File:US_Navy_020828-N-1058W-

025_Former_U.S._President_George_H._W._Bush_congratulates_Sailor_aboard_USS_Harry_S._Truman_(CVN_75).jpg

https://commons.wikimedia.org/wiki/File:President_Clinton_speaks_on_tax_cut_deal.jpg