Nokia @ Skunkworkspeople.csail.mit.edu/ledlie/papers/skunkworks-feb10.pdf · Nokia @ Skunkworks...

Preview:

Citation preview

Nokia @ SkunkworksNokia Research Center, Cambridge, US and Nairobi, Kenya

Jonathan Ledlie, Billy Odero, Brian Omwenga, Einat Minkov, Imre Kiss, Joseph Polifroni, Jay Chen, Pauline Githinji, Mokeira Masita-Mwangi, Lucy Macharia, Jussi ImpioFebruary 2010

1

© 2010 Nokia 2010-Feb-9 / JTL

Outline

2

© 2010 Nokia 2010-Feb-9 / JTL

OutlineKenya Projects• Tafsiri (aka Crowd Translator) • Tangaza (“announce”)• SMS Find‣ Next talk: Jay Chen

2

© 2010 Nokia 2010-Feb-9 / JTL

OutlineKenya Projects• Tafsiri (aka Crowd Translator) • Tangaza (“announce”)• SMS Find‣ Next talk: Jay Chen

Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)

2

© 2010 Nokia 2010-Feb-9 / JTL

OutlineKenya Projects• Tafsiri (aka Crowd Translator) • Tangaza (“announce”)• SMS Find‣ Next talk: Jay Chen

Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)

Other Nokia Projects

2

© 2010 Nokia 2010-Feb-9 / JTL

OutlineKenya Projects• Tafsiri (aka Crowd Translator) • Tangaza (“announce”)• SMS Find‣ Next talk: Jay Chen

Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)

Other Nokia Projects

Interrupt!(please)

2

© 2010 Nokia 2010-Feb-9 / JTL

Tafsiri (1/2)

3

© 2010 Nokia 2010-Feb-9 / JTL

Tafsiri (1/2)Problem: have speech-enabled service in English• e.g. on-device or server-side mobile money transfer like MPESA• How do you translate (“localise”) it for e.g. Luo, or even Swahili

‣ Can’t purchase :-(

3

© 2010 Nokia 2010-Feb-9 / JTL

Tafsiri (1/2)Problem: have speech-enabled service in English• e.g. on-device or server-side mobile money transfer like MPESA• How do you translate (“localise”) it for e.g. Luo, or even Swahili

‣ Can’t purchase :-(

Our Approach: Tafsiri (aka Crowd Translator)• Cheaply create speech recognizer for local, low-corpus languages• Like Nathan Eagle’s txteagle, Amazon’s Mechanical Turk• Programmer standpoint‣ Like a “localisation” file

3

© 2010 Nokia 2010-Feb-9 / JTL

Tafsiri (2/2)Like Mechanical Turk: pay users for validated work (i.e. speech contributions)

MobileOperator

Worker

A B

30 min0 min

C D D’ ZA’E

0.86, 0.93, ..., ....

Users are not translating; Tafsiri translates (“localises”) speech service

4

© 2010 Nokia 2010-Feb-9 / JTL

Tafsiri (2/2)

1. User flashes CX• Gets callback

Like Mechanical Turk: pay users for validated work (i.e. speech contributions)

MobileOperator

Worker

A B

30 min0 min

C D D’ ZA’E

0.86, 0.93, ..., ....

Users are not translating; Tafsiri translates (“localises”) speech service

4

© 2010 Nokia 2010-Feb-9 / JTL

Tafsiri (2/2)

1. User flashes CX• Gets callback

Like Mechanical Turk: pay users for validated work (i.e. speech contributions)

MobileOperator

Worker

A B

30 min0 min

C D D’ ZA’E

0.86, 0.93, ..., ....

2. Selects his native language

Users are not translating; Tafsiri translates (“localises”) speech service

4

© 2010 Nokia 2010-Feb-9 / JTL

Tafsiri (2/2)

1. User flashes CX• Gets callback

Like Mechanical Turk: pay users for validated work (i.e. speech contributions)

MobileOperator

Worker

A B

30 min0 min

C D D’ ZA’E

0.86, 0.93, ..., ....3. Mimics voice prompts• “new transaction”

2. Selects his native language

Users are not translating; Tafsiri translates (“localises”) speech service

4

© 2010 Nokia 2010-Feb-9 / JTL

Tafsiri (2/2)

1. User flashes CX• Gets callback

Like Mechanical Turk: pay users for validated work (i.e. speech contributions)

MobileOperator

Worker

A B

30 min0 min

C D D’ ZA’E

0.86, 0.93, ..., ....3. Mimics voice prompts• “new transaction”

4. Automatic verification

2. Selects his native language

Users are not translating; Tafsiri translates (“localises”) speech service

4

© 2010 Nokia 2010-Feb-9 / JTL

Tafsiri (2/2)

1. User flashes CX• Gets callback

Like Mechanical Turk: pay users for validated work (i.e. speech contributions)

MobileOperator

Worker

A B

30 min0 min

C D D’ ZA’E

0.86, 0.93, ..., ....3. Mimics voice prompts• “new transaction”

4. Automatic verification

2. Selects his native language

5. Payment

Users are not translating; Tafsiri translates (“localises”) speech service

4

© 2010 Nokia 2010-Feb-9 / JTL

OutlineKenya Projects• Tafsiri• Tangaza• SMS Find‣ Next talk: Jay Chen

Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)

Other Nokia Projects

5

© 2010 Nokia 2010-Feb-9 / JTL

Tangaza (1/2)

6

© 2010 Nokia 2010-Feb-9 / JTL

Tangaza (1/2)Tangaza (“announce” in Swahili)• Send voice messages to friends, family, and groups‣ e.g., Nairobi taxi drivers, tomato farmers in Uganda

• “Twitter” (social net, status updates) for users with basic phones

6

© 2010 Nokia 2010-Feb-9 / JTL

Tangaza (1/2)Tangaza (“announce” in Swahili)• Send voice messages to friends, family, and groups‣ e.g., Nairobi taxi drivers, tomato farmers in Uganda

• “Twitter” (social net, status updates) for users with basic phones

Problem• Create a useable group speech and text messaging service• Be nice if everyone would just upgrade to my S40 application• Instead make it reasonable on all GSM phones

6

© 2010 Nokia 2010-Feb-9 / JTL

Tangaza (1/2)Tangaza (“announce” in Swahili)• Send voice messages to friends, family, and groups‣ e.g., Nairobi taxi drivers, tomato farmers in Uganda

• “Twitter” (social net, status updates) for users with basic phones

Problem• Create a useable group speech and text messaging service• Be nice if everyone would just upgrade to my S40 application• Instead make it reasonable on all GSM phones

How it works• Text messages for creating and manipulating groups‣ “create group slot”‣ create #skunkworks 1 or join #skunkworks 1

• Call in to create spoken messages‣ Skunkworks spoken messages now linked with slot 1‣ Also listen and forward spoken messages

6

© 2010 Nokia 2010-Feb-9 / JTL

Tangaza (2/2)

7

© 2010 Nokia 2010-Feb-9 / JTL

Tangaza (2/2)Pilots• Strathmore‣ They use Facebook‣ Since November

• Huruma‣ They don’t‣ Much more Ksh conscious‣ Fewer snazzy phones‣ Starting this week‣ Nokia subsidizing, doesn’t scale

7

© 2010 Nokia 2010-Feb-9 / JTL

Tangaza (2/2)Pilots• Strathmore‣ They use Facebook‣ Since November

• Huruma‣ They don’t‣ Much more Ksh conscious‣ Fewer snazzy phones‣ Starting this week‣ Nokia subsidizing, doesn’t scale

Eventual Goals...?• Link with on-phone premium version, free minutes• 98% without new device could use Tangaza vanilla

7

© 2010 Nokia 2010-Feb-9 / JTL

OutlineKenya Projects• Tafsiri• Tangaza• SMS Find‣ Next talk: Jay Chen

Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)

Other Nokia Projects

8

© 2010 Nokia 2010-Feb-9 / JTL

Indoor Localization / Positioning (1/2)

0xa3b

0x6d2

0xbc4

0x5fe

0xa3b

0x6d20xbc40x5fe

9

© 2010 Nokia 2010-Feb-9 / JTL

Indoor Localization / Positioning (1/2)Main idea: Build mapping from RF to location

0xa3b

0x6d2

0xbc4

0x5fe

0xa3b

0x6d20xbc40x5fe

9

© 2010 Nokia 2010-Feb-9 / JTL

Indoor Localization / Positioning (1/2)Main idea: Build mapping from RF to location

Two methods:• Both take ambient RF sources‣ wi-fi, cell towers‣ use GPS hints if available 0xa3b

0x6d2

0xbc4

0x5fe

0xa3b

0x6d20xbc40x5fe

9

© 2010 Nokia 2010-Feb-9 / JTL

Indoor Localization / Positioning (1/2)Main idea: Build mapping from RF to location

Two methods:• Both take ambient RF sources‣ wi-fi, cell towers‣ use GPS hints if available

Method #1• Survey: DB of guesses of RF source x,y(,z)• Use: Triangulate observed RF sources• Works outdoors, more scalable than #2

0xa3b

0x6d2

0xbc4

0x5fe

0xa3b

0x6d20xbc40x5fe

9

© 2010 Nokia 2010-Feb-9 / JTL

Indoor Localization / Positioning (1/2)Main idea: Build mapping from RF to location

Two methods:• Both take ambient RF sources‣ wi-fi, cell towers‣ use GPS hints if available

Method #1• Survey: DB of guesses of RF source x,y(,z)• Use: Triangulate observed RF sources• Works outdoors, more scalable than #2

Method #2• Scrap method #1 b/c poor accuracy indoors due to multipath• Survey: map RF signatures (aka fingerprints) to spaces• Use: lookup closest signature‣ We assume the granularity of a space is a medium-sized room

0xa3b

0x6d2

0xbc4

0x5fe

0xa3b

0x6d20xbc40x5fe

9

© 2010 Nokia 2010-Feb-9 / JTL

Indoor Localization/Positioning (2/2)

10

© 2010 Nokia 2010-Feb-9 / JTL

Indoor Localization/Positioning (2/2)Problem with both methods• Expert surveyor is costly and must be repeated

10

© 2010 Nokia 2010-Feb-9 / JTL

Indoor Localization/Positioning (2/2)Problem with both methods• Expert surveyor is costly and must be repeated

Our Approach: Users become Surveyors• Organic collection and update

10

© 2010 Nokia 2010-Feb-9 / JTL

Indoor Localization/Positioning (2/2)Problem with both methods• Expert surveyor is costly and must be repeated

Our Approach: Users become Surveyors• Organic collection and update

Challenges• Erroneous data• Malicious data• Knowing what you don’t know• Motivation?• Applications?

10

© 2010 Nokia 2010-Feb-9 / JTL

OutlineKenya Projects• Tangaza (“announce”)• Tafsiri (aka Crowd Translator)• SMS Find‣ Next talk: Jay Chen

Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)

Other Nokia Projects

11

© 2010 Nokia 2010-Feb-9 / JTL

Other Nokia Projects

12

© 2010 Nokia 2010-Feb-9 / JTL

Other Nokia ProjectsNokia Money• Currently piloting in a state in India• Client like MPESA; outlets are Nokia shops; real bank• Interesting w.r.t. operator, bank, regulatory/gov

12

© 2010 Nokia 2010-Feb-9 / JTL

Other Nokia ProjectsNokia Money• Currently piloting in a state in India• Client like MPESA; outlets are Nokia shops; real bank• Interesting w.r.t. operator, bank, regulatory/gov

“Discover Maemo” in Kenya• High-end mobile programming: N900• ~July; two days; part of MIT AITI @ local univ• Details from King’ori in May

12

© 2010 Nokia 2010-Feb-9 / JTL

Other Nokia ProjectsNokia Money• Currently piloting in a state in India• Client like MPESA; outlets are Nokia shops; real bank• Interesting w.r.t. operator, bank, regulatory/gov

“Discover Maemo” in Kenya• High-end mobile programming: N900• ~July; two days; part of MIT AITI @ local univ• Details from King’ori in May

$1m App Competition• Requirement: helps people living on less than $5 a day• Deadline: April 18• callingallinnovators.com

12

© 2010 Nokia 2010-Feb-9 / JTL

The End

Thanks!Jonathan Ledlie

jonathan.ledlie@nokia.com

13

© 2010 Nokia 2010-Feb-9 / JTL

Extra Slides

14

© 2010 Nokia 2010-Feb-9 / JTL

Our Approach(a) Make Canonical Recordings (c) Verify Input

(b) Gather User Input (d) Expand Corpus

English Swahili Gold Std.Utterance

carboatplane

...

garimashuandege

...

“gari”“mashua”“ndege”

“...”

15

© 2010 Nokia 2010-Feb-9 / JTL

Our Approach(a) Make Canonical Recordings (c) Verify Input

(b) Gather User Input (d) Expand Corpus

English Swahili Gold Std.Utterance

carboatplane

...

garimashuandege

...

“gari”“mashua”“ndege”

“...”

Prompt User1 Utterance

garig

ndegeg

mashuag

...ggarig

...gmashuag

gari1

ndege1

mashua1

...1gari1′

...1mashua1′

15

© 2010 Nokia 2010-Feb-9 / JTL

Our Approach(a) Make Canonical Recordings (c) Verify Input

(b) Gather User Input (d) Expand Corpus

English Swahili Gold Std.Utterance

carboatplane

...

garimashuandege

...

“gari”“mashua”“ndege”

“...”

Prompt User1 Utterance

garig

ndegeg

mashuag

...ggarig

...gmashuag

gari1

ndege1

mashua1

...1gari1′

...1mashua1′

Intra-session Agreement?

gari1 ⋲ gari1′mashua1 ⋲ mashua1′

15

© 2010 Nokia 2010-Feb-9 / JTL

Our Approach(a) Make Canonical Recordings (c) Verify Input

(b) Gather User Input (d) Expand Corpus

English Swahili Gold Std.Utterance

carboatplane

...

garimashuandege

...

“gari”“mashua”“ndege”

“...”

Prompt User1 Utterance

garig

ndegeg

mashuag

...ggarig

...gmashuag

gari1

ndege1

mashua1

...1gari1′

...1mashua1′

Intra-session Agreement?

gari1 ⋲ gari1′mashua1 ⋲ mashua1′

Word Utterance

carcarcarcar...

boatboat

...

garig

gari1

gari1′gari2

...mashuag

mashua1

...

Added

15

© 2010 Nokia 2010-Feb-9 / JTL

How is a speech recognizer built?

1. Expert creates dictionary

bangers b-ae-N-s@r-z batter b-ae-t-s@r ... ...

2. Collect corpus from native speakers

3. Build phoneme matcher

Output: bangers (98%) batter ( 1%) ...

1. Determine target phrases (no expert)2. Collect corpus from native speakers

“new transaction” “cancel transaction” “cancel”

3. Build phrase matcher

Output: cancel transaction (99%) cancel (1%)

Type 1: PHONEME Type 2: PHRASE

b (0.8)

bH (0.2)

ae (0.98)

ai (0.02)

“cancel transaction” (.99)

“cancel” (.01)

- Expensive (>$10m/language)+ Grammar expandable+ Memory: |phonemes|

+ Cheap ($10k/language)- Corpus not expandable without more collection~ Memory: |vocabulary| Good enough for C&C on devices w/small vocab

16

© 2010 Nokia 2010-Feb-9 / JTL

How is a speech recognizer built?

1. Expert creates dictionary

bangers b-ae-N-s@r-z batter b-ae-t-s@r ... ...

2. Collect corpus from native speakers

3. Build phoneme matcher

Output: bangers (98%) batter ( 1%) ...

1. Determine target phrases (no expert)2. Collect corpus from native speakers

“new transaction” “cancel transaction” “cancel”

3. Build phrase matcher

Output: cancel transaction (99%) cancel (1%)

Type 1: PHONEME Type 2: PHRASE

b (0.8)

bH (0.2)

ae (0.98)

ai (0.02)

“cancel transaction” (.99)

“cancel” (.01)

- Expensive (>$10m/language)+ Grammar expandable+ Memory: |phonemes|

+ Cheap ($10k/language)- Corpus not expandable without more collection~ Memory: |vocabulary| Good enough for C&C on devices w/small vocab

16

© 2010 Nokia 2010-Feb-9 / JTL

Automatic VerificationGoal:• Discard low quality work• Tolerate noise to improve trust

Previous work (Turk, txteagle, Sarmenta)

• Give k users same task‣ Slow payment

17

© 2010 Nokia 2010-Feb-9 / JTL

Automatic VerificationGoal:• Discard low quality work• Tolerate noise to improve trust

Previous work (Turk, txteagle, Sarmenta)

• Give k users same task‣ Slow payment

Our approach: Intra-session Agreement• Make a small fraction of user’s queries redundant• Measure acoustical similarity between each pair• Examine distribution of similarity scores

‣ Like same user saying same word? Accept‣ Else: Reject

A B

30 min0 min

C D D’ ZA’E

D = s(a,a′), s(d,d′), ..., s(k,k′)

s(x,y)⇒acoustical similarity

17

© 2010 Nokia 2010-Feb-9 / JTL

Automatic VerificationGoal:• Discard low quality work• Tolerate noise to improve trust

Previous work (Turk, txteagle, Sarmenta)

• Give k users same task‣ Slow payment

Our approach: Intra-session Agreement• Make a small fraction of user’s queries redundant• Measure acoustical similarity between each pair• Examine distribution of similarity scores

‣ Like same user saying same word? Accept‣ Else: Reject

A B

30 min0 min

C D D’ ZA’E

D = s(a,a′), s(d,d′), ..., s(k,k′)

s(x,y)⇒acoustical similarityCan be augmented with other methods• vs. Gold Standard, vs. Corpus

17

© 2010 Nokia 2010-Feb-9 / JTL© 2009 Nokia 2009-08-11 / JTL

NRC/Cambridge ProjectsThree server-side services; prototyping in East Africa (Audio/SMS-based)

• Our focus: User-generated content

Tangaza (“announce” in Swahili)

• Send voice messages to friends, family, and groups‣ e.g., Nairobi taxi drivers, tomato farmers in Uganda

• “Twitter” (social net, status updates) for emerging markets

Crowd Translator• Apply mechanical turk model to generate input for speech recognizer

‣ Micropayments in exchange for small tasks• Improve device/service localization through speech in many more local languages

Mosoko (“mobile marketplace” in Swahili)

• Post and query advertisements for jobs, apartments, and goods• “Craigslist” for the Next Billion

18

Recommended