Nokia @ SkunkworksNokia Research Center, Cambridge, US and Nairobi, Kenya
Jonathan Ledlie, Billy Odero, Brian Omwenga, Einat Minkov, Imre Kiss, Joseph Polifroni, Jay Chen, Pauline Githinji, Mokeira Masita-Mwangi, Lucy Macharia, Jussi ImpioFebruary 2010
1
© 2010 Nokia 2010-Feb-9 / JTL
Outline
2
© 2010 Nokia 2010-Feb-9 / JTL
OutlineKenya Projects• Tafsiri (aka Crowd Translator) • Tangaza (“announce”)• SMS Find‣ Next talk: Jay Chen
2
© 2010 Nokia 2010-Feb-9 / JTL
OutlineKenya Projects• Tafsiri (aka Crowd Translator) • Tangaza (“announce”)• SMS Find‣ Next talk: Jay Chen
Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)
2
© 2010 Nokia 2010-Feb-9 / JTL
OutlineKenya Projects• Tafsiri (aka Crowd Translator) • Tangaza (“announce”)• SMS Find‣ Next talk: Jay Chen
Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)
Other Nokia Projects
2
© 2010 Nokia 2010-Feb-9 / JTL
OutlineKenya Projects• Tafsiri (aka Crowd Translator) • Tangaza (“announce”)• SMS Find‣ Next talk: Jay Chen
Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)
Other Nokia Projects
Interrupt!(please)
2
© 2010 Nokia 2010-Feb-9 / JTL
Tafsiri (1/2)
3
© 2010 Nokia 2010-Feb-9 / JTL
Tafsiri (1/2)Problem: have speech-enabled service in English• e.g. on-device or server-side mobile money transfer like MPESA• How do you translate (“localise”) it for e.g. Luo, or even Swahili
‣ Can’t purchase :-(
3
© 2010 Nokia 2010-Feb-9 / JTL
Tafsiri (1/2)Problem: have speech-enabled service in English• e.g. on-device or server-side mobile money transfer like MPESA• How do you translate (“localise”) it for e.g. Luo, or even Swahili
‣ Can’t purchase :-(
Our Approach: Tafsiri (aka Crowd Translator)• Cheaply create speech recognizer for local, low-corpus languages• Like Nathan Eagle’s txteagle, Amazon’s Mechanical Turk• Programmer standpoint‣ Like a “localisation” file
3
© 2010 Nokia 2010-Feb-9 / JTL
Tafsiri (2/2)Like Mechanical Turk: pay users for validated work (i.e. speech contributions)
MobileOperator
Worker
A B
30 min0 min
C D D’ ZA’E
0.86, 0.93, ..., ....
Users are not translating; Tafsiri translates (“localises”) speech service
4
© 2010 Nokia 2010-Feb-9 / JTL
Tafsiri (2/2)
1. User flashes CX• Gets callback
Like Mechanical Turk: pay users for validated work (i.e. speech contributions)
MobileOperator
Worker
A B
30 min0 min
C D D’ ZA’E
0.86, 0.93, ..., ....
Users are not translating; Tafsiri translates (“localises”) speech service
4
© 2010 Nokia 2010-Feb-9 / JTL
Tafsiri (2/2)
1. User flashes CX• Gets callback
Like Mechanical Turk: pay users for validated work (i.e. speech contributions)
MobileOperator
Worker
A B
30 min0 min
C D D’ ZA’E
0.86, 0.93, ..., ....
2. Selects his native language
Users are not translating; Tafsiri translates (“localises”) speech service
4
© 2010 Nokia 2010-Feb-9 / JTL
Tafsiri (2/2)
1. User flashes CX• Gets callback
Like Mechanical Turk: pay users for validated work (i.e. speech contributions)
MobileOperator
Worker
A B
30 min0 min
C D D’ ZA’E
0.86, 0.93, ..., ....3. Mimics voice prompts• “new transaction”
2. Selects his native language
Users are not translating; Tafsiri translates (“localises”) speech service
4
© 2010 Nokia 2010-Feb-9 / JTL
Tafsiri (2/2)
1. User flashes CX• Gets callback
Like Mechanical Turk: pay users for validated work (i.e. speech contributions)
MobileOperator
Worker
A B
30 min0 min
C D D’ ZA’E
0.86, 0.93, ..., ....3. Mimics voice prompts• “new transaction”
4. Automatic verification
2. Selects his native language
Users are not translating; Tafsiri translates (“localises”) speech service
4
© 2010 Nokia 2010-Feb-9 / JTL
Tafsiri (2/2)
1. User flashes CX• Gets callback
Like Mechanical Turk: pay users for validated work (i.e. speech contributions)
MobileOperator
Worker
A B
30 min0 min
C D D’ ZA’E
0.86, 0.93, ..., ....3. Mimics voice prompts• “new transaction”
4. Automatic verification
2. Selects his native language
5. Payment
Users are not translating; Tafsiri translates (“localises”) speech service
4
© 2010 Nokia 2010-Feb-9 / JTL
OutlineKenya Projects• Tafsiri• Tangaza• SMS Find‣ Next talk: Jay Chen
Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)
Other Nokia Projects
5
© 2010 Nokia 2010-Feb-9 / JTL
Tangaza (1/2)
6
© 2010 Nokia 2010-Feb-9 / JTL
Tangaza (1/2)Tangaza (“announce” in Swahili)• Send voice messages to friends, family, and groups‣ e.g., Nairobi taxi drivers, tomato farmers in Uganda
• “Twitter” (social net, status updates) for users with basic phones
6
© 2010 Nokia 2010-Feb-9 / JTL
Tangaza (1/2)Tangaza (“announce” in Swahili)• Send voice messages to friends, family, and groups‣ e.g., Nairobi taxi drivers, tomato farmers in Uganda
• “Twitter” (social net, status updates) for users with basic phones
Problem• Create a useable group speech and text messaging service• Be nice if everyone would just upgrade to my S40 application• Instead make it reasonable on all GSM phones
6
© 2010 Nokia 2010-Feb-9 / JTL
Tangaza (1/2)Tangaza (“announce” in Swahili)• Send voice messages to friends, family, and groups‣ e.g., Nairobi taxi drivers, tomato farmers in Uganda
• “Twitter” (social net, status updates) for users with basic phones
Problem• Create a useable group speech and text messaging service• Be nice if everyone would just upgrade to my S40 application• Instead make it reasonable on all GSM phones
How it works• Text messages for creating and manipulating groups‣ “create group slot”‣ create #skunkworks 1 or join #skunkworks 1
• Call in to create spoken messages‣ Skunkworks spoken messages now linked with slot 1‣ Also listen and forward spoken messages
6
© 2010 Nokia 2010-Feb-9 / JTL
Tangaza (2/2)
7
© 2010 Nokia 2010-Feb-9 / JTL
Tangaza (2/2)Pilots• Strathmore‣ They use Facebook‣ Since November
• Huruma‣ They don’t‣ Much more Ksh conscious‣ Fewer snazzy phones‣ Starting this week‣ Nokia subsidizing, doesn’t scale
7
© 2010 Nokia 2010-Feb-9 / JTL
Tangaza (2/2)Pilots• Strathmore‣ They use Facebook‣ Since November
• Huruma‣ They don’t‣ Much more Ksh conscious‣ Fewer snazzy phones‣ Starting this week‣ Nokia subsidizing, doesn’t scale
Eventual Goals...?• Link with on-phone premium version, free minutes• 98% without new device could use Tangaza vanilla
7
© 2010 Nokia 2010-Feb-9 / JTL
OutlineKenya Projects• Tafsiri• Tangaza• SMS Find‣ Next talk: Jay Chen
Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)
Other Nokia Projects
8
© 2010 Nokia 2010-Feb-9 / JTL
Indoor Localization / Positioning (1/2)
0xa3b
0x6d2
0xbc4
0x5fe
0xa3b
0x6d20xbc40x5fe
9
© 2010 Nokia 2010-Feb-9 / JTL
Indoor Localization / Positioning (1/2)Main idea: Build mapping from RF to location
0xa3b
0x6d2
0xbc4
0x5fe
0xa3b
0x6d20xbc40x5fe
9
© 2010 Nokia 2010-Feb-9 / JTL
Indoor Localization / Positioning (1/2)Main idea: Build mapping from RF to location
Two methods:• Both take ambient RF sources‣ wi-fi, cell towers‣ use GPS hints if available 0xa3b
0x6d2
0xbc4
0x5fe
0xa3b
0x6d20xbc40x5fe
9
© 2010 Nokia 2010-Feb-9 / JTL
Indoor Localization / Positioning (1/2)Main idea: Build mapping from RF to location
Two methods:• Both take ambient RF sources‣ wi-fi, cell towers‣ use GPS hints if available
Method #1• Survey: DB of guesses of RF source x,y(,z)• Use: Triangulate observed RF sources• Works outdoors, more scalable than #2
0xa3b
0x6d2
0xbc4
0x5fe
0xa3b
0x6d20xbc40x5fe
9
© 2010 Nokia 2010-Feb-9 / JTL
Indoor Localization / Positioning (1/2)Main idea: Build mapping from RF to location
Two methods:• Both take ambient RF sources‣ wi-fi, cell towers‣ use GPS hints if available
Method #1• Survey: DB of guesses of RF source x,y(,z)• Use: Triangulate observed RF sources• Works outdoors, more scalable than #2
Method #2• Scrap method #1 b/c poor accuracy indoors due to multipath• Survey: map RF signatures (aka fingerprints) to spaces• Use: lookup closest signature‣ We assume the granularity of a space is a medium-sized room
0xa3b
0x6d2
0xbc4
0x5fe
0xa3b
0x6d20xbc40x5fe
9
© 2010 Nokia 2010-Feb-9 / JTL
Indoor Localization/Positioning (2/2)
10
© 2010 Nokia 2010-Feb-9 / JTL
Indoor Localization/Positioning (2/2)Problem with both methods• Expert surveyor is costly and must be repeated
10
© 2010 Nokia 2010-Feb-9 / JTL
Indoor Localization/Positioning (2/2)Problem with both methods• Expert surveyor is costly and must be repeated
Our Approach: Users become Surveyors• Organic collection and update
10
© 2010 Nokia 2010-Feb-9 / JTL
Indoor Localization/Positioning (2/2)Problem with both methods• Expert surveyor is costly and must be repeated
Our Approach: Users become Surveyors• Organic collection and update
Challenges• Erroneous data• Malicious data• Knowing what you don’t know• Motivation?• Applications?
10
© 2010 Nokia 2010-Feb-9 / JTL
OutlineKenya Projects• Tangaza (“announce”)• Tafsiri (aka Crowd Translator)• SMS Find‣ Next talk: Jay Chen
Other Projects• Indoor Localization/Positioning (w/MIT)• Internet Overlay Routing (w/Imperial College London)
Other Nokia Projects
11
© 2010 Nokia 2010-Feb-9 / JTL
Other Nokia Projects
12
© 2010 Nokia 2010-Feb-9 / JTL
Other Nokia ProjectsNokia Money• Currently piloting in a state in India• Client like MPESA; outlets are Nokia shops; real bank• Interesting w.r.t. operator, bank, regulatory/gov
12
© 2010 Nokia 2010-Feb-9 / JTL
Other Nokia ProjectsNokia Money• Currently piloting in a state in India• Client like MPESA; outlets are Nokia shops; real bank• Interesting w.r.t. operator, bank, regulatory/gov
“Discover Maemo” in Kenya• High-end mobile programming: N900• ~July; two days; part of MIT AITI @ local univ• Details from King’ori in May
12
© 2010 Nokia 2010-Feb-9 / JTL
Other Nokia ProjectsNokia Money• Currently piloting in a state in India• Client like MPESA; outlets are Nokia shops; real bank• Interesting w.r.t. operator, bank, regulatory/gov
“Discover Maemo” in Kenya• High-end mobile programming: N900• ~July; two days; part of MIT AITI @ local univ• Details from King’ori in May
$1m App Competition• Requirement: helps people living on less than $5 a day• Deadline: April 18• callingallinnovators.com
12
© 2010 Nokia 2010-Feb-9 / JTL
The End
Thanks!Jonathan Ledlie
13
© 2010 Nokia 2010-Feb-9 / JTL
Extra Slides
14
© 2010 Nokia 2010-Feb-9 / JTL
Our Approach(a) Make Canonical Recordings (c) Verify Input
(b) Gather User Input (d) Expand Corpus
English Swahili Gold Std.Utterance
carboatplane
...
garimashuandege
...
“gari”“mashua”“ndege”
“...”
15
© 2010 Nokia 2010-Feb-9 / JTL
Our Approach(a) Make Canonical Recordings (c) Verify Input
(b) Gather User Input (d) Expand Corpus
English Swahili Gold Std.Utterance
carboatplane
...
garimashuandege
...
“gari”“mashua”“ndege”
“...”
Prompt User1 Utterance
garig
ndegeg
mashuag
...ggarig
...gmashuag
gari1
ndege1
mashua1
...1gari1′
...1mashua1′
15
© 2010 Nokia 2010-Feb-9 / JTL
Our Approach(a) Make Canonical Recordings (c) Verify Input
(b) Gather User Input (d) Expand Corpus
English Swahili Gold Std.Utterance
carboatplane
...
garimashuandege
...
“gari”“mashua”“ndege”
“...”
Prompt User1 Utterance
garig
ndegeg
mashuag
...ggarig
...gmashuag
gari1
ndege1
mashua1
...1gari1′
...1mashua1′
Intra-session Agreement?
gari1 ⋲ gari1′mashua1 ⋲ mashua1′
15
© 2010 Nokia 2010-Feb-9 / JTL
Our Approach(a) Make Canonical Recordings (c) Verify Input
(b) Gather User Input (d) Expand Corpus
English Swahili Gold Std.Utterance
carboatplane
...
garimashuandege
...
“gari”“mashua”“ndege”
“...”
Prompt User1 Utterance
garig
ndegeg
mashuag
...ggarig
...gmashuag
gari1
ndege1
mashua1
...1gari1′
...1mashua1′
Intra-session Agreement?
gari1 ⋲ gari1′mashua1 ⋲ mashua1′
Word Utterance
carcarcarcar...
boatboat
...
garig
gari1
gari1′gari2
...mashuag
mashua1
...
Added
15
© 2010 Nokia 2010-Feb-9 / JTL
How is a speech recognizer built?
1. Expert creates dictionary
bangers b-ae-N-s@r-z batter b-ae-t-s@r ... ...
2. Collect corpus from native speakers
3. Build phoneme matcher
Output: bangers (98%) batter ( 1%) ...
1. Determine target phrases (no expert)2. Collect corpus from native speakers
“new transaction” “cancel transaction” “cancel”
3. Build phrase matcher
Output: cancel transaction (99%) cancel (1%)
Type 1: PHONEME Type 2: PHRASE
b (0.8)
bH (0.2)
ae (0.98)
ai (0.02)
“cancel transaction” (.99)
“cancel” (.01)
- Expensive (>$10m/language)+ Grammar expandable+ Memory: |phonemes|
+ Cheap ($10k/language)- Corpus not expandable without more collection~ Memory: |vocabulary| Good enough for C&C on devices w/small vocab
16
© 2010 Nokia 2010-Feb-9 / JTL
How is a speech recognizer built?
1. Expert creates dictionary
bangers b-ae-N-s@r-z batter b-ae-t-s@r ... ...
2. Collect corpus from native speakers
3. Build phoneme matcher
Output: bangers (98%) batter ( 1%) ...
1. Determine target phrases (no expert)2. Collect corpus from native speakers
“new transaction” “cancel transaction” “cancel”
3. Build phrase matcher
Output: cancel transaction (99%) cancel (1%)
Type 1: PHONEME Type 2: PHRASE
b (0.8)
bH (0.2)
ae (0.98)
ai (0.02)
“cancel transaction” (.99)
“cancel” (.01)
- Expensive (>$10m/language)+ Grammar expandable+ Memory: |phonemes|
+ Cheap ($10k/language)- Corpus not expandable without more collection~ Memory: |vocabulary| Good enough for C&C on devices w/small vocab
16
© 2010 Nokia 2010-Feb-9 / JTL
Automatic VerificationGoal:• Discard low quality work• Tolerate noise to improve trust
Previous work (Turk, txteagle, Sarmenta)
• Give k users same task‣ Slow payment
17
© 2010 Nokia 2010-Feb-9 / JTL
Automatic VerificationGoal:• Discard low quality work• Tolerate noise to improve trust
Previous work (Turk, txteagle, Sarmenta)
• Give k users same task‣ Slow payment
Our approach: Intra-session Agreement• Make a small fraction of user’s queries redundant• Measure acoustical similarity between each pair• Examine distribution of similarity scores
‣ Like same user saying same word? Accept‣ Else: Reject
A B
30 min0 min
C D D’ ZA’E
D = s(a,a′), s(d,d′), ..., s(k,k′)
s(x,y)⇒acoustical similarity
17
© 2010 Nokia 2010-Feb-9 / JTL
Automatic VerificationGoal:• Discard low quality work• Tolerate noise to improve trust
Previous work (Turk, txteagle, Sarmenta)
• Give k users same task‣ Slow payment
Our approach: Intra-session Agreement• Make a small fraction of user’s queries redundant• Measure acoustical similarity between each pair• Examine distribution of similarity scores
‣ Like same user saying same word? Accept‣ Else: Reject
A B
30 min0 min
C D D’ ZA’E
D = s(a,a′), s(d,d′), ..., s(k,k′)
s(x,y)⇒acoustical similarityCan be augmented with other methods• vs. Gold Standard, vs. Corpus
17
© 2010 Nokia 2010-Feb-9 / JTL© 2009 Nokia 2009-08-11 / JTL
NRC/Cambridge ProjectsThree server-side services; prototyping in East Africa (Audio/SMS-based)
• Our focus: User-generated content
Tangaza (“announce” in Swahili)
• Send voice messages to friends, family, and groups‣ e.g., Nairobi taxi drivers, tomato farmers in Uganda
• “Twitter” (social net, status updates) for emerging markets
Crowd Translator• Apply mechanical turk model to generate input for speech recognizer
‣ Micropayments in exchange for small tasks• Improve device/service localization through speech in many more local languages
Mosoko (“mobile marketplace” in Swahili)
• Post and query advertisements for jobs, apartments, and goods• “Craigslist” for the Next Billion
18