20
야옹 miyav o㎓㎞㎴ miao mjau ㏈㎻㎴ miaou ㆈcw mnau meong ニャン meow miauw mjá mjav meamh yfiz~YQ ラヱラヱ ュfや温ヱ貫 ?

Automatic Language Identification

  • Upload
    bigshum

  • View
    447

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Automatic Language Identification

야옹

miyav miao

mjaumiaou

mnau

meongニャン

meow

miauw

mjá

mjav

meamh

?

Page 2: Automatic Language Identification

Automatic  Language  Identification    

 (LiD)  

Alexander  Hosford  

@awhosford  

Page 3: Automatic Language Identification

The  LiD  Problem  

•  The  identification  of  a  given  spoken  language  from  a  speech  signal  

•  94%  of  global  population  speak  only  6%  of  

world  languages  

Page 4: Automatic Language Identification

Real  World  Situations  

•  Automation  of  LiD  is  desirable  

•  Offers  many  benefits  to  international  service  

industries  

•  Hotels,  Airports,  Global  Call  Centres  

Page 5: Automatic Language Identification

Language  Differences    

•  Languages  contain  information  that  makes  

one  discernable  from  the  other;  

– Phonemes  

– Prosody  

– Phonotactics  

– Syntax  

Page 6: Automatic Language Identification

Human  Abilities  

•  Bias  towards  native  language  arises  in  infancy  

•  Prosodic  features  some  of  the  first  cues  to  be  

recognised  

•  Humans  can  make  a  reasonable  estimate  on  language  

heard  within  3-­‐4  seconds  of  audition  

•  Even  unfamiliar  languages  may  be  plausibly  judged  

 

Page 7: Automatic Language Identification

Previous  Attempts  

•  Attempts  as  early  as  1974  –  USAF  work,  therefore  

classified.  

•  Methodological  Investigations  as  early  as  1977  (House  &  

Neuburg)  

•  Studies  for  the  most  part  center  on  phonotactic  constrains  

and  phoneme  modeling  

•  Raw  acoustic  waveforms  have  been  visited  (Kwasny  1993)  

Page 8: Automatic Language Identification

A  Simpler  Approach?  

•  Phonotactic  approaches  require  expert  linguistic  knowledge  

•  Phoneme  and  phonotactic  modeling  time  

consuming  

•  Given  the  speed  of  human  LiD  abilities,  

discrimination  most  likely  based  on  acoustic  

features  

Page 9: Automatic Language Identification

System  Overview  

Pre-processed Speech Utterance nPVI

MFCC Vectors

Onset Detection

Pitch Contour

WEKA ToolsARFF File Generation

SuperColliderSystemCmd

SuperCollider

Page 10: Automatic Language Identification

Feature  Extraction  

•  Spectral  Information  –  MFCC  Vectors  

•  Pitch  contour  information  

•  Handled  by  SCMIR  Library  (Collins,  2010)  

•  Speech  Rhythm  –  Normalised  Pairwise  Variability  

Index  (nPVI)  

Feature  Type   Implemented  Measure  

Spectral  Content   MFCC  Vector  uGen  

Pitch  Contour   Tartini  uGen  

Speech  Rhythm   nPVI  function  

Page 11: Automatic Language Identification

Classification  

•  Handled  by  the  WEKA  toolkit  

•  Built  in  Multilayer  Perceptron  

•  Called  from  the  command  line  through  a  

SuperCollider  system  command  

Page 12: Automatic Language Identification

Comparisons  Made  

•  66  language  pairs  from  12  languages  

•  A  comparison  within  language  families  

•  A  comparison  of  all  12  languages  

•  Averaged  &  segmented  data  

Page 13: Automatic Language Identification

Results  Within  Families  

*Data  averaged  across  files  

20.00  

30.00  

40.00  

50.00  

60.00  

70.00  

80.00  

90.00  

100.00  

4  MFCC   13  MFCC  Tartini   41  MFCC  Tartini   All  Features  

Germanic  

Romantic  

Slavic  

SinoAltaic  

Mean  

Page 14: Automatic Language Identification

Results  Within  Families  

*Data  in  5  second  segments    

20.00  

30.00  

40.00  

50.00  

60.00  

70.00  

80.00  

90.00  

100.00  

4  MFCC   13  MFCC  Tartini   41  MFCC  Tartini  

Germanic  

Romantic  

Slavic  

SinoAltaic  

Mean  

Page 15: Automatic Language Identification

Results  From  All  Languages  

10.00  

20.00  

30.00  

40.00  

50.00  

60.00  

70.00  

80.00  

90.00  

100.00  

4  MFCC   4  MFCC  Tartini   4  MFCC  Tartini  nPVI  

13  MFCC   13  MFCC  Tartini   13  MFCC  Tartini  nPVI  

41  MFCC   41  MFCC  Tartini   41  MFCC  Tartini  nPVI  

Averaged  

Mean  

5s  Segments  

Mean  

Page 16: Automatic Language Identification

Extensions  

•  nPVI  function  to  use  vowel  onsets  

•  Phonemic  Segmentation  

•  A  Larger  dataset  

•  Better  Efficiency  

•  Real-­‐time  operation  

Page 17: Automatic Language Identification

Robustness  

•  Real  world  signals  are  very  different  from  

processed  ‘clean’  data.  

•  ‘Ideal’  LiD  systems  –  independent  &  robust  

•  A  need  to  analyse  only  the  part  of  the  signal  that  matters.  

Page 18: Automatic Language Identification

CASA  

•  A  computational  modeling  of  human  

‘Auditory  Scene  Analysis’  (Bregman  1990)  

•  Separation  of  signal  into  component  parts  

and  reconstitution  into  meaningful  ‘streams’  

Page 19: Automatic Language Identification

Using  Noisy  Signals  

I’m  open  to  suggestions!  

Page 20: Automatic Language Identification

[email protected]  07814  692  549  @awhosford