Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Representation, classification andinformation fusion for robust and efficient

multimodal human state recognition

Ming Li1

Signal Analysis and Interpretation Lab

2013 Research Festival

1Supervised by Professor Shrikanth (Shri) S. Narayanan1/5

Identifying or verifying various human states fromhuman centered multimodal signals

WHO? HOW?

WHAT?

WHERE? WHEN?

Each signal source (e.g. speech) offers a part ofthe story about each of the "state" of interest

2/5


WHO? HOW?WHAT?

Speaker verificationSpeaker diarization

Audio-visual joint biometrics ECG biometrics

Age/gender identification Biometrics in speech

production system...

Spoken language identification Computational acoustics scene analysis

Audio watermarkingMultimodal physical activity recognition

High-level descriptions of real-life physical activities

...

Emotion recognition Intoxication detection

Personality recognitionPathology recognition

Behavior signal processing

ECG QT interval detectionHealth monitoring

...

A variety of technology applications in the state of the art

each with different levels of process

2/5


WHO? HOW?WHAT?








...





...

Human "state" recognition: a broad range of associated information:

biometric identitylanguage identityphysical activityhealth stateemotional statecognitive functioning...

2/5


WHO? HOW?WHAT?








...





...

CharacteristicsI states are relatively stableI fit a general supervised learning classification framework

2/5

Representation, classification and information fusion

Multimodal signals

Representation ClassificationInformation

fusionResults

Feature extraction

Signal processing

statistical modeling

I time varying property =⇒ short time frame level features

I generative model for data description =⇒ features (supervectors) inmodel parameters’ space for classification

3/5

Representation, classification and information fusion

Multimodal signals

Representation ClassificationInformation

fusionResults

Feature extraction

Signal processing

statistical modeling

The focus of this proposal:Robustness: improve the performanceEfficiency: reduce computational coston top of the state of the art systems

3/5

Working on top of the state of the art systems

I A very active and competitive research areaI NIST Speaker Recognition Evaluation (SRE) 1997-2012 (13)I NIST Language Recognition Evaluation (LRE) 1996-2011 (5)I DARPA RATS LRE evaluation 2011-2014I Interspeech challenges 2009-2012

(emotion, paralinguistic, intoxication, personality, pathology) (5)

I Standard databases and tasks, Engineering skills required

I Multimodal signals

4/5






4/5






4/5






4/5






! !

Morphological Variation in the Adult Vocal TractAdam Lammert, Michael Proctor & Shrikanth Narayanan

University of Southern California

!"#$%&#$$'() *+,-'(.,)/%01%203)4'(+%5#6,10(+,#%7%2!8&

90(:4060;,<#6%=#(,#),0+%,+%)4'%!"36)%=0<#6%>(#<)

No two vocal tracts are exactly alike:

• Proportions

• Structural shapes

• Structural orientations

4/5

Thank you for your attention.

I We welcome your questions, suggestions and comments!I Personal website (http://www-scf.usc.edu/ mingli/)I SAIL lab website (http://sail.usc.edu/)

5/5

Documents

Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered