13
Representation, classification and information fusion for robust and efficient multimodal human state recognition Ming Li 1 Signal Analysis and Interpretation Lab 2013 Research Festival 1 Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5

Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Representation, classification andinformation fusion for robust and efficient

multimodal human state recognition

Ming Li1

Signal Analysis and Interpretation Lab

2013 Research Festival

1Supervised by Professor Shrikanth (Shri) S. Narayanan1/5

Page 2: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Identifying or verifying various human states fromhuman centered multimodal signals

WHO? HOW?

WHAT?

WHERE? WHEN?

Each signal source (e.g. speech) offers a part ofthe story about each of the "state" of interest

2/5

Page 3: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Identifying or verifying various human states fromhuman centered multimodal signals

WHO? HOW?WHAT?

Speaker verificationSpeaker diarization

Audio-visual joint biometrics ECG biometrics

Age/gender identification Biometrics in speech

production system...

Spoken language identification Computational acoustics scene analysis

Audio watermarkingMultimodal physical activity recognition

High-level descriptions of real-life physical activities

...

Emotion recognition Intoxication detection

Personality recognitionPathology recognition

Behavior signal processing

ECG QT interval detectionHealth monitoring

...

A variety of technology applications in the state of the art

each with different levels of process

2/5

Page 4: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Identifying or verifying various human states fromhuman centered multimodal signals

WHO? HOW?WHAT?

Speaker verificationSpeaker diarization

Audio-visual joint biometrics ECG biometrics

Age/gender identification Biometrics in speech

production system...

Spoken language identification Computational acoustics scene analysis

Audio watermarkingMultimodal physical activity recognition

High-level descriptions of real-life physical activities

...

Emotion recognition Intoxication detection

Personality recognitionPathology recognition

Behavior signal processing

ECG QT interval detectionHealth monitoring

...

Human "state" recognition: a broad range of associated information:

biometric identitylanguage identityphysical activityhealth stateemotional statecognitive functioning...

2/5

Page 5: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Identifying or verifying various human states fromhuman centered multimodal signals

WHO? HOW?WHAT?

Speaker verificationSpeaker diarization

Audio-visual joint biometrics ECG biometrics

Age/gender identification Biometrics in speech

production system...

Spoken language identification Computational acoustics scene analysis

Audio watermarkingMultimodal physical activity recognition

High-level descriptions of real-life physical activities

...

Emotion recognition Intoxication detection

Personality recognitionPathology recognition

Behavior signal processing

ECG QT interval detectionHealth monitoring

...

CharacteristicsI states are relatively stableI fit a general supervised learning classification framework

2/5

Page 6: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Representation, classification and information fusion

Multimodal signals

Representation ClassificationInformation

fusionResults

Feature extraction

Signal processing

statistical modeling

I time varying property =⇒ short time frame level features

I generative model for data description =⇒ features (supervectors) inmodel parameters’ space for classification

3/5

Page 7: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Representation, classification and information fusion

Multimodal signals

Representation ClassificationInformation

fusionResults

Feature extraction

Signal processing

statistical modeling

The focus of this proposal:Robustness: improve the performanceEfficiency: reduce computational coston top of the state of the art systems

3/5

Page 8: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Working on top of the state of the art systems

I A very active and competitive research areaI NIST Speaker Recognition Evaluation (SRE) 1997-2012 (13)I NIST Language Recognition Evaluation (LRE) 1996-2011 (5)I DARPA RATS LRE evaluation 2011-2014I Interspeech challenges 2009-2012

(emotion, paralinguistic, intoxication, personality, pathology) (5)

I Standard databases and tasks, Engineering skills required

I Multimodal signals

4/5

Page 9: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Working on top of the state of the art systems

I A very active and competitive research areaI NIST Speaker Recognition Evaluation (SRE) 1997-2012 (13)I NIST Language Recognition Evaluation (LRE) 1996-2011 (5)I DARPA RATS LRE evaluation 2011-2014I Interspeech challenges 2009-2012

(emotion, paralinguistic, intoxication, personality, pathology) (5)

I Standard databases and tasks, Engineering skills required

I Multimodal signals

4/5

Page 10: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Working on top of the state of the art systems

I A very active and competitive research areaI NIST Speaker Recognition Evaluation (SRE) 1997-2012 (13)I NIST Language Recognition Evaluation (LRE) 1996-2011 (5)I DARPA RATS LRE evaluation 2011-2014I Interspeech challenges 2009-2012

(emotion, paralinguistic, intoxication, personality, pathology) (5)

I Standard databases and tasks, Engineering skills required

I Multimodal signals

4/5

Page 11: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Working on top of the state of the art systems

I A very active and competitive research areaI NIST Speaker Recognition Evaluation (SRE) 1997-2012 (13)I NIST Language Recognition Evaluation (LRE) 1996-2011 (5)I DARPA RATS LRE evaluation 2011-2014I Interspeech challenges 2009-2012

(emotion, paralinguistic, intoxication, personality, pathology) (5)

I Standard databases and tasks, Engineering skills required

I Multimodal signals

4/5

Page 12: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Working on top of the state of the art systems

I A very active and competitive research areaI NIST Speaker Recognition Evaluation (SRE) 1997-2012 (13)I NIST Language Recognition Evaluation (LRE) 1996-2011 (5)I DARPA RATS LRE evaluation 2011-2014I Interspeech challenges 2009-2012

(emotion, paralinguistic, intoxication, personality, pathology) (5)

I Standard databases and tasks, Engineering skills required

I Multimodal signals

! !

Morphological Variation in the Adult Vocal TractAdam Lammert, Michael Proctor & Shrikanth Narayanan

University of Southern California

!"#$%&#$$'() *+,-'(.,)/%01%203)4'(+%5#6,10(+,#%7%2!8&

90(:4060;,<#6%=#(,#),0+%,+%)4'%!"36)%=0<#6%>(#<)

No two vocal tracts are exactly alike:

• Proportions

• Structural shapes

• Structural orientations

4/5

Page 13: Representation, classification and information fusion for ...1Supervised by Professor Shrikanth (Shri) S. Narayanan 1/5. Identifying or verifying various human states from human centered

Thank you for your attention.

I We welcome your questions, suggestions and comments!I Personal website (http://www-scf.usc.edu/ mingli/)I SAIL lab website (http://sail.usc.edu/)

5/5