12
Research activities at AUTH Research activities at AUTH related to dialogue detection related to dialogue detection Ioannis Pitas Ioannis Pitas Constantine Kotropoulos Constantine Kotropoulos Nikos Nikolaidis Nikos Nikolaidis WP6 e-team: Audiovisual WP6 e-team: Audiovisual Understanding Understanding

Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Nikos Nikolaidis Research activities at AUTH related to

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Research activities at AUTH related to dialogue Research activities at AUTH related to dialogue detectiondetection

Ioannis Pitas Ioannis Pitas Constantine KotropoulosConstantine Kotropoulos

Nikos NikolaidisNikos Nikolaidis

WP6 e-team: Audiovisual UnderstandingWP6 e-team: Audiovisual Understanding

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

OutlineOutline IntroductionIntroduction Dialogue detection concept: cross-correlation of Dialogue detection concept: cross-correlation of

indicator functionsindicator functions Speaker turn detection based on speech and visual Speaker turn detection based on speech and visual

cues (mouth activity)cues (mouth activity) Frontal face detection; facial feature detection (e.g. Frontal face detection; facial feature detection (e.g.

mouth)mouth) One-two speaker detection One-two speaker detection Speaker clustering based on speech and visual cuesSpeaker clustering based on speech and visual cues FingerprintingFingerprinting

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

Indicator functions and their cross-Indicator functions and their cross-correlation (1)correlation (1)

A dialogue between two persons from the movie “Secret Window” [Dialogue 1] .

( )AI n n

( )ABc d

n( )BI n

d

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

Indicator functions and their cross-Indicator functions and their cross-correlation (2)correlation (2)

( )AI n

( )BI n

( )ABc d

n

n

d

A scene without a dialogue between two persons

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

Speaker Turn DetectionSpeaker Turn Detection

Audio Segmentation aims at finding acoustic events within Audio Segmentation aims at finding acoustic events within an audio stream. Speaker turn detection is a special case of an audio stream. Speaker turn detection is a special case of speaker segmentation.speaker segmentation.

Important step in pre-processing of speech in order to Important step in pre-processing of speech in order to implement audio indexing or speaker tracking.implement audio indexing or speaker tracking.

Usually, no prior knowledge about speakers is assumed. Usually, no prior knowledge about speakers is assumed.

Speaker 1 Speaker 2

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

MODEL BASED SEGMENTATION

( , )Z ZN

DISTBIC DISTBIC

( , )Y YN CONTRAST THE HYPOTHESIS OF NO SPEAKER TURN ( ) AGAINST THE SPEAKER TURN( )

ZN

,X YN N

( ) log2

log log2 2

ZZ

X YX Y

NML i

N N

( , )X XN XN vectors in X

YN vectors in Y

BIC CRITERION( ) ( ) 0BIC i ML i P

Z X YN N N

Speaker turn!!!!

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

Frontal face images at quartet and Frontal face images at quartet and octet resolutionoctet resolution

Original ImageOriginal Image Quartet ImageQuartet Image Octet ImageOctet Image

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

Face detection based on cornersFace detection based on corners

The figures show the 3 possible feature point set The figures show the 3 possible feature point set configurations, having 100 feature points each. They differ configurations, having 100 feature points each. They differ at the minimum distance allowed between the feature at the minimum distance allowed between the feature points. In general, small inter feature point distances yield points. In general, small inter feature point distances yield a feature point concentration and poor face detection. The a feature point concentration and poor face detection. The minimum allowed distance is a parameter of the training minimum allowed distance is a parameter of the training procedure.procedure.

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

Face detection Receiver Operating Face detection Receiver Operating Characteristic (ROC) curvesCharacteristic (ROC) curves

• For the SVM-based face For the SVM-based face detection, the best results detection, the best results were obtained with the were obtained with the sigmoidal kernel. Best sigmoidal kernel. Best equal error rate 4.5%equal error rate 4.5%

• The maximum likelihood The maximum likelihood detection commits a few detection commits a few false alarm. For FAR in false alarm. For FAR in [5.2%, 5.67%] the FRR [5.2%, 5.67%] the FRR drops quickly from 6.1% to drops quickly from 6.1% to 0.7%. 0.7%.

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

One/Two Speaker Detection One/Two Speaker Detection

Two-speaker detection (NIST 2002): Best EER 16.2 %

Kajarekar, Adami, Hermansky, 2003

One-speaker detection (NIST 2002): Best EER 7.1 %

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

Frontal face authenticationFrontal face authentication

AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki

FingerprintingFingerprinting