41
Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University

Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University

Embed Size (px)

Citation preview

Voice source characterisation

Gerrit Bloothooft

UiL-OTS Utrecht University

Emasters School Leuven 2002 Voice Source Characterization 2

Voice research

To describe and model the properties of the vocal sound source from view points of:– Physiology– Acoustics– Perception

Emasters School Leuven 2002 Voice Source Characterization 3

Importance of the voice

• Speech synthesis– Towards natural sounding synthesis

• Speech recognition– Using source properties in recognition

• Speaker recognition/identification– Voice source characteristics are essential

• Diagnosis– Pathologies, voice classifications

Emasters School Leuven 2002 Voice Source Characterization 4

Voice possibilities

Limited use of voice in speech• Range of the fundamental

frequency• Vocal intensity range• Spectral variation

Emasters School Leuven 2002 Voice Source Characterization 5

Focus in this presentation

How do acoustic voice source characteristics vary as a functionof F0 and vocal intensity

Emasters School Leuven 2002 Voice Source Characterization 6

Voice profile measurement

Thirties: Intensity range as function of various pitches– manual measurement

Eighties: Automatic computation ofF0 and Intensity– computer measurement– visual feedback– additional parameters

Emasters School Leuven 2002 Voice Source Characterization 7

Measurement unit

• One decibel• One semi-tone

Emasters School Leuven 2002 Voice Source Characterization 8

Measurement procedure

• Subject in front of computer screen• Microphone on head set (30 cm)• Just phonate, sing, and see the result

immediately

• Best results with recording protocol• Feed back stimulates extreme

phonations

Emasters School Leuven 2002 Voice Source Characterization 9

Fundamental frequency (Hz)

Voca

l In

ten

sity

(d

B S

PL)

Sam

ple

den

sity

Voice profile / density

Emasters School Leuven 2002 Voice Source Characterization 10

Fundamental frequency (Hz)

Voca

l In

ten

sity

(d

B S

PL)

Sam

ple

den

sity

Voice profile / speech area

Emasters School Leuven 2002 Voice Source Characterization 11

Acoustic voice quality parameters

• Jitter– Stability of periodicity– Asymmetry in vocal folds

• Crest factor– Max amplitude divided by average

energy– Relates to spectral slope

• Many more …

Emasters School Leuven 2002 Voice Source Characterization 12

Crest factorV

oc a

l In

ten

sity

(d

B S

PL)

Fundamental frequency (Hz)

Cre

st f

act

or

Emasters School Leuven 2002 Voice Source Characterization 1353

Jitter

Fundamental frequency (Hz)

Vo

cal

inte

nsi

ty (

dB

SP

L)

regular

irregular

Emasters School Leuven 2002 Voice Source Characterization 14

Real time presentation

Screen presentation• One data point per F0-I cell

Advanced data storage [new]• Full audio signal • Full distribution of data per F0-I cell

• Data for screen presentation

Emasters School Leuven 2002 Voice Source Characterization 15

Advantages

• Reusability of recordings

• Statistical analysis per F0-I cell

• Study of time-varying behavior

Emasters School Leuven 2002 Voice Source Characterization 16

Crest factorV

oc a

l In

ten

sity

(d

B S

PL)

Fundamental frequency (Hz)

Cre

st f

act

or

Emasters School Leuven 2002 Voice Source Characterization 17

Median smoothing of crest factor

Voc a

l In

ten

sity

(d

B S

PL)

Fundamental frequency (Hz)

Cre

st f

act

or

Crest factor median smoothed

Emasters School Leuven 2002 Voice Source Characterization 18

Vocal Registers

Different movement patterns of the vocal folds

• Pulse register (creaky voice)• Modal register• Falsetto register

Emasters School Leuven 2002 Voice Source Characterization 19

Pulse register

• Less than 50 Hz• Irregular • Long closed period

Emasters School Leuven 2002 Voice Source Characterization 20

Fundamental Frequency (Hz)

Voc

al I

nten

sity

(dB

SP

L)

Pulse register

Emasters School Leuven 2002 Voice Source Characterization 21

Modal register

• “Normal” use of voice• Active role of M. Vocalis• Vocal folds thick and completely

vibrating

• Wide range in F0 and intensity

• Flat spectrum

Emasters School Leuven 2002 Voice Source Characterization 22

Fundamental frequency (Hz)

Voc

al I

nten

sity

(dB

SP

L)

Modal register

Emasters School Leuven 2002 Voice Source Characterization 23

Falsetto register

• Higher pitches• M. Vocalis passive, tense vocal

ligaments through M.Cricothyroidus

• Edge vibration of vocal volds• Sound poor in higher harmonics (in

untrained subjects)

Emasters School Leuven 2002 Voice Source Characterization 24

Fundamental frequency (Hz)

Voc

al I

nten

sity

(dB

SP

L)

Falsetto register

Emasters School Leuven 2002 Voice Source Characterization 25

Fundamental frequency (Hz)

Voc

al I

nens

ity

(dB

SP

L)

Register overlap

Emasters School Leuven 2002 Voice Source Characterization 26

Chest- en head voice

Refer to secundary vibratory sensations in the body

• Chest voice: loud modal register• Head voice:

– males: higher, softer modal register in overlap area with falsetto register

– women: falsetto register

Emasters School Leuven 2002 Voice Source Characterization 27

Fundamental frequency (Hz)

Voc

al I

nten

sity

(dB

SP

L)

Chest voice and Head voice

chest

head

Emasters School Leuven 2002 Voice Source Characterization 28

Registers and voice profiles

With a description using

• Iso-crest factor lines• Iso-jitter lines

Emasters School Leuven 2002 Voice Source Characterization 29

Iso-crest factor lines

4 dB

6 dB

Vo c

al I

nten

sity

(dB

SP

L)

Cre

st f

acto

r

Fundamental frequency (Hz)

Emasters School Leuven 2002 Voice Source Characterization 30

Vo c

al I

nten

sity

(dB

SP

L)

Fundamental frequency (Hz)

3 %

Jitt

er (

%)

Iso-jitter lines

Emasters School Leuven 2002 Voice Source Characterization 31

New representation

• Areas defined by iso-parameter lines– crest factor < 4 dB– crest factor > 4 dB, < 6 dB– crest factor > 6 dB– jitter < 3 %– [relative rise time < 6 %]

Emasters School Leuven 2002 Voice Source Characterization 32

Areas in the phonetogramV

o cal

Int

ensi

ty (

dB S

PL

)

Fundamental frequency (Hz)

Jitter > 3%, unstable

RRT < 6 %pressed-like Crest factor < 4 dB

sine-like

Emasters School Leuven 2002 Voice Source Characterization 33Fundamental frequency (Hz)

Vocal registers in the phonetogram

Falsettoupper boundary

Modallower boundary

Chest voiceboundary

Vo c

al I

nten

sity

(dB

SP

L)

Emasters School Leuven 2002 Voice Source Characterization 34

Comparison of voice profiles

Characterisation of

• Voice pathologies• Voice classifications

Reuse stored voice profiles of subjects with known voice history

Emasters School Leuven 2002 Voice Source Characterization 35

Important features

• Contour has limited value– but most research goes into that

direction (norm profiles)

• Distribution of acoustical parameters across the voice profile tells much more

Emasters School Leuven 2002 Voice Source Characterization 36

• Unit for comparison

Voice profile unit defined by small range of F0 and Vocal Intensity

• Distributions of acoustic voice parameters per unit

Probability density function per parameter• Model

Hidden Markov Model

We need

Emasters School Leuven 2002 Voice Source Characterization 37

IN OUT

two unconnected states per phonetogram unit

• vocal registers• start and end of phonetion

Unit model

Emasters School Leuven 2002 Voice Source Characterization 38

Speech Voice Profile

• phoneme model F0/I unit model

• not labeled labeled by F0 and I

• spectral envelope acoustic voice parameters• language model unrestricted transitions

“forced alignment

recognition”

Correspondences

Emasters School Leuven 2002 Voice Source Characterization 39

Crest factor distributions

training subject 1

0

500

4 5 6 7 8 9 10 11 12 13 14 15

test subject 1

0

500

4 5 6 7 8 9 10 11 12 13 14 15

training subject 2

0

500

4 5 6 7 8 9 10 11 12 13 14 15

test subject 2

0

500

4 5 6 7 8 9 10 11 12 13 14 15

 

Emasters School Leuven 2002 Voice Source Characterization 40

Fundamental frequency (Hz)

Voc

al I

nten

sity

(dB

SP

L)

Dis

tinc

tive

ness

Most distinctive states

Emasters School Leuven 2002 Voice Source Characterization 41

Conclusions

• Voice profiles can enhance our understanding of vocal behaviour in a visually attractive way

• Current data storage opens a series of important research topics

• Market opportunities for “light” versions