Upload
rick-mckinnon
View
2.205
Download
2
Tags:
Embed Size (px)
Citation preview
Phonetics
The Creation of Speech Sounds
Anatomy and phonetics
• Speech sounds are created by air pushed out of the lungs; air vibrates as it passes through the vocal tract
• Different positions of the vocal folds, the tongue, lips, and other articulators in the mouth modify the air causing different speech sounds
Periodic vibrations
• When a tuning fork vibrates, it creates successive waves of compression and rarifaction among air molecules.
Periodic vibrations
• We can plot these changes in density as sine wave where the horizontal axis represents time and the vertical axis represents density of air molecules.
Sine Wave
Amplitude
Time
Phase
• Two waves can have the same frequency, but different phase. Combination of two waves can result in an increase in amplitude if they are in phase, or a decrease if they are out of phase.
Amplitude
• Two waves can have the same phase, but different amplitudes.
Complex waves
• The combination of sine waves can result in a complex wave with virtually any shape.
Complex waves
• The addition of higher harmonics creates a sawtooth shape to this wave.
Fourier analysis
Power spectrum of sine wave
Amplitude
Frequency
Power spectrum
• This graph shows the fundamental frequency, and the harmonic frequencies (whole number multiples of F0) associated with that fundamental.
Aperiodic vibration (noise)
• Noise is characterized by random movement of air molecules. There is no patter to the vibrations, hence there is energy at all frequencies of sound.
Speech waveform
Source-filter model
Sagittal section of the vocal tract (Techmer 1880).
Lungs
Trachea
Vocal Folds (within the Larynx)
Pharynx
Nasal Cavity
text ©J.J. Ohala, September 2001
Source-filter model
Source-filter model
• The lungs provide the power.
• The larynx is a valve.• Air forced through the
larynx causes it to vibrate (Bernoulli’s principle).
• The vibrations resonate in the upper cavities.
Anatomy: the larynx
• The larynx is a highly complex structure that houses the vocal folds. It evolved out of the cartilagenous rings that structure the trachea.
Anatomy: the larynx
• The larynx has two levels of protection for preventing objects from passing into the trachea.o Epiglottiso Vocal folds
Anatomy: the larynx
• A complex set of muscles control the movement of the vocal folds (which are themselves muscles).
Anatomy: the vocal folds
• Vibration of the vocal folds.o Bernulli’s Principle.
Vocal fold sequence
Glottal waveform
The filter (vocal tract)
• The vocal tract can cause an increase in amplitude of some harmonics.
Orangutan vocal tract
• Compare the vocal tract of the Orangutan. Not much volume for resonance to occur in.
Rhesus monkey v-t
• The tongue takes up most of the space in the v-t of non-human primate species.
Homo sapiens v-t
• Notice the “L” shape of the v-t. Humans choke to death far more frequently than other species.
Resonance cavity (vocal tract)
• Position of the tongue creates sub-spaces for resonance of the vocal tone.
The vocal tract
Filter Function for Schwa Vowel
Power Spectrum of Glottal Tone
Output of Vocal Tract Filter Function
Vocal Tract Filter
Filter Function for Specific Vowels
Vowel formants for English
F1 F2
/i/ 290 2500
/æ/ 690 1650
/a/ 710 1200
/u/ 310 900
American Vowels
Other Vowel Systems
Consonants
• Compare the musculature of non-human primates with respect to the articulation of speech sounds.
• Humans have far more specific control over the facial muscles that control the lips and jaw.
The modularity of Speech
• Two aspects of modularity
language vs. general cognitive system
linguistic subsystems• The importance of modularity of speech• Two supporting evidences of modularity of
speech
-- problem of invariance
-- categorical perception
Two supporting evidence for speech as a modular system
1). Problem of invariance
• The relationship between acoustic stimulus and perceptual experience is complex in the case of speech. The fact that there is no one-to-one correspondence between acoustic cues and perceptual events has been termed the lack of invariance.
Why is the question of modularity important?
It is related to the question of the organization of the brain for language language development / disorders.
If speech is a modular system a specialized neurological representation. not be based on general cognitive
functioning (working memory, episodic memory, and so on) but would be specific to language.
the basis for the perception of language in young infants and, if damaged, the reason that certain individuals suffer quite specific breakdowns in language functioning.
The phoneme /t/ and its allophones
• [t] as in [th] as in [ɾ] as in [ʔ] as in …• stop top little kitten• acoustic acoustic acoustic acoustic
acoustic• pattern 1 pattern 2 pattern 3 pattern 4
pattern 5
Perception of /t/
Why does the problem of invariance support the speech as modular system hypothesis?
• When hearing the physical sound [d], how does a listener quickly solving the problem of its real identity (e.g. /t/ or /d/?)
this is more complex than ordinary auditory perception
speech is a special mode of perception.
Two supporting evidence for speech as a modular system
• 2).Categorical perception of initial consonants: based on VOT( Are there any difference between language and other cognitive functioning such as vision? )
To comprehend speech, we must impose an absolute (or categorical) identification on the incoming speech signal rather than simply a relative determination of the various physical characteristics of the signal.
auditory cues such as frequency and intensity will play a role, but ultimately the result of speech perception is the identification of a stimulus as belonging to one or another category of speech sound.
VOT—voice onset time
• In the case of oral stops, the airflow is blocked completely, causing pressure to build up. The obstruction in the mouth is then suddenly opened; the released airflow produces a sudden impulse in pressure causing an audible sound.
• VOT is relative to the stop release burst.
VOT – voice onset time
• On a speech spectrogram it is possible to identify the difference between the voiced sound [ba] and the voiceless sound [pa] as due to the time between when the sound is released at the lips and when the vocal cords begin vibrating.
• With voiced sounds, the vibration occurs immediately; however, with voiceless sounds it occurs after a short delay. This lag, the voice onset time, is an important cue in the perception of the voicing feature.