Upload
thimba
View
93
Download
0
Embed Size (px)
DESCRIPTION
Speech Coding Basics. A Tutorial. Mahdi Amiri Supervisor Dr. H. R. Rabiee April 2009 Sharif University of Technology. Speech Coding. A road map. PCM DPCM ADPCM LPC CELP. Pulse-code Modulation (PCM). Basics. Digital Representation of an Analog Signal Sampling and Quantization - PowerPoint PPT Presentation
Citation preview
Speech Coding BasicsSpeech Coding Basics
Mahdi Amiri
Supervisor
Dr. H. R. Rabiee
April 2009
Sharif University of Technology
A TutorialA Tutorial
Page 2 of 30 Speech Coding Basics
Speech CodingSpeech CodingA road mapA road map
PCMDPCMADPCMLPCCELP
Page 3 of 30 Speech Coding Basics
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)BasicsBasics
Digital Representation of an Analog Signal Sampling and Quantization
Parameters:– Sampling Rate (Samples per Second)
– Quantization Levels (Bits per Sample)
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
Page 4 of 30 Speech Coding Basics
Why Call it PCM?Why Call it PCM?
4-bit PCM4-bit PCM
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
How to choose proper…– Sampling Rate
• 8 Khz ?
– Quantization Level• 8 bit/sample ?
Bit per Second for 8000 Hz 8 bit PCM– 64 kbit/s
Page 5 of 30 Speech Coding Basics
Bit per Second (bit/s)Bit per Second (bit/s)
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
Human Hearing Frequency Range– 20 Hz to 20 kHz– Play with “HearTest” to test your hearing– Most people will find that their hearing is most
sensitive around 1-4 kHz and that it is less sensitive at high and low frequencies.
Page 6 of 30 Speech Coding Basics
Sampling RateSampling Rate
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
Page 7 of 30 Speech Coding Basics
Hearing RangeHearing Range
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
Human Vocal Range– Normal: 80 Hz to 1100 Hz– Charles Kellogg (14 KHz) (not verified)
– Guinness Book of Records• Female: Georgia Brown
– (Eight octaves, 25087Hz)
• Male: Tim Storms– (Six octaves)
Page 8 of 30 Speech Coding Basics
Sampling RateSampling Rate
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
8,000 Hz: Telephone, adequate for human speech 11,025 Hz 22,050 Hz – radio 32,000 Hz - miniDV digital video camcorder, DAT (LP mode) 44,100 Hz - audio CD, also most commonly used with MPEG-1 audio
(VCD, SVCD, MP3) 48,000 Hz - digital sound used for miniDV, digital TV, DVD, DAT, films
and professional audio 96,000 or 192,000 Hz - DVD-Audio, some LPCM DVD tracks, BD-ROM
(Blu-ray Disc) audio tracks, and HD-DVD (High-Definition DVD) audio tracks
2.8224 MHz - SACD, 1-bit sigma-delta modulation process known as Direct Stream Digital, co-developed by Sony and Philips”
Page 9 of 30 Speech Coding Basics
Common Sampling RatesCommon Sampling Rates
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
Want to prevent human ear fatigue by minimizing quantization noise
Signal-to-Noise Ratio = 6.02B dBSNR is approximately 6 dB per bit.
– 16-bit => 96 dB– Above 36 dB is required
Page 10 of 30 Speech Coding Basics
Quantization LevelsQuantization Levels
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
The average person cannot tell the difference between a bitrate above 192 kbit/s and the original CD/WAV.
Even if your headphones seal really well around your ears, they will probably only give you about 20 to 25 dB insulation from the external sound.
Page 11 of 30 Speech Coding Basics
Good to KnowGood to Know
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
Page 12 of 30 Speech Coding Basics
ImagesImages
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
Page 13 of 30 Speech Coding Basics
u-law, a-lawu-law, a-law Nonuniform quantizers: Difficult to make, Expensive. Solution: Companding Uniform Q. Expanding
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
Page 14 of 30 Speech Coding Basics
U-law, A-lawU-law, A-law
Pulse-code Modulation (PCM)Pulse-code Modulation (PCM)
Page 15 of 30 Speech Coding Basics
u-law, a-lawu-law, a-law
North America and JapanNorth America and Japan EuropeEurope
Page 16 of 30 Speech Coding Basics
Differential PCM (DPCM)Differential PCM (DPCM)IdeaIdea
Differential PCM (DPCM)Differential PCM (DPCM)
Page 17 of 30 Speech Coding Basics
Basic SchemeBasic Scheme
1Delta Modulation (DM): i n ia x z
Problem?Problem?
General Predictive CodingGeneral Predictive Coding
Differential PCM (DPCM)Differential PCM (DPCM)
Page 18 of 30 Speech Coding Basics
Better StructureBetter Structure
Page 19 of 30 Speech Coding Basics
Adaptive DPCM (ADPCM)Adaptive DPCM (ADPCM)IdeaIdea
Problem?Problem?
Adaptive DPCM (ADPCM)Adaptive DPCM (ADPCM)
Page 20 of 30 Speech Coding Basics
Size of Quantization StepSize of Quantization Step
ADM: [ ] [ 1]n M n
12, 2P Q
1 if [ ] [ 1]
1 if [ ] [ 1]
M P c n c n
M Q c n c n
Page 21 of 30 Speech Coding Basics
Speech Compression ConceptsSpeech Compression ConceptsSpectrogram, STFTSpectrogram, STFT
3D surface spectrogram of a part from a music piece.3D surface spectrogram of a part from a music piece.
Speech Compression ConceptsSpeech Compression Concepts
Page 22 of 30 Speech Coding Basics
SpectrogramSpectrogram
Spectrogram of a male voice saying ‘nineteenth century’.Spectrogram of a male voice saying ‘nineteenth century’.
Speech Compression ConceptsSpeech Compression Concepts
Page 23 of 30 Speech Coding Basics
Spectrogram, DemonstrationSpectrogram, Demonstration
Bat Echolocation CallBat Echolocation Call Flute by Jean Pierre RampalFlute by Jean Pierre Rampal
Singing VoiceSinging Voice Face!Face!
Speech Compression ConceptsSpeech Compression Concepts
Page 24 of 30 Speech Coding Basics
FormantFormant
Page 25 of 30 Speech Coding Basics
Linear Predictive Coding (LPC)Linear Predictive Coding (LPC)ModelingModeling
Linear Predictive Coding (LPC)Linear Predictive Coding (LPC)
Page 26 of 30 Speech Coding Basics
Modeling (Hiss or Buzz)Modeling (Hiss or Buzz)
1
[ ] [ ]P
ii
x n a x n i
Predictor for each frame:Predictor for each frame:
Buzzer Buzzer Filter Filter
Speech = Formants + ResidueSpeech = Formants + Residue
Chuncks: 30 thr. 50 frames/sec.Chuncks: 30 thr. 50 frames/sec.
Linear Predictive Coding (LPC)Linear Predictive Coding (LPC)
Page 27 of 30 Speech Coding Basics
Modeling (Hiss or Buzz)Modeling (Hiss or Buzz)
Page 28 of 30 Speech Coding Basics
Code Excited Linear PredictionCode Excited Linear PredictionCELPCELP
Problem of LPC– Where there is both Hiss and Buzz
Solution– Encode residue
Method– Vector Quantization (Codebook)
Page 29 of 30 Speech Coding Basics
ComparisonComparisonSample SpeechSample Speech
A lathe is a big tool. Grab every dish of sugar.A lathe is a big tool. Grab every dish of sugar.
ComparisonComparison
Page 30 of 30 Speech Coding Basics
DemonstrationDemonstration
OriginalOriginal ADPCMADPCM
LPCLPC CELPCELP
Page 31 of 30 Speech Coding Basics
Speech Coding BasicsSpeech Coding Basics
Thank You
FIND OUT MORE AT...
1. http://ce.sharif.edu/~m_amiri/
2. http://www.aictct.ir/dml/
A TutorialA Tutorial
Page 32 of 30 Speech Coding Basics
Animated TitleAnimated TitleTitleTitle
Abc
Page 33 of 20 Speech Coding Basics
TitleTitleTitleTitle
Abc
100 (100 )old
d
Definition ofDefinition ofVanishing Percentage (VP)Vanishing Percentage (VP)