Upload
vukiet
View
232
Download
0
Embed Size (px)
Citation preview
AUDIO Muhammad Aminul Akbar
WHAT IS SOUND?
Sound is a (pressure) wave which is created by a vibrating object.
HOW SOUND IS PRODUCED ?
The vibrations by a vibrating object set particles in the surrounding
medium (typical air) in vibrational motion
Molecules in air are disturbed, one bumping against another
An area of high pressure moves through the air in a wave
Thus a wave representing the changing air pressure can be used to
represent sound
• The sound wave is referred
to as a longitudinal wave.
• The result of longitudinal
waves is the creation of
compressions and
rarefactions within the air.
THE SOUND WAVE
WAVELENGTH, AMPLITUDE, FREQUENCY OF A WAVEAmplitude : The maximum distance of moving particle in a medium from their equilibrium position
The frequency f of a wave is measured as the number of complete back-and-forth vibrations of a particle of the medium per unit of time.
1 Hertz = 1 vibration/second
Depending on the medium, sound travels at some speed c which defines the wavelength l
l = c/f
MEASURING THE INTENSITY OF SOUND
Normally, sound intensity is measured as a relative ratio to some standard intensity ( Io )we define the relative sound intensity level as :
I is the intensity of the sound expressed in watts per meter and Iois the reference intensity defined to be 10-12 w/m2. This value of Io is the threshold (minimum sound intensity) of hearing at 1 kHz for a young person under the best circumstances.
EXAMPLES:
If we measured a sound intensity to be 100 greater that the threshold reference, what would be the sound level expressed in dB?
SL(dB) =10 * Log 100∗10−12
10−12
10*log(100) = 20dB
EXAMPLE 2
The threshold of pain is about 120 dB. How many times greater in intensity (in w/m2) is this?
DIGITIZATION OF SOUND
Microphones, video cameras produce analog signals (continuous-valued voltages) as illustrated in the figure below.
To get audio or video into a computer, we have to digitize it (convert it into a stream of numbers) Need to convert Analog-to-Digital
SAMPLING AND QUANTIZATION
1. Sampling - divide the horizontal axis (the time dimension) into discrete pieces
2. Quantization -divide the vertical axis (signal strength) into pieces
SAMPLING
The rate at which sampling is performed is called the sampling frequency
Frequencies over 22.01 kHz are filteredout before sampling is done.
The sampling rate must be at least twice the highest frequency component of the sound (Nyquist Theorem).
The human voice can reach approximately 4 kHz, For audio, typical sampling rates are from 8 kHz (8,000 samples per second) to 48 kHz,
How many Samples to take?
11.025 KHz
-- Speech (Telephone 8KHz)
22.05 KHz
-- Low Grade Audio
(WWW Audio, AM Radio)
44.1 KHz
-- CD Quality
QUANTIZATION
Sampling in the amplitude or voltage dimension is called quantization.
Typical uniform quantization rates are 8-bit and 16-bit
8-bit quantization divides the vertical axis into 256 levels, and 16-bit divides it into 65,536 levels.
NYQUIST'S SAMPLING THEOREM
Suppose we are sampling a sine wave in figure below. How often do we need to sample it to figure out its frequency?
NYQUIST'S SAMPLING THEOREM
If we sample at 1 time per cycle, we can think it's a constant.
NYQUIST'S SAMPLING THEOREM
If we sample at 1.5 times per cycle, we can think it's a lower frequency sine wave
Now if we sample at twice the sample frequency, (Nyquist Rate), we start to make some progress. In this case (at these sample points) we see we get a sawtooth wave that begins to start crudely approximating a sine wave
Nyquist rate -- For lossless digitization, the sampling rate should be at least twice the maximum frequency responses. Indeed many times more the better.
NYQUIST'S SAMPLING THEOREM
Teorema sampling Nyquist menjamin sample data mengandungsemua informasi dari sinyal orisinal
Voice data (speech) limited to below 4000Hz
Membutuhkan 8000 sample per detik (2x4000Hz Nyquist)
Sistem telepon dapat mendigitalisasi voice dengan 128 level atau256 level.
Level-level tersebut disebut level kuantisasi
Jika128 level, maka bit tiap sampel = 7 bits (2 ^ 7 = 128).
Jika 256 level, maka bit tiap sampel = 8 bits (2 ^ 8 = 256).
8000 samples/sec x 7 bits/sample = 56Kbps for a single voice channel.
8000 samples/sec x 8 bits/sample = 64Kbps for a single voice channel.
NYQUIST'S SAMPLING THEOREM
QUANTIZATION INTERVAL
If Vmax is the maximum positive and negative signal amplitude and n is the number of binary bits used, then the magnitude of the quantization interval, q, is defined as follows:
For example, what if we have 8 bits and the values range from –1000 to +1000?
n
Vq
2
2 max
SIGNAL-TO-QUANTIZATION-NOISE RATIO (SQNR)
For digital signals, we must take into account the fact that only quantized values are stored, we effectively force all continuous values of voltage into only 256 different values
Inevitably, this introduces a roundoff error. Although it is not really “noise,” it is called quantization noise (or quantization error).
The actual signal may differ from the code word by up to plus or minus q/2, where q is the size of the quantization interval.
Quantization
Intervals and
Resulting
Error
LINEAR VS. NON-LINEAR QUANTIZATION
In linear quantization, each code word represents a quantization interval of equal length.
In non-linear quantization, you use more digits to represent samples at some levels, and less for samples at other levels.
For sound, it is more important to have a finer-grained representation (i.e., more bits) for low amplitude signals than for high because low amplitude signals are more sensitive to noise. Thus, non-linear quantization is used.
LINEAR VS. NON-LINEAR QUANTIZATION
0123456789101112131415
Strong signal
Weak signal
0
1
2
3
456789
1011
12
13
14
15Quantizing level
Without nonlinear encoding With nonlinear encoding
3.23.9
2.8 3.4
1.2
4.2
3 4 3 3
1
4
011 100 011 011 001 100
Original signal
PAM pulse
PCM pulse
with quantized error
011100011011001100PCM output
PULSE CODE MODULATION (PCM)
PCM menggunakan pengkodean kuantisasi non-linear: spasi amplituda dari tiap level tidak linear
Ada step kuantisasi yang lebih banyak pada amplitudarendah (low)
Ini untuk mengurangi distorsi sinyal secara overall.
PULSE CODE MODULATION (PCM)
Teknik Elektro
Universitas
Brawijaya
DELTA MODULATION (DM)
Pada Delta Modulation, sinyal analog ditracking.
Analog input diaproksimasi dengan staircase function
Apakah Move up (naik) atau down (turun) satu level () padatiap interval sampel
Bit 1 digunakan untuk merepresentasi kenaikan level teganganpd sinyal, dan bit 0 untuk merepresentasi turunnya level tegangan. -> Output dari DM adalah bit tunggal untuk setiapsample
Digunakan juga pada berbagai teknik Kompresi Data
e.g. Interframe coding techniques for video
DELTA MODULATION
DELTA MODULATION
Staircase function
Delta Modulation output
HOW ABOUT MEMORY SPACE?
AUDIO: a sequence of microphone readings on several channels.
Readings (samples) are normally taken at 11000, 22K or 44K per second and may be 8, 12 or 16-bit values.
Q: Berapa banyak memori dibutuhkan untuk menyimpan rekaman audio selama 5 menit dengan menggunakan 2 channel dan 16 bit per sample?
HOW ABOUT MEMORY SPACE?KB = 1024 bytes MB = 1,048,576 bytes GB = 1,073,741,824 bytes
Jika: Fs = 11000 Hz
Nsampel = 5 (menit) x 60 (detik/menit) x 11000 (sampel/detik) = 3.300.000 sampel
Nbit = 16 (bit/sampel) x 3.300.000 sampel = 52.800.000 bit = 6.600.000 byte = 6,295 MB
Nbit Stereo (2 channel) = 6,295 MB x 2 ≈ 12,6 MB
Jika: Fs = 44100 Hz
Nsampel = 5 (menit) x 60 (detik/menit) x 44100 (sampel/detik)= 13.230.000 sampel
Nbit = 16 (bit/sampel) x 13.230.000 sampel= 211.680.000 bit = 26.460.000 byte ≈ 25,234 MB
Nbit Stereo = 25,234 MB x 2 ≈ 50,468 MB satu lagu pada CD audio
RAW DIGITAL AUDIO ..
Makin besar FS, makin baik kualitas rekaman audio, makin banyak jumlah bit yang dibutuhkan!
Makin besar jumlah bit / sampel, makin baik kualitas rekaman audio, makinbanyak jumlah bit yang dibutuhkan!, demikian pula sebaliknya
Kualitas Audio Digital adalah linear dengan kebutuhan memori!
AUDIO FILE FORMAT
Uncompressed Audio Formats
1. PCM
2. WAV
Lossy Compressed Audio Formats
1. MP3
2. AAC
Lossless Compressed Audio Formats
1. FLAC
2. ALAC
PCM
PCM stands for Pulse-Code Modulation, a digital representation of raw analog audio signals.
There is no compression involved. The digital recording is a close-to-exact representation of the analog sound.
PCM is the most common audio format used in CDs and DVDs
WAV
WAV stands for Waveform Audio File Format (also called Audio for Windows at some point but not anymore). It’s a standard that was developed by Microsoft and IBM back in 1991.
A lot of people assume that all WAV files are uncompressed audio files, but that’s not exactly true. WAV is actually just a Windows container for audio formats. This means that a WAV file can contain compressed audio, but it’s rarely used for that.
Most WAV files contain uncompressed audio in PCM format. The WAV file is just a wrapper for the PCM encoding, making it more suitable for use on Windows systems. However, Mac systems can usually open WAV files without any issues.
MP3
MP3 stands for MPEG-1 Audio Layer 3. It was released back in 1993 and quickly exploded in popularity, eventually becoming the most popular audio format in the world for music files.
The main pursuit of MP3 is to cut out all of the sound data that exists beyond the hearing range of most normal people and to reduce the quality of sounds that aren’t as easy to hear, and then to compress all other audio data as efficiently as possible.
AAC
AAC stands for Advanced Audio Coding. It was developed in 1997 as the successor to MP3.
The compression algorithm used by AAC is much more advanced and technical than MP3, so when you compare a particular recording in MP3 and AAC formats at the same bitrate, the AAC one will generally have better sound quality.
AAC is standard audio compression method used by YouTube, Android, iOS, iTunes, later Nintendo portables, and later PlayStations.
FLAC
FLAC stands for Free Lossless Audio Codec.
FLAC can compress an original source file by up to 60% without losing a single bit of data.
FLAC is an open source and royalty-free format rather than a proprietary one,
FLAC is supported by most major programs and devices and is the main alternative to MP3 for CD audio. With it, you basically get the full quality of raw uncompressed audio in half the file size
ALAC
ALAC stands for Apple Lossless Audio Codec.
It was developed and launched in 2004 as a proprietary format but eventually became open source and royalty-free in 2011.
iTunes and iOS both provide native support for ALAC and no support at all for FLAC.
MIDI
MIDI, which dates from the early 1980s, is an acronym that stands for Musical Instrument Digital Interface.
a protocol that enables computer, synthesizers, keyboards, and other musical device to communicate with each other.
Components of a MIDI System : Synthesizer, Sequencer, Track, Channel,Timbre, Pitch, Voice, Path
ROLE OF MIDI
MIDI makes music notes (among other capabilities), so is useful for inventing, editing, and exchanging musical ideas that can be encapsulated as notes.
One strong capability of MIDI-based musical communication is the availability of a single MIDI instrument to control other MIDI instruments, allowing a masterslave relationship: the other MIDI instruments must play the same music, in part, as the master instrument, thus allowing interesting music
MIDI is aimed at music, which can then be altered as the “user”wishes.
THAT’S ALL