Upload
phunglien
View
218
Download
0
Embed Size (px)
Citation preview
advanced spectral processing
Jordi Janer Music Technology Group Universitat Pompeu Fabra, Barcelona jordi.janer @ upf.edu
CDSIM – UPF May 2014 hKp://mtg.upf.edu/
CDSIM UPF – May 2014
Simple Periodic Waves (sine waves)
Time (s)0 0.02
–0.99
0.99
0
• Characterized by: • period: T • amplitude A • phase φ
• Fundamental frequency in cycles per second, or Hz F0=1/T T
A
y(0)=A·∙sin(φ) y = A·sin(2πF0t+φ)
(Many slides come from materials from Dan Jurafsky)
CDSIM UPF – May 2014
Simple periodic waves
• Frequency: 5 cycles in .5 seconds = 10 cycles/second = 10 Hz • Amplitude: 1 • Phase: at time 0 seconds, y(0)=A·sin(2π10t+φ)=sin(φ)=0 ⇒ φ=πk , k∈! ⇒ φ=0 • Equation:
y(t) = A·sin(20πt)
CDSIM UPF – May 2014
(more) Basic facts about sound waves
• where c = speed of sound, and λ = wave length (longitud d’ona) in meters
• c=3440 cm/s (≈345 m/s) at 21 degrees Celsius at sea level
• Example: with λ=10m, frequency f=34,5Hz
λ
f = c/λ
CDSIM UPF – May 2014
Speech sound waves
• A liKle piece from the waveform of a vowel • Y axis:
– Amplitude = amount of air pressure at that Nme point • PosiNve is compression • Zero is normal air pressure, • negaNve is rarefacNon (expansion)
• X axis: Nme.
CDSIM UPF – May 2014
Fundamental frequency • The fundamental frequency (or F0) is the lowest frequency of a periodic
(voiced) waveform, produced by any particular instrument (our vocal folds are like a “complicated” instrument)
• It is also called the first harmonic, in comparison with its integer multiples called second, third, etc. harmonics
Fundamental Frequency = first harmonic
2nd harmonic
3rd harmonic
4th harmonic
5th harmonic
6th harmonic
7th harmonic
CDSIM UPF – May 2014
Fundamental frequency
In speech, see for example the waveform of a vowel
• The fundamental frequency could be computed as the number of repeNNons/second of the wave: – Above vowel has 10 reps in .03875 secs -‐> freq. is 10/.03875 = 258 Hz
• This is the speed at which vocal folds move, hence voicing speed
• Each peak corresponds to an opening of the vocal folds
CDSIM UPF – May 2014
Pitch • Pitch is defined as the perceived fundamental frequency of a sound
• F0 and pitch are different concepts: – F0 corresponds to a physically measurable frequency – Pitch corresponds to a perceivable frequency
• The relaNonship between pitch and F0 is not linear – human pitch perception is most accurate between 100Hz and
1000Hz. • Linear in this range: At F01=200Hz, if Pitch2=Pitch1/2 then F02≈100Hz • Logarithmic above 1000Hz: At F01=5KHz if Pitch2=Pitch1/2 then F02<2KHz
• SNll, in the literature many Nmes F0 and pitch are treated as the same
CDSIM UPF – May 2014
F0 tracking
•
F0 can be computed using several techniques, and using tools like PRAAT
CDSIM UPF – May 2014
Frequency analysis • Waves have different frequencies
Time (s)0 0.02
–0.99
0.99
0
Time (s)0 0.02
–0.99
0.99
0
100 Hz
1000 Hz
CDSIM UPF – May 2014
Frequency analysis • Complex waves: Adding a 100 Hz and
1000 Hz wave together
Time (s)0 0.05
–0.9654
0.99
0
CDSIM UPF – May 2014
Spectrum
100 1000 Frequency in Hz
Am
plitu
de
Frequency components (100 and 1000 Hz) on x-‐axis
CDSIM UPF – May 2014
Fourier transform analysis • Fourier analysis: any wave can be represented as the (infinite) sum of sine waves of different frequencies (amplitude, phase)
• For conNnuous signals:
• For discrete signals:
When N is finite (and relaNvely short) we call the resulNng signal the short term spectrum (STFT)
CDSIM UPF – May 2014
Spectrum example
• Spectrum of one instant in an actual soundwave: many components across the frequency range
• Each frequency component of the wave is separated
Frequency (Hz)0 5000
0
20
40Magnitude
(in dB
)
CDSIM UPF – May 2014
Formants • Formants are defined as the spectra peaks of
the sound spectrum envelope • Formants are independent of the F0 frequency,
as they are defined over the envelope of the spectrum
• They are created by the pass of the sound through the vocal tract
CDSIM UPF – May 2014
Example
What about Helium voice? … hKp://www.phys.unsw.edu.au/jw/speechmodel.html
CDSIM UPF – May 2014
Spectrogram
• Time-‐frequency representaNon • Short-‐Nme windowing • Fast Fourier Transform (FFT) • Available tools:
– Sonic Visualizer (for music analysis) – Praat (for speech analysis)
• Other resources: – Live spectrogram: hKp://labrosa.ee.columbia.edu/expo/
CDSIM UPF – May 2014
Window size
• Understanding Time-‐Frequency resoluNon – Long windows: good freq resoluNon – Short windows: good temporal resoluNon
CDSIM UPF – May 2014
Observing test signals
• Two near tones • Noise burst • Chirp • Pure tones • Harmonic richness (square/saw) • Low tone SonicVisualizer h.p://mtg.upf.edu/~jjaner/teaching/CDSIM2014/Test_various_signals.wav
CDSIM UPF – May 2014
ApplicaNons of spectral processing
technologies for the synthesis of sound and music
technologies for the analysis of sound and music
technologies for the transforma9on of sound and music
CDSIM UPF – May 2014
Transforming signals
• Approaches for spectral transformaNons: – SMS: hKp://mtg.upf.edu/sms – Phase Vocoder
• Basic transformaNons – Pitch transposiNon – Harmonic/noise decomposiNon – Time-‐stretching
(Matlab internal MTG sosware)
CDSIM UPF – May 2014
Transforming signals
• Basic transformaNons – Original
– Pitch transposiNon
– Harmonic/noise decomposiNon
– Time-‐stretching (50x)
CDSIM UPF – May 2014
TransformaNon • Time scaling
– DetecNon of transients – RepeNNon/Removal of spectral frames – Demo: Fast Remixing
• Original fast Nme-‐varying remix
• Swing detecNon – Tempo detecNon at 8th note level – Change swing factor – Demo: video
CDSIM UPF – May 2014
Synthesis
• Sample-‐based (Violin) – Gesture modelling to provide a more realisNc synthesis
• Voice-‐driven synthesis – Voice analysis is used to control the synthesis of a violin sound
The objecNve
• Music is distributed as mixdowns in various formats • Users aim to further manipulate music signals in mulNple applicaNon
contexts (karaoke, soloing, remixing, etc.)
* from mulNtrack originals
The problem
• Music signals are complex • Variety of music styles and instrumentaNons • Modern producNon techniques go beyond linear combinaNon of recorded
acousNc sources – (FX’s, digital synth, etc.)
ExisNng generic SS approaches: • Spectral subtrac9on
– IntuiNve – Well-‐studied (industrial interest) – Good for speech/staNonary noise reducNon – Less appropriate for music signals
Background I
Background II
ExisNng music-‐specific approaches I: • Pan-‐frequency masks
o Assumes non-‐overlapping signals in Nme-‐frequency bins o Stereo signals are required o Amplitude raNo between L and R FFT bins o 2D user interface
• Examples o Good for simple excerpts o Bad for complex mixes
* Loses brightness, vocals less reduced due to reverb, flute is also removed,.,…
ExisNng music-‐specific approaches II: • Non-‐nega9ve Matrix Factoriza9on (NMF)
– Magnitude spectrogram (non-‐negaNve) – DecomposiNon as matrix product – W (spectral basis) and H (gain acIvaIons over Ime) – Spectrum frame explained as linear combinaNon of R basis. – MinimizaNon problem that finds W and H: min(D (V, WH))
Background III
W
H
V
• Non-‐nega9ve Matrix Factoriza9on I • 3 spectral basis W
NMF details
1 overlapping note
H: acIvaIon gains
• Non-‐nega9ve Matrix Factoriza9on I • 3 spectral basis W
NMF details
2 overlapping notes
H: acIvaIon gains
NMF challenges
• Predominant instrument separaNon – (pitch/Nmbre analysis)
• Completeness of instrument removal – (aKack/sustain, residual/breathing noise, unvoiced consonants,…)
• Percussive instruments separaNon – (Transient detecNon, wideband spectrum)
• Polyphonic instrument separaNon – (blind and score-‐informed)
• “Music print” decomposiNon: – song containing a region without target (e.g. vocals), – basis model learnt from the user-‐selected “music-‐print”
Music print (without vocals)
Region with vocals
Vocals/Background separaNon
• “Music print” decomposiNon: – Demos:
Basis decomposiNon W·∙H Wbgd
Background excerpt
Basis decomposiNon [Wbgd,Wother]·∙[Hbgd,Hoth
er]
Input
Wiener filtering (Wbgd,Hbgd)/(W·∙M)
output mute
original mute
Vocals/Background separaNon
• “Music print” decomposiNon: – Demos:
Basis decomposiNon W·∙H Wbgd
Background excerpt
Basis decomposiNon [Wbgd,Wother]·∙[Hbgd,Hoth
er]
Input
Wiener filtering (Wother,Hother)/(W·∙M)
output solo
original solo
Vocals/Background separaNon
• “Music print” decomposiNon: – not always possible…
• accompaniment (music print) changes throughout the song • target always present in some secNons
Vocals/Background separaNon
• Solu9on à Predominant Pitch detec9on – e.g MELODIA (J. Salomon, MTG)
• SeparaNon à Binary mask from pitch informaNon – Simplest approach – Nme-‐frequency mask 1’s at harmonic posiNons, 0’s rest – Can be combined with pan-‐frequency mask
• Demos • Voice is properly removed/aKenuated • Bass guitar is “comb-‐filtered”, and horns aKenuated • Soloing produces more arNfacts
original mute solo
Vocals/Background separaNon
Advanced separa9on approaches Special treatment for vocals: source / filter models
Breathiness residual (noise added on formant envelope) Demos: Solo version
without residual Solo version with residual
Original
Vocals/Background removal
Advanced separa9on approaches Special treatment for vocals
Breathiness residual (noise added on formant envelope) Unvoiced FricaIve modelling /s/, /f/, /sh/,…
• supervised basis from solo phoneme recordings o Demos: Solo version
/s/ are missing Solo version /s/ are present
Original
Spectrogram of the fricaNve recording used to train the spectral basis.
Vocals/Background removal
Piano decomposiNon/retouch
• Using instrument-‐specific NMF dicNonaries – Piano model of 88 notes (W matrix is pre-‐learned).
• Retouch use-‐case: – Amateur recording with errors. – The user can select and correct individual notes aser decomposiNon/
separaNon.
Original (played with errors)
Separated notes
Corrected remix
Original (ref)
• Mul9ple sources in an orchestral recording • Score data is used to iniNalize acNvaNons matrix H
Score-‐informed separaNon
• Video Demo: • Isolated instruments: violin, cello, oboe, bassoon, flute
Other potenNal applicaNons
• Singer replacement – Original Vocals mute Vocaloid Clara Vocaloid Clara Mix
• Drums enhancement – Original Drums+6dB Drums-‐6dB
• Step-‐remixer for drums – user-‐supervised transients (onsets Nme and instrument) – Original All Drums Single Instrument
Other potenNal applicaNons (piano)
• Mono-‐to-‐stereo upmixing • Input
– Mozart K331 recording (RWC dataset)
• Output – Upmixing from Mono
• les/right hands are panned in stereo
Other potenNal applicaNons (piano)
• Automa9c accompaniment • Input
– Mozart K331 recording (RWC dataset)
• Output • automaNc object detecNon • String ensemble resynthesis
synth solo (Kontakt)
mixture