Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Part 2Music Processing with MPEG-7
Low Level Audio Descriptors
Dr. Michael CaseyCentre for Computational Creativity
Department of ComputingCity University, London
MPEG-7 Software Tools• ISO 15938-6 (Reference Software C++)
» http://www.lis.ei.tum.de/research/bv/topics/mmdb/e_mpeg7.html
• Audio Only Reference Software (Matlab)» http://ccc.soi.city.ac.uk/mpeg7 (City University Mirror)
Audio DescriptionsHeader
Audio Descriptions
Segments
Audio Descriptions
Descriptor
Containment Hierarchy for Audio Descriptors
SeriesOfScalarType
AudioSegmentType
AudioDSType
AudioLLDScalarType
AudioDType
SeriesOfVectorType
AudioLLDVectorType
ScalableSeriesType
Audio LLD DataTypes
Some Useful Descriptors for Music Processing
• AudioSpectrumEnvelopeD
• AudioSpectrumBasisD
• AudioSpectrumProjectionD
• SoundModelDS
• SoundModelStatePathD
• SoundModelStateHistogramD
Other Useful Descriptors for Music Processing
• AudioSpectrumFlatnessD
• AudioHarmonicityD
• AudioSpectrumCentroidD
AudioSpectrumEnvelopeD
• Log frequency scale spectral power coefficients• Total power preserved across logarithmic bands
62.5 1K 16000
1 coeff 8 coeffs 1 coeff
total power
within-bandbelow-band above- band
AudioSpectrumEnvelopeD
[AudioSpectrumEnvelope, attributegrp, map, XMLFile] = AudioSpectrumEnvelopeType(audioFile,hopSize,attributegrp,writeXML,XMLFile,map)
This function determines an AudioSpectrumEnvelopeand also returns the map from linear to log bands.
% EXAMPLE 1: AudioSpectrumEnvelopeD extractionag.octaveResolution='1/4';ag.loEdge=62.5;ag.hiEdge=8000;hopSize='PT10N1000F';fname='e:\Beatles\1\000100.wav';
[ASE,ag]=AudioSpectrumEnvelopeD(fname,hopSize,ag,1,'ase.xml');
AudioSpectrumEnvelopeD
. . .
AudioSpectrumEnvelopeD
AudioSpectrumBasisD
SVD / ICABasis Rotation
AudioSpectrumProjectionD
AudioSpectrumBasisD
AudioSpectrumBasisDAudioSpectrumBasisType -independent components of a spectrum matrix
[V,env]=AudioSpectrumBasis(X, k, DDL_FLAG)
Inputs:X - spectrum data matrix ( t x n, t=time points, n=spectral channels)k - number of components to extractDDL_FLAG - 1=write XML output. [0]
OutputsV - n x k matrix of basis functionsenv - L2-norm envelope of log Spectrogram data (required for MPEG7)
% EXAMPLE2: AudioSpectrumBasisD[ASB,env]=AudioSpectrumBasisD(ASE,10,'asb.xml');
AudioSpectrumBasisD
AudioSpectrumBasisD
AudioSpectrumBasisD: Block Form
AudioSpectrumProjectionDAudioSpectrumBasisD
SVD / ICABasis Rotation
AudioSpectrumProjectionD
AudioSpectrumProjectionD
[P,maxenv] = AudioSpectrumProjectionD(X, V, XML)
InputsX = t x n matrix containing AudioSpectrumEnvelopeD values:
t=timepoints,n=frequency binsV = n x k matrix containing AudioSpectrumBasisD values
n=frequency bins, k=basis functions
DDL_FLAG XML file name [optional]
Output
P = t x (1 + k) matrix where each row contains 1 x L2-norm envelopecoefficient and k x spectral projection coefficients.
% EXAMPLE3: AudioSpectrumProjectionD extraction[ASP,maxEnv]=AudioSpectrumProjectionD(ASE,ASB,'asp.xml');
AudioSpectrumProjectionD
IndependentSpectrum Basis Features
Time Function
Reconstruction
1 Component
Spectral Feature
High Channel Spectrogram
Basis Reduction 4 Components
10 Components
Outer Product Spectrum ReconstructionIndividual Basis Component
4 Component Reconstruction
10 Component Reconstruction
Music Unmixing
• Linear basis projection using SVD and ICA• spectrum subspace separation • fast computation of subspace ICA• full-rate filterbank masking
• Blocked ICA functions• subspace reconstruction Y = XVV• cluster subspaces to identify “tracks”• sum masked filterbank output to create audio
+j j j
Music Unmixing Example 1
dB
Drum Mixture
Music Unmixing Example 1
Music Unmixing Example 1
Music Unmixing Example 1
Music Unmixing Example 2(Pink Floyd: stereo -> 9 subspace tracks)
SoundModelDS
Sound Model DSand related descriptors
1 3 3 2 2 3 4 4 4 4 .
1
2 34
ContinuousHiddenMarkovModelDS
AudioSpectrumBasisD
T(i,j)
x
AudioSpectrumEnvelopeD
AudioSpectrumProjectionD
SoundModelStatePathD
SoundModelDS - Bayesean inference of HMM parameters from training data
Y = SoundModelDS(TrainingDataListFile, nS, nB [,OPTIONAL ARGUMENTS...])
INPUTS:TrainingDataList - filename of training data list: WAV file names (one per line).
nS - number of states in hidden Markov model [10]nB - number of basis components to extract [10]
The following variables are optional, and are specified using['parameter', value pairs] on the command line.
'hopSize' 'PT10N1000F' - AudioSpectrumEnvelopeD hopSize'loEdge' 62.5, - AudioSpectrumEnvelopeD low Hz'hiEdge' 16000, - AudioSpectrumEnvelopeD high Hz'octaveResolution' '1/8' - AudioSpectrumEnvelopeD resolution'sequenceHopSize' '', - HMM data window hop [whole file]'sequenceFrameLength' '' - HMM data window length [whole file]'outputFile' '' - Filename for Model output [stem+mp7.xml]'soundName' '' - Model identifier name
OUTPUTS:
outputFile.dat = matlab struct Y.{T,S,M,C,X,maxenv,V,p}
T - state transition matrixS - initial state probability vectorM - stacked means matrix (1 vector per row)C - stacked inverse covariancesV - AudioSpectrumBasis vectors
maxenv- scaling parameter for model decodingp - training cycle likelihoods
outputFile.mp7 = XML file containing MPEG-7 SoundModel description scheme
SoundModelDS - Bayesean inference of HMM parameters from training data
Y = SoundModelDS(TrainingDataListFile, nS, nB [,OPTIONAL ARGUMENTS...])
INPUTS:TrainingDataList - filename of training data list: WAV file names (one per line).
nS - number of states in hidden Markov model [10]nB - number of basis components to extract [10]
The following variables are optional, and are specified using['parameter', value pairs] on the command line.
'hopSize' 'PT10N1000F' - AudioSpectrumEnvelopeD hopSize'loEdge' 62.5, - AudioSpectrumEnvelopeD low Hz'hiEdge' 16000, - AudioSpectrumEnvelopeD high Hz'octaveResolution' '1/8' - AudioSpectrumEnvelopeD resolution'sequenceHopSize' '', - HMM data window hop [whole file]'sequenceFrameLength' '' - HMM data window length [whole file]'outputFile' '' - Filename for Model output [stem+mp7.xml]'soundName' '' - Model identifier name
OUTPUTS:
outputFile.dat = matlab struct Y.{T,S,M,C,X,maxenv,V,p}
T - state transition matrixS - initial state probability vectorM - stacked means matrix (1 vector per row)C - stacked inverse covariancesV - AudioSpectrumBasis vectors
maxenv- scaling parameter for model decodingp - training cycle likelihoods
outputFile.mp7 = XML file containing MPEG-7 SoundModel description scheme
Process Small Chunks= Local Dynamics Model
SoundModelDS
SoundModelDS
SoundModelStatePathDA simplified representation of spectral dynamics
State Path
SoundModelStatePathD[Path,loglike]=SoundModelStatePathD(soundfilename, arg2 [,OPTIONAL ARGS])
Compute HMM State Path and log likelihood of sequence data
Inputs:soundfilename - filename of input sound (.wav or .au)arg2 - SoundModelDS structure or filename of binary SoundModelDS instance
(.mat)
The following variables are optional, and are specified using'parameter' value pairs on the command line.
'hopSize' 'PT10N1000F''loEdge' 62.5, 'hiEdge' 16000, 'octaveResolution' '1/8''sequenceHopSize' '','sequenceFrameLength' ''
% EXAMPLE 5: SoundModelStatePathD extraction[Path,ll]=SoundModelStatePathD(fname,Y,'octaveResolution','1/4','hiEdge',8000);
SoundModelStatePathD
SoundModelStatePathD
seconds
0.01s Frames
state index
state index
BEATLES: A Hard Day’s Night
SoundModelStateHistogramDSoundModelStateHistogramD(Path, Nstates, [segSkip], [segLen])
Extract normalized segmental state-path histograms
Inputs:Path - SondModelStatePathD outputNstates - Number of states in SoundModel[segSkip] - hop size in samples[segLen] - histogram length in samples
Outputs:H - t x n matrix containing segmented state occupancy histograms
t=time points, n=states
% EXAMPLE 6: SoundModelStateHistogramD extractionH=SoundModelStateHistogramD(Path,10,100,1000);
SoundModelStateHistogramD
seconds
state index
state index
0.01s Frames
S-Matrix
• Similarity Function• Segmented Histograms are Unit Norm• Outer Product Computes Similarity Matrix
>>size(H)
ans =
137 10
>>S = H * H’ ; % Similarity Matrix>>imagesc(S);>>D = real(acos(S)); % Dissimilarity Matrix
S-Matrix
Sound Replacement and Audio Mosaics
• Find segments similar to target segment• Similarity Scores Computed between Histograms.• Cluster with k-means or pair-wise clustering.
• Replace with similar (but different) material• Segmentation boundaries (beat alignment)
• EXAMPLES
Acknowledgements• International Standards Organisation
• ISO/IEC JTC 1 SC29 WG11 (MPEG)
• Mitsubishi Electric Research Labs• Massachusetts Institute of Technology
• Music Mind Machine Group (formerly Machine Listening Group)
• Paris Smaragdis, Youngmoo Kim, Brian Whitman• Iroro Orife, John Hershey, Alex Westner, Kevin Wilson
• City University • Deparment of Computing• Centre for Computational Creativity