Music Processing with MPEG-7 Low Level Audio …Part 2 Music Processing with MPEG-7 Low Level Audio Descriptors Dr. Michael Casey Centre for Computational Creativity Department of

Part 2Music Processing with MPEG-7

Low Level Audio Descriptors

Dr. Michael CaseyCentre for Computational Creativity

Department of ComputingCity University, London

MPEG-7 Software Tools• ISO 15938-6 (Reference Software C++)

» http://www.lis.ei.tum.de/research/bv/topics/mmdb/e_mpeg7.html

• Audio Only Reference Software (Matlab)» http://ccc.soi.city.ac.uk/mpeg7 (City University Mirror)

Audio DescriptionsHeader

Audio Descriptions

Segments

Audio Descriptions

Descriptor

Containment Hierarchy for Audio Descriptors

SeriesOfScalarType

AudioSegmentType

AudioDSType

AudioLLDScalarType

AudioDType

SeriesOfVectorType

AudioLLDVectorType

ScalableSeriesType

Audio LLD DataTypes

Some Useful Descriptors for Music Processing

• AudioSpectrumEnvelopeD

• AudioSpectrumBasisD

• AudioSpectrumProjectionD

• SoundModelDS

• SoundModelStatePathD

• SoundModelStateHistogramD

Other Useful Descriptors for Music Processing

• AudioSpectrumFlatnessD

• AudioHarmonicityD

• AudioSpectrumCentroidD

AudioSpectrumEnvelopeD

• Log frequency scale spectral power coefficients• Total power preserved across logarithmic bands

62.5 1K 16000

1 coeff 8 coeffs 1 coeff

total power

within-bandbelow-band above- band


[AudioSpectrumEnvelope, attributegrp, map, XMLFile] = AudioSpectrumEnvelopeType(audioFile,hopSize,attributegrp,writeXML,XMLFile,map)

This function determines an AudioSpectrumEnvelopeand also returns the map from linear to log bands.

% EXAMPLE 1: AudioSpectrumEnvelopeD extractionag.octaveResolution='1/4';ag.loEdge=62.5;ag.hiEdge=8000;hopSize='PT10N1000F';fname='e:\Beatles\1\000100.wav';

[ASE,ag]=AudioSpectrumEnvelopeD(fname,hopSize,ag,1,'ase.xml');


. . .


AudioSpectrumBasisD

SVD / ICABasis Rotation

AudioSpectrumProjectionD

AudioSpectrumBasisD

AudioSpectrumBasisDAudioSpectrumBasisType -independent components of a spectrum matrix

[V,env]=AudioSpectrumBasis(X, k, DDL_FLAG)

Inputs:X - spectrum data matrix ( t x n, t=time points, n=spectral channels)k - number of components to extractDDL_FLAG - 1=write XML output. [0]

OutputsV - n x k matrix of basis functionsenv - L2-norm envelope of log Spectrogram data (required for MPEG7)

% EXAMPLE2: AudioSpectrumBasisD[ASB,env]=AudioSpectrumBasisD(ASE,10,'asb.xml');

AudioSpectrumBasisD

AudioSpectrumBasisD

AudioSpectrumBasisD: Block Form

AudioSpectrumProjectionDAudioSpectrumBasisD

SVD / ICABasis Rotation



[P,maxenv] = AudioSpectrumProjectionD(X, V, XML)

InputsX = t x n matrix containing AudioSpectrumEnvelopeD values:

t=timepoints,n=frequency binsV = n x k matrix containing AudioSpectrumBasisD values

n=frequency bins, k=basis functions

DDL_FLAG XML file name [optional]

Output

P = t x (1 + k) matrix where each row contains 1 x L2-norm envelopecoefficient and k x spectral projection coefficients.

% EXAMPLE3: AudioSpectrumProjectionD extraction[ASP,maxEnv]=AudioSpectrumProjectionD(ASE,ASB,'asp.xml');


IndependentSpectrum Basis Features

Time Function

Reconstruction

1 Component

Spectral Feature

High Channel Spectrogram

Basis Reduction 4 Components

10 Components

Outer Product Spectrum ReconstructionIndividual Basis Component

4 Component Reconstruction

10 Component Reconstruction

Music Unmixing

• Linear basis projection using SVD and ICA• spectrum subspace separation • fast computation of subspace ICA• full-rate filterbank masking

• Blocked ICA functions• subspace reconstruction Y = XVV• cluster subspaces to identify “tracks”• sum masked filterbank output to create audio

+j j j

Music Unmixing Example 1

dB

Drum Mixture




Music Unmixing Example 2(Pink Floyd: stereo -> 9 subspace tracks)

SoundModelDS

Sound Model DSand related descriptors

1 3 3 2 2 3 4 4 4 4 .

1

2 34

ContinuousHiddenMarkovModelDS

AudioSpectrumBasisD

T(i,j)

x



SoundModelStatePathD

SoundModelDS - Bayesean inference of HMM parameters from training data

Y = SoundModelDS(TrainingDataListFile, nS, nB [,OPTIONAL ARGUMENTS...])

INPUTS:TrainingDataList - filename of training data list: WAV file names (one per line).

nS - number of states in hidden Markov model [10]nB - number of basis components to extract [10]

The following variables are optional, and are specified using['parameter', value pairs] on the command line.

'hopSize' 'PT10N1000F' - AudioSpectrumEnvelopeD hopSize'loEdge' 62.5, - AudioSpectrumEnvelopeD low Hz'hiEdge' 16000, - AudioSpectrumEnvelopeD high Hz'octaveResolution' '1/8' - AudioSpectrumEnvelopeD resolution'sequenceHopSize' '', - HMM data window hop [whole file]'sequenceFrameLength' '' - HMM data window length [whole file]'outputFile' '' - Filename for Model output [stem+mp7.xml]'soundName' '' - Model identifier name

OUTPUTS:

outputFile.dat = matlab struct Y.{T,S,M,C,X,maxenv,V,p}

T - state transition matrixS - initial state probability vectorM - stacked means matrix (1 vector per row)C - stacked inverse covariancesV - AudioSpectrumBasis vectors

maxenv- scaling parameter for model decodingp - training cycle likelihoods

outputFile.mp7 = XML file containing MPEG-7 SoundModel description scheme

SoundModelDS - Bayesean inference of HMM parameters from training data

Y = SoundModelDS(TrainingDataListFile, nS, nB [,OPTIONAL ARGUMENTS...])

INPUTS:TrainingDataList - filename of training data list: WAV file names (one per line).

nS - number of states in hidden Markov model [10]nB - number of basis components to extract [10]

The following variables are optional, and are specified using['parameter', value pairs] on the command line.

'hopSize' 'PT10N1000F' - AudioSpectrumEnvelopeD hopSize'loEdge' 62.5, - AudioSpectrumEnvelopeD low Hz'hiEdge' 16000, - AudioSpectrumEnvelopeD high Hz'octaveResolution' '1/8' - AudioSpectrumEnvelopeD resolution'sequenceHopSize' '', - HMM data window hop [whole file]'sequenceFrameLength' '' - HMM data window length [whole file]'outputFile' '' - Filename for Model output [stem+mp7.xml]'soundName' '' - Model identifier name

OUTPUTS:

outputFile.dat = matlab struct Y.{T,S,M,C,X,maxenv,V,p}

T - state transition matrixS - initial state probability vectorM - stacked means matrix (1 vector per row)C - stacked inverse covariancesV - AudioSpectrumBasis vectors

maxenv- scaling parameter for model decodingp - training cycle likelihoods

outputFile.mp7 = XML file containing MPEG-7 SoundModel description scheme

Process Small Chunks= Local Dynamics Model

SoundModelDS

SoundModelDS

SoundModelStatePathDA simplified representation of spectral dynamics

State Path

SoundModelStatePathD[Path,loglike]=SoundModelStatePathD(soundfilename, arg2 [,OPTIONAL ARGS])

Compute HMM State Path and log likelihood of sequence data

Inputs:soundfilename - filename of input sound (.wav or .au)arg2 - SoundModelDS structure or filename of binary SoundModelDS instance

(.mat)

The following variables are optional, and are specified using'parameter' value pairs on the command line.

'hopSize' 'PT10N1000F''loEdge' 62.5, 'hiEdge' 16000, 'octaveResolution' '1/8''sequenceHopSize' '','sequenceFrameLength' ''

% EXAMPLE 5: SoundModelStatePathD extraction[Path,ll]=SoundModelStatePathD(fname,Y,'octaveResolution','1/4','hiEdge',8000);



seconds

0.01s Frames

state index

state index

BEATLES: A Hard Day’s Night

SoundModelStateHistogramDSoundModelStateHistogramD(Path, Nstates, [segSkip], [segLen])

Extract normalized segmental state-path histograms

Inputs:Path - SondModelStatePathD outputNstates - Number of states in SoundModel[segSkip] - hop size in samples[segLen] - histogram length in samples

Outputs:H - t x n matrix containing segmented state occupancy histograms

t=time points, n=states

% EXAMPLE 6: SoundModelStateHistogramD extractionH=SoundModelStateHistogramD(Path,10,100,1000);

SoundModelStateHistogramD

seconds

state index

state index

0.01s Frames

S-Matrix

• Similarity Function• Segmented Histograms are Unit Norm• Outer Product Computes Similarity Matrix

>>size(H)

ans =

137 10

>>S = H * H’ ; % Similarity Matrix>>imagesc(S);>>D = real(acos(S)); % Dissimilarity Matrix

S-Matrix

Sound Replacement and Audio Mosaics

• Find segments similar to target segment• Similarity Scores Computed between Histograms.• Cluster with k-means or pair-wise clustering.

• Replace with similar (but different) material• Segmentation boundaries (beat alignment)

• EXAMPLES

Acknowledgements• International Standards Organisation

• ISO/IEC JTC 1 SC29 WG11 (MPEG)

• Mitsubishi Electric Research Labs• Massachusetts Institute of Technology

• Music Mind Machine Group (formerly Machine Listening Group)

• Paris Smaragdis, Youngmoo Kim, Brian Whitman• Iroro Orife, John Hershey, Alex Westner, Kevin Wilson

• City University • Deparment of Computing• Centre for Computational Creativity

Documents

Music Processing with MPEG-7 Low Level Audio …Part 2 Music Processing with MPEG-7 Low Level Audio Descriptors Dr. Michael Casey Centre for Computational Creativity Department of