21
Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf ementation based on: Transactions on Multimedia, Vol. 7, No. 1, February 2005 A. Bartsch, Member, IEEE, and Gregory H. Wakefield, Member, IEEE

Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Embed Size (px)

Citation preview

Page 1: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Audio Thumbnailing of Popular Music Using Chroma-Based

Representations

Matt Williamson

Chris Scharf

Implementation based on:IEEE Transactions on Multimedia, Vol. 7, No. 1, February 2005Mark A. Bartsch, Member, IEEE, and Gregory H. Wakefield, Member, IEEE

Page 2: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Introduction

• Multimedia content is growing rapidly

• Efficient method of browsing is necessary

• Indexing and retrieval methods are media-dependent

Page 3: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Primary goal

• Minimize audition time for a given type of media

Page 4: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Current methods

• Images– Downsampling

• Produces a smaller version of image (thumbnail)• Reduces cost of delivery and display

Page 5: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Current methods

• Audio: speech– Symbolic representation

• Produces a transcript of the audio

Page 6: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

What about music?

• Adapt an existing method:– Downsampling (time compression)

• Results in highly distorted, unintelligible audio

Page 7: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

What about music?

• Adapt an existing method (cont’d):– Symbolic representation (score transcription)

• Extremely difficult• Results in essentially meaningless information• Does not convey other important elements:

– Vocal style– Instruments used– Processing effects used

Page 8: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Essential problem:

Adapting existing methods cannot reduce the audition time for music and effectively

convey the “gist” of the song

Page 9: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Possible Solution:

Audio thumbnailing via chroma-based analysis

Page 10: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Audio thumbnailing

• Produces a short clip of the selection to represent the “gist” of the song

Page 11: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Chroma-based analysis

• Based on the extraction of chroma features from the audio

• Chroma Feature Extraction Algorithm:– Frame Segmentation– Feature Calculation– Correlation Calculation– Correlation Filtering– Thumbnail Selection

Page 12: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Chroma Feature Extraction

• Extract frequencies from audio file• Calculate chroma values from frequencies:

• Categorize chroma values into pitch classes– 12 pitch classes: A, A#/Bb, C, C#/Db, …, G#/Ab

ffc 22 loglog

Page 13: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Frame Segmentation

• Author’s Implementation:– Determined via beat tracking algorithm– Range: 0.25s to 0.56s

• Our Implementation:– Average of range: 0.41s

Page 14: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Feature Calculation

• Calculate 12-element chroma feature vector, vt for each frame:– Apply FFT to each frequency:

– Constraints:• Minimum frequency: 20 Hz

– Lower limit of human hearing

• Maximum frequency: 2000 Hz– Higher frequencies effect the perception of chroma

}11...0{,)(

,

kN

nFv

kSn k

tkt

Page 15: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Correlation Calculation

• Calculate similarity matrix C– Each element is equal to the correlation between two

feature vectors:

– High correlation along diagonals in the matrix indicate repetitions within the song

jTiji vvC ,

Page 16: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Correlation Filtering

• Calculate the filtered time-lag matrix T:– Exposes similarity between extended segments that

are separated by constant lag– Filtering is performed along the diagonals of C

• Uses a symmetric rectangular windowing function (a uniform moving average filter)

– T is then “rotated” so that the diagonals are oriented vertically

k

kjikiji kwCT )(,,

Page 17: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Thumbnail Selection

• Select maximum value in T– The location of this value indicates:

• Occurrence of the segment (the y-coordinate)• Lag time (the x-coordinate)

– Constraints:• Minimum lag time = 1/10 of song length• Maximum start time = 3/4 of song length

– To reduce susceptibility to “fading repeat”

Page 18: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Results

• Jimmy Buffet – “Math Sucks”– System: [64, 89]

• Lifehouse – “You and Me”– System: [38, 63]

• Gavin DeGraw – “I Don’t Want To Be”– System: [95, 120]

• Super Mario Brothers Theme– System: [18, 43]

Page 19: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Conclusion

• Successfully extracted time segments which closely match the chorus of the song

• Feature Calculation issue:– Author’s implementation unclear

Page 20: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Possible Uses

• Audio domain:– Improved search capability

• Searching for similar songs

– Audio fingerprinting

• Other domains:– Detection of irregular heartbeats

Page 21: Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Suggested Improvements and Alternatives

• Image-based analysis on the waveform

• Tested alternatives– MSE on signal frequencies

• Chroma-based analysis proved more correct