68
Sound Applications Advanced Multimedia Tamara Berg

Sound Applications Advanced Multimedia Tamara Berg

Embed Size (px)

Citation preview

Page 1: Sound Applications Advanced Multimedia Tamara Berg

Sound Applications

Advanced MultimediaTamara Berg

Page 2: Sound Applications Advanced Multimedia Tamara Berg

Reminder

• HW2 due March 13, 11:59pm • Questions?

Page 3: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 4: Sound Applications Advanced Multimedia Tamara Berg

Audio Indexing and Retrieval

• Features for representing audio:– Metadata– low level features – high level audio features

• Example usage cases:Audio classificationMusic retrieval

Page 5: Sound Applications Advanced Multimedia Tamara Berg

Content Based Music Retrieval

Extract music descriptions from a database of music documents.

Extract music description from a query music document.

Compute similarity between query and database descriptions.

Retrieve similar music documents to query.

Casey et al IEEE 2008

Page 6: Sound Applications Advanced Multimedia Tamara Berg

MIR tasks

H: high level specificity – match specific instances of audio content.

M: mid-level specificity – match high level audio features like melody, but do not match audio content.

L: low specificity – match global (statistical) properties of the query

Different usage cases require different descriptions and matching schema.

Casey et al IEEE 2008

Page 7: Sound Applications Advanced Multimedia Tamara Berg

Metadata

• Most common method of accessing music• Can be rich and expressive• When catalogues become very large, difficult

to maintain consistent metadata

Useful for low specificity queries

Casey et al IEEE 2008

Page 8: Sound Applications Advanced Multimedia Tamara Berg

Metadata• Pandora.com – Uses metadata to estimate artist

similarity and track similarity and creates personalized radio stations. Experts entered metadata of musical-cultural properties (20-30 minutes per track of an expert’s time – 50 person-years for 1 million tracks).

• Crowd sourced metadata repositories (gracenote, musicbrainz). Factual metadata (artist, album, year, title, duration). Cultural metadata (mood, emotion, genre, style).

• Automatic metadata methods – generate descriptions from community metadata automatically. Language analysis to associate noun and verb phrases with musical features (Whitman & Rifkin).

Casey et al IEEE 2008

Page 9: Sound Applications Advanced Multimedia Tamara Berg

Content features• Low level or high level• Want features to be robust to certain changes in

the audio signal– Noise– Volume– Sampling

• High level features will be more robust to changes, low level features will be less robust.

• Low level features will be easy to compute, high level difficult

Page 10: Sound Applications Advanced Multimedia Tamara Berg

Content features• Low level or high level• Want features to be robust to certain changes in

the audio signal– Noise– Volume– Sampling

• High level features will be more robust to changes, low level features will be less robust.

• Low level features will be easy to compute, high level difficult

Page 11: Sound Applications Advanced Multimedia Tamara Berg

Content features• Low level or high level• Want features to be robust to certain changes in

the audio signal– Noise– Volume– Sampling

• High level features will be more robust to changes, low level features will be less robust.

• Low level features will be easy to compute, high level difficult

Page 12: Sound Applications Advanced Multimedia Tamara Berg

Low level audio features

• Low level measurements of audio signal that contain information about a musical work.

• Can be computed periodically (10-1000 ms intervals) or beat synchronous.

Casey et al IEEE 2008

In text analysis we had words, here we have to come up with our own set of features to compute from audio signal!

Page 13: Sound Applications Advanced Multimedia Tamara Berg

Example Low-Level Audio Features

Howard Leung

Page 14: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 15: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Average number of times signal crosses zero amplitude value.

Page 16: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Average number of times signal crosses zero amplitude value.

Page 17: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Average number of times signal crosses zero amplitude value.

1 if trueO o.w.

Page 18: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 19: Sound Applications Advanced Multimedia Tamara Berg

Example Low-Level Audio Features

Howard Leung

Page 20: Sound Applications Advanced Multimedia Tamara Berg

Frequency Domain Reminder

How much of each describes the frequency spectrum of a signal.Li & Drew

Signals can be decomposed into a weighted sum of sinusoids

Page 21: Sound Applications Advanced Multimedia Tamara Berg

Frequency domain features

• How do we get to frequency domain?

Time Frequency

Page 22: Sound Applications Advanced Multimedia Tamara Berg

DFTDiscrete Fourier Transform (DFT) of the audio

Converts to a frequency representation

DFT analysis occurs in terms of number of equallyspaced ‘bins’

Each bin represents a particular frequency rangeDFT analysis gives the amount of energy in the audio signalthat is present within the frequency range for each bin

Inverse Discrete Fourier Transform (IDFT)Converts from frequency representation back to audio signal.

Page 23: Sound Applications Advanced Multimedia Tamara Berg

DFTDiscrete Fourier Transform (DFT) of the audio

Converts to a frequency representation

DFT analysis occurs in terms of number of equallyspaced ‘bins’

Each bin represents a particular frequency rangeDFT analysis gives the amount of energy in the audio signalthat is present within the frequency range for each bin

Inverse Discrete Fourier Transform (IDFT)Converts from frequency representation back to audio signal.

Page 24: Sound Applications Advanced Multimedia Tamara Berg

DFTDiscrete Fourier Transform (DFT) of the audio

Converts to a frequency representation

DFT analysis occurs in terms of number of equallyspaced ‘bins’

Each bin represents a particular frequency rangeDFT analysis gives the amount of energy in the audio signalthat is present within the frequency range for each bin

Inverse Discrete Fourier Transform (IDFT)Converts from frequency representation back to audio signal.

Page 25: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 26: Sound Applications Advanced Multimedia Tamara Berg

FilteringRemoves frequency components from some

part of the spectrum Low pass filter – removes high frequency

components from input and leaves only low in the output signal.

High pass filter – removes low frequency components from input and leaves only high in the output signal.

Band pass filter – removes some part of the frequency spectrum.

Page 27: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

Compute FT spectrum of input.

Zero out the part of the frequency spectrum that you want to filter out.

Compute the IFT of this modified spectrum -> output will be input with some frequency components removed.

Page 28: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?f = input

Page 29: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?f = input

FT(f)

Page 30: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

1

0

.*

f = input FT(f)

Page 31: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

1

0

.*

f = input FT(f)

Zero out some freq components

Page 32: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

1

0

.*

=

f = input FT(f)

Zero out some freq components

x xxxxxxxxxxxxx

Page 33: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

1

0

.*

=

f = input FT(f)

Zero out some freq components IFT

o = Frequency limited output

x xxxxxxxxxxxxx

Page 34: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

1

0

.*

=

f = input FT(f)

Zero out some freq components IFT

o = Frequency limited output

x xxxxxxxxxxxxx

What kind of filter is this?

Page 35: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?f = input

Page 36: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?f = input

FT(f)

Page 37: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

1

0

.*

f = input FT(f)

Page 38: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

1

0

.*

f = input FT(f)

Zero out some freq components

Page 39: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

1

0

.*

=

f = input FT(f)

Zero out some freq components

xxxxxxxxxxxxx

Page 40: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

1

0

.*

=

f = input FT(f)

Zero out some freq components IFT

o = Frequency limited output

xxxxxxxxxxxxx

Page 41: Sound Applications Advanced Multimedia Tamara Berg

How could you do this using the FT and IFT?

1

0

.*

=

f = input FT(f)

Zero out some freq components IFT

o = Frequency limited output

xxxxxxxxxxxxx

What kind of filter is this?

Page 42: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 43: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 44: Sound Applications Advanced Multimedia Tamara Berg

Frequency Domain Reminder

How much of each describes the frequency spectrum of a signal.Li & Drew

Signals can be decomposed into a weighted sum of sinusoids

Page 45: Sound Applications Advanced Multimedia Tamara Berg

Pitch-Class Profile (PCP)

• Represent the energy due to each pitch class • Integrates the energy in all octaves into a single band• There are 12 equally spaced pitch classes in western tonal

music. So, typically 12 bands in the PCP.

Page 46: Sound Applications Advanced Multimedia Tamara Berg

Pitch-Class Profile (PCP)

• Represent the energy due to each pitch class • Integrates the energy in all octaves into a single band• There are 12 equally spaced pitch classes in western tonal

music. So, typically 12 bands in the PCP.

How might we calculate this using the DFT?

Page 47: Sound Applications Advanced Multimedia Tamara Berg

High level music featuresHigh level intuitive information about a piece of music (melody, harmony etc).

“It is melody that enables us to distinguish one work from another. It is melody that human beings are innately able to reproduce by singing, humming, andwhistling. It is melody that makes music memorable: we are likely to recall a tune long after we have forgotten its text.”

-Selfridge-Field

Intuitive features, but hard to extract and ongoing areas of research.

Casey et al IEEE 2008

Page 48: Sound Applications Advanced Multimedia Tamara Berg

Melody & Bass Estimation

• Melody and bass lines represented as continuous temporal trajectory of fundamental frequency, F0, (a series of musical notes).

• PreFEst (Goto 1999) – Estimate the F0 trajectory in mid-high freq range of

input -> melody. – Estimate the F0 trajectory in low freq range-> bass.

Casey et al IEEE 2008

Page 49: Sound Applications Advanced Multimedia Tamara Berg

Chord Recognition

Recognize chord progressions based on:- Estimated PCPs- Statistics of transitions between PCPs

Casey et al IEEE 2008

Page 50: Sound Applications Advanced Multimedia Tamara Berg

Chord Recognition

Page 51: Sound Applications Advanced Multimedia Tamara Berg

Chord Recognition

Page 52: Sound Applications Advanced Multimedia Tamara Berg

Music as vector of features

• Once again we represent (music) documents as a vector of numbers – Each entry (or set of entries) in this vector is a different

feature

Page 53: Sound Applications Advanced Multimedia Tamara Berg

Music as vector of features

• Once again we represent (music) documents as a vector of numbers – Each entry (or set of entries) in this vector is a different

feature

• To retrieve music documents given a query we can:– Find exact matches– Find nearest match– Find nearby matches– Train a classifier to recognize a given category (genre, style

etc).

Page 54: Sound Applications Advanced Multimedia Tamara Berg

Audio Similarity

We have a description of a music document based on some set of features, now how do we compare two descriptions?

Casey et al IEEE 2008

Page 55: Sound Applications Advanced Multimedia Tamara Berg

Usage examples

Page 56: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 57: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 58: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 59: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 60: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 61: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 62: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung

Page 63: Sound Applications Advanced Multimedia Tamara Berg

Query by humming• Requires robustness to variation because

matches will not be exact• Extract melody from dataset of songs• Extract melody from hum• Match by comparing similarities of melodies

(nearby matches)

Page 64: Sound Applications Advanced Multimedia Tamara Berg

Copyright monitoring

• Compute fingerprints from database examples• Compute fingerprint from query example• Find exact matches

Page 65: Sound Applications Advanced Multimedia Tamara Berg

Best performing systems on MIREX 2007

Casey et al IEEE 2008

Page 66: Sound Applications Advanced Multimedia Tamara Berg

Music BrowsingMusicream – UI for discovering and managing musical pieces.

User can select a disc and listen to it. By dragging a disc in the flow, the user can easily pick out other similar pieces (attach similardiscs). This interaction allows a user to unexpectedly comeacross various pieces similar to other pieces the user likes.

Link to demo

Casey et al IEEE 2008

Page 67: Sound Applications Advanced Multimedia Tamara Berg

Music Browsing

Musicrainbow – UI for discovering unknown artists.

Artists are mapped on a circular rainbow where colors represent different styles of music. Similar artists are mapped near each other.

User rotates rainbow by turning a knob.

Link to demo

Casey et al IEEE 2008

Page 68: Sound Applications Advanced Multimedia Tamara Berg

Howard Leung