58
Wedneday, January 21st, 2008 Comp. Sci. Colloquium The Problem with Music: The Problem with Music: Modeling Distance Distributions of Modeling Distance Distributions of Large Music Collections Large Music Collections Prof. Michael Casey Prof. Michael Casey Program in Digital Musics Program in Digital Musics Dartmouth College, Hanover, NH Dartmouth College, Hanover, NH

Wedneday, January 21st, 2008 Comp. Sci. Colloquium The Problem with Music: Modeling Distance Distributions of Large Music Collections Prof. Michael Casey

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Wedneday, January 21st, 2008 Comp. Sci. Colloquium

The Problem with Music:The Problem with Music:

Modeling Distance Distributions of Modeling Distance Distributions of Large Music CollectionsLarge Music Collections

Prof. Michael CaseyProf. Michael Casey

Program in Digital MusicsProgram in Digital MusicsDartmouth College, Hanover, NHDartmouth College, Hanover, NH

a.k.a.a.k.a.The Problem with The Problem with

Multimedia:Multimedia:

MusicMusicMusic VideosMusic Videos

VideosVideosImagesImages

Scalable SimilarityScalable Similarity

8M tracks in commercial collection8M tracks in commercial collection

6B Images on WWW6B Images on WWW

Require scalable nearest-neighbor Require scalable nearest-neighbor methodsmethods

Increase scale, decrease search Increase scale, decrease search complexitycomplexity

Example: HattogateExample: Hattogate

Example: Remixing / Example: Remixing / Sampling in Yahoo! MusicSampling in Yahoo! Music

Original TrackOriginal Track Remix 1Remix 1 Remix 2Remix 2 Remix 3Remix 3

Example: 3B Images in Example: 3B Images in FlickrFlickr

SpecificitySpecificity

Partial document (sub-track) retrievalPartial document (sub-track) retrieval Alternate versions: remix, cover, live, Alternate versions: remix, cover, live,

album album Task is mid-high specificityTask is mid-high specificity

Machine ListeningMachine Listening

Feature ExtractionFeature Extraction

frame 2

frame 3

overlapframe 1

audiosource

20ms10ms 30ms 40ms

Feature ExtractionFeature Extraction

frame 2

frame 3

overlapframe 1

audiosource

20ms10ms 30ms 40ms

Feature ExtractionFeature Extraction

frame 2

frame 3

overlapframe 1

audiosource

20ms10ms 30ms 40ms

Feature ExtractionFeature Extraction

frame 2

frame 3

overlapframe 1

audiosource

20ms10ms 30ms 40ms

Feature ExtractionFeature Extraction

frame 2

frame 3

overlapframe 1

audiosource

20ms10ms 30ms 40ms

Feature ExtractionFeature Extraction

frame 2

frame 3

overlapframe 1

audiosource

20ms10ms 30ms 40ms

Audio ShinglesAudio Shingles

, concatenate l frames of m dimensional features

A shingle is defined as:

• Shingles provide contextual information about features • Originally used for Internet search engines:

•Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig: “Syntactic Clustering of the Web”. Computer Networks 29(8-13): 1157-1166 (1997)

•Related to N-grams, overlapping sequences of features• Applied to audio domain by Casey and Slaney :

•Casey, M.   Slaney, M.   “The Importance of Sequences in Musical Similarity”, in Proc.

IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006

Audio Shingle SimilarityAudio Shingle Similarity

Audio Shingle Similarity Audio Shingle Similarity

, a query shingle drawn from a query track {Q}

, database of audio tracks indexed by (n)

, a database shingle from track n

Shingles are normalized to unit vectors, therefore:

For shingles with M dimensions (M=l.m); m=12, 20; l=30,40

AudioDB: Shingle Nearest AudioDB: Shingle Nearest Neighbor SearchNeighbor Search

Whole-track similarityWhole-track similarity

Often want to know which tracks are Often want to know which tracks are similarsimilar

Similarity depends on specificity of Similarity depends on specificity of tasktask Distortion / filtering / re-encoding (high)Distortion / filtering / re-encoding (high) Remix with new audio material (mid)Remix with new audio material (mid) Cover song: same song, different artist Cover song: same song, different artist

(mid)(mid)

Whole-track resemblance:Whole-track resemblance:radius-bounded searchradius-bounded search

Compute the number of shingle collisions between two tracks:

Whole-track resemblance:Whole-track resemblance:radius-bounded searchradius-bounded search

Compute the number of shingle collisions between two tracks:

• Requires a threshold for considering shingles to be related• Need a way to estimate relatedness (threshold) for data set

SCALESCALE

Mazurkas: 10,000 tracks 10-100ms Mazurkas: 10,000 tracks 10-100ms featuresfeatures

3s clips (30 – 300 frames per vector)3s clips (30 – 300 frames per vector) 12d – 20d features (360 – 600d vectors)12d – 20d features (360 – 600d vectors)

Yahoo! MusicYahoo! Music 6M tracks6M tracks 1000 vectors per track1000 vectors per track (6M x 1k)^2 search for near neighbours (6M x 1k)^2 search for near neighbours

LSHLSH

Approximate Near Neighbor Approximate Near Neighbor MatchingMatching

Approximate Approximate nearnear neighborsneighbors

In many applications we need only In many applications we need only near near neghborsneghbors

We can exploit this by allowing a We can exploit this by allowing a degree of approximation in retrievaldegree of approximation in retrieval

Space partitioningSpace partitioning

Curse of dimensionalityCurse of dimensionality

d=4 d=8 d=1024

dist.

Pr(dist)

Border effects in high Border effects in high dd

ε-NN : approximate near ε-NN : approximate near neighborsneighbors

Setting the rangeSetting the range

HashingHashing

Types of hashesTypes of hashes String : put String : put BashBash vs vs Bush Bush in different in different

binsbins Locality sensitive : close matches in Locality sensitive : close matches in

same binsame bin High-dimensional and probabilisticHigh-dimensional and probabilistic

Nearest Neighbor implementationsNearest Neighbor implementations Pair-wise distance computationPair-wise distance computation

1,000,000,000,000 comparisons in 2M song 1,000,000,000,000 comparisons in 2M song databasedatabase

Hash bucket collisionsHash bucket collisions 1,000,000,000 hash projections1,000,000,000 hash projections

Exact matching via Exact matching via hashinghashing

Audio fingerprinting Audio fingerprinting Shazzam, etc.Shazzam, etc.

Make the feature robustMake the feature robust Use exact matching on integer hashUse exact matching on integer hash Find a sequence of hashes to identify Find a sequence of hashes to identify

specific recording or imagespecific recording or image Drawback: only exact matches Drawback: only exact matches

possiblepossible

Locality-Sensitive Hashing (Indyk-Motwani’98)

• Hash functions are locality-sensitive, if, for a random hash random function h, for any pair of points p,q we have:– Pr[h(p)=h(q)] is “high” if p is “close” to q– Pr[h(p)=h(q)] is “low” if p is”far” from q

Locality Sensitive Locality Sensitive HashingHashing

Random ProjectionsRandom Projections

Random Random projections projections estimate estimate distancedistance

Multiple Multiple projections projections improve improve estimateestimate

hh’s are locality-sensitive’s are locality-sensitive

Pr[h(p)=h(q)]=(1-D(p,q)/d)Pr[h(p)=h(q)]=(1-D(p,q)/d)kk

We can vary the probability by We can vary the probability by changing changing kk

k=1 k=2

distance distance

Pr Pr

LSH Random ProjectionsLSH Random Projections3d to 2d3d to 2d

Statistical approaches to Statistical approaches to modeling modeling

distance distributionsdistance distributions

Distribution of minimum Distribution of minimum distancesdistances

Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selectedquery shingles and this database. The right bump is a small sampling (1/98 000 000) of the full histogram of all distances.

Radius-bounded retrieval Radius-bounded retrieval performance: cover song performance: cover song

(opus task)(opus task)

• Performance depends critically on xthresh, the collision threshold

• Want to estimate xthresh automatically from unlabelled data

Order StatisticsOrder Statistics

Minimum-value distribution is Minimum-value distribution is analyticanalytic

Estimate the distribution parametersEstimate the distribution parameters Substitute into minimum value Substitute into minimum value

distributiondistribution Define a threshold in terms of FP Define a threshold in terms of FP

raterate This gives an estimate of This gives an estimate of xthreshxthresh

Estimating Estimating xthresh xthresh from from unlabelled dataunlabelled data

Use theoretical statisticsUse theoretical statistics Null Hypothesis: Null Hypothesis:

HH00: shingles are drawn from unrelated tracks: shingles are drawn from unrelated tracks

Assume elements i.i.d., normally distributedAssume elements i.i.d., normally distributed MM dimensional shingles, dimensional shingles, dd effective degrees of effective degrees of

freedom: freedom:

Squared distance distribution for Squared distance distribution for HH00

ML for background ML for background distributiondistribution

• Likelihood for N data points (distances squared)• d = effective degrees of freedom• M = shingle dimensionality

Background distribution Background distribution parametersparameters

• Likelihood for N data points (distances squared)• d = effective degrees of freedom• M = shingle dimensionality

Minimum value over Minimum value over NN samplessamples

Minimum value distribution Minimum value distribution of of unrelated shinglesunrelated shingles

Estimate of Estimate of xthreshxthresh

, false positive rate

Unlabelled data Unlabelled data experimentexperiment

Unlabelled data set Unlabelled data set Known to contain:Known to contain:

cover songs (same work, different performer)cover songs (same work, different performer) Near duplicate recordings (misattribution, Near duplicate recordings (misattribution,

encoding)encoding) Estimate background distance distributionEstimate background distance distribution Estimate minimum value distributionEstimate minimum value distribution Set Set xthresh xthresh so FP rate is <= 1%so FP rate is <= 1% Whole-track retrieval based on shingle Whole-track retrieval based on shingle

collisionscollisions

MisattributionsMisattributions Joyce Hatto: 100% of known misattributions in first rankJoyce Hatto: 100% of known misattributions in first rank

Sergie FiorentinoSergie Fiorentino

Eleven out of twenty-six Mazurkas performances on Eleven out of twenty-six Mazurkas performances on another Concert Artists/Fidelio disc, issued under the another Concert Artists/Fidelio disc, issued under the name of Sergio Fiorentino, are in fact copies of name of Sergio Fiorentino, are in fact copies of recordings by other artists. This is the first time that recordings by other artists. This is the first time that such practices have been found in the Concert Artist‘ such practices have been found in the Concert Artist‘ Fidelio recordings issued other than under the name of Fidelio recordings issued other than under the name of Joyce Hatto, and prompts speculation as to how much Joyce Hatto, and prompts speculation as to how much more misattributed material remains to be found in the more misattributed material remains to be found in the Concert Artists/Fidelio catalogue. Concert Artists/Fidelio catalogue. Click hereClick here for further for further details. details.

Cover song retrievalCover song retrieval

ScalingScaling

Locality sensitive hashing Locality sensitive hashing Trade-off approximate NN for time Trade-off approximate NN for time

complexitycomplexity 3 to 4 orders of magnitude speed-up3 to 4 orders of magnitude speed-up No noticeable degradation in No noticeable degradation in

performanceperformance For optimal radius thresholdFor optimal radius threshold

Remix retrieval via LSHRemix retrieval via LSH

Open source: google: Open source: google: “audioDB”“audioDB” Management of tracks, sequences, Management of tracks, sequences,

saliencesalience Automatic indexing parametersAutomatic indexing parameters OMRAS2, Yahoo!, AWAL, CHARM, more…OMRAS2, Yahoo!, AWAL, CHARM, more… Web-services interface (SOAP / JSON)Web-services interface (SOAP / JSON) Implementation of LSH for large N ~ 1BImplementation of LSH for large N ~ 1B 1-10 ms whole-track retrieval from 1B 1-10 ms whole-track retrieval from 1B

vectorsvectors

AudioDB: Shingle Nearest AudioDB: Shingle Nearest Neighbor SearchNeighbor Search

Current deploymentCurrent deployment

Large commercial collectionsLarge commercial collections AWAL ~ 100,000 tracksAWAL ~ 100,000 tracks Yahoo! 2M+ tracks, related song Yahoo! 2M+ tracks, related song

classifierclassifier Flickr 1B+ ImagesFlickr 1B+ Images

AudioDB: open-source, international AudioDB: open-source, international consortium of developersconsortium of developers

Google: “audioDB”Google: “audioDB”

ConclusionsConclusions

Radius-bounded retrieval model for tracksRadius-bounded retrieval model for tracks Shingles preserve temporal information, high Shingles preserve temporal information, high

dd Implements mid-to-high specificity searchImplements mid-to-high specificity search Optimal radius threshold from order statistics Optimal radius threshold from order statistics

null hypothesis: shingles are drawn from unrelated null hypothesis: shingles are drawn from unrelated trackstracks

LSH requires radius bound, automatic LSH requires radius bound, automatic estimateestimate

Scales to 1B shingles+ using LSHScales to 1B shingles+ using LSH

ThanksThanks

Malcolm Slaney, Yahoo! Research Malcolm Slaney, Yahoo! Research Inc.Inc.

Christophe Rhodes, Goldsmiths, U. Christophe Rhodes, Goldsmiths, U. of Londonof London

Michela Magas, Goldsmiths, U. of Michela Magas, Goldsmiths, U. of LondonLondon

Funding: EPSRC: EP/E02274X/1 Funding: EPSRC: EP/E02274X/1