51
111/08/14 1 Query-by-Singing/Humming: An Overview 哼哼哼哼哼哼哼 「」 J.-S. Roger Jang ( 哼哼哼 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://mirlab.org/jang

2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

Embed Size (px)

Citation preview

Page 1: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

112/04/21 1

Query-by-Singing/Humming: An Overview「哼唱選歌」綜述

J.-S. Roger Jang ( 張智星 )

Multimedia Information Retrieval Lab

CS Dept., Tsing Hua Univ., Taiwan

http://mirlab.org/jang

Page 2: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-2-

Outline

IntroductionMethods for QBSH

Pitch Tracking Database Comparison

Demos and Commercial ApplicationsConclusions

Page 3: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-3-

音樂資訊檢索( MIR )分類

Metadata-based Example: 歌名、歌手、標記、作詞者、作曲者 Query input: text or speech

Content-based Example: Melody, chord, note onsets, moods… Query input:

Symbolic: 音符、和弦、文字Acoustic: 哼唱、口哨、敲擊

Page 4: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-4-

Acoustic Inputs for MIR

哼唱 Query by humming

(usually “ta” or “da”) Query by singing

口哨 Query by whistling

敲擊 Query by tapping (at the

onsets of notes)

語音 Query by the user’s

speech input (for meta-data)

原音音樂範例 Query by recordings of

mobile phones

Beatboxing

Page 5: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-5-

Introduction to QBSH

QBSH: Query by Singing/Humming Input: Singing or humming from microphone Output: A ranking list retrieved from the song

database

Progression First paper: Around 1994 Extensive studies since 2001 State of the art: QBSH tasks at ISMIR/MIREX

Page 6: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-6-

「哼唱選歌」的流程

前處理: 收集單軌標準答案(通常是 MIDI 檔) 轉換成適合比對的中介格式

即時處理: 將使用者的音訊輸入轉成音高向量 由音高向量轉成音符(選擇性) 和標準答案進行比對 列出排名

Page 7: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-7-

Flowchart of QBSH

Pitch vectorsmoothing

Pitch tracking

Microphone input

Filtering

Query results(Ranked song list)

Similarity comparison

Off-line processing

Melody trackextraction

MIDI files

Frame-based representation

On-line processing

Page 8: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-8-

Pitch Tracking for QBSH

Two categories for pitch tracking algorithms Time domain ( 時域 )

ACF (Autocorrelation function)AMDF (Average magnitude difference function)SIFT (Simple inverse filtering tracking)

Frequency domain ( 頻域 )Harmonic product spectrum methodCepstrum method

Page 9: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-9-

Frame Blocking for Pitch Tracking

Frame size=256 pointsOverlap=84 pointsFrame rate=11025/(256-84)=64 pitch/sec

0 50 100 150 200 250 300-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Zoom in

Overlap

Frame

0 500 1000 1500 2000 2500-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Page 10: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-10-

ACF: Auto-correlation Function

Frame s(i):

Shifted frame s(i+):

=30

30

acf(30) = inner product of overlap part

Pitch period

1

0

n

i

acf s i s i

Page 11: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-11-

Pitch Tracking via ACF

Specs Sampe rate = 11025 Hz Frame size = 32 ms Overlap = 0 Frame rate = 31.25

Playback soo.wav sooPitch.wav

Page 12: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-12-

AMDF: Average Magnitude Difference Function

Frame s(i):

Shifted frame s(i+):

=30

30

amdf(30) = sum of abs. difference

Pitch period

1

0

n

i

amdf s i s i

Page 13: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-13-13/44

UPDUDP (1/4)

UPDUDP: Unbroken Pitch Determination Using DP Goal: To take pitch smoothness into consideration

: a given path in the AMDF matrix : Number of frames : Transition penalty : Exponent of the transition difference

n

i

n

i

m

iiii pppamdfm1

1

11,,cost p

mn

ni ppp ,,1p

Page 14: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-14-

UPDUDP (2/4)

Optimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j)

Recurrent formula:

Initial conditions : Optimum cost :

160,8),(),1( 1 jjamdfjD

),(min

160,8jnD

j

2

160,8),1(min)(),( jkkiDjamdfjiD

ki

160,8,,1 jni

Page 15: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-15-

UPDUDP (3/4)

A typical example of UPDUDP using AMDF

Page 16: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-16-

UPDUDP (4/4)

Insensitivity in

0 0.5 1 1.5 2

-3

-2

-1

0

1

2

3

x 104

Wav

efor

m

xi

x i

lu

l u

chan

ch a nn

sheng

sh ng

chang

ch a ng

0 0.5 1 1.5 2

20

30

40

50

60

70

80

Time (seconds)

Pitc

h (S

emito

nes)

xi

x i

lu

l u

chan

ch a nn

sheng

sh ng

chang

ch a ng

=0

=2000 =4000 =6000 =8000 =10000 =12000 =14000 =16000 =18000 =20000

Page 17: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-17-

Frequency to Semitone Conversion

Semitone : A music scale based on A440

Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )

69440

log12 2

freqsemitone

Page 18: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-18-

Vectors after Pitch Tracking

With rests Without rests

Page 19: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-19-

Typical Result of Pitch Tracking

Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音

Page 20: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-20-

Comparison of Pitch VectorsYellow line : Target pitch vector

Page 21: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-21-

Demo of Pitch Tracking

Real-time display of ACF for pitch tracking toolbox/sap/goPtByAcf.mdl

Real-time pitch tracking for real-time mic input toolbox/sap/goPtByAcf2.mdl

Pitch scaling pitchShiftDemo/project1.exe pitchShift-multirate/multirate.m

Page 22: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-22-

Comparison Methods of QBSH

Categories of approaches to QBSH Histogram/statistics-based Note vs. note

Edit distance

Frame vs. noteHMM

Frame vs. frameLinear scaling, DTW, recursive alignment

Page 23: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-23-

Range Comparison

Concept Reject a song if the range does not match:

Characteristics Extremely fast Not effective Good for initial filtering

)()( crangeqrange

Page 24: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-24-

Linear Scaling (LS)

Concept Scale the query linearly to match the candidates

Example:

Page 25: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-25-

Linear Scaling (II)

Strength One-shot for dealing

with key transposition Efficient and effective Indexing methods

available

Weakness Cannot deal with non-

uniform tempo variations

Typical mapping path

Page 26: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-26-

Linear Scaling (III)

Distance function for LS Normalized L1-norm Normalized L2-norm

Rest handling Extend previous non-zero

note

Alignment example

Page 27: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-27-

Dynamic Time Warping (DTW)

Goal: Allows comparison of high tolerance to tempo variation

Characteristics: Robust for irregular tempo variations Trial-and-error for dealing with key transposition Expensive in computation Does not conform to triangle inequality Some indexing algorithms do exist

#1 method for task 2 in QBSH/MIREX 2006

Page 28: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-28-

Dynamic Time Warping: Type 1

i

j

t(i-1)

r(j)

)1,2(

)1,1(

)2,1(

min

|)()(|),(

jiD

jiD

jiD

jritjiD

),( jiD

t: input pitch vector (8 sec, 128 points)r: reference pitch vectorLocal paths: 27-45-63 degrees

DTW recurrence:r(j-1)

t(i)

Page 29: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-29-

Dynamic Time Warping: Type 2

i

j

t(i-1)

r(j)

),1(

)1,1(

)1,(

min

|)()(|),(

jiD

jiD

jiD

jritjiD

),( jiD

r(j-1)

t(i)

t: input pitch vector (8 sec, 128 points)r: reference pitch vectorLocal paths: 0-45-90 degrees

DTW recurrence:

Page 30: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-30-

Local Path Constraints

Type 1: 27-45-63 local paths

Type 2: 0-45-90 local paths

jiD ,

jiD ,

),1(

)1,1(

)1,(

min

)()(),(

jiD

jiD

jiD

jritjiD

)1,2(

)1,1(

)2,1(

min

)()(),(

jiD

jiD

jiD

jritjiD

2,1 jiD

1, jiD

1,1 jiD

jiD ,1

1,1 jiD 1,2 jiD

Page 31: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-31-

DTW Paths of “Match Beginning”

We assume the speed of a user’s acoustic input falls within 1/2 and 2 times of that of the intended song.

Right-end is free to move. Typical DTW table size =

128 x 180

i

j

Page 32: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-32-

DTW Paths of “Match Anywhere”

Both ends are free to move.

Typical DTW table size = 128 x 2880

i

j

Page 33: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-33-

DTW Path of “Match Beginning”

Page 34: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-34-

DTW Path of “Match Anywhere”

Page 35: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-35-

DTW Path of “Match Anywhere”

Page 36: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-37-

Key Transposition

Goal: Allow users’ input of different keys

Method 1: Mean shift and heuristic modification

5 DTW computation when compared to each song

Mean

-4 40-2 21 3

t-2t+2(t’)t’-1 t’+1t

Page 37: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-38-

Type-3 DTW:Frame to Note Alignment

DP-based method for filling the table:

67

64

65

Frame-levelPitch vector

Notes

)1,1(

),1(min|)()(|),(

jiD

jiDjritjiD

jiD ,

1,1 jiD

jiD ,1

Recurrent formula: Local constraint:

62

65

Page 38: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-39-

Type-3 DTW

Characteristics Frame-based query input

vs. note-based music database

Note duration unused More efficient, less

effective Heuristics for key-

transposition

Mapping path

Page 39: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-40-

RA (Recursive Alignment)

Characteristics Combine characteristics

of LS & DTW #1 method for task 1 in

QBSH/MIREX 2006

A typical mapping path

Page 40: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-41-

Modified Edit Distance

Note segmentation

Modified edit distance

,

)(}2),,....,,({

)(}2),,,....,({

)(),(

)(),(

)(),(

min

1,1

11,

1,1

1,

,1

,

ionfragmentatjkbbawd

ionconsolidatikbaawd

treplacemenbawd

insertionbwd

deletionawd

d

jkjikji

jikijki

jiji

jji

ji

ji

Page 41: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-42-

Challenges in QBSH Systems

Song database preparation MIDIs, singing clips, or audio music

Reliable pitch tracking for acoustic input Input from mobile devices or noisy karaoke bar

Efficient/effective retrieval Karaoke machine: ~10,000 songs Internet music search engine: ~500,000,000 songs

Page 42: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-43-

Page 43: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-44-

Goal and Approach

Goal: To retrieve songs effectively within a given response time, say 5 seconds or so

Our strategy Multi-stage progressive filtering Indexing for different comparison methods Repeating pattern identification

Page 44: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-45-

Demo: MIRACLE

MIRACLE: Music Information Retrieval Acoustically via CLuster Engines

Demo page of MIR Lab: http://mirlab.org/new/mir_products.asp

MIRACLE demo: http://cuda.mirlab.org

Page 45: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-46-

Internet Music Search EngineClient-server distributed computingCloud computing via clustered PCs & GPU

Master server

Clients Clustered servers

PC

PDA

Cellular

Slave

Slave

Slave

Master server

Slave servers

Request: pitch vector

Response: search result

Page 46: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-47-

Challenge 1:音樂資料庫之收集

由網路收集之音樂檔案: MIDI檔案

若要精準,需由人工找出主旋律所在的軌數。若以自動化之方法來進行,辨識率約為 85%

MIDI 檔案格式複雜且不一致MIDI 主旋律不乾淨(有前奏、疊音、變奏等)

MP3檔案流行音樂:極不容易抽取人聲之音高。根據 ISMIR2011之比賽結果,最佳音高辨識率為 84%

交響樂:可能根本沒有主旋律 人工標記:

若要支援文字搜尋,則需加入歌手、歌詞、類別等資訊。

Page 47: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-48-

Challenge 2:比對之加速

影響比對速度之因素(及其代表值) 哼唱輸入長度: 8 秒( 128音高點) 資料庫大小:約 13000首歌 比對方法: LS+DTW CPU: Pentium 2G(比較不受到記憶體大小影響) 比對位置

從頭比對:約 2 秒從中間比對

• 副歌開始處• 每個音符開始處:約 45秒• 任意處:約 60秒

Page 48: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-49-

Response Time of Miracle

8 sec recording of “ 小毛驢” , comparison from beginning: LS: 0.4 sec DTW: 3.5 sec LS+DTW: 0.6 sec

8 sec recordings of the refrain of “ 夢醒時分” , comparison from anywhere: LS: 40 sec DTW: IIS time out LS+DTW: 45 sec NBDTW: IIS time out

Page 49: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-50-

Could It Be More Efficient?

Algorithms Indexing of LS/DTW Progressive filtering

New Platforms GPU (66 times faster for QBSH!) Grid/clustered computing Multi-core platforms

Page 50: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-51-

Commercial Applications

www.midomo.comwww.soundhound.comwww.shazam.com

Page 51: 2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-52-

Conclusions

QBSH Fun and interesting way to retrieve music Can be extend to singing scoring Commercial applications getting mature

Challenges How to deal with massive music databases? How to extract melody from audio music?