6
Open Issues on Query by Humming Nauman Ali Khan *, Mubashar Mushtaq t * t Dept. of Computer Science, Quaid-I-Azam University, Islamabad Email: *[email protected]. t [email protected] Abstractongs are considered the major source of entertain- ment for the people of all age groups. With the wide spread growth of World Wide Web. Various resources are available and accessible. The high availability of audio music content is bringing significant problems and the relevant songs retrieval being the foremost. Searching of audio files on the basis of its content is the most effective way, especially in the case when supportive information (metadata information) of the file is missing or incomplete. In this paper, we aim to discuss the famous content based searching technique, Query by Humming (QbH) along-with the other existing techniques in the domain. We have highlighted certain open issues and key challenges that need to be address by the research community for the advancement in the domain. The discussion is supported by conducting surveys to study the importance of these highlighted issues for the relevant songs retrial. Index T erms-Query by humming I. INTRODUCTION We have witnessed the exponential growth in the usage of Internet during the last decade. More and more multimedia contents are available that is source of interest of different people. Searching the most relevant information from the huge repository of Internet is always being a hectic job for the end users. The situation becomes worst when there is no explicit metadata available. In this regard, to achieving the 100% relevancy while searching the relevant information is considered a great challenge in the field of information retrieval. Many researchers both in academia and research industry are focusing to provide efficient searching mechanism and a lot of searching techniques have been developed and adapted. Users always want to have the most relevant file at the top in the searching results. We need certain efficient techniques to make the searching process more precise and accurate to find the most relevant files. The high relevancy can be achieved in two ways; either maximum information is provided by the user explicitly while querying or maximum possible information is grabbed implicitly from the user [1]. The later will enhance the querying experience of end user by not indulging him to provide more and more information. This will also help the searching mechanism when the clear metadata information is not known in advance to the user. Due to the rapid growth of multimedia content, music and songs repositories have also grown massively. Different techniques are used to organize and manage these repositories. We encounter certain special issues while searching over different multimedia content. In this paper, our key focus is towards the songs searching. In general songs are categorized in genres though searching specific song can become easy if you know all the information about the song. Generally for audio retrieval two set of techniques are com- monly used. Metadata Searching Content Based Searching 978-1-4244-9825-3/11/$26.00 ©2011 IEEE A. Metadata Searching Metadata based searching is a vast domain regarding In- formation Retrieval. In the field of audio searching separate techniques and algorithms have been adapted for different categories. Metadata based searching can further be classified into two categories; Browsing and Keyword Search. 1) Bwsing: Browsing technique is mostly used when there is high probability that a user wants to explore songs or there is an undirected search; when the user is not sure about what he wants to search. For example, Y OUTUBE [2] has a large repository containing millions of videos. Generally, in such case a user browses video and further each video is categorized on the basis of different criteria like, the upload date, number of views, relevancy and other associated to the video. The similar techniques are applied for the audio searching in different domains like napster [3]. There are many different techniques that are adapted for the browsing. One of these techniques is Faceted search. This technique is widely adapted by the ITUNES [4]. When a user is interested in exploring any song with respect to album, Title, year of Release, Artist etc., facet based browsing procedures are applied. 2) Keyword Search: Keyword Search is a very commonly used technique in various domains of searching. A user is required to enter the most relevant keyword associated to any content and the searching is performed on the basis of this associated keyword. Many researchers have proposed differ- ent keywords based searching techniques. The most commonly known keyword based searching techniques include Indexing, Vector Space Model, Boolean Model, Cosine Measure, String Matching functions [5]. Keyword based searching is also very common in the field of audio searching. The mechanism of audio song retrieval totally relies on exact keyword query. In key word search engine there is a string similarity function which basically returns the percentage of matching. The Exactness in keyword of query with respect related song can give better results. Same is the case when we give more information during querying, like if state year of release, genre type and title so this sort of query in keyword search engine is handled by Advance Search Option Example to the advance search option is when "A user is interested finding a song of 'Michael Jackson' and also he wants to set a genre as 'POP'." So here advance search provide this facility to give both parameter separately. This sort of searching falls in the category of Meta based searching. Content Based searching techniques are also applied to song retrieval that also have some special issues that are described in coming sections. B. Content Based Searching Technique Content based searching technique helps us to search into the contents of the files. Contents of file play a vital role in describing what actually is inside the file. The reason of introducing content based search technique was to extract the exact meaning of the file. By simply looking metadata one cannot fully judge what actually it is describing. For song retrieval the same technique was applied with a variation. 147

[IEEE 2011 Fourth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2011) - Stevens Point, WI, USA (2011.08.4-2011.08.6)] Fourth International

Embed Size (px)

Citation preview

Page 1: [IEEE 2011 Fourth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2011) - Stevens Point, WI, USA (2011.08.4-2011.08.6)] Fourth International

Open Issues on Query by Humming Nauman Ali Khan * , Mubashar Mushtaq t

*t Dept. of Computer Science, Quaid-I-Azam University, Islamabad

Email: *[email protected]@qau.edu.pk

Abstract-Songs are considered the major source of entertain­ment for the people of all age groups. With the wide spread growth of World Wide Web. Various resources are available and accessible. The high availability of audio music content is bringing significant problems and the relevant songs retrieval being the foremost. Searching of audio files on the basis of its content is the most effective way, especially in the case when supportive information (metadata information) of the file is missing or incomplete.

In this paper, we aim to discuss the famous content based searching technique, Query by Humming (QbH) along-with the other existing techniques in the domain. We have highlighted certain open issues and key challenges that need to be address by the research community for the advancement in the domain. The discussion is supported by conducting surveys to study the importance of these highlighted issues for the relevant songs retrial. Index Terms-Query by humming

I. INTRODUCTION

We have witnessed the exponential growth in the usage of Internet during the last decade. More and more multimedia contents are available that is source of interest of different people. Searching the most relevant information from the huge repository of Internet is always being a hectic job for the end users. The situation becomes worst when there is no explicit metadata available. In this regard, to achieving the 100% relevancy while searching the relevant information is considered a great challenge in the field of information retrieval. Many researchers both in academia and research industry are focusing to provide efficient searching mechanism and a lot of searching techniques have been developed and adapted.

Users always want to have the most relevant file at the top in the searching results. We need certain efficient techniques to make the searching process more precise and accurate to find the most relevant files. The high relevancy can be achieved in two ways; either maximum information is provided by the user explicitly while querying or maximum possible information is grabbed implicitly from the user [1]. The later will enhance the querying experience of end user by not indulging him to provide more and more information. This will also help the searching mechanism when the clear metadata information is not known in advance to the user.

Due to the rapid growth of multimedia content, music and songs repositories have also grown massively. Different techniques are used to organize and manage these repositories. We encounter certain special issues while searching over different multimedia content. In this paper, our key focus is towards the songs searching. In general songs are categorized in genres though searching specific song can become easy if you know all the information about the song.

Generally for audio retrieval two set of techniques are com­monly used.

• Metadata Searching • Content Based Searching

978-1-4244-9825-3/11/$26.00 ©2011 IEEE

A. Metadata Searching

Metadata based searching is a vast domain regarding In­formation Retrieval. In the field of audio searching separate techniques and algorithms have been adapted for different categories. Metadata based searching can further be classified into two categories; Browsing and Keyword Search.

1) Browsing: Browsing technique is mostly used when there is high probability that a user wants to explore songs or there is an undirected search; when the user is not sure about what he wants to search. For example, Y OUTUBE [2] has a large repository containing millions of videos. Generally, in such case a user browses video and further each video is categorized on the basis of different criteria like, the upload date, number of views, relevancy and other associated to the video. The similar techniques are applied for the audio searching in different domains like napster [3].

There are many different techniques that are adapted for the browsing. One of these techniques is Faceted search. This technique is widely adapted by the ITUNES [4]. When a user is interested in exploring any song with respect to album, Title, year of Release, Artist etc., facet based browsing procedures are applied.

2) Keyword Search: Keyword Search is a very commonly used technique in various domains of searching. A user is required to enter the most relevant keyword associated to any content and the searching is performed on the basis of this associated keyword. Many researchers have proposed differ­ent keywords based searching techniques. The most commonly known keyword based searching techniques include Indexing, Vector Space Model, Boolean Model, Cosine Measure, String Matching functions [5].

Keyword based searching is also very common in the field of audio searching. The mechanism of audio song retrieval totally relies on exact keyword query. In key word search engine there is a string similarity function which basically returns the percentage of matching. The Exactness in keyword of query with respect related song can give better results. Same is the case when we give more information during querying, like if state year of release, genre type and title so this sort of query in keyword search engine is handled by Advance Search Option

Example to the advance search option is when "A user is interested in finding a song of 'Michael Jackson' and

also he wants to set a genre as 'POP'." So here advance search provide this facility to give both

parameter separately. This sort of searching falls in the category of Meta based searching. Content Based searching techniques are also applied to song retrieval that also have some special issues that are described in coming sections.

B. Content Based Searching Technique

Content based searching technique helps us to search into the contents of the files. Contents of file play a vital role in describing what actually is inside the file. The reason of introducing content based search technique was to extract the exact meaning of the file. By simply looking metadata one cannot fully judge what actually it is describing. For song retrieval the same technique was applied with a variation.

147

Page 2: [IEEE 2011 Fourth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2011) - Stevens Point, WI, USA (2011.08.4-2011.08.6)] Fourth International

QbH Techniques

I

String Matching Tree Based Searching DynamicTime Warping Hidden Markov Model (HMM)

r- AsifGhais at el. - McNab at el.

� Mc Nab at el. - Roland at el.

... Liu, T. and - Blackburn at el. Haung, X. at el.

Fig. 1. Q uery By Humming Techniques adopted by Authors

'--

Query by humming [6] is one of the well known content based searching techniques, that is commonly used for songs and audio file retrieval. The motivation behind the making of query by humming was that when a user does not know the lyrics of the song and wants to retrieve it through the humming so in such case query by humming is used. Humming by the user acts a query for query by humming search Different alignment algorithms are also been used to resolve issues of Query by humming.

In our study, we did extensive literature survey for the field of query by humming. We also described commonly used Query by humming techniques and elaborated their details and applications. At the end, we highlighted some of the demanding challenges and issues related to query by humming which are to be resolved.

The rest of paper is structured as follows: Section II covers the background of Content Based Searching and Query by Humming Section III describes the open issues and challenges related to Query by Humming. Lastly, section IV concludes the paper with future direction.

II. LITERATURE REVIEW

Content Based retrieval of Multimedia is more effective when metadata or related information is less or not good enough to explain the contents. For example a song is commonly searched by the title of the song. However, it may happens that someone does not know the exact tile of the song but a small portion of the song that is not part of the title. So, in this special case content based searching give better results [7] as compared to. Browsing and Searching were considered two main aspects for content searching [7]. By searching they mean search a specific item in multimedia and browsing means to explore in an undirected way.

In the category of Content based Searching for song retrieval, query by humming is one of the ways of searching a song. In this technique, initially extraction of features is performed from the melody of music, after then the similarity is calculated. Generally the system is sighted as a four phase procedures. First phase is called as the extraction phase, in this phase feature of the music tunes are extracted from the whole audio file. Features

Jang at el. - Hsuan-Huei at el.

Kosdugi at el. - Lu, L and Seide, Fat el.

Kankanhalli at el. Chew, E.U.S.N.E and -Dahlin, P.G.G.N at el.

Nam, G.P at el.

are the values of the samples after equal interval, like it can be pitch or amplitude values. Second phase is the filtration phase, noise filtration from the extracted features is been done. Noise filtration phase is the removable of those sample values which are below than the noise threshold. Third phase is storing phase, in this phase features of the music tunes are stored in the database. Fourth phase is the matching phase, in this phase already extracted features are compared to the hummed voice features. Each song has its own melody and tune which helps it to differentiate from other song. The comparison of features is commonly evaluated by matching the pitch of each file.

Humming a song is easy rather than remembering the exact lyrics of the song [8]. This technique was introduced by keeping in mind the human natural behavior towards remembering a particular song. Whenever a person goes to the music shop and search for a particular song. The shopkeeper asks question in the given series.

• What is the song title? • What is the album name? • Do you remember the Artist name? • Can you sing a song? • Can you whistle or hum the song?

This means that in human mind firstly metadata based searching is performed if that is not possible, content based is performed in sort of humming and that is considered the last option to find a song. The motivation behind the proposed Query by humming technique was the prescribed scenario. Query by humming technique is further sub divided into two generic categories; melodic representation and transcription method. In melodic representation method, each sample value is gather and then normalized while in melodic transcription method the melody of song is written in traditional music values e.g. "G D7 G G D7 G C G D7 G C G" represents poem of humpty dumpty set on a wall. A large portion of work is done on melodic representation. We have organized the related work related for Query by humming in technique wise.

In general Query by Humming field can divided into four main categories with respect to technique used. Figure 1 shows a detail overview of Query by Humming field with techniques

148

Page 3: [IEEE 2011 Fourth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2011) - Stevens Point, WI, USA (2011.08.4-2011.08.6)] Fourth International

applied to it and which author used which technique.

A. String Matching

String matching is the most earlier technique commonly used in the field of query by humming. In this technique firstly melodic representation is extracted from the song, and then it is converted to audio signal, and at the end it is assembled to its corresponding contour representation. Many QBH systems adapted the same conversion technique. For pitch tracking autocorrelation procedure was introduced by Ghias at el. [6]. Three symbols (U, D and S) were used to depict the three different levels of pitch contours. U as Up, D as Down and S as Same; because in any time series graph there are only three possibilities for a graph to grow, shrink or remain same. McNab et al. [9], added an extra feature of duration to three level contour representations. Duration feature increases the relevancy to provide an extra detail of the contents of an audio file. By duration McNab et al. [9], means that for how long any of the three symbols (U, D and S) remains same. Liu, T. at el. proposed an innovative way to compare the hum query with the songs database directly without converting it to the relative MIDI file. Harmonic Product Spectral (HPS) method was used for extracting the pitch feature values. This method accurately extracts the feature values in the form of (U, D and S). Similarities were evaluated between hummed query and the already existing song place in the database by the string matching techniques. Concerned issues and factors were also pointed in this research along-with the discovery of better results for long query and template [10].

B. Tree Based Search Technique

Roland et al. and Blackburn et al. further improved McNab work by introducing Tree Based Searching. They enhanced the efficiency of the results and also refined the results regarding the relevancy. Roland et al. [11] and Blackburn et al. implemented the open hypermedia model for multimedia documents. This system basically uses to store contour representation in tree structure and after applying the brute force mechanism results were gathered and provide the details of tools design and algorithms.

C. Dynamic Time Warping Technique

Dynamic time warping techniques is a famous technique used in different domains. Dynamic Time warping is applied on two time series values and the extra feature which Dynamic Time Warping algorithm has is that it ignores speed e-g if we want to apply on the human body movement so it disregards the speed of movement that weather he is running or walking slow algorithm will just compare the movement. In audio comparison Dynamic Time Warping Algorithm ignore the speed of song it just matches the pattern. Jang at el. introduced the query-by-singing system called as CBMR (Content Based Music Retrieval), which takes the human's audio as input query. Content-based song database retrieval facility was provided in this system. Dynamic time warping algorithm was firstly adapted in the field of Query by humming by Jang at el. System worked by converting the human's audio voice into a pitch vector. In the preprocess stage, each song was read and stored in different indexed files. Then, the two levels of comparisons were made between the pitch vector and each song stored in database. These comparisons were carried out to find out the similarity between the pitch and the song database. Dynamic programming concepts were applied in the comparison procedure. They also claimed that their results were adequate for typical people with average singing ability. Kosdugi Hu at el. and Zu used different similarity measures while evaluation phase and to find out the similarity between the users's hummed sound and that of each song in the database. Based on those similarities, songs were ranked and presented

to the users. They focused on how to improve the retrieval of humming systems so that system must be capable of providing only the accurate songs to the user instead of a long list of similar songs. They interpreted the beats of the music and feature vectors were made by converting the user's hummed voice into relative intervals. The indices are used for the retrieval process that shows the significant result with 20% improvement. In this system, the retrieval time is one second over a large database e.g. if it consists of 10,069 songs. To attain such improvement different factors are considered such as, partial tone feature vectors, retrieval along with more than one search key and the results of music analysis of the songs. Graphics and user interfaces are also discussed. Moreover, the noise deletion methods were also adapted in this paper.

Y. and Kankanhalli MS at el. applied Dynamic programming algorithm for implementation of query by humming system. Point sequences were used to represent the music melody item as well as the hummed query [12]. The terminology of melody skeleton was introduced to the extreme points in the time series graph of plane and major task of alignment was the part of melody skeleton. The main reason of adapting dynamic programming algorithm was to skip useless points. As extreme points were only selected so the algorithm performed more efficiently. Performance checking was tested by examining different experiments. Nam, G.P. at.el used G. 729 [13] for the extraction of features from the hummed query. Dynamic Time Warping algorithms were used for matching. For evaluation of their results different statistical measure were used like median, average filtering, shifting and minimum and maximum scaling [13].

D. Hidden Markov Model (HMM)

Hidden Markov model (HMM) is basically a statically model that checks all possible options in comparing. Hsuan-Huei Shih at el. adapted the same HMM Statistical approach for query by humming [14]. The proposed system contained two modules. Firstly, two HMM models were designed to segment the notes into signedsignals and training of these HMM models was done with Mel-Frequency Cepstral Coefficients (MFCC) features. MFCC feature is used in voice recognition. Representation of both the ''regular note" and "rest" were made by MFCC features. During evaluation it was assumed that the signal only include sounds. Secondly, tone of every note was measured by Gaussian Mixture Models (GMMs) pitch model. GMM is one of the most stable statically measure used for clustering. Set of 8 humans were selected for the creation of the data set and about 80% of correct results were determined [14]. Unsupervised learning in this architecture plays a vital role in collecting better results.

Lu, L. and Seide, F. used Query by humming for mobile ring tones [15]. Initially phone recording robustness for the front end was presented in the proposed approach. Consideration of distortion due to wireless communication and GSM codec were kept in mind. Statistical approach was adapted by applying probabilistic modeling and a systematic matching was designed to prune, align, decode, and rescore. The dataset of 3000 songs was selected for the experimental setup and the accuracy achieved by the proposed technique is 83% and 85% was achieved for the dataset. Linear searching mostly performs badly when it is applied to a very large dataset. Author suggests that if the Machine learning techniques with appropriate training are applied to the proposed technique, the required results may be achieved with better efficiency [16] [17].

Issues related to automatic model of conversion from audio hummed query to its respective symbolic notes were explained in [18] for query by humming based search system. Hummed notes were properly extracted with the help of speech recognizers. Another task of speech recognizer was to purify the pitch and energy level. For evaluation precision recall and F-measure were also considered in the specified time and 200 hummed voices were

149

Page 4: [IEEE 2011 Fourth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2011) - Stevens Point, WI, USA (2011.08.4-2011.08.6)] Fourth International

Album -Ori en t ed Rock

[ Math rock

�===-, Surf Rock I [ Plano Rock I

[ Electronica I Electrodash

[ Metalcore

Fig. 2. Clas s ification of M us ic Genres

compared (Human recorded). In 75 ms time period F-measure 0.84 was observed. A total of 76% accuracy was calculated by the system simultaneously. The output results were further utilized for the performance analysis. Here, the accuracy means that the listing of exact audio file on top of the list of retrieved results. Total of 83% retrievals were seen for the same data sets when it was applied to manual transcriptions. Removal of segmentation errors were considered as the key factors for the improvement of results.

In the field of Music Information Retrieval (MIR) Query By humming is known to be the better searching technique especially when applied to huge song repositories. Query by Humming is an emerging field of research. Four main techniques are described in this paper i-e String matching, Tree Based Searching, Dy­namic Time Warping and Hidden Markov Model. We explained experimental setups and data sets for each technique.

E. Music Genre

In the field of Music different Genres act as class. Genre differentiates each song on the bases of tune or melody. Each genre has its own pattern and melody e.g. Folk, Pop, Rock, Country Song etc. Commonly Genre is selected will recording of song mostly by the singer itself. However automatic genre selection techniques are also made [19].

Figure 2 shows a bird eye view of genres. Genres are further divided into nine categories of Blues, Rock, Punk, Folk, Pop, Hip-Hop, Metal, Electro and Country. These nine categories have totally different pattern [19].

Soft Rock I Blues-rock

[ Cool Jazz I �--_ '-----� [ Acid Jazz (VOcaIl � [ NuJazz

[ Free Jazz

III. ISSUES IN QUERY BY HUMMING The diverse features of Query by Humming systems opens

many challenging issues. Even though there exist a lot of solutions for Query by Humming but it is still is an active research area. Various problems and challenges are still pending to be resolved. In this section, we have described some issues which were not that much focused and need significance attentions.

We have done small surveys with a lot of variations and found some special issues regarding query formulation. We justified each problem with its corresponding accuracy results. For evaluation of results we choose different users from different regions of Pakistan with different mother tongue, age, gender and qualification.

A. Different language people

Diverse growth of multimedia items and release of different songs in different languages causes special issues regarding song retrieval. Among the vast group of computer users, songs listeners hold a dominant population. This song listening users group belongs to different countries having different languages and each language users having separate dialect.

If a user has to retrieve a song of his mother tongue then accuracy of results will be more than to a user of different language and dialect.

For our experiment we took group of users having two famous regional languages of Pakistan i-e Pashto and Punjabi .Figure 3 shows a comparison of these two groups of users. Both set of users are firstly given to sing a song of Pashto. It is shown that the set of users having Pashto language hum more accurately

150

Page 5: [IEEE 2011 Fourth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2011) - Stevens Point, WI, USA (2011.08.4-2011.08.6)] Fourth International

100 90 80

� 70 E 60 :J so U 40 U

« 30 20 10 0

Pashto Song Accuracy

r- - -r-r- - -I-r-

PH1 PU1 Age � llyr andSlSyr

PH2 PU2 PH3 PU3 Age?: 16yr Age?: 21yr and 520yr and 525yr

J PH4 PU4 Age ?: 26yr and 530yr

Punjabi And Pashto Users of different Ages

• Pashto User

o Punjabi User

Fig. 3. Accuracy comparis on of us ers humming for Pas hto Language Song

with respect to Panjabi users and that is the reason that they have more accuracy in their results.

Same is the case when both set of users are asked to hum a Panjabi language song. Figure 4, shows more accuracy for the set of Panjabi users rather than that of Pashto.

90 80 70

�60 � 50 a 40 � 30

20 10

Punjabi Song Accuracy

r--r--r--

PH1 PU1 Age ?: llyr and 515yr

-• • • •

PH2 PU2 PH3 PU3 Age?: 16y' Age ?: 21yr and 52Ciyr and 525yr

• • • -•

PH. PU4 Age?:26yr and 530yr

Punjabi And Pashto Users of different Ages

• Pashto User

o Punjabi User

Fig. 4. Accuracy comparis on of users humming for Punjabi Language Song

This concludes that if two users are chosen with different mother tongue and dialect and ask to hum or sing one same song so they will be having different Contour representation. Some mechanism in query reformulation is to be drawn so that user query can be inter-convertible to different dialect before searching is performed.

B. High note Vs Low note Songs

In general, various songs are recorded in different notes [20], in the fields of music high note simply means intensity of amplitude. It is difficult to produce same high note hum to retrieve a high note song for any music listener. We organize a small survey to justify the importance of this problem. We took set of three professional singers and gave them two same songs with different note to listen and then to hum. One of those songs was of high note and other was of low note. The professional singers firstly recorded hum for both songs in low note then in high notes. Figure 5 represents the Low Note Song accuracy for both queries and it can be seen that for low note query results are more accurate rather than high note query.

Likewise Figure 6 represents the High Note Song accuracy for both query and high note query results are more accurate than to low note.

So Loudness parameter of a song should not be neglected while searching a song. If the same songs are being recorded by two singers with two different amplitudes i.e. low and high notes.

Certain formulation is required to be applied to the searching query so that the exact need of user can be achieved.

151

Low Note Song Accuracy 100

r- r- .lNQ

r- -- r- r-low Note Query

OHNQ r- -

High Note Query 20 o

LNQl HNQl LNQ2 HNQ2 LNQ3 HNQ3

Low and High Note Queries

Fig. S. Accuracy comparison of us ers humming for Low Note Song

100 80

�60 :; u 40 u

« 20 0

High Note Song Accuracy

r- - .lNQ

r-low Note Query

OHNQ I- - High Note Query

LNQIHNQl LNQ2HNQ2 LNQ3HNQ3

Low and High Note Queries

Fig. 6. Accuracy comparison of us ers humming for High Note Song

C. Fast tempo Vs Slow tempo voice

In the field of music, famous songs are re-recorded as their second version mostly called as remix of that particular song. Lyrics typically remain the same but tempo or the composition of song is changed. By tempo, we mean at which speed song or humming query is recorded. The common change in most remix songs is that the tempo or speed of the song is increased. Any music listener faces problem when he is interested in finding a remix of any particular song because he cannot hum or sing that much properly as original remix song is made. Some Query reformulation techniques are to be applied so that user queries to get accurate results. If after recording user explicitly change the speed of query by the searching interface and listen accordingly, this can be one solution to cope with this sort of problem. Explicit adjusting of tempo of the song can be helpful for query reformulation.

D. Variant Pitches

Songs are recorded by male and female singers, separately as well as combined. Females by nature have high pitch voice as compare to Male. We have evaluated, if female voice song is given to a female user to retrieve through humming or singing the accuracy of results are much better than a male user. This evaluation justifies a problem of querying when a user wants to retrieve opposite gender song. Figure 6 represents the female voice songs accuracy for both male and female end user query and it can be seen that for female end user query results are more accurate rather than male query. Likewise, we evaluated results for male query as well and observe that males can retrieve songs more accurately that are sung by male singers. Figure 5 shows results for male song accuracy for both male and female query.

According to the accuracy results we come up with a conclu­sion that some functionality is to be made so that song can easily be converted to any gender with some threshold value. Certain Interface is to be built so that end user can also participate in converting e-g a male user records audio query and he wants to adjust the pitch of his query by himself so certain controls in interface and functionality are to be develop so that he can adjust pitch before sending it to search engine.

Page 6: [IEEE 2011 Fourth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2011) - Stevens Point, WI, USA (2011.08.4-2011.08.6)] Fourth International

E. Professional Singer Vs Novice user

We analyzed that professional singers can hum more accu­rately and their results are also better as compare to ordinary non professional user [21]. As every ordinary user is not a professional singer so it is not easy to sing or hum a song to exact level. Contour representation from the professional singer is more refined than to ordinary user hummed query. This limitation of an ordinary user should be overcome. Certain rules need to be developed so that meaningful contents from novice user sung queries are to be extracted. This means ordinary user query is to be processed in this context.

F. Issues related to noise and quality of sound

Recording phase of Query by Humming has been given less attention but this phase can give a great impact on the results. If a song or hummed query is recorded in noise and noise proof environment a big difference in results can be measured in query by humming systems. The recording quality also plays a vital role if we have a larger bit rate of recording this mean we are having more content information about a song. The Distinct normalization phases are required to tackle such extensive problems.

G. Inter Genre Searching

In the music world, songs are recorded in different genres. Genres are defined as the categories of songs e.g. folk, rock, pop, etc. each genre has its own feature and tone. These genre songs can easily sung by domain experts genre singer unlikely to an ordinary end user who has less knowledge of songs genre. In order to see this variation, we took a survey and found that it is difficult for non professional singer to sing or hum that much accurately as compare to professional singer.

100

� 80 60 � 40 �

<t 20 0

OEl

100

� 80 60

a 40 � 20

0 PSl

Folk Genre Songs

Accuracy of Ordinary End User

OE2 OE3 OE4 Ordinary End User

Accuracy of Professional Singers

PS2 PS3 Professional Sincers

PS4

OES

PSS

Fig. 7. Accuracy comparison of us ers humming for Folk Genre Song

Total of 10 users were taken as population. Among those 5 were professional singers and 5 were non professional singers (Ordinary end user). All users were given folk genre song to listen and hum accordingly. Figure 7 shows the accuracy results of folk genre songs. 5 non professional singers got less than 65% accuracy in their results. This evaluation of results is a motivation to the problem of inter-genre searching. This whole survey concludes that if a non professional singers tries to search a song through humming of different genre then accuracy of results is not that much good. While if professional singers are directed to search same songs through humming so results are better. So certain rules for each category of genre are to be made so that user hum query can accurately be reformulating to its corresponding category and after that searching is performed.

IV. CONCLUSION

Searching songs using Query by Humming content based searching technique can get more accurate if search query can be reformulated by some mean. By reformulation, it means that parameters which are associated to requested songs can somehow append to the search query. Then, the accuracy of searched results can be improved to a great extend. In this paper, we have discussed different songs retrievaVsearching techniques and have identified some open challenging issues that need to be address to have more refined results. We evaluated different measure and came to the conclusion that those measures should not be neglected while searching songs especially when using Query by Humming (QbH) technique. We supported our discussion by taking an experimental evaluation (surveys) and mention some of the new research issues for query reformulation associated to the field of Query By humming. The proposed reformation can further enhance the searching of most relevant songs.

REFERENCES

[1] N.A. Khan, M.A. Khan, and M. Mushtaq. Hybrid query by humming and metadata search system (HQMS). page 9, 2010.

[2] YouThbe. URL: http://www.youtube.com. [3] LLC Napster. Napster. URL: http://www.napster.com. [4] S. Sheet. iThnes 2, Apple Computer. Inc., Oct, 31, 2001. [5] G. Salton, A. Wong, and C.S. Yang. A vector space model for

automatic indexing. Communications of the ACM, 18(11):613-620, 1975.

[6] A. Ghias, J. Logan, D. Chamberlin, and B.C. Smith. Query by humming: musical information retrieval in an audio database. pages 231-236, 1995.

[7] M.A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. Content-based music information retrieval: current di­rections and future challenges. Proceedings of the IEEE, 96(4):668-696,2008.

[8] J. Song, S.Y. Bae, and K. Yoon. Query by humming: matching humming query to polyphonic audio. 1 :329-332, 2002.

[9] R.J. McNab, L.A. Smith, I.H. Witten, C.L. Henderson, and S..J. Cunningham. Towards the digital music library: Thne retrieval from acoustic input. pages 11-18, 1996.

[10] T. Liu, X. Huang, L. Yang, and P. Zhang. Query by Humming: Comparing Voices to Voices. pages 1-4, 2009.

[11] S. Blackburn and D. DeRoure. A tool for content based navigation of music. pages 361-368, 1998.

[12] Y. Zhu and MS Kankanhalli. A robust music retrieval method for query-by-humming. pages 89-93, 2003.

[13] G.P. Nam, K.R. Park, S.P. Lee, E.C. Lee, M.Y. Kim, and K. Kim. Intelligent Query by Humming System. pages 1-6.

[14] H.H. Shih, S.S. Narayanan, and C.C.J. Kuo. Multidimensional humming transcription using a statistical approach for query by humming systems. pages 385-388, 2003.

[15] L. Lu and F. Seide. Mobile ringtone search through query by humming. pages 2157-2160, 2008.

[16] Mohammad R. K. Nastaran J. Qos-aware selection of web service composition based on harmony search algorithm. Journal of Digital Information Management, 8(3):160-166, 2010.

[17] Muhammet Semra, Ayegl. Classification with the neural network application of basic hearing losses determined by audiometric measuring. Journal of Networking Technology, 1(2):63-68, 2010.

[18] E. Unal, S. Narayanan, E. Chew, P.G. Georgiou, and N. Dahlin. A dictionary based approach for robust and syllable-independent audio input transcription for query by humming systems. pages 37-44, 2006.

[19] T. Li, M. Ogihara, and Q. Li. A comparative study on content-based music genre classification. pages 282-289, 2003.

[20] S.E. Helling. Rhythmic alteration in seventeenth-and eighteenth­century music: Notes incgales and overdotting. 1993.

[21] B. P ardo and W.P. Birmingham. Query by Humming: How good can it get? T he MIRlMDL Evaluation Project White Paper Collection Edition# 3, 1001:107, 2003.

152