HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020....

Preview:

Citation preview

HitPredict: Using Spotify Data to Predict

Billboard Hits

1Stanford University Center for Computer Research in Music and Acoustics (CCRMA)2Stanford University Department of Civil and Environmental Engineering

E LENA GEORGIEVA1

M ARCELLA S UTA 2 , N I CHOLAS B URTON 2

Overview

§ Inspiration§ Spotify Audio Features§ Dataset Selection§ Machine Learning Approaches§ Next Steps

We approach the “Hit Song Science” problem, aimingto predict which songs will become Billboard Hits

Inspiration

Spotify Audio Features - NY TimesThe New York Times used Spotify’s API to gather information on songs,

1. Loudness: “Volume of the song”2. Energy: “How fast and noisy the song sounds”3. Danceability: “Strength and regularity of the beat”4. Acousticness: “Likelihood that the song uses acoustic instruments5. Valence: “How cheerful the song sounds”

Spotify Audio Features - OthersThe New York Times chose to omit several available features from the Spotify API:

1. Speechiness: “How much spoken words are in a track”2. Instrumentalness: “Detects whether a track contains no vocals”3. Liveness: “Detects whether the track was performed live”4. Tempo: “The beats per minute of a track”5. Duration: “Duration of the track in minutes”6. Mode (Major/Minor)7. Key or Tonality8. Time Signature

6

Top Songs

Using Spotify Audio Features to Study !the Evolution of Pop Music!

!

Elena Georgieva and Blair Kaneshiro Center for Computer Research in Music and Acoustics (CCRMA)

Stanford University

WiMIR 1st Annual Workshop | Paris, France 2018 Contact: egeorgie@stanford.edu

AbstractPopular music is a symbol of culture, and is often looked at as a symbol of a time period or a generation. While there is much research on the evolution of pop music, most such research is anecdotal rather than scientific in nature.3 We investigate the top 5 songs on the Billboard Hot 100 in the first week of September of the years 2018, 2008, 1998, and 1988.1 Nine audio features were taken from Spotify’s API.2 Initial observations show that on average, popular songs are becoming more dance-able, louder, and shorter in length. Notably, tracks are showing more variety after a heavy similarity across all features in 2008.

Data

DiscussionEach of the first 6 features is scaled [0, 1] where 0.5 is the average amount of that quality across all tracks on Spotify. Looking at these sonic footprints, the five top tracks are rather varied in 1988, 1998, and 2018, but shockingly similar in 2008. The ‘Valence’ and ‘Energy’ of top tracks seems to be decreasing as time progresses, but the ‘Danceability’ is increasing. Values of ‘Liveness,’ ‘Acousticness, and ‘Speechiness’ are low across all tracks. The last 3 features are quantitative: Tempo (BPM), song length, and average loudness (dB). Song length of top tracks has been steadily decreasing over the years, while loudness increased until 2008, but calmed down somewhat in 2018. Song tempo values varied very little in 2008, when tracks seemed to be most similar.

FurtherResearchIn the future, we will look at the initial observations more in-depth, and investigate potential causes and repercussions of the observed trends. Furthermore, we will look at top songs in 5 year increments and dating back further to the 1960s. This research can have potential to predict trends in future popular music, as well as predict what styles and recorded tracks will be commercially successful.

References1.  Billboard. (2018). Billboard Hot 100 Chart. Retrieved from:

https://www.billboard.com/charts/hot-100 2.  Chinoy, S. and Ma, J. (2018). Why Songs of the Summer

Sound the Same. Nytimes.com. Retrieved from: https://www.nytimes.com/interactive/2018/08/09/opinion/do-songs-of-the-summer-sound-the-same.html

3.  Mauch, M., MacCallum, R. M., Levy, M., and Leroi, A. M. (2015). The Evolution of Popular Music: USA 1960–2010. R. Soc. open sci.

0

1

2

3

4

5

6

7

year 2018 year 2008 year 1998 year 1988

-16 -14 -12 -10 -8 -6 -4 -2 0

year 2018 year 2008 year 1998 year 1988

0

0.5

1Dance

Energy

Speech

Acoustic

Liveness

Valence "The Boy is Mine" Brady & Monica "My Way" Usher

"The First Night" Monica "Crush" Jennifer Paige "Never Ever" All Saints

0

0.5

1Dance

Energy

Speech

Acoustic

Liveness

Valence "Monkey" George Michael

"I Don't Wanna Go On With You Like That" Elton John "I Don't Wanna Live Without Your Love" Chicago "Sweet Child O' Mine" Guns N' Roses "Simply Irresistible" Robert Palmer

Top Songs: September 2018

Top Songs: September 1998 Top Songs: September 1988

Song Tempo (BPM) Song Length (Minutes)

Loudness (dB)

0

0.5

1 Dance

Energy

Speech

Acoustic

Liveness

Valence "In My Feelings" Drake "I Like It" Cardi B.

"Girls Like You" Maroon5 "Fefe" 6ix9ine

"Better Now" Post Malone

0

0.5

1 Dance

Energy

Speech

Acoustic

Liveness

Valence "Disturbia" Rihanna "Crush" David Archuleta "Forever" Chris Brown "I Kissed A Girl" Katy Perry "Viva La Vida" Coldplay

Top Songs: September 2008

0 20 40 60 80

100 120 140 160 180

year 2018 year 2008 year 1998 year 1988

7

Top Songs

Using Spotify Audio Features to Study !the Evolution of Pop Music!

!

Elena Georgieva and Blair Kaneshiro Center for Computer Research in Music and Acoustics (CCRMA)

Stanford University

WiMIR 1st Annual Workshop | Paris, France 2018 Contact: egeorgie@stanford.edu

AbstractPopular music is a symbol of culture, and is often looked at as a symbol of a time period or a generation. While there is much research on the evolution of pop music, most such research is anecdotal rather than scientific in nature.3 We investigate the top 5 songs on the Billboard Hot 100 in the first week of September of the years 2018, 2008, 1998, and 1988.1 Nine audio features were taken from Spotify’s API.2 Initial observations show that on average, popular songs are becoming more dance-able, louder, and shorter in length. Notably, tracks are showing more variety after a heavy similarity across all features in 2008.

Data

DiscussionEach of the first 6 features is scaled [0, 1] where 0.5 is the average amount of that quality across all tracks on Spotify. Looking at these sonic footprints, the five top tracks are rather varied in 1988, 1998, and 2018, but shockingly similar in 2008. The ‘Valence’ and ‘Energy’ of top tracks seems to be decreasing as time progresses, but the ‘Danceability’ is increasing. Values of ‘Liveness,’ ‘Acousticness, and ‘Speechiness’ are low across all tracks. The last 3 features are quantitative: Tempo (BPM), song length, and average loudness (dB). Song length of top tracks has been steadily decreasing over the years, while loudness increased until 2008, but calmed down somewhat in 2018. Song tempo values varied very little in 2008, when tracks seemed to be most similar.

FurtherResearchIn the future, we will look at the initial observations more in-depth, and investigate potential causes and repercussions of the observed trends. Furthermore, we will look at top songs in 5 year increments and dating back further to the 1960s. This research can have potential to predict trends in future popular music, as well as predict what styles and recorded tracks will be commercially successful.

References1.  Billboard. (2018). Billboard Hot 100 Chart. Retrieved from:

https://www.billboard.com/charts/hot-100 2.  Chinoy, S. and Ma, J. (2018). Why Songs of the Summer

Sound the Same. Nytimes.com. Retrieved from: https://www.nytimes.com/interactive/2018/08/09/opinion/do-songs-of-the-summer-sound-the-same.html

3.  Mauch, M., MacCallum, R. M., Levy, M., and Leroi, A. M. (2015). The Evolution of Popular Music: USA 1960–2010. R. Soc. open sci.

0

1

2

3

4

5

6

7

year 2018 year 2008 year 1998 year 1988

-16 -14 -12 -10

-8 -6 -4 -2 0

year 2018 year 2008 year 1998 year 1988

0

0.5

1Dance

Energy

Speech

Acoustic

Liveness

Valence "The Boy is Mine" Brady & Monica "My Way" Usher

"The First Night" Monica "Crush" Jennifer Paige "Never Ever" All Saints

0

0.5

1Dance

Energy

Speech

Acoustic

Liveness

Valence "Monkey" George Michael

"I Don't Wanna Go On With You Like That" Elton John "I Don't Wanna Live Without Your Love" Chicago "Sweet Child O' Mine" Guns N' Roses "Simply Irresistible" Robert Palmer

Top Songs: September 2018

Top Songs: September 1998 Top Songs: September 1988

Song Tempo (BPM) Song Length (Minutes)

Loudness (dB)

0

0.5

1 Dance

Energy

Speech

Acoustic

Liveness

Valence "In My Feelings" Drake "I Like It" Cardi B.

"Girls Like You" Maroon5 "Fefe" 6ix9ine

"Better Now" Post Malone

0

0.5

1 Dance

Energy

Speech

Acoustic

Liveness

Valence "Disturbia" Rihanna "Crush" David Archuleta "Forever" Chris Brown "I Kissed A Girl" Katy Perry "Viva La Vida" Coldplay

Top Songs: September 2008

0 20 40 60 80

100 120 140 160 180

year 2018 year 2008 year 1998 year 1988

Hit Song ScienceIndustry§ The Echo Nest§ ChartMetric§ Next Big Sound

Academia§ International Society for Music Information Retrieval (ISMIR) Conference

HitPredict: Using Spotify Data to Predict

Billboard Hits E LENA GEORGIEVA1

M ARCELLA S UTA 2 , N I CHOLAS B URTON 2

Step 1 – Data CollectionBillboard Hits § All unique songs featured on “Billboard Hot 100” § 1990- 2018§ Billboard API Library§ Dataset:

› Artist name, song title, other misc. features

Step 1 – Data CollectionNon-Hit Songs§ Million Song Dataset (labROSA, Columbia University) § 1990- 2018

Step 1 – Data Collection

~4000 songsLabeled 1 (Hit) or 0 (Non-Hit)

All Together§ Remove Overlapping songs § Balance Datasets

Step 2 – Feature CollectionAudio Features§ Spotify Web API § Chose 9 audio features:

› Danceability, Energy, Speechiness, Acousticness, Instrumentalness, Liveness, Valence, Loudness, and Tempo

Step 2 – Feature CollectionAn Additional Feature: “Artist Score”§ Whether or not the artist has a previous Billboard Hit§ Back to 1987 § Using Billboard API Library

Overview

Step 3 – Classification

Figure. A plot of songs’ danceability vs. energy vs. loudness (dB). Black circles represent Billboard hits and red marks represent non-hits.

Step 3 – ClassificationSupervised Learning§ Logistic Regression (LR)§ Gaussian Discriminant Analysis (GDA)§ Neural Network (NN)

› 1 hidden layer of 6 units› Sigmoid activation function› L2 regularization to avoid over-fitting

§ Training/ Testing -- 75/25 split

Some Results

§ LR and GDA yielded accuracies of 75.9% and 73.7%, respectively, against the testing data with similar accuracy against the training data indicating no overfitting

Neural Network

• The NN gives similar accuracy to LR, but interestingly generates significantly higher precision. This shows the robustness of the NN prediction.

• The peak accuracy: ~19000 epochs.

Error AnalysisAblative Analysis• Ablative analysis was used, the features at the end of the list decreased the

accuracy of predictions and were removed.

Time• We divided the data into subsets of five- year periods and split each

subset into training and validation sets (80/20).• In most cases, the accuracy on both the training and validation set

improved, implying that the features of pop music are somewhat unique to the time period of the songs release.

Accuracy on the validation set for specific time periods. Accuracy improves for individual time periods, indicating that hit songs have features unique to their time period. 50

556065707580859095100

1990-2018

1990-1994

1995-1999

2000-2004

2005-2009

2010-2014

2015-2018

Logistic RegressionNeural Network

Conclusion & Future Work

• Why do songs in a given time period hold trends?• Social culture? Commercial Influences?

• “External factors”, difficult to quantify but may be very important in predicting a song’s Billboard success.

• Do we… want this?

Do we… want this?

NY Times: “Why Songs of the Summer Sound the Same”

HitPredict

Thanks to my Collaborators Marcella Suta and Nicholas Burton! Thanks to Blair Kaneshiro

ReferencesBertin-Mahieux, T., Ellis, D. P. W., Whitman, B., and Lamere, P. The Million Song Dataset. In Proceedings of

International Society for Music Information Retrieval, 2011. Chinoy, S. and Ma, J. Why songs of the summer sound the same. New York Times, 2018. Dhanaraj, R. and Logan, B. Automatic prediction of hit songs. In Proceedings of International Society for

Music Information Retrieval, 2005. Guo, A. Python API for Billboard data. github.com. retrieved from: https://pypi.org/project/billboard.py/. Mauch, M., MacCallum, R. M., Levy, M., and Leroi, A. M. The evolution of popular music: USA 1960-2010. In

Royal Society Open Science, 2015. Ni, Y. and Santos-Rodriguez, R. Hit song science once again a science. In International Workshop on Machine

Learning and Music, 2011. Pachet, F. and Roy, P. Hit song science is not yet a sci- ence. In Proceedings of International Society for

Music Information Retrieval, 2008. Singhi, A. and Brown, D. G. Hit song detection using lyric features alone. In Proceedings of International

Society for Music Information Retrieval, 2014. Yang, L.-C., Chou, S.-Y., Liu, J.-Y., Yang, Y.-H., and Chen, Y.-A. Revisiting the problem of audio-based hit

song prediction using convolutional neural networks. In IEEE International Conference on Acoustics, Speech and Sig- nal Processing (ICASSP), 2017.

Zangerle, E., Pichl, M., Hupfauf, B., and Specht, G. Can microblogs predict music charts? an analysis of the rela- tionship between #nowplaying tweets and music charts. In Proceedings of International Society for Music Infor- mation Retrieval, 2016.

HitPredict: Using Spotify Data to Predict

Billboard Hits E G E O R G I E @ C C R M A . S TA N F O R D . E D U

THANK YOU!