26
HitPredict: Using Spotify Data to Predict Billboard Hits 1 Stanford University Center for Computer Research in Music and Acoustics (CCRMA) 2 Stanford University Department of Civil and Environmental Engineering E LENA G EORGIEVA 1 M ARCELLA S UTA 2 , N ICHOLAS B URTON 2

HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

HitPredict: Using Spotify Data to Predict

Billboard Hits

1Stanford University Center for Computer Research in Music and Acoustics (CCRMA)2Stanford University Department of Civil and Environmental Engineering

E LENA GEORGIEVA1

M ARCELLA S UTA 2 , N I CHOLAS B URTON 2

Page 2: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Overview

§ Inspiration§ Spotify Audio Features§ Dataset Selection§ Machine Learning Approaches§ Next Steps

We approach the “Hit Song Science” problem, aimingto predict which songs will become Billboard Hits

Page 3: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Inspiration

Page 4: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Spotify Audio Features - NY TimesThe New York Times used Spotify’s API to gather information on songs,

1. Loudness: “Volume of the song”2. Energy: “How fast and noisy the song sounds”3. Danceability: “Strength and regularity of the beat”4. Acousticness: “Likelihood that the song uses acoustic instruments5. Valence: “How cheerful the song sounds”

Page 5: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Spotify Audio Features - OthersThe New York Times chose to omit several available features from the Spotify API:

1. Speechiness: “How much spoken words are in a track”2. Instrumentalness: “Detects whether a track contains no vocals”3. Liveness: “Detects whether the track was performed live”4. Tempo: “The beats per minute of a track”5. Duration: “Duration of the track in minutes”6. Mode (Major/Minor)7. Key or Tonality8. Time Signature

Page 6: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

6

Top Songs

Using Spotify Audio Features to Study !the Evolution of Pop Music!

!

Elena Georgieva and Blair Kaneshiro Center for Computer Research in Music and Acoustics (CCRMA)

Stanford University

WiMIR 1st Annual Workshop | Paris, France 2018 Contact: [email protected]

AbstractPopular music is a symbol of culture, and is often looked at as a symbol of a time period or a generation. While there is much research on the evolution of pop music, most such research is anecdotal rather than scientific in nature.3 We investigate the top 5 songs on the Billboard Hot 100 in the first week of September of the years 2018, 2008, 1998, and 1988.1 Nine audio features were taken from Spotify’s API.2 Initial observations show that on average, popular songs are becoming more dance-able, louder, and shorter in length. Notably, tracks are showing more variety after a heavy similarity across all features in 2008.

Data

DiscussionEach of the first 6 features is scaled [0, 1] where 0.5 is the average amount of that quality across all tracks on Spotify. Looking at these sonic footprints, the five top tracks are rather varied in 1988, 1998, and 2018, but shockingly similar in 2008. The ‘Valence’ and ‘Energy’ of top tracks seems to be decreasing as time progresses, but the ‘Danceability’ is increasing. Values of ‘Liveness,’ ‘Acousticness, and ‘Speechiness’ are low across all tracks. The last 3 features are quantitative: Tempo (BPM), song length, and average loudness (dB). Song length of top tracks has been steadily decreasing over the years, while loudness increased until 2008, but calmed down somewhat in 2018. Song tempo values varied very little in 2008, when tracks seemed to be most similar.

FurtherResearchIn the future, we will look at the initial observations more in-depth, and investigate potential causes and repercussions of the observed trends. Furthermore, we will look at top songs in 5 year increments and dating back further to the 1960s. This research can have potential to predict trends in future popular music, as well as predict what styles and recorded tracks will be commercially successful.

References1.  Billboard. (2018). Billboard Hot 100 Chart. Retrieved from:

https://www.billboard.com/charts/hot-100 2.  Chinoy, S. and Ma, J. (2018). Why Songs of the Summer

Sound the Same. Nytimes.com. Retrieved from: https://www.nytimes.com/interactive/2018/08/09/opinion/do-songs-of-the-summer-sound-the-same.html

3.  Mauch, M., MacCallum, R. M., Levy, M., and Leroi, A. M. (2015). The Evolution of Popular Music: USA 1960–2010. R. Soc. open sci.

0

1

2

3

4

5

6

7

year 2018 year 2008 year 1998 year 1988

-16 -14 -12 -10 -8 -6 -4 -2 0

year 2018 year 2008 year 1998 year 1988

0

0.5

1Dance

Energy

Speech

Acoustic

Liveness

Valence "The Boy is Mine" Brady & Monica "My Way" Usher

"The First Night" Monica "Crush" Jennifer Paige "Never Ever" All Saints

0

0.5

1Dance

Energy

Speech

Acoustic

Liveness

Valence "Monkey" George Michael

"I Don't Wanna Go On With You Like That" Elton John "I Don't Wanna Live Without Your Love" Chicago "Sweet Child O' Mine" Guns N' Roses "Simply Irresistible" Robert Palmer

Top Songs: September 2018

Top Songs: September 1998 Top Songs: September 1988

Song Tempo (BPM) Song Length (Minutes)

Loudness (dB)

0

0.5

1 Dance

Energy

Speech

Acoustic

Liveness

Valence "In My Feelings" Drake "I Like It" Cardi B.

"Girls Like You" Maroon5 "Fefe" 6ix9ine

"Better Now" Post Malone

0

0.5

1 Dance

Energy

Speech

Acoustic

Liveness

Valence "Disturbia" Rihanna "Crush" David Archuleta "Forever" Chris Brown "I Kissed A Girl" Katy Perry "Viva La Vida" Coldplay

Top Songs: September 2008

0 20 40 60 80

100 120 140 160 180

year 2018 year 2008 year 1998 year 1988

Page 7: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

7

Top Songs

Using Spotify Audio Features to Study !the Evolution of Pop Music!

!

Elena Georgieva and Blair Kaneshiro Center for Computer Research in Music and Acoustics (CCRMA)

Stanford University

WiMIR 1st Annual Workshop | Paris, France 2018 Contact: [email protected]

AbstractPopular music is a symbol of culture, and is often looked at as a symbol of a time period or a generation. While there is much research on the evolution of pop music, most such research is anecdotal rather than scientific in nature.3 We investigate the top 5 songs on the Billboard Hot 100 in the first week of September of the years 2018, 2008, 1998, and 1988.1 Nine audio features were taken from Spotify’s API.2 Initial observations show that on average, popular songs are becoming more dance-able, louder, and shorter in length. Notably, tracks are showing more variety after a heavy similarity across all features in 2008.

Data

DiscussionEach of the first 6 features is scaled [0, 1] where 0.5 is the average amount of that quality across all tracks on Spotify. Looking at these sonic footprints, the five top tracks are rather varied in 1988, 1998, and 2018, but shockingly similar in 2008. The ‘Valence’ and ‘Energy’ of top tracks seems to be decreasing as time progresses, but the ‘Danceability’ is increasing. Values of ‘Liveness,’ ‘Acousticness, and ‘Speechiness’ are low across all tracks. The last 3 features are quantitative: Tempo (BPM), song length, and average loudness (dB). Song length of top tracks has been steadily decreasing over the years, while loudness increased until 2008, but calmed down somewhat in 2018. Song tempo values varied very little in 2008, when tracks seemed to be most similar.

FurtherResearchIn the future, we will look at the initial observations more in-depth, and investigate potential causes and repercussions of the observed trends. Furthermore, we will look at top songs in 5 year increments and dating back further to the 1960s. This research can have potential to predict trends in future popular music, as well as predict what styles and recorded tracks will be commercially successful.

References1.  Billboard. (2018). Billboard Hot 100 Chart. Retrieved from:

https://www.billboard.com/charts/hot-100 2.  Chinoy, S. and Ma, J. (2018). Why Songs of the Summer

Sound the Same. Nytimes.com. Retrieved from: https://www.nytimes.com/interactive/2018/08/09/opinion/do-songs-of-the-summer-sound-the-same.html

3.  Mauch, M., MacCallum, R. M., Levy, M., and Leroi, A. M. (2015). The Evolution of Popular Music: USA 1960–2010. R. Soc. open sci.

0

1

2

3

4

5

6

7

year 2018 year 2008 year 1998 year 1988

-16 -14 -12 -10

-8 -6 -4 -2 0

year 2018 year 2008 year 1998 year 1988

0

0.5

1Dance

Energy

Speech

Acoustic

Liveness

Valence "The Boy is Mine" Brady & Monica "My Way" Usher

"The First Night" Monica "Crush" Jennifer Paige "Never Ever" All Saints

0

0.5

1Dance

Energy

Speech

Acoustic

Liveness

Valence "Monkey" George Michael

"I Don't Wanna Go On With You Like That" Elton John "I Don't Wanna Live Without Your Love" Chicago "Sweet Child O' Mine" Guns N' Roses "Simply Irresistible" Robert Palmer

Top Songs: September 2018

Top Songs: September 1998 Top Songs: September 1988

Song Tempo (BPM) Song Length (Minutes)

Loudness (dB)

0

0.5

1 Dance

Energy

Speech

Acoustic

Liveness

Valence "In My Feelings" Drake "I Like It" Cardi B.

"Girls Like You" Maroon5 "Fefe" 6ix9ine

"Better Now" Post Malone

0

0.5

1 Dance

Energy

Speech

Acoustic

Liveness

Valence "Disturbia" Rihanna "Crush" David Archuleta "Forever" Chris Brown "I Kissed A Girl" Katy Perry "Viva La Vida" Coldplay

Top Songs: September 2008

0 20 40 60 80

100 120 140 160 180

year 2018 year 2008 year 1998 year 1988

Page 8: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Hit Song ScienceIndustry§ The Echo Nest§ ChartMetric§ Next Big Sound

Academia§ International Society for Music Information Retrieval (ISMIR) Conference

Page 9: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

HitPredict: Using Spotify Data to Predict

Billboard Hits E LENA GEORGIEVA1

M ARCELLA S UTA 2 , N I CHOLAS B URTON 2

Page 10: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Step 1 – Data CollectionBillboard Hits § All unique songs featured on “Billboard Hot 100” § 1990- 2018§ Billboard API Library§ Dataset:

› Artist name, song title, other misc. features

Page 11: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Step 1 – Data CollectionNon-Hit Songs§ Million Song Dataset (labROSA, Columbia University) § 1990- 2018

Page 12: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Step 1 – Data Collection

~4000 songsLabeled 1 (Hit) or 0 (Non-Hit)

All Together§ Remove Overlapping songs § Balance Datasets

Page 13: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Step 2 – Feature CollectionAudio Features§ Spotify Web API § Chose 9 audio features:

› Danceability, Energy, Speechiness, Acousticness, Instrumentalness, Liveness, Valence, Loudness, and Tempo

Page 14: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Step 2 – Feature CollectionAn Additional Feature: “Artist Score”§ Whether or not the artist has a previous Billboard Hit§ Back to 1987 § Using Billboard API Library

Page 15: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Overview

Page 16: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Step 3 – Classification

Figure. A plot of songs’ danceability vs. energy vs. loudness (dB). Black circles represent Billboard hits and red marks represent non-hits.

Page 17: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Step 3 – ClassificationSupervised Learning§ Logistic Regression (LR)§ Gaussian Discriminant Analysis (GDA)§ Neural Network (NN)

› 1 hidden layer of 6 units› Sigmoid activation function› L2 regularization to avoid over-fitting

§ Training/ Testing -- 75/25 split

Page 18: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Some Results

§ LR and GDA yielded accuracies of 75.9% and 73.7%, respectively, against the testing data with similar accuracy against the training data indicating no overfitting

Page 19: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Neural Network

• The NN gives similar accuracy to LR, but interestingly generates significantly higher precision. This shows the robustness of the NN prediction.

• The peak accuracy: ~19000 epochs.

Page 20: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Error AnalysisAblative Analysis• Ablative analysis was used, the features at the end of the list decreased the

accuracy of predictions and were removed.

Page 21: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Time• We divided the data into subsets of five- year periods and split each

subset into training and validation sets (80/20).• In most cases, the accuracy on both the training and validation set

improved, implying that the features of pop music are somewhat unique to the time period of the songs release.

Accuracy on the validation set for specific time periods. Accuracy improves for individual time periods, indicating that hit songs have features unique to their time period. 50

556065707580859095100

1990-2018

1990-1994

1995-1999

2000-2004

2005-2009

2010-2014

2015-2018

Logistic RegressionNeural Network

Page 22: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Conclusion & Future Work

• Why do songs in a given time period hold trends?• Social culture? Commercial Influences?

• “External factors”, difficult to quantify but may be very important in predicting a song’s Billboard success.

• Do we… want this?

Page 23: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

Do we… want this?

NY Times: “Why Songs of the Summer Sound the Same”

Page 24: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

HitPredict

Thanks to my Collaborators Marcella Suta and Nicholas Burton! Thanks to Blair Kaneshiro

Page 25: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

ReferencesBertin-Mahieux, T., Ellis, D. P. W., Whitman, B., and Lamere, P. The Million Song Dataset. In Proceedings of

International Society for Music Information Retrieval, 2011. Chinoy, S. and Ma, J. Why songs of the summer sound the same. New York Times, 2018. Dhanaraj, R. and Logan, B. Automatic prediction of hit songs. In Proceedings of International Society for

Music Information Retrieval, 2005. Guo, A. Python API for Billboard data. github.com. retrieved from: https://pypi.org/project/billboard.py/. Mauch, M., MacCallum, R. M., Levy, M., and Leroi, A. M. The evolution of popular music: USA 1960-2010. In

Royal Society Open Science, 2015. Ni, Y. and Santos-Rodriguez, R. Hit song science once again a science. In International Workshop on Machine

Learning and Music, 2011. Pachet, F. and Roy, P. Hit song science is not yet a sci- ence. In Proceedings of International Society for

Music Information Retrieval, 2008. Singhi, A. and Brown, D. G. Hit song detection using lyric features alone. In Proceedings of International

Society for Music Information Retrieval, 2014. Yang, L.-C., Chou, S.-Y., Liu, J.-Y., Yang, Y.-H., and Chen, Y.-A. Revisiting the problem of audio-based hit

song prediction using convolutional neural networks. In IEEE International Conference on Acoustics, Speech and Sig- nal Processing (ICASSP), 2017.

Zangerle, E., Pichl, M., Hupfauf, B., and Specht, G. Can microblogs predict music charts? an analysis of the rela- tionship between #nowplaying tweets and music charts. In Proceedings of International Society for Music Infor- mation Retrieval, 2016.

Page 26: HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020. 7. 18. · Billboard Hits §All unique songs featured on “Billboard Hot 100”

HitPredict: Using Spotify Data to Predict

Billboard Hits E G E O R G I E @ C C R M A . S TA N F O R D . E D U

THANK YOU!