Last November, I (finally) popped the big question and proposed! Since then, my fiance and I have been diligently planning our wedding. While we have most of the big-ticket items checked off (venue, catering, photography, etc.), one area we still have more work to do is on the wedding playlist. We’ve started putting together a playlist on spotify, but it feels like it’s come to a bit of a stand-still. Currently, there’s a mix of zesty bops and tame songs on the playlist (we need to accommodate both our college friends and our grandparents!), but spotify’s track recommender only wants to suggest tamer songs right now. Our goal is to have a full dance floor the entire night — to achieve this, we can use spotifyr and the new tidyclust package to pull in the current playlist, cluster the songs based on their features, and find new songs based on the bop cluster.
If you’d like to follow along, I’d recommend installing the development versions of parsnip and workflows, as some of the functionality that interacts with tidyclust isn’t yet on CRAN.
Pulling in the playlist
spotifyr is an R interface to spotify’s web API and gives access to a host of track features (you can follow this tutorial to get it setup). I’ll use the functions get_user_playlists() and get_playlist_tracks() to pull in songs that are currently on our wedding playlist (appropriately named “Ding dong”).
Code
# get the songs that are currently on the wedding playlistding_dong <-get_user_playlists("12130039175") %>%filter(name =="Ding dong") %>%pull(id) %>%get_playlist_tracks() %>%as_tibble() %>%select(track.id, track.name, track.popularity) %>%rename_with(~stringr::str_replace(.x, "\\.", "_"))ding_dong %>%slice_head(n =10) %>% knitr::kable()
track_id
track_name
track_popularity
5jkFvD4UJrmdoezzT1FRoP
Rasputin
61
1D066zixBwqFYqBhKgdPzp
Fergalicious
66
12jjuxN1gxlm29cqL5M6MW
I Got You
62
2grjqo0Frpf2okIBiifQKs
September
78
2RlgNHKcydI9sayD2Df2xp
Mr. Blue Sky
76
6x4tKaOzfNJpEJHySoiJcs
Mambo No. 5 (a Little Bit of…)
72
3n3Ppam7vgaVa1iaRUc9Lp
Mr. Brightside
62
7Cp69rNBwU0gaFT8zxExlE
Ymca
45
3Gf5nttwcX9aaSQXRWidEZ
Ride Wit Me
72
3wMUvT6eIw2L5cZFG1yH9j
Country Grammar (Hot Shit)
65
Spotify estimates quite a few features for each song in their catalog: speechiness (the presence of words on a track), acousticness (whether or not a song includes acoustic instruments), liveness (estimates whether or not the track is live or studio-recorded), etc. We can use get_track_audio_features() to get the features for each song based on its track_id.
Code
# pull in track features of songs on the playlisttrack_features <- ding_dong %>%pull(track_id) %>%get_track_audio_features()# join togetherding_dong <- ding_dong %>%left_join(track_features, by =c("track_id"="id"))
In my case, I’m interested in the energy and valence (positivity) of each song, so I’ll select these variables to use in the cluster analysis.
Currently, the playlist covers a wide spectrum of songs. For new songs on the playlist, I’m really just interested in songs similar to others in the top right corner of the below chart with high energy and valence.
Code
# how are valence/energy related?obj <- ding_dong %>%ggplot(aes(x = valence,y = energy,tooltip = track_name)) + ggiraph::geom_point_interactive(size =3.5, alpha =0.5) +scale_x_continuous(labels = scales::label_percent(accuracy =1)) +scale_y_continuous(labels = scales::label_percent(accuracy =1)) +labs(title ="The current wedding playlist",subtitle ="Hover over each point to see the song's name!")ggiraph::girafe(ggobj = obj,options =list( ggiraph::opts_tooltip(opacity =0.8,css ="background-color:gray;color:white;padding:2px;border-radius:2px;font-family:Roboto Slab;"), ggiraph::opts_hover(css ="fill:#1279BF;stroke:#1279BF;cursor:pointer;") ))
Broadly, there are three generic categories that the songs on the current playlist fall into: high energy and valence, low energy, or low valence (songs with low energy and valence will fall into one of the “low” categories). Rather than manually assign categories, we can use tidyclust to cluster the songs into three groups using the kmeans algorithm.
There’s some great documentation on the tidyclust site, but to get started, we’ll categorize the songs on the current playlist by “fitting” a kmeans model (using the stats engine under the hood).
As expected, the majority of songs in the current playlist fall into the bop cluster. Let’s explore this cluster using in more detail with the custom metric vibe.
Code
# assign to clustersding_dong_vibes <- ding_dong_clusters %>%augment(ding_dong) %>%select(track_name, valence, energy, .pred_cluster) %>%mutate(vibe = valence + energy)# what are songs with the biggest vibe?ding_dong_vibes %>%arrange(desc(vibe)) %>%slice_head(n =10) %>% knitr::kable()
track_name
valence
energy
.pred_cluster
vibe
Hey Ya!
0.965
0.974
Cluster_1
1.939
Rasputin
0.966
0.893
Cluster_1
1.859
September
0.979
0.832
Cluster_1
1.811
She Bangs - English Version
0.858
0.950
Cluster_1
1.808
Take on Me
0.876
0.902
Cluster_1
1.778
The Legend of Chavo Guerrero
0.913
0.858
Cluster_1
1.771
Can’t Hold Us (feat. Ray Dalton)
0.847
0.922
Cluster_1
1.769
Toxic
0.924
0.838
Cluster_1
1.762
Timber (feat. Ke$ha)
0.788
0.963
Cluster_1
1.751
Shake It Off
0.942
0.800
Cluster_1
1.742
As expected, when arranging by vibe, the top songs are all a part of the first cluster. And they are, indeed, a vibe:
Compare that with the second cluster, which are generally lower energy (I’d personally disagree with spotify ranking Mr. Blue Sky and Single Ladies as “low energy,” but most others make sense).
Now that I have the songs in the current playlist sorted by cluster, let’s pull in some new songs and assign them to the appropriate cluster!
Adding new songs
To go searching for new songs, we’ll start by casting a wide net then narrow the search with some of the get_*() functions from spotifyr. I’ll start by using get_categories() to explore the categories available in spotify.
I don’t really want to play country music or R&B during the wedding, so I’ll filter to a few categories before using get_category_playlists() to pull in the featured playlists available in each category.
YONAKA along with the biggest Rock songs you need to hear today!
37i9dQZF1DX4dyzvuaRJ0n
mint
The world’s biggest dance hits. Cover: Young Marco
37i9dQZF1DX1lVhptIYRda
Hot Country
Today’s top country hits of the week, worldwide! Cover: Elle King
37i9dQZF1DX10zKzsJ2jva
Viva Latino
Today’s top Latin hits, elevando nuestra música. Cover: KAROL G & Romeo Santos
37i9dQZF1DX4SBhb3fqCJd
Are & Be
The pulse of R&B music today. Cover: Ella Mai
37i9dQZEVXbLRQDuF5jeBp
Top 50 - USA
Your daily update of the most played tracks right now - USA.
37i9dQZEVXbMDoHDwVN2tF
Top 50 - Global
Your daily update of the most played tracks right now - Global.
37i9dQZEVXbLiRSasKsNU9
Viral 50 - Global
Your daily update of the most viral tracks right now - Global.
There’s a lot of playlists in playlists, so I’ve gone through and selected a few that I’m interested in exploring further.
Code
selected_playlists <-c("Today's Top Hits","mint","Top 50 - US","Top 50 - Global","Viral 50 - US","Viral 50 - Global","New Music Friday","Most Necessary","Internet People","Gold School","Hot Hits USA","Pop Rising","teen beats","big on the internet","Party Hits","Mega Hit Mix","Pumped Pop","Hit Rewind","The Ultimate Hit Mix","00s Rock Anthems","Summer Hits","Barack Obama's Summer 2022 Playlist","Summer Hits of the 10s","Family Road Trip")
With this shorter list of playlists, I can pull in the all the songs that appear on each with get_playlist_tracks(). Some songs may appear on multiple playlists, so we’ll only look at unique songs by track_id. I’ve already pulled in features for songs currently on the playlist, so we can filter those out as well. Finally, get_track_audio_features() limits queries to a maximum of 100 songs, so we’ll select the top 100 most popular songs within the sample.
Nice! It looks like the new songs are far more broad than the original playlist, but we can look at just the songs in the first cluster with the biggest vibe.
Now for the true vibe check — do these songs belong on the playlist?
Oh hell yeah!
This analysis was originally done on Aug. 20th, 2022 — Spotify’s featured playlists and tracks change on on a regular basis and also are time dependent on unique user data. When remapping from blogdown to Quarto in February 2023, it’s likely that the songs mentioned in the text differ from the songs pulled in from Spotify’s API.
Citation
BibTeX citation:
@online{rieke2022,
author = {Rieke, Mark},
title = {Finding New Wedding Bops with \{Tidyclust\} and \{Spotifyr\}},
date = {2022-08-20},
url = {https://www.thedatadiary.net/posts/2022-08-20-finding-new-wedding-bops-with-tidyclust-and-spotifyr/},
langid = {en}
}