Finding new wedding bops with {tidyclust} and {spotifyr}

rstats
tidymodels
Author

Mark Rieke

Published

August 20, 2022

Code
sysfonts::font_add_google("Roboto Slab")
showtext::showtext_auto()

ggplot2::theme_set(
  ggplot2::theme_minimal(base_family = "Roboto Slab", 
                         base_size = 14) +
    ggplot2::theme(plot.title.position = "plot",
                   plot.background = ggplot2::element_rect(fill = "white", color = "white"),
                   plot.title = ggtext::element_markdown(),
                   plot.subtitle = ggtext::element_markdown())
)

Last November, I (finally) popped the big question and proposed! Since then, my fiance and I have been diligently planning our wedding. While we have most of the big-ticket items checked off (venue, catering, photography, etc.), one area we still have more work to do is on the wedding playlist. We’ve started putting together a playlist on spotify, but it feels like it’s come to a bit of a stand-still. Currently, there’s a mix of zesty bops and tame songs on the playlist (we need to accommodate both our college friends and our grandparents!), but spotify’s track recommender only wants to suggest tamer songs right now. Our goal is to have a full dance floor the entire night — to achieve this, we can use spotifyr and the new tidyclust package to pull in the current playlist, cluster the songs based on their features, and find new songs based on the bop cluster.

Code
library(tidymodels)
library(tidyclust)
library(spotifyr)

If you’d like to follow along, I’d recommend installing the development versions of parsnip and workflows, as some of the functionality that interacts with tidyclust isn’t yet on CRAN.

Pulling in the playlist

spotifyr is an R interface to spotify’s web API and gives access to a host of track features (you can follow this tutorial to get it setup). I’ll use the functions get_user_playlists() and get_playlist_tracks() to pull in songs that are currently on our wedding playlist (appropriately named “Ding dong”).

Code
# get the songs that are currently on the wedding playlist
ding_dong <- 
  get_user_playlists("12130039175") %>%
  filter(name == "Ding dong") %>%
  pull(id) %>%
  get_playlist_tracks() %>% 
  as_tibble() %>%
  select(track.id, track.name, track.popularity) %>%
  rename_with(~stringr::str_replace(.x, "\\.", "_"))

ding_dong %>%
  slice_head(n = 10) %>%
  knitr::kable()
track_id track_name track_popularity
5jkFvD4UJrmdoezzT1FRoP Rasputin 61
1D066zixBwqFYqBhKgdPzp Fergalicious 66
12jjuxN1gxlm29cqL5M6MW I Got You 62
2grjqo0Frpf2okIBiifQKs September 78
2RlgNHKcydI9sayD2Df2xp Mr. Blue Sky 76
6x4tKaOzfNJpEJHySoiJcs Mambo No. 5 (a Little Bit of…) 72
3n3Ppam7vgaVa1iaRUc9Lp Mr. Brightside 62
7Cp69rNBwU0gaFT8zxExlE Ymca 45
3Gf5nttwcX9aaSQXRWidEZ Ride Wit Me 72
3wMUvT6eIw2L5cZFG1yH9j Country Grammar (Hot Shit) 65

Spotify estimates quite a few features for each song in their catalog: speechiness (the presence of words on a track), acousticness (whether or not a song includes acoustic instruments), liveness (estimates whether or not the track is live or studio-recorded), etc. We can use get_track_audio_features() to get the features for each song based on its track_id.

Code
# pull in track features of songs on the playlist
track_features <- 
  ding_dong %>%
  pull(track_id) %>%
  get_track_audio_features()

# join together
ding_dong <- 
  ding_dong %>%
  left_join(track_features, by = c("track_id" = "id"))

In my case, I’m interested in the energy and valence (positivity) of each song, so I’ll select these variables to use in the cluster analysis.

Code
ding_dong %>%
  select(track_name, valence, energy) %>%
  slice_head(n = 10) %>%
  knitr::kable()
track_name valence energy
Rasputin 0.966 0.893
Fergalicious 0.829 0.583
I Got You 0.544 0.399
September 0.979 0.832
Mr. Blue Sky 0.478 0.338
Mambo No. 5 (a Little Bit of…) 0.892 0.807
Mr. Brightside 0.240 0.918
Ymca 0.671 0.951
Ride Wit Me 0.722 0.700
Country Grammar (Hot Shit) 0.565 0.664

Clustering with tidyclust

Currently, the playlist covers a wide spectrum of songs. For new songs on the playlist, I’m really just interested in songs similar to others in the top right corner of the below chart with high energy and valence.

Code
# how are valence/energy related?
obj <- 
  ding_dong %>%
  ggplot(aes(x = valence,
             y = energy,
             tooltip = track_name)) + 
  ggiraph::geom_point_interactive(size = 3.5, alpha = 0.5) +
  scale_x_continuous(labels = scales::label_percent(accuracy = 1)) +
  scale_y_continuous(labels = scales::label_percent(accuracy = 1)) +
  labs(title = "The current wedding playlist",
       subtitle = "Hover over each point to see the song's name!")

ggiraph::girafe(
  ggobj = obj,
  options = list(
    ggiraph::opts_tooltip(opacity = 0.8,
                          css = "background-color:gray;color:white;padding:2px;border-radius:2px;font-family:Roboto Slab;"),
    ggiraph::opts_hover(css = "fill:#1279BF;stroke:#1279BF;cursor:pointer;")
  )
)

Broadly, there are three generic categories that the songs on the current playlist fall into: high energy and valence, low energy, or low valence (songs with low energy and valence will fall into one of the “low” categories). Rather than manually assign categories, we can use tidyclust to cluster the songs into three groups using the kmeans algorithm.

There’s some great documentation on the tidyclust site, but to get started, we’ll categorize the songs on the current playlist by “fitting” a kmeans model (using the stats engine under the hood).

Code
# create a clustering obj
set.seed(918)
ding_dong_clusters <- 
  k_means(num_clusters = 3) %>%
  fit(~ valence + energy,
      data = ding_dong) 
Code
pal <- MetBrewer::MetPalettes$Egypt[[1]]

obj <- 
  ding_dong_clusters %>%
  augment(ding_dong) %>%
  ggplot(aes(x = valence,
             y = energy,
             color = .pred_cluster,
             tooltip = track_name)) +
  ggiraph::geom_point_interactive(size = 3.5, alpha = 0.75) +
  scale_x_continuous(labels = scales::label_percent(accuracy = 1)) +
  scale_y_continuous(labels = scales::label_percent(accuarcy = 1)) +
  theme(legend.position = "none") +
  labs(title = "Clusters in the current playlist",
       subtitle = glue::glue("Clustered into",
                             "**{riekelib::color_text('zesty bops',pal[1])}**,",
                             "**{riekelib::color_text('angsty bangers', pal[3])}**,",
                             "and",
                             "**{riekelib::color_text('mellow jams', pal[2])}**",
                             .sep = " ")) +
  MetBrewer::scale_color_met_d("Egypt")

ggiraph::girafe(
  ggobj = obj,
  options = list(
    ggiraph::opts_tooltip(opacity = 0.8,
                          use_fill = TRUE,
                          css = "color:white;padding:2px;border-radius:2px;font-family:Roboto Slab;"),
    ggiraph::opts_hover(css = "fill:#1279BF;stroke:#1279BF;cursor:pointer;")
  )
)

As expected, the majority of songs in the current playlist fall into the bop cluster. Let’s explore this cluster using in more detail with the custom metric vibe.

Code
# assign to clusters
ding_dong_vibes <- 
  ding_dong_clusters %>%
  augment(ding_dong) %>%
  select(track_name,
         valence, 
         energy, 
         .pred_cluster) %>%
  mutate(vibe = valence + energy)

# what are songs with the biggest vibe?
ding_dong_vibes %>%
  arrange(desc(vibe)) %>%
  slice_head(n = 10) %>%
  knitr::kable()
track_name valence energy .pred_cluster vibe
Hey Ya! 0.965 0.974 Cluster_1 1.939
Rasputin 0.966 0.893 Cluster_1 1.859
September 0.979 0.832 Cluster_1 1.811
She Bangs - English Version 0.858 0.950 Cluster_1 1.808
Take on Me 0.876 0.902 Cluster_1 1.778
The Legend of Chavo Guerrero 0.913 0.858 Cluster_1 1.771
Can’t Hold Us (feat. Ray Dalton) 0.847 0.922 Cluster_1 1.769
Toxic 0.924 0.838 Cluster_1 1.762
Timber (feat. Ke$ha) 0.788 0.963 Cluster_1 1.751
Shake It Off 0.942 0.800 Cluster_1 1.742

As expected, when arranging by vibe, the top songs are all a part of the first cluster. And they are, indeed, a vibe:

Compare that with the second cluster, which are generally lower energy (I’d personally disagree with spotify ranking Mr. Blue Sky and Single Ladies as “low energy,” but most others make sense).

Code
ding_dong_vibes %>%
  filter(.pred_cluster == "Cluster_2") %>%
  arrange(vibe) %>%
  slice_head(n = 10) %>%
  knitr::kable()
track_name valence energy .pred_cluster vibe
Mr. Blue Sky 0.478 0.338 Cluster_2 0.816
Single Ladies (Put a Ring on It) 0.272 0.584 Cluster_2 0.856
Low (feat. T-Pain) 0.304 0.609 Cluster_2 0.913
I Got You 0.544 0.399 Cluster_2 0.943
Wake Up in the Sky 0.367 0.578 Cluster_2 0.945
Summer, Highland Falls - Live at the Bayou, Washington, D.C. - July 1980 0.452 0.544 Cluster_2 0.996
Gimme! Gimme! Gimme! (A Man After Midnight) 0.536 0.491 Cluster_2 1.027
Wagon Wheel 0.634 0.403 Cluster_2 1.037
Hung Up 0.405 0.647 Cluster_2 1.052
Take Me Out 0.527 0.663 Cluster_2 1.190

Finally, the third cluster mostly contains songs with low valence but relatively high energy.

Code
ding_dong_vibes %>%
  filter(.pred_cluster == "Cluster_3") %>%
  arrange(vibe) %>%
  slice_head(n = 10) %>%
  knitr::kable()
track_name valence energy .pred_cluster vibe
Clarity 0.176 0.781 Cluster_3 0.957
Love Story 0.296 0.741 Cluster_3 1.037
Titanium (feat. Sia) 0.301 0.787 Cluster_3 1.088
Mr. Brightside 0.240 0.918 Cluster_3 1.158
All Night (feat. Knox Fortune) 0.392 0.777 Cluster_3 1.169
Turn to Stone 0.458 0.720 Cluster_3 1.178
Body 0.469 0.732 Cluster_3 1.201
Forever 0.445 0.819 Cluster_3 1.264
Shout, Pts. 1 & 2 0.416 0.866 Cluster_3 1.282
The Spins 0.550 0.766 Cluster_3 1.316

Now that I have the songs in the current playlist sorted by cluster, let’s pull in some new songs and assign them to the appropriate cluster!

Adding new songs

To go searching for new songs, we’ll start by casting a wide net then narrow the search with some of the get_*() functions from spotifyr. I’ll start by using get_categories() to explore the categories available in spotify.

Code
get_categories() %>%
  as_tibble() %>%
  select(id, name) %>%
  slice_head(n = 10) %>%
  knitr::kable()
id name
toplists Top Lists
0JQ5DAqbMKFQ00XGBls6ym Hip-Hop
0JQ5DAqbMKFEC4WFtoNRpw Pop
0JQ5DAqbMKFKLfwjuJMoNC Country
0JQ5DAqbMKFxXaXKP7zcDp Latin
0JQ5DAqbMKFDXXwE9BDJAr Rock
0JQ5DAqbMKFLVaM30PMBm4 Summer
0JQ5DAqbMKFAXlCG6QvYQ4 Workout
0JQ5DAqbMKFEZPnFQSFB1T R&B
0JQ5DAqbMKFHOzuVTgTizF Dance/Electronic

I don’t really want to play country music or R&B during the wedding, so I’ll filter to a few categories before using get_category_playlists() to pull in the featured playlists available in each category.

Code
# pull in playlist ids
playlists <- 
  get_categories() %>%
  as_tibble() %>%
  filter(id %in% c("toplists", "hiphop", "pop", "rock", "summer")) %>%
  pull(id) %>%
  map_dfr(get_category_playlists) %>%
  as_tibble() %>%
  select(id, name, description) %>%
  distinct(id, .keep_all = TRUE)

playlists %>%
  slice_head(n = 10) %>%
  knitr::kable()
id name description
37i9dQZF1DXcBWIGoYBM5M Today’s Top Hits The Kid LAROI is on top of the Hottest 50!
37i9dQZF1DX0XUsuxWHRQd RapCaviar New music from GloRilla and NLE Choppa.
37i9dQZF1DXcF6B6QPhFDv Rock This YONAKA along with the biggest Rock songs you need to hear today!
37i9dQZF1DX4dyzvuaRJ0n mint The world’s biggest dance hits. Cover: Young Marco
37i9dQZF1DX1lVhptIYRda Hot Country Today’s top country hits of the week, worldwide! Cover: Elle King
37i9dQZF1DX10zKzsJ2jva Viva Latino Today’s top Latin hits, elevando nuestra música. Cover: KAROL G & Romeo Santos
37i9dQZF1DX4SBhb3fqCJd Are & Be The pulse of R&B music today. Cover: Ella Mai
37i9dQZEVXbLRQDuF5jeBp Top 50 - USA Your daily update of the most played tracks right now - USA.
37i9dQZEVXbMDoHDwVN2tF Top 50 - Global Your daily update of the most played tracks right now - Global.
37i9dQZEVXbLiRSasKsNU9 Viral 50 - Global Your daily update of the most viral tracks right now - Global.

There’s a lot of playlists in playlists, so I’ve gone through and selected a few that I’m interested in exploring further.

Code
selected_playlists <-
  c("Today's Top Hits",
    "mint",
    "Top 50 - US",
    "Top 50 - Global",
    "Viral 50 - US",
    "Viral 50 - Global",
    "New Music Friday",
    "Most Necessary",
    "Internet People",
    "Gold School",
    "Hot Hits USA",
    "Pop Rising",
    "teen beats",
    "big on the internet",
    "Party Hits",
    "Mega Hit Mix",
    "Pumped Pop",
    "Hit Rewind",
    "The Ultimate Hit Mix",
    "00s Rock Anthems",
    "Summer Hits",
    "Barack Obama's Summer 2022 Playlist",
    "Summer Hits of the 10s",
    "Family Road Trip")

With this shorter list of playlists, I can pull in the all the songs that appear on each with get_playlist_tracks(). Some songs may appear on multiple playlists, so we’ll only look at unique songs by track_id. I’ve already pulled in features for songs currently on the playlist, so we can filter those out as well. Finally, get_track_audio_features() limits queries to a maximum of 100 songs, so we’ll select the top 100 most popular songs within the sample.

Code
new_songs <- 
  playlists %>%
  filter(name %in% selected_playlists) %>%
  pull(id) %>%
  map_dfr(get_playlist_tracks) %>%
  as_tibble()

new_songs <- 
  new_songs %>%
  select(track.id,
         track.name,
         track.popularity) %>%
  rename_with(~stringr::str_replace(.x, "\\.", "_")) %>%
  distinct(track_id, .keep_all = TRUE) %>%
  arrange(desc(track_popularity)) %>%
  filter(!track_id %in% ding_dong$track_id) %>%
  slice_head(n = 100)

new_songs %>%
  slice_head(n = 10) %>%
  knitr::kable()
track_id track_name track_popularity
0yLdNVWF3Srea0uzk55zFn Flowers 100
4nrPB8O7Y7wsOCJdgXkthe Shakira: Bzrp Music Sessions, Vol. 53 96
2dHHgzDwk4BJdRwy9uXhTO Creepin’ (with The Weeknd & 21 Savage) 94
3nqQXoyQOWXiESFLlDF1hG Unholy (feat. Kim Petras) 93
4uUG5RXrOk84mYEfFvj3cK I’m Good (Blue) 93
5ww2BF9slyYgNOk37BlC4u La Bachata 93
0V3wPSX9ygBnCm8psDIegu Anti-Hero 92
1Qrg8KqiBpW07V7PNxwwwL Kill Bill 91
0WtM2NBVQNNJLh6scP13H8 Calm Down (with Selena Gomez) 91
78Sw5GDo6AlGwTwanjXbGh Here With Me 91

Now let’s assign these 100 news songs to the clusters we found earlier based on their valence and energy!

Code
new_song_features <- 
  new_songs %>%
  pull(track_id) %>%
  get_track_audio_features()

new_songs <- 
  new_songs %>%
  left_join(new_song_features, by = c("track_id" = "id"))

new_songs_clustered <- 
  ding_dong_clusters %>%
  augment(new_songs) %>%
  select(track_name,
         valence,
         energy,
         .pred_cluster) %>%
  mutate(vibe = valence + energy)

obj <- 
  new_songs_clustered %>%
  ggplot(aes(x = valence,
             y = energy,
             color = .pred_cluster,
             tooltip = track_name)) +
  ggiraph::geom_point_interactive(size = 3.5, alpha = 0.75) +
  scale_x_continuous(labels = scales::label_percent(accuracy = 1)) +
  scale_y_continuous(labels = scales::label_percent(accuarcy = 1)) + 
  theme(legend.position = "none") +
  labs(title = "New songs, same old clusters",
       subtitle = glue::glue("Clustered into",
                             "**{riekelib::color_text('zesty bops',pal[1])}**,",
                             "**{riekelib::color_text('angsty bangers', pal[3])}**,",
                             "and",
                             "**{riekelib::color_text('mellow jams', pal[2])}**",
                             .sep = " ")) +
  MetBrewer::scale_color_met_d("Egypt")

ggiraph::girafe(
  ggobj = obj,
  options = list(
    ggiraph::opts_tooltip(opacity = 0.8,
                          use_fill = TRUE,
                          css = "color:white;padding:2px;border-radius:2px;font-family:Roboto Slab;"),
    ggiraph::opts_hover(css = "fill:#1279BF;stroke:#1279BF;cursor:pointer;")
  )
)

Nice! It looks like the new songs are far more broad than the original playlist, but we can look at just the songs in the first cluster with the biggest vibe.

Code
new_songs_clustered %>%
  filter(.pred_cluster == "Cluster_1") %>%
  arrange(desc(vibe)) %>%
  slice_head(n = 10) %>%
  knitr::kable()
track_name valence energy .pred_cluster vibe
In The Yuma (feat. Aatig) 0.940 0.891 Cluster_1 1.831
MERCHO 0.962 0.790 Cluster_1 1.752
Zona De Perigo 0.970 0.767 Cluster_1 1.737
I’m Not Here To Make Friends 0.838 0.895 Cluster_1 1.733
PRC 0.893 0.826 Cluster_1 1.719
Boy’s a liar Pt. 2 0.857 0.809 Cluster_1 1.666
Late Night Talking 0.901 0.728 Cluster_1 1.629
I Ain’t Worried 0.825 0.797 Cluster_1 1.622
All By Myself 0.773 0.848 Cluster_1 1.621
Calm Down (with Selena Gomez) 0.802 0.806 Cluster_1 1.608

Now for the true vibe check — do these songs belong on the playlist?

Oh hell yeah!

This analysis was originally done on Aug. 20th, 2022 — Spotify’s featured playlists and tracks change on on a regular basis and also are time dependent on unique user data. When remapping from blogdown to Quarto in February 2023, it’s likely that the songs mentioned in the text differ from the songs pulled in from Spotify’s API.

Citation

BibTeX citation:
@online{rieke2022,
  author = {Rieke, Mark},
  title = {Finding New Wedding Bops with \{Tidyclust\} and \{Spotifyr\}},
  date = {2022-08-20},
  url = {https://www.thedatadiary.net/posts/2022-08-20-finding-new-wedding-bops-with-tidyclust-and-spotifyr/},
  langid = {en}
}
For attribution, please cite this work as:
Rieke, Mark. 2022. “Finding New Wedding Bops with {Tidyclust} and {Spotifyr}.” August 20, 2022. https://www.thedatadiary.net/posts/2022-08-20-finding-new-wedding-bops-with-tidyclust-and-spotifyr/.

Stay in touch

Support my work with a coffee

Share