Personal Spotify Listening Habits

Updated Nov 16, 2022

Project Duration: Nov 2022 - Present

Introduction

After using public government data in my first personal project, I wanted to see how data curated by my own habits could be turned into a project. Thus, I decided to analyze my Spotify listening history to compare my current music taste to my past, and use those findings to consider what my future music taste might be.

I first requested my personal streaming history from the past year (November 2021 - November 2022) from Spotify, but because it would take at least a few days to receive, I had to resort to a different method for the time being. I turned to the site https://www.statsforspotify.com/, which, after logging into my Spotify account, showed me my top 50 tracks, artists, and genres, in the past 4 weeks, 6 months, and for all time. In top songs, there were three different colored symbols next to the names of songs, indicating how they related to the last time I used the website. The green up arrow indicates the song was ranked higher than before, whereas the red down arrow indicates the song was ranked lower. The blue circle means that the ranking of the song hadn't changed since I had last used the site.

Snippet of Top Songs in Past Month from statsforspotify.com

I manually input the top 50 songs and their corresponding artist, language, and genre into a Google spreadsheet for analysis.

For context and comparison, I created a Gantt chart in Google Sheets to map out the different music phases I could remember from the past few years.

Gantt Chart of Music Phases

Currently, I have been listening to Chinese music for the past 2+ years and English songs (some sad) for a little over a month. In the past, I listened to show tunes and original soundtracks for almost 3 years, K-pop music for about 3 years, and lo-fi music for almost 3 years. I had also listened to K-pop and lo-fi during my show tunes phase, as well as listening to Chinese during my main K-pop phase.

The Data

When I received my streaming history, it was given to me as a JSON file, with the attributes of date and time listened, song name, artist, and the duration of the song. In order to be able to work with this data at all, I found a snippet of Python code that allowed me to convert this JSON data into a CSV format, which I edited to work with my data and paths:

# import statements

import csv

import json

# streaming history

json_data = 'projects\spotify_listening\data\StreamingHistory0.json'

csv_conv = 'projects\spotify_listening\data\one_year.csv'

input = open(json_data, encoding='utf8')

output = open(csv_conv, 'w', encoding='utf8')

data = json.load(input)

input.close()

new_csv = csv.writer(output)

new_csv.writerow(data[0].keys())

for r in data:

new_csv.writerow(r.values())

I uploaded the resulting CSV into Google Sheets and converted the attribute for data and time listened, named endTime, into order listened, which I accordingly named Order_Listened, so instead of the date and time, they were numbered starting from 1. This CSV was then imported into my personal BigQuery SQL project, named myportfolio-110818, in the database I named spotify_trends. I named this data table official_1_year as they were my official streaming statistics from the past year.

After running a few queries, I realized that the data was somehow out of order, as the songs were not listened based on my new attribute Order_Listened. Fortunately, all it took was a simply query to list the songs in ascending order by Order_Listened.

Queries and Graphs

For my first query, I wanted to count the total times I listened to each song, ordered from most to least, and limited to the top 50 - similar to my statsforspotify statistics. This was possible with my streaming data as it did not exclude repeated songs at different dates and times. However, I had also listened to podcasts in the past year and there were times I had replayed a podcast in order to find where I last ended. In order to prevent potentially messing up the song order and count, this meant that I would have to exclude these podcasts. Thus, my first query was as follows:

SELECT

Title,

Artist,

COUNT(Title) AS Times_Listened

FROM `myportfolio-110818.spotify_trends.official_1_year`

WHERE ARTIST != 'Rotten Mango'

GROUP BY Title, Artist

ORDER BY Times_Listened DESC

LIMIT 50;

The resulting songs were exported as a CSV, where I included two new columns: Genre and Language. By limiting the songs to 50, these values were much easier to add, whereas before I would've had to fill out over 500 rows. Personal knowledge each song helped me fill out the Language column whereas I relied on searching up the songs to accurately identify their specific genre, if any. Once this was done, I uploaded the CSV to Tableau to be able to visualize each individual song in one interactive graph:

I also uploaded the CSV into BigQuery as a table named official_1_year_genre_language, and grouped the number of songs by language and genre:

SELECT

Language,

COUNT(Times_Listened) AS Songs_Per_Language

FROM `myportfolio-110818.spotify_trends.official_1_year_genre_language`

GROUP BY Language

ORDER BY Total_Language_Listened DESC;

SELECT

Genre,

COUNT(Times_Listened) AS Songs_Per_Genre

FROM `myportfolio-110818.spotify_trends.official_1_year_genre_language`

GROUP BY Genre

ORDER BY Total_Genre_Listened DESC;

My final set of queries grouped the total number of listens to each language and genre:

SELECT

Language,

SUM(Times_Listened) AS Total_Language_Listened

FROM `myportfolio-110818.spotify_trends.official_1_year_genre_language`

GROUP BY Language

ORDER BY Total_Language_Listened DESC;

SELECT

Genre,

SUM(Times_Listened) AS Total_Genre_Listened

FROM `myportfolio-110818.spotify_trends.official_1_year_genre_language`

GROUP BY Genre

ORDER BY Total_Genre_Listened DESC;

Once again, the results of these queries were exported as CSV files, where I then used the data to create charts in Google Sheets. By focusing only on my top 50 listened to songs, I would be able to use pie charts to indicate percentage of total. This was easy to do with my language-focused data as the query had cut down the number of rows from 50 to just 3.

Songs Per Language

Listens Per Language

However, when attempting to do the same with the genre-focused data, there were too many different genres that labels for smaller pie slices were not visible. Although access to the Google spreadsheet and hovering over each pie slice would have given us this data, it was not an option as these graphs were to be displayed as static images. As a solution, I removed the labels and instead used a colored legend which included each genre's percentage out of 50. In order to easily find the percentage, all I had to do was create another column named Percentage, and divided the number of songs per genre by 50, and then I added the resulting values to the Genre column.

Songs Per Genre

Listens Per Genre

Analysis

To clear any confusion regarding the N/A label: lo-fi or instrumental-type music was labeled as N/A language, whereas songs that did not have a specific genre I labeled with N/A genre.

Based on the above charts, it was clear that Chinese music was on top, whether in total songs, listens, or the various genres, and in the CSV containing my listens per song we find that all of my top English songs were in my top 10. This matches my current song phase as I have recently been listening to mostly Chinese and some sad English songs on repeat. Additionally, there were no K-pop, lo-fi, or show tunes in the top 50, which also matches as those phases have not occurred in the past year.

However, we also find that in my listens per song, currently my top three songs are all in English, which doesn't necessarily match considering my Chinese music phase has been longer than my listening to sad English songs.

Other observations include the fact the the song Fly Me Away was released only in June and is my current most-listened to song in the past year, whereas Count The Days was released barely a week before this analysis and already has the second-most listens. Altogether, though, I do have more total listens to songs in Chinese than in English or other languages and genres of music from the past year.

Conclusion

Through this analysis, it is more than likely that I will continue listening to Chinese music, as I have listened to a variety of songs and genres in the past year. Also, due to Fly Me Away having the most listens since it was released, it is likely to remain a favorite of mine. While I may listen to songs in other languages or genres, based on the data, I can infer that I won't be listening to those kinds of songs as often compared to Chinese music.

As this was a small project, I plan to expand by using my full streaming history and working on an algorithm to try to predict what songs I might listen to in the future and how that compares to my music phase.

Click here to view the data and scripts.