Personal Spotify Listening Habits
Updated Nov 16, 2022
Project Duration: Nov 2022 - Present
Introduction
After using public government data in my first personal project, I wanted to see how data curated by my own habits could be turned into a project. Thus, I decided to analyze my Spotify listening history to compare my current music taste to my past, and use those findings to consider what my future music taste might be.
​
I first requested my personal streaming history from the past year (November 2021 - November 2022) from Spotify, but because it would take at least a few days to receive, I had to resort to a different method for the time being. I turned to the site https://www.statsforspotify.com/, which, after logging into my Spotify account, showed me my top 50 tracks, artists, and genres, in the past 4 weeks, 6 months, and for all time. In top songs, there were three different colored symbols next to the names of songs, indicating how they related to the last time I used the website. The green up arrow indicates the song was ranked higher than before, whereas the red down arrow indicates the song was ranked lower. The blue circle means that the ranking of the song hadn't changed since I had last used the site.

Snippet of Top Songs in Past Month from statsforspotify.com
I manually input the top 50 songs and their corresponding artist, language, and genre into a Google spreadsheet for analysis.
For context and comparison, I created a Gantt chart in Google Sheets to map out the different music phases I could remember from the past few years.

Gantt Chart of Music Phases
Currently, I have been listening to Chinese music for the past 2+ years and English songs (some sad) for a little over a month. In the past, I listened to show tunes and original soundtracks for almost 3 years, K-pop music for about 3 years, and lo-fi music for almost 3 years. I had also listened to K-pop and lo-fi during my show tunes phase, as well as listening to Chinese during my main K-pop phase.
The Data
When I received my streaming history, it was given to me as a JSON file, with the attributes of date and time listened, song name, artist, and the duration of the song. In order to be able to work with this data at all, I found a snippet of Python code that allowed me to convert this JSON data into a CSV format, which I edited to work with my data and paths:
# import statements
import csv
import json
# streaming history
json_data = 'projects\spotify_listening\data\StreamingHistory0.json'
csv_conv = 'projects\spotify_listening\data\one_year.csv'
input = open(json_data, encoding='utf8')
output = open(csv_conv, 'w', encoding='utf8')
data = json.load(input)
input.close()
new_csv = csv.writer(output)
new_csv.writerow(data[0].keys())
for r in data:
new_csv.writerow(r.values())
​I uploaded the resulting CSV into Google Sheets and converted the attribute for data and time listened, named endTime, into order listened, which I accordingly named Order_Listened, so instead of the date and time, they were numbered starting from 1. This CSV was then imported into my personal BigQuery SQL project, named myportfolio-110818, in the database I named spotify_trends. I named this data table official_1_year as they were my official streaming statistics from the past year.
After running a few queries, I realized that the data was somehow out of order, as the songs were not listened based on my new attribute Order_Listened. Fortunately, all it took was a simply query to list the songs in ascending order by Order_Listened.
Queries and Graphs
For my first query, I wanted to count the total times I listened to each song, ordered from most to least, and limited to the top 50 - similar to my statsforspotify statistics. This was possible with my streaming data as it did not exclude repeated songs at different dates and times. However, I had also listened to podcasts in the past year and there were times I had replayed a podcast in order to find where I last ended. In order to prevent potentially messing up the song order and count, this meant that I would have to exclude these podcasts. Thus, my first query was as follows:
SELECT
Title,
Artist,
COUNT(Title) AS Times_Listened
FROM `myportfolio-110818.spotify_trends.official_1_year`
WHERE ARTIST != 'Rotten Mango'
GROUP BY Title, Artist
ORDER BY Times_Listened DESC
LIMIT 50;
​The resulting songs were exported as a CSV, where I included two new columns: Genre and Language. By limiting the songs to 50, these values were much easier to add, whereas before I would've had to fill out over 500 rows. Personal knowledge each song helped me fill out the Language column whereas I relied on searching up the songs to accurately identify their specific genre, if any. Once this was done, I uploaded the CSV to Tableau to be able to visualize each individual song in one interactive graph:
I also uploaded the CSV into BigQuery as a table named official_1_year_genre_language, and grouped the number of songs by language and genre:
SELECT
Language,
COUNT(Times_Listened) AS Songs_Per_Language
FROM `myportfolio-110818.spotify_trends.official_1_year_genre_language`
GROUP BY Language
ORDER BY Total_Language_Listened DESC;
​
SELECT
Genre,
COUNT(Times_Listened) AS Songs_Per_Genre
FROM `myportfolio-110818.spotify_trends.official_1_year_genre_language`
GROUP BY Genre
ORDER BY Total_Genre_Listened DESC;
SELECT
Language,
SUM(Times_Listened) AS Total_Language_Listened
FROM `myportfolio-110818.spotify_trends.official_1_year_genre_language`
GROUP BY Language
ORDER BY Total_Language_Listened DESC;
SELECT
Genre,
SUM(Times_Listened) AS Total_Genre_Listened
FROM `myportfolio-110818.spotify_trends.official_1_year_genre_language`
GROUP BY Genre
ORDER BY Total_Genre_Listened DESC;
Once again, the results of these queries were exported as CSV files, where I then used the data to create charts in Google Sheets. By focusing only on my top 50 listened to songs, I would be able to use pie charts to indicate percentage of total. This was easy to do with my language-focused data as the query had cut down the number of rows from 50 to just 3.

Songs Per Language

Listens Per Language
However, when attempting to do the same with the genre-focused data, there were too many different genres that labels for smaller pie slices were not visible. Although access to the Google spreadsheet and hovering over each pie slice would have given us this data, it was not an option as these graphs were to be displayed as static images. As a solution, I removed the labels and instead used a colored legend which included each genre's percentage out of 50. In order to easily find the percentage, all I had to do was create another column named Percentage, and divided the number of songs per genre by 50, and then I added the resulting values to the Genre column.

Songs Per Genre

Listens Per Genre
Analysis
To clear any confusion regarding the N/A label: lo-fi or instrumental-type music was labeled as N/A language, whereas songs that did not have a specific genre I labeled with N/A genre.
Based on the above charts, it was clear that Chinese music was on top, whether in total songs, listens, or the various genres, and in the CSV containing my listens per song we find that all of my top English songs were in my top 10. This matches my current song phase as I have recently been listening to mostly Chinese and some sad English songs on repeat. Additionally, there were no K-pop, lo-fi, or show tunes in the top 50, which also matches as those phases have not occurred in the past year.
However, we also find that in my listens per song, currently my top three songs are all in English, which doesn't necessarily match considering my Chinese music phase has been longer than my listening to sad English songs.
Other observations include the fact the the song Fly Me Away was released only in June and is my current most-listened to song in the past year, whereas Count The Days was released barely a week before this analysis and already has the second-most listens. Altogether, though, I do have more total listens to songs in Chinese than in English or other languages and genres of music from the past year.
Conclusion
Through this analysis, it is more than likely that I will continue listening to Chinese music, as I have listened to a variety of songs and genres in the past year. Also, due to Fly Me Away having the most listens since it was released, it is likely to remain a favorite of mine. While I may listen to songs in other languages or genres, based on the data, I can infer that I won't be listening to those kinds of songs as often compared to Chinese music.
As this was a small project, I plan to expand by using my full streaming history and working on an algorithm to try to predict what songs I might listen to in the future and how that compares to my music phase.
Click here to view the data and scripts.
Recent Posts