To celebrate the 1,000th episode of A State of Trance the radioshow invited viewers to vote for their all-time favorite trance tracks, and the resulting list was broadcast as ASOT 1000.
In this post we'll analyze the top 1,000 - which artists, BPMs, and years are most-represented? And more!
As with previous posts here, we'll be pulling data from Spotify and graphing the results. While there is an official "ASOT Top 1000" playlist on Spotify, I'm opting to instead use the "ASOT TOP 1000 Countdown Extended" playlist compiled by reddit user turbodevin. As Devin writes,
I used a filler track (4 seconds) for the missing song, to keep the song numbers corresponding to the ranking. When an extended version was not available, a shorter version is used. When a remix is not available, the regular version is used when available. MISSING
531 || Sean Callery - The Longest Day (Armin van Buuren Remix)
REMIX NOT AVAILABLE
414 || Faithless - Insomnia (Andrew Rayel Remix)
520 || Safri Duo - Played A Live (The Bongo Song) [NWYR & Willem de Roo Remix]
530 || Kensington - Sorry (Armin van Buuren Remix)
635 || Ilse de Lange - The Great Escape (Armin van Buuren Remix)
661 || Zedd feat. Foxes - Clarity (Andrew Rayel Remix)
While the playlist may not be complete, I'd still consider to be the most-complete playlist available on Spotify - using extended mixes over the official playlist's radio mixes is certainly more preferrable, at least.
Remember, all data here is pulled directly from Spotify's API without any modification from my end. See the post on Methodology for details on what data we can pull from Spotify, and how. Notably, Spotify's AudioFeaturesObject
lists tempo
as "overall estimated tempo of a track in beats per minute (BPM)" - keyword being estimate. I've done little to account for any inconsistencies and nothing to address them!
Spotify's API for "Get a Playlist's Items" limits us to getting 100 tracks at a time. Let's make 10 API calls for 100 tracks each, incrementing offset
each time, and save the results.
"""
User: https://open.spotify.com/user/113444659
Playlist: ASOT TOP 1000 Countdown Extended
Playlist link: https://open.spotify.com/playlist/5DCcjCLMlPjTwKLCcYyzIj
Playlist ID: 5DCcjCLMlPjTwKLCcYyzIj
"""
top_1000_playlist = '5DCcjCLMlPjTwKLCcYyzIj'
top_1000_tracks = []
# Get full details of the tracks and episodes of a playlis
# https://spotipy.readthedocs.io/en/2.16.1/#spotipy.client.Spotify.playlist_items
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=100)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=200)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=300)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=400)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=500)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=600)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=700)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=800)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=900)['items'])
print(len(top_1000_tracks))
What's number 1?
print(top_1000_tracks[999]['track']['artists'][0]['name'], '-', top_1000_tracks[999]['track']['name'])
Let's begin by looking at the artists who made the top 1000 - how many unique artists were featured?
unique_artists = set()
for track in top_1000_tracks:
for artist in track['track']['artists']:
unique_artists.add(artist['name'])
print(len(unique_artists))
Which artists were featured the most?
from collections import defaultdict
artist_counter = defaultdict(int)
for track in top_1000_tracks:
for artist in track['track']['artists']:
artist_counter[artist['name']] += 1
top_artists = sorted(artist_counter.items(), key=lambda k_v: k_v[1], reverse=True)
Alright, let's see the top 25 in a graph..
source = pd.DataFrame.from_dict(top_artists[:25])
bars = alt.Chart(source).mark_bar().encode(
x=alt.X('1:Q', title='Plays'),
y=alt.Y('0:N', sort='-x', title='Artist')
).properties(
title="ASOT Top 1000 - Most-played artists",
width=600
)
text = bars.mark_text(
align='left',
baseline='middle',
dx=3 # Nudges text to right so it doesn't appear on top of the bar
).encode(
text='1:Q'
)
bars + text
No surprise at who the #1 is, but the sheer number of their tracks featured is pretty impressive - over 10% of the ASOT Top 1000 was produced by Armin van Buuren, more than twice the number of the second-most featured artist!
Which artists were featured exactly once, with what track, at what position?
# Find all artists with one play, then find that track in the top 1000
for artist in top_artists:
if artist[1] == 1:
for position, track in enumerate(top_1000_tracks):
if track['track']['artists'][0]['name'] == artist[0]:
print(1000 - position, '.', track['track']['artists'][0]['name'], '-', track['track']['name'])
Quite a few, click "Show Output" above to view! Note that we're only listing the artist on the track credits that's only featured on that track. For example, "120. Darren Tate & Jono Grant – Shine (Let The Light Shine In)" is listed here but lists only Darren Tate as the producer because Jono Grant also appears in "562. Jono Grant vs Mike Koglin – Circuits".
Let's looks at some track-specific numbers now.
In which years were the tracks produced?
annual_total = defaultdict(int)
for track in top_1000_tracks:
annual_total[track['track']['album']['release_date'][:4]] += 1
top_years = sorted(annual_total.items(), key=lambda k_v: k_v[1])
print(top_years)
In a graph:
source = pd.DataFrame.from_dict(top_years)
bars = alt.Chart(source).mark_bar().encode(
x=alt.X('1:Q', title='Plays'),
y=alt.Y('0:N', sort='-x', title='Year')
).properties(
title="ASOT Top 1000 - Most-represented years",
width=600
)
text = bars.mark_text(
align='left',
baseline='middle',
dx=3 # Nudges text to right so it doesn't appear on top of the bar
).encode(
text='1:Q'
)
bars + text
Might be better to see it sorted by year:
source = pd.DataFrame.from_dict(top_years)
bars = alt.Chart(source).mark_bar().encode(
x=alt.X('1:Q', title='Plays'),
y=alt.Y('0:N', title='Year')
).properties(
title="ASOT Top 1000 - Yearly representation",
width=600
)
text = bars.mark_text(
align='left',
baseline='middle',
dx=3 # Nudges text to right so it doesn't appear on top of the bar
).encode(
text='1:Q'
)
bars + text
What are the oldest tracks in the list? Sorted by position.
for position, track in enumerate(top_1000_tracks):
if int(track['track']['album']['release_date'][:4]) < 2000:
track_artist = track['track']['artists'][0]['name']
for artist in track['track']['artists'][1:]:
track_artist += " & " + artist['name']
print(1000 - position, '.', track_artist, '-', track['track']['name'], '- released', track['track']['album']['release_date'])
A lot of tracks released in 2020 made the list, what are the most recent? Here's the tracks in the months leading up to the end of the year.
for position, track in enumerate(top_1000_tracks):
if track['track']['album']['release_date'][:7] == '2020-09' or track['track']['album']['release_date'][:7] == '2020-10' or track['track']['album']['release_date'][:7] == '2020-11' or track['track']['album']['release_date'][:7] == '2020-12':
track_artist = track['track']['artists'][0]['name']
for artist in track['track']['artists'][1:]:
track_artist += " & " + artist['name']
print(1000 - position, '.', track_artist, '-', track['track']['name'], '- released', track['track']['album']['release_date'])
In which years were the tracks produced by the top five most-played artists produced?
artist_avb_counter = defaultdict(int) # Tracks crediting Armin van Buuren
artist_ab_counter = defaultdict(int) # Tracks crediting Above & Beyond
artist_af_counter = defaultdict(int) # Tracks crediting Aly & Fila
artist_fc_counter = defaultdict(int) # Tracks crediting Ferry Corsten
artist_ar_counter = defaultdict(int) # Tracks crediting Andrew Rayel
for track in top_1000_tracks:
for artist in track['track']['artists']:
if artist['name'] == "Armin van Buuren":
artist_avb_counter[track['track']['album']['release_date'][:4]] += 1
elif artist['name'] == "Above & Beyond":
artist_ab_counter[track['track']['album']['release_date'][:4]] += 1
elif artist['name'] == "Aly & Fila":
artist_af_counter[track['track']['album']['release_date'][:4]] += 1
elif artist['name'] == "Ferry Corsten":
artist_fc_counter[track['track']['album']['release_date'][:4]] += 1
elif artist['name'] == "Andrew Rayel":
artist_ar_counter[track['track']['album']['release_date'][:4]] += 1
# Sort by year and print the results
sorted_avb_years = sorted(artist_avb_counter.items(), key=lambda k_v: k_v[0])
sorted_ab_years = sorted(artist_ab_counter.items(), key=lambda k_v: k_v[0])
sorted_af_years = sorted(artist_af_counter.items(), key=lambda k_v: k_v[0])
sorted_fc_years = sorted(artist_fc_counter.items(), key=lambda k_v: k_v[0])
sorted_ar_years = sorted(artist_ar_counter.items(), key=lambda k_v: k_v[0])
print("Armin van Buuren:")
print(sorted_avb_years)
print("Above & Beyond:")
print(sorted_ab_years)
print("Aly & Fila:")
print(sorted_af_years)
print("Ferry Corsten:")
print(sorted_fc_years)
print("Andrew Rayel:")
print(sorted_ar_years)
This would look nice in a stacked bar chart, but I couldn't get the data arranged properly to create the chart.
What's the average BPM of tracks in the top 1,000?
total_bpm = 0
for track in top_1000_tracks:
total_bpm += sp.audio_features(track['track']['uri'])[0]['tempo']
print(total_bpm/1000)
Maybe that's not so useful. How does the track BPM vary throughout the top 1,000? With #1,000 on the left, down to #1 on the right.
bpm = []
for track in top_1000_tracks:
tempo = sp.audio_features(track['track']['uri'])[0]['tempo']
if tempo < 100 or tempo > 150: # "outliers", details below
bpm.append(138)
else:
bpm.append(sp.audio_features(track['track']['uri'])[0]['tempo'])
x = np.arange(len(top_1000_tracks))
source = pd.DataFrame({
'track': x,
'bpm': np.array(bpm)
})
source['138'] = 138
base = alt.Chart(source).mark_line().encode(
alt.X('track'),
alt.Y('bpm', scale=alt.Scale(domain=(100, 150))),
).properties(
title="ASOT Top 1000 - BPM of track"
)
rule = alt.Chart(source).mark_rule(color='red').encode(
y='138'
)
base + rule
Not the best way to visualize it, how about a semi-interactive scatter plot? Mouseover for track position and BPM, zoom with the mousewheel. I couldn't figure out how to get track titles and artists in the tooltips.
detail = (
alt.Chart(source)
.mark_point()
.encode(
x=alt.X(
"track:T",
),
y=alt.Y(
"bpm:Q",
scale=alt.Scale(domain=(100, 150)),
),
color="bpm",
tooltip=['bpm', 'track']
)
.properties(width=600, height=400, title="ASOT Top 1000 -- BPM of track")
).interactive()
detail
There's a few "outliers" that kind of throw off the graph - let's look at the tracks in the top 1,000 with the lowest and highest BPMs.
for position, track in enumerate(top_1000_tracks):
tempo = sp.audio_features(track['track']['uri'])[0]['tempo']
if tempo < 125 or tempo > 141: # "outliers"
track_artist = track['track']['artists'][0]['name']
for artist in track['track']['artists'][1:]:
track_artist += " & " + artist['name']
print(1000 - position, '.', track_artist, '-', track['track']['name'], '-', tempo, 'BPM')
But this is not entirely right, right? Beatport lists Popcorn as 138 BPM. Again, I've done nothing to address any inconsistencies.
I've hardly covered the most basic analyses, so I'll leave you with a CSV file of tracks and "audio features" from Spotify so you can run the numbers yourself.
import csv
with open('../data/top-1000.csv', 'w', newline='') as csvfile:
topreader = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
topreader.writerow(['position', 'artist', 'track', 'year', 'danceability', 'energy', 'key', 'loudness', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'id', 'uri', 'duration_ms', 'time_signature'])
for position, track in enumerate(top_1000_tracks):
# Get track artists
track_artist = track['track']['artists'][0]['name']
for artist in track['track']['artists'][1:]:
track_artist += " & " + artist['name']
audio_features = sp.audio_features(track['track']['uri'])[0]
topreader.writerow([1000 - position, track_artist, track['track']['name'], track['track']['album']['release_date'][:4], audio_features['danceability'], audio_features['energy'], audio_features['key'], audio_features['loudness'], audio_features['speechiness'], audio_features['acousticness'], audio_features['instrumentalness'], audio_features['liveness'], audio_features['valence'], audio_features['tempo'], audio_features['id'], audio_features['uri'], audio_features['duration_ms'], audio_features['time_signature']])
The resulting file can be found in https://github.com/ScottBrenner/asot-jupyter/blob/master/csv/top-1000.csv - let me know what you make with it!