The Tortured Analysts Department
A word analysis of Taylor Swift’s discography.
Summary
This project aims to quantify and visualize the lyrics of Taylor Swift's discography to analyze the change in word choices over the years. This is not a critique of Swift’s music, nor is it a look at her best-performing songs and chart-topping hits because unless you are fresh out the slammer, those analyses would be nothing new for either of us.
This project uses Genius API to download the artist’s discography. The lyrics are analyzed in Python using pandas and NLTK and visualized using WordClouds, Matplotlib, and Plotly. An analysis is done to discover how often certain words are used, and a sentiment analysis is done to examine the emotions in the songs.
Links
Introduction
Perhaps you are total Swifie who thinks she is an absolute superstar, and you live for collecting every record and connecting every line in her songs to every date in her life. Perhaps you just tolerate it when she's shown during the football game. You think the billionaire only sings about her ex-lovers and champagne problems. Or perhaps you are more like me, somewhere in the middle: slightly tired of the lore surrounding Taylor Swift’s career and personal life, but still a fan of her songs. Not a worshipper, just a listener.
Nevertheless, Taylor Swift sells out tours around the world that cause earthquakes as fans sing along, even though tickets cost her devoted Swifties over $1,000 a seat. So she must be a mastermind to some degree.
This project will examine the lyricism of Taylor Swift by first downloading her discography from Genius API. The words of the self-proclaimed “tortured poet” will be analyzed and visualized to discover which advice of hers she has followed: to “Change” or to “Never Grow Up” (Fearless, 2008; Speak Now, 2010).
Rules for Song Collection
To simplify the song collection from Genius API while still collecting the most data possible, only certain songs from certain albums are included. The following rules are followed to maintain consistency:
Full studio albums are included.
If a deluxe edition exists, that is included instead of the standard edition.
If multiple deluxe editions exist, as with Midnights (2022), the edition with the highest number of different songs is chosen.
If the album was re-released as a Taylor’s Version* edition:
The original version is used; only vault songs from Taylor's Version are included.
This will avoid any lyric changes such as that in the chorus of “Better Than Revenge” on Speak Now:
“She's an actress, woah / She's better known for the things that she does / On the mattress, woah” (Speak Now, 2010)
“She's an actress, woah / He was a moth to the flame / She was holding the matches, woah” (Speak Now (Taylor’s Version), 2023)
This is done to maintain the integrity of the lyrics as change is analyzed. We are not interested in Taylor’s songs as she sees them as a 30-something in the 2020s, we are interested in Taylor’s songs as she originally wrote and released them.
The following are NOT included:
Songs that would cause duplicate lyrics:
remixes, acoustic, piano, live, or original demo versions
Taylor’s Version*
Only vault songs are included.
EPs, such as:
The Taylor Swift Holiday Collection (2007)
More Lover Chapter (2023) including “All of the Girls You Loved Before”
Songs originally released as singles or on soundtracks, including:
“Safe & Sound” featuring The Civil Wars from The Hunger Games: Songs from District 12 and Beyond (2012)
“Carolina” from the soundtrack for Where the Crawdads Sing (2022)
“Ronan” (2011)
Anything written or spoken by the artist, such as:
poems, voice memos, prologues, forewords, messages from artists, etc.
Songs on which the artist is featured and songs the artist wrote for other artists.
Covers and unreleased songs.
Data Collection and Preprocessing
The data, lyrics from every Taylor Swift song, is collected from the Genius API. The lyrics are downloaded by album into directories. As the lyrics are collected, two functions are called: the first decides which songs to include, and the second cleans the song titles if it passes the first function. Songs that do not meet the song collection rules are passed over by the first song inclusion function; this is when a “Taylor’s Version” song must also have “From the Vault” in the title to be included. The second cleaning function removes certain punctuation (“?”, “!”, and quotation marks), to make downloading easier, and unnecessary words from the title of the song, such as “bonus track”, “feat.”, or “Ft.” and anything that follows.
Once all the songs and lyrics are downloaded, the text is cleaned: the lyrics are made lowercase, punctuation is removed, and dashes (“-”) are replaced with spaces. The directory paths and the dataframe are created, and the dataframe is exported as a CSV file.
The text is further cleaned as a pandas dataframe in two rounds. The results of the most frequent words after both rounds of cleaning are shown in the Word Analysis section.
First Round of Cleaning
The ‘Lyrics’ column is tokenized, then lemmatized, and the new columns are created. Stopwords and words shorter than two letters are removed; otherwise, “oh” would be the most frequently used word.
As discussed later in the Word Analysis section, “na” of the slang words “wanna” and “gonna” is the sixth most common word. The other half of the contraction, “wan”, also appears in the WordCloud. The words “don’t” and “you’re” are among the most frequent words. Further cleaning is needed to remove all contractions.
Second Round of Cleaning
Contractions, including “wanna” and “gonna”, are removed along with any non-ASCII characters to better understand the words used by Swift. A new ‘clean_lyrics’ column is created. SpellChecker is used to check for any misspelled words; fortunately, there are none.
As with the first round of cleaning, the ‘clean_lyrics’ column is tokenized, then lemmatized, and the new columns are created. Stopwords and words shorter than two letters are removed.
A pandas series of the lemmatized lyrics is created. This clean dataframe is used for the analysis.
Word Analysis
To find the most frequent words used in Swift’s songs, the words in the series of the lemmatized lyrics are counted in a dataframe with two columns, ‘Word’ and ‘Count’. A bar chart and a WordCloud of the most frequently used words are created. This is done for both rounds of the cleaning process and the results are compared.
First Cleaning Round Results
After the first round of cleaning, the top three most used words in Swift’s songs are “like”, “know”, and “don’t”. Swift must fill her songs with a lot of “oh”s, as “ooh” is the fifteenth most common word, and, had words shorter than two letters not been filtered out, “oh” would have been the most frequently used word. This is shown in the bar graph on the left below.
It is interesting to note, "na”, as in "wanna” or "gonna”, is the sixth most common word, so the words must have become split at some point in the cleaning process. The first part of one of the words, "wan” appears in WordCloud (bottom left of the visualizations below).
The bar chart and WordCloud will be revisited after the second cleaning round.
Please note the glitch: the WordClouds do not show “like”.
The visualizations from the first round of cleaning (above left) and the second round (above right).
Second Cleaning Round Results
After the second round of cleaning, “like” and “know” are still the two most frequent words in the text, but “never” has replaced “dont” as the third most common word. Contractions have been successfully removed. The results are shown in the visualizations on the right above.
Word Frequency in Tableau
The clean lyrics are brought into Tableau for further exploration. First, a new dataframe is created containing the year of the original “era”, the “era” named after the original standard album title, and the albums included in the “era”. This new dataframe is merged with the first containing the lyrics. In Tableau, the number of lyrics per “era” is now explored and a dashboard is created, shown below and linked here.
specific words
Certain motifs are common across Taylor Swift’s discography, even if the way she uses them has changed. Here is a closer look at the language surrounding three prevailing themes in Swift’s songs and the number of times she uses specific words in her albums. Later, the appearances of song and album titles throughout the lyrics will be examined.
Process
The same process is used to search for the language of all three themes and album titles. A list of words for which to search is created. The number of times each word appears in the lyrics of each song is counted. A new dataframe is created with the ‘Album’, ‘Song Name’, ‘Lyrics’, and each word in the given list as the columns. The dataframe is reduced by combining like words in the list (“pray”, “prayer”, “praying”, “prayin”), and unnecessary columns are removed.
This is not the most efficient of searching for words: when “pray” is used to search, any word with those four consecutive letters is also counted, including “prayer” and irrelevant words like “hairspray”. To avoid irrelevant words and find how many times each version of a word was used simply out of curiosity, a space on either end of every keyword is added: “ pray “ not “pray” is used to search.
Two bar graphs showing the number of times each word in the lists appears in every album are created using functions. The first is a static chart color-coded by keyword created using matplotlib. The second is an interactive graph created with Plotly including hover data that shows the word, album, and the number of times the word appears in the album. Users may also select which words they want to see on the chart. A Tableau dashboard was created to organize the results by “Era.”
Here is a closer look at the number of times Swift swears, references drugs and alcohol, and references religion on each of her studio albums.
swear words
Even a casual listener will notice Swift’s songs have become more adult. She has come a long way from using “damn” once in her debut album to dropping the “F-bomb” 18 times in one song on her latest album (“Cold as You”, Taylor Swift, 2006; “Down Bad”, The Tortured Poets Department, 2024).
Keywords
Mild and more severe language is found in Swift’s songs. Language describing the act of cursing at someone is excluded; to see examples of this, see the Appendix. The following keywords are used to search through Swift’s discography.
"hell", "bitch", "bitches", "bitchin", "asshole", "shit", "shitty", "shitstorm", "damn", "damned", "goddamn", "pissed", "fuck", "fuckin", "fucking", "fucked", "sexy", "whore", "slut", "dickhead", "godforsaken”
After combining like words, the reduced list becomes:
“hell", "bitch", "asshole", "shit", "damn", "pissed", "fuck", "sexy", "whore", "slut", "dickhead", "godforsaken”
As expected, more recent albums, specifically Midnights (2022) and The Tortured Poets Department (2024), include more curse words. Speak Now (2010) contains no swears, and Swift notoriously wrote the album by herself.
drug references
Swift often uses drug references to drive home her point: being in love or falling out of love is like being on a drug. Similarly, she uses alcohol to paint pictures of drinking wine at dinner or getting drunk at a club.
Keywords
The list of keywords for references to drugs and alcohol in Swift’s songs does not include vague or common language to describe drug or alcohol use to avoid words that, often in her songs, refer to something more innocent. For the full list of these excluded words with examples, please see the Appendix. The one exception is “doin lines” from “Vigilante Shit” of Midnights (2022) since the term is used once and the meaning is obvious.
The following keywords are used to search in Swift’s song lyrics:
"alcohol", "alcoholic", "bar", "drink", "drinkin", "drinking", "drunk", "sober", "beer", "beers", "wine", "rosé", "merlot", "champagne", "dom pérignon", "liquor", "whiskey", "old fashioned", "patrón", "island breeze", "drug", "drugs", "weed", "smoke", "smoking", "smokin", "smoked", "stoned", "overdose", "narcotics", "heroin", "dopamine", "pills", "doin lines"
To create the reduced list:
The words, “merlot” and “rosé” appear only once and twice, respectively, so the specific types of wine are added to the “wine” column. However, “champagne” appears many times, so the sparkling wine is kept as its own column, and the vintage champagne, “Dom Pérignon”, is added to the “champagne” column.
Similarly, types of liquor are combined in the “liquor” column, and cocktails are combined to a new column, “mixed drinks”.
The reduced list thus becomes:
“alcohol", "bar", "drink", "sober", "beer", "wine", "champagne", "liquor", "mixed drinks", "drug", "weed", "smoke", "stoned", "overdose", "narcotics", "heroin", "dopamine", "pills", "doin lines”
There are many drug and alcohol references across Swift’s discography, especially in her later songs, but Taylor Swift (2006), Fearless (2008), including the “vault tracks” released in 2021, and Red (2012) include no references. The albums Lover (2019), folklore (2020), evermore (2020), and Midnights (2022) have over a dozen references each. Reputation (2016) and, recently, The Tortured Poets Department (2024), have the most drug- or alcohol-related words with over 30 each.
religious words
Many years ago Taylor Swift strummed her guitar and sang with a country twang, “And when I got home, ‘fore I said, ‘Amen’ / Askin’ God if he could play it again” (“Our Song”, Taylor Swift, 2006). Christian themes are prevalent throughout Swift’s songs. So, from “Holy Ground” to “Guilty as Sin?”, exactly how often does Swift use religious language to convey her message (Red, 2012; The Tortured Poets Department, 2024)?
Keywords
To explore religious themes in Swift’s songs, only words related to Christianity are used to search through the lyrics. Only literal words are included as well; allusions or vague references are excluded. For the full list of these words and examples, please see the Appendix.
The keywords used to search through all of the lyrics:
"amen", "christian", "church", "faith", "faithless", "god", "gods", "lord", "devil", "devils", "angel", "angels", "demons", "saint", "saintly", "jesus", "holy", "pray", "praying", "prayer", “prayers”, "altar", "sin", “sins”, "guilty", "guilt", "hell", "heaven", "heavenly", "halo", "preacher", "christmas", "methodist", "jehovahs witness", "hallelujah", "forgive", "forgiveness", "forgiven", "unforgiven", "priest", "confess", "confessions", "religion", "religions", "religious", "epiphany", "grace", "sacred", "worship", "worshipping", "soul", "spirit", "prophecy", "miracle", "bless", "crucify", "temple", "eve", "exorcise"
To create the reduced list,
“Jehovah's Witness” and “Methodist” are combined with “Christian”, since they are both sects of the religion and each is used only once.
After reduction, the list becomes:
“Amen", "Christian", "church", "faith", "God", "Lord", "devil/demon", "angel", "saint", "Jesus", "holy", "pray", "altar", "sin", "guilt", "Hell", "Heaven", "halo", "preacher", "Christmas", "hallelujah", "forgive", "priest", "confess", "religion", "epiphany", "grace", "sacred", "worship", "soul", "spirit", "prophecy", "miracle", "bless", "crucify", "temple", "Eve", "exorcise"
Since all lyrics were made lowercase during the cleaning process, there is no delineation in case in this analysis. So, the proper nouns “God”, “Heaven”, and “Hell”, are in the same category as the common nouns “god” (“Karma is a god”, (“Karma”, Midnights, 2022)), “heaven”, and “hell”. For all intents and purposes, the category titles are capitalized.
Swift frequently uses religion in her songs, especially in Lover (2019) and The Tortured Poets Department (2024).
Specific Words: Results
Swift’s most recent albums, Lover (2019), Midnights (2022), and The Tortured Poets Department (2024), reference religion much more than her earlier works; however, they ironically also include the most drug and alcohol mentions and curse words. Perhaps Taylor is only a believer in religious metaphors these days. This is confirmed in the more neatly organized Tableau dashboard, linked here and shown below.
Swift already “wish[ed] [she]'d never grown up” fourteen years ago, but now her songs are a testament to her natural change and growth as a person and as a songwriter (“Never Grow Up”, Speak Now, 2010).
Album and Song Title Appearances
Fans of Taylor Swift love drawing connections between her songs and finding “Easter Eggs”. This part of the word analysis explores the connections between songs by looking for album and song title appearances in each of her songs.
Album Titles in Songs
Swift is known for dropping hints at future albums in her songs. For example, on her 2017 album, Reputation, she sang “You and me forevermore” in the final track, “New Year’s Day”. Three years later, she released evermore (2020) with the titular track by the same name. How often Swift references her albums in her lyrics is counted and visualized. The appearance of song names in other songs is analyzed in the next section.
Process
The same process is followed as was for the lists of words of three specific themes. An interactive bar chart is created with hover data that shows the name of the song in which the album title appears, the title that appears, the album, and the number of appearances.
Keywords
When creating the list of keywords to search for album titles in the lyrics, some leeway is allowed:
for Red, "redhead" is accepted.
for 1989, "nineteen" and "eighty-nine", without the hyphen, are accepted.
for Lover, the plural is accepted.
for Midnights, the singular is accepted.
for The Tortured Poets Department, "poet", "poets", "poem", and "poems", are accepted.
The full list of keywords is therefore:
“taylor swift", "fearless", "speak now", "red", "redhead", "nineteen", "eighty nine", "reputation", "reputations", "lover", "lovers", "folklore", "evermore", "midnight", "midnights", "the tortured poets department", "poet", "poet”, "poem", "poems"
The list is then reduced to the names of the eleven studio albums.
Expectedly, album titles appear most in songs on their album: the word “red” appears most on the album Red (2012), especially in the titular track, “Red”. There are many times when album titles appear on other albums, however. For example, on her debut album, Swift released “Mary’s Song (Oh My My My)” with the lyrics, “I'll be eighty-seven, you'll be eighty-nine / I'll still look at you like the stars that shine” (Taylor Swift, 2006). In 2014, almost a decade later, Swift released 1989. Was the lyric in “Mary’s Song (Oh My My My)” an intentional hint or just a coincidence?
Song Titles in Other Songs
Similar to the search for album titles in songs, this section looks for song title appearances in the lyrics. Since there are over 200 songs in the dataframe, manually creating a list of keywords is inefficient. So this process requires cleaning the song names and creating a list from the column of the dataframe containing the clean song names to use to search the lyrics.
Process
To find the occurrences of song titles in lyrics, a new approach is taken than previously discussed. First, a function is created to clean the song names. All punctuation and title tags (“Taylor’s Version”, “From the Vault”, and “10 Minute Version”) are removed. Words that are spelled funny in the song titles are replaced with correct spellings:
“imgonngetyouback” is corrected to “im gonna get you back” (The Tortured Poets Department, 2024).
“loml” is expanded to “love of my life” (The Tortured Poets Department, 2024).
“Mary’s Song (Oh My My My)” is reduced to “oh my my my” since “Mary’s Song” is not used in any lyrics, but “oh my my my” is (Taylor Swift, 2006).
“22” is replaced with “twenty two” (Red, 2012).
The cleaning function is applied, non-ASCII characters are removed, and the song titles are made lowercase as a new column of the clean song titles is created. A list of the clean song names is created, and extra spaces are removed. The song “Me!” (Lover, 2019), now simply “me”, is removed from the list to avoid skewing the data since many lyrics, and even words themselves, contain the word or the two consecutive letters. Next, the function to search through the ‘Lyrics’ column is applied to find all the song titles in all the lyrics.
An interactive bar graph is created with hover data showing the name of the song in which the title appears, the song title that appears, the album, and the number of appearances.
A large variety of song titles appear in Swift’s songs. Most song name appearances are the title of the song in which it appears. However, many songs feature other song titles.
Unfortunately, this analysis does not understand the connotation of words. The name of Swift’s 2010 hit, “Mean” from Speak Now, appears in several songs, such as “Afterglow” from Lover (2019). In 2010, Swift used the word to indicate unkind and malicious. In “Afterglow”, the program counts “But it's not what I meant” three times.
In the same vein, the model does not count the root words of song names, such as those in “Haunted” and “Never Grow Up”, both from Speak Now (2010):
Still sitting in a corner I haunt” (“right where you left me”, evermore (deluxe version), 2020)
“I never grew up, it's getting so old” (“The Archer”, Lover, 2019)
Are these lyrics references to Taylor’s songs from ten years earlier? Swifties would argue yes. The program would argue no.
Album and Song Title Appearances: Results
Swift is known for leaving “Easter Eggs” in her songs and social media, little clues for her fans to follow, so they can tie together the invisible string from one song to the next. In this section, the clues were quantified as interactive bar graphs were created to visualize the number of occurrences of album and song titles in the lyrics. As expected, the album and song name appear most in that album or song. Many albums and songs are referenced across Swift’s discography, but whether she does this on purpose or for the “tortured poet[ry]”, is up for debate (The Tortured Poets Department, 2024).
Sentiment Analysis
A sentiment analysis is done on the text of Swift’s lyrics to discern the emotional tone of her words. The lyrics are first tokenized and lemmatized, and the list of strings of lemmatized lyrics is converted to a single string. The SentimentIntensityAnalyzer from NLTK is used to find the number and frequency of positive, negative, and neutral words. A pie chart is created showing the percentages of positive, negative, and neutral words.
A higher percentage of negative songs would be expected since Swift is notorious for writing break-up songs.
A new dataframe is created with the top 250 words by count in each category: all words, positive words, negative words, and neutral words. WordClouds are constructed for each word type, positive, negative, and neutral.
Please note the glitch: the WordCloud does not show “like”.
Next, NRCLex is called to find the frequency of specific emotions: fear, negative, sadness, surprise, anticipation, joy, positive, trust, anger, and disgust. The number of words per emotion is found and a pie chart is created.
Bar charts are assembled showing the top 10 words for each emotion.
Many words appear in more than one category, such as “kiss”, “good”, “leave”, and “ill”.
WordClouds are generated, with a different colormap for each emotion, for the top words in each category.
Sentiment Analysis: Results
Sentiment analysis saves time as it is faster than a human when analyzing over 250 songs. It also yields an unbiased result; two people may have different opinions on the sentiment of certain lyrics. Unfortunately, sarcasm and jokes are not detected by the model. For example, Swift’s song “Mr. Perfectly Fine” from Fearless (Taylor’s Version) (2021) is mocking the attitude of an ex after a breakup. Even in the title, the words, on their own, seem positive and good, but in the context of the song, the words together are less positive and good. This is not recognized by the model. Additionally, many of the most commonly used words, such as “good”, are considered more than one emotion.
Of course, this sentiment analysis could be expanded to find the attitudes and emotions of every album and song. Red (2012) is a break-up album, so is it full of sad language? Perhaps we will find out.
Conclusion
In the first part of the project, the album title, song name, and lyrics of all of Taylor Swift’s songs, according to the rules laid out in the introduction, were downloaded from Genius API. The lyrics text was thoroughly cleaned, and the words were analyzed and visualized with static and interactive bar charts. The language of three common themes—curse words, drugs and alcohol, religion—in Swift’s music was examined closely. The appearances of album and song titles across Swift’s discography were visualized.
A sentiment analysis was done on the lemmatized lyrics: the words were organized into three categories, positive, negative, and neutral. NRCLex was used to organize the words into ten emotional types: fear, negative, sadness, surprise, anticipation, joy, positive, trust, anger, and disgust. These sentiments were visualized in pie charts, bar graphs, and WordClouds.
Whether or not these are the words and sentiments of a “tortured poet” is up to you, dear reader.
In Part Two, the data will be used to predict the lyrics of a new Swift-like song using machine learning.