The Tortured Analysts Department:
The Anthology
A prediction analysis of Taylor Swift’s discography.
Summary
Like its namesake album, this is a double project. The second half of the project aims to train a machine-learning model with Swift’s entire discography to predict a Taylor-esque song.
The rest of Part Two is coming soon, but I am a one-person department, so it may take longer than a fortnight!
Follow along with my progress in GitHub here or linked below.
Links
Introduction
I still remember when Taylor Swift was a small artist, only a few of us knew her songs. A couple of friends and I danced to “Our Song” in the elementary school talent show (Taylor Swift, 2006). It was also the only song I learned to master on the guitar. Then Fearless (2008) was released, and “You Belong With Me” and “Love Story” swept across the fifth-grade class. When Speak Now (2010) was released, “Dear John” was just a song, not a weaponized anthem against John Mayer, with great writing and what sounded like great vocals to my untrained middle-school-aged ears. I have grown up with Taylor Swift for the majority of my life. Now, I will fulfill my childhood dream of being the next Taylor Swift the only way I know how: by coding.
To predict the lyrics of a Swift song, an LSTM model is built. Long short-term memory (LSTM) is a type of recurrent neural network (RNN). RNNs remember previous information and use it to process the current input; however, RNNs have a vanishing gradient, so they cannot remember long-term dependencies. LSTMs are designed to avoid the dependency issues.
A word-level approach is taken, as opposed to a character-level one. The words are treated as unique units, and the model attempts to predict the next word. This will help create a comprehensible model as it is unlikely to generate random words. On the other hand, it requires a lot of memory to remember an entire vocabulary of words. My little laptop must stay strong.
Data Collection and Preprocessing
The data, lyrics from every Taylor Swift song, is collected from the Genius API. The process is discussed in Part One. The ‘Lyrics’ column is joined into a single string.
Preparing the Data
Every word in the lyrics string is identified and separated with tokenization. The vocabulary is created by finding all the unique words in the lyrics. Input sequences from the text data, the ‘Lyrics’ column, are created using a for loop to cycle through each song. The uncleaned text is used to generate a genuine song, or as close to one as possible. The words are changed into their number codes according to the vocabulary. Another for loop is inside the loop that creates n-gram sequences from the number codes. This process builds sequences of different lengths for each song by adding one word at a time to make a new sequence.
The sequences are shaped to fit the abilities of the LSTM network by padding, which ensures the sequences are the same length since LSTM networks work with fixed-length inputs.
The sequences are divided into predictors and labels: the predictors include every token except the last, which is the label. The label integers are converted into one-hot encoded format, which transforms each integer into a vector of zeros except the position, set to one, of the integer, to make it appropriate for the model. The data is finally split into 75% training and 25% test data.
First Model
Training the Model
The first model is created with the following layers:
Input or Embedding Layer: transforms input data into dense layers of a fixed size, 100-dimensional vector
LSTM Layer: 150 LSTM units operate to comprehend the sequence and context of words
Dropout Layer: randomly skips some neurons during training making the model less sensitive to the weights of neurons and thus avoiding overfitting; set to 0.1 dropout rate
Output or Dense Layer: has as many neurons as there are in vocabulary; prepares the model to choose the next word
Since this is a multi-class classification, the loss function is categorical_crossentropy. Early stopping is implemented to avoid over-training by stopping the training process if the model stops improving. The accuracy and loss are graphed as the model is trained.
Predicting the Next Word
A function is defined to predict the next word given two arguments, the model and the seed text. The seed text is the first word given to the model which it will use to guess the next word. The words “I am” will be the seed text. In a for loop, the seed text is prepared for the model by tokenizing and padding.
The model finds the probability for the next word by going through the whole vocabulary. The word with the highest probability is chosen as the next word, and it is added to the seed text. The seed text is updated with the word predicted to include the new additions. The new seed text goes into the model for the next word prediction. The process is repeated until the lyrics are all predicted, for example, 150 words.
The model generates the following:
I am in the heat of my head in a café i regret you all the time i know why you know that you know that you know that you know that you know that you know that you know that you know is it gets me to you i know places i know why now i know that i hate you now i hate you like i hate you i know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better know you better
The model is far from creating a top-hit, or even something that makes sense. However, many of the phrases in the output are actual lyrics from Swift’s songs:
“the heat” in “Florida!!!” (The Tortured Poets Department, 2024)
“in a café” in “Begin Again” (Red, 2012)
“I regret you all the time” in “Would’ve, Could’ve, Should’ve” (Midnights (3am Edition), 2022)
“I know why” appears multiple times, for example, in “The Best Day” (Fearless, 2008)
“it gets me” in “Peter” (The Tortured Poets Department, 2024)
“I know places” in “I Know Places” (1989, 2014)
“I hate you now” in “I Wish You Would” (1989, 2014)
“I know you better” in “You Belong With Me” (Fearless, 2008)
“know you better know you better know you better” in “Everything Has Changed ft. Ed Sheeran” (Red, 2012)
The repetition of “know you better”, though it is repeated a lot in Swift’s song, “Everything Has Changed ft. Ed Sheeran” (Red, 2012), indicates a lack of creativity in the model. The model chooses the most likely next word based on what it has learned, but it does not consider variety. The model is successfully learning from the data, but it is generating senseless predictions.
The next model will attempt to address this.
Second Model
Results coming soon, but I have just one small laptop with not much memory! Your patience is appreciated.
Follow along with my progress in GitHub here.