Post image

Python-Powered Twitter Sentiment Analysis of the KSI vs. Jake Paul UFC Fight

Motivation

This article analyzes public opinion surrounding a public feud between two YouTubers, KSI and Jake Paul. The two made videos that portrayed each other negatively and their argument became viral. The public had conflicting opinions, with some supporting KSI and others supporting Jake. To understand what people really thought about this feud and whether their opinions changed over time, I scraped data from Twitter and performed a sentiment analysis. In this article, I will guide you through the steps I took to conduct the analysis, including scraping and storing Twitter data, classifying Tweets as positive, negative, or neutral using a simple algorithm, and creating charts with Plotly and Matplotlib to identify trends in sentiment.

Content

Step 1 - Data Collection

We will scrape Tweets from the time period when the feud between KSI and Jake occurred, which was in 2020. I scraped 170K Tweets containing the keywords "Jake Paul" and "KSI" from January 2020.

Exemple of tweet:

Step 2 - Sentiment Analysis

The Tweet mentioned above exhibits negative sentiment. Let's determine if the model can accurately identify this negativity and provide a negative prediction.

The SentimentIntensityAnalyzer (SID) module accepts a string as input and returns scores in four categories: positive, negative, neutral, and compound. The compound score is derived by normalizing the positive, negative, and neutral scores. If the compound score is close to 1, the input string can be classified as having positive sentiment. If it is close to -1, the input string can be classified as having negative sentiment. Let's now use the SID to analyze the sentiment of the above sentence.

The output of the code above indicates that the sentiment of the input string is negative, with a score of -0.25. This is correct. Now, let's create a function that predicts the sentiment of each Tweet in the dataframe and stores the result as a separate column called "sentiment." Before we do that, we need to run some lines of code to clean the Tweets in the dataframe. To clean the Tweets in the dataframe, we need to remove the "@" symbol from mentions, eliminate HTTP links, and remove the hashtag symbol "#".

Now that the Tweets have been cleaned, we can perform the sentiment analysis. We need to convert the compound scores into categories — ‘positive’, ‘negative’, and ‘neutral'

Step 3 - Visualisation

Now that we have classified the Tweets as positive or negative, we can examine changes in sentiment over time. To do this, we need to group the positive and negative sentiment and count them by date. Then, we can visualize sentiment by date using Plotly.

During the period of July-August 2022, there was a noticeable increase in both positive and negative sentiment. This can be seen on the graph where the red line (representing negative sentiment) and the green line (representing positive sentiment) both show a spike. This period coincides with the announcement of Jake Paul proposing to KSI to fight in a boxing match. If we focus our attention on this specific time frame, we can examine the sentiment breakdown of this feud in greater detail.

On August 28, 2022, there was a significant increase in both negative and positive sentiment, as indicated by a spike on the graph. On this particular day, the number of tweets was ten times higher than usual. This can be attributed to the boxing match that took place on this date. The coincidence confirms that our model is accurate. Same for August 8th, which is the fight annoncement.

Now, let's examine some word clouds to see what people are saying about the event. First, let's look at the most commonly used positive words.

The positive words about Jake and KSI feud and fight are : 'biggest', 'idol', 'interesting', 'hope'. Nice words to describe this fight.

The negative comments seems to be more aggresive about the result of the fight : 'washed', 'ko', 'beat'

Conclusion

In conclusion, the sentiment analysis project on Python involving the scrapping of tweets about the KSI vs Jake Paul boxing match was a success. By analyzing the sentiment of tweets about the event, we were able to gain insight into the public's reactions and opinions on the fight. The results of the analysis showed a spike in both positive and negative sentiment around the time of the match, which is expected given the high level of public interest in the event