8 Project DescriptionDue Week 13 (16 October), Friday 11:59 pmA well-known Public Figure is investigating its public image and has approached your team toidentify what the public associates with their name. They want the four pieces of analysis to beperformed.8.1Analysis of Twitter language about the Public Figure(4 marks: 1+1+1+1) The Public Figure wants you to examine the language used in tweets about them.They would like to have a general idea about what people are talking about.

Use rtweet package to download tweets.1.Use search_tweets function from the rtweet libraryto search for 2000 tweets about the personyou selected. Save as “tweets”. 2.Pre-process your data and construct a document-term matrix of the tweets by using TF-IDFweights.3.Construct a word cloud of the word in your document term matrix.4.Comment on your findings.8.2 Clustering the tweets(11 marks: 2+2+1+2+3+1) Public figure wants you to identify the topics in the tweets. They want to beaware of at least 2 topics. We do not want to present all tweets to them, so we must identify if there isa set of common tweet themes between tweets.By using your pre-processed document term matrix that you generated in section 1, compute thefollowing:5.Find the most appropriate number of clusters using the elbow method for the tweets by usingcosine distance.6.Cluster the tweets using k-means clustering.7.Identify the number of tweets in each cluster. Which cluster is the largest?8.Visualize your clustering in 2-dimensional vector space to present it. Show each cluster in adifferent colour.9.Create the dendrograms of the words in the most populated two clusters only. You should buildtwo separate dendrograms by using complete linkage clustering for these clusters. You do not needto visualize all words in your dendrogram, set up appropriate boundaries to improve yourvisualization. Make sure your visualizations is readable!10.Comment on your findings.8.3 Who to follow(7 marks: 1+1+1+3+1) The public figure wants to understand how they can increase their followernumber. We believe that it is best to be active on Twitter for this aim, but we are unsure if this is true. To examine this, we want to test if there is a relationship between the number of followers and thenumber of tweets that a user posted. To perform this: 11.Find the top 100 tweets that are retweeted the most in your tweets.12.Identify the users of these tweets.13. Get the follower count and the statuses count (number of tweets they posted) of these twitterhandles. 14.Apply the appropriate statistical test to test the relationship between follower and statusescount.15.Comment on your result.8.4 Building Networks(6 marks: 0.5+3+2+0.5) In this section, we want to create an outlook of the public figure’s Twitternetwork. To perform this(you can use the twitteR package in this section) :

16.Find the most popular 10 friends of the chosen Twitter handle.17.Obtain a 1.5-degree egocentric graph centred at the chosen Twitter handle and plot the graph.The egocentric graph should contain the most popular 10 friends of the chosen Twitter handle(eleven vertices).18.Compute the popularity of each vertex in your graph by using Page Rank method. List the top 3most popular people in your graph according to the Page Rank scores.19.Comment on your result.


