The #WorldBot 2014 - The #Good, the #Bad and the #Ugly
Tomasso Elli, Carlo de Gaetano, Christoph Lutz, Myrthe Bil, Iris Beerepot, Claudio Coletta
This research was conducted during the Digital Methods Initiatives 2014 Summer School as part of the project #WorldBot 2014 - The #Good, the #Bad and the #Ugly" .
It centered on the representation of bots on Twitter at the World Cup 2014 in Brasil.
Our initial research question was: Do different national teams at the World Cup attract Twitter bots in different ways? However, as we describe in the Findings section below, as we began to conduct deeper analysis of the data, we discovered that distinguishing different types of bots and coming up with a meaningful typology was the more realistic and promising approach. Therefore, we reframed our research question to: Can we categorize the bots surrounding the World Cup 2014? Can we identify behavioral patterns in the accounts we identify as bots?
We worked with the 1% sample of Twitter collected between 15 June and 23 June 2014 using the TCAT tool (Borra & Rieder 2014; Gerlitz & Rieder 2013). We ran a query on a total of 18 world cup games and extracted all tweets and (individual) user statistics for the hashtags in the 1% dataset that occured more at least 100 times. That we, we were able to extract around 50000 tweets.
First we only queried match-related hashtags, such as #FRASUI (France vs. Switzerland) or #GERGHA (Germany vs. Ghana), and excluded single country codes or #hashflags (e.g., #CRC for Costa Rice, #JPN for Japan, #MEX for Mexico or #ITA for Italy). We then did an extra round of queries including these #hashflags. This yielded an additional 50000 tweets so that our total (merged) dataset consisted of more than 100000 tweets.
The study mainly relied on the analysis modules provided by the TCAT tool, including statistic modules such as user and hashtag frequencies that we used to identify the main tendencies in the dataset. Given our interest in the (potentially multiple) nature of bots the WorldCup
, we also used network visualizations of user and hashtag co-occurrences, exported from TCAT and processed with Gephi.
World cup tweeting is not very botted
World cup bots are very diverse, its not all #good and #bad
Large #ugly category in the middle thats hard to categorize and make sense of
Boundaries between #ugly and #good are blurry and more difficult to draw than between #good and #bad
E. Borra, B. Rieder, (2014) "Programmed method: developing a toolset for capturing and analyzing tweets"
, Aslib Journal of Information Management, Vol. 66 Iss: 3, pp.262 - 278.
C. Gerlitz and B. Rieder, (2013) "Mining One Percent of Twitter: Collections, Baselines, Sampling"
M/C Journal, Vol. 16, No. 2