The #WorldBot 2014 - The #Good, the #Bad and the #Ugly

Team Members

Tomasso Elli, Carlo de Gaetano, Christoph Lutz, Myrthe Bil, Iris Beerepot, Claudio Coletta


This research was conducted during the Digital Methods Initiative’s 2014 Summer School as part of the project “#WorldBot 2014 - The #Good, the #Bad and the #Ugly" .

It centered on the representation of bots on Twitter at the World Cup 2014 in Brasil.

Research Questions

Our initial research question was: Do different national teams at the World Cup attract Twitter bots in different ways? However, as we describe in the “Findings” section below, as we began to conduct deeper analysis of the data, we discovered that distinguishing different types of bots and coming up with a meaningful typology was the more realistic and promising approach. Therefore, we reframed our research question to: Can we categorize the bots surrounding the World Cup 2014? Can we identify behavioral patterns in the accounts we identify as bots?


We worked with the 1% sample of Twitter collected between 15 June and 23 June 2014 using the TCAT tool (Borra & Rieder 2014; Gerlitz & Rieder 2013). We ran a query on a total of 18 world cup games and extracted all tweets and (individual) user statistics for the hashtags in the 1% dataset that occured more at least 100 times. That we, we were able to extract around 50000 tweets.

First we only queried match-related hashtags, such as #FRASUI (France vs. Switzerland) or #GERGHA (Germany vs. Ghana), and excluded single country codes or #hashflags (e.g., #CRC for Costa Rice, #JPN for Japan, #MEX for Mexico or #ITA for Italy). We then did an extra round of queries including these #hashflags. This yielded an additional 50000 tweets so that our total (merged) dataset consisted of more than 100000 tweets.

The study mainly relied on the analysis modules provided by the TCAT tool, including statistic modules such as user and hashtag frequencies that we used to identify the main tendencies in the dataset. Given our interest in the (potentially multiple) nature of bots the WorldCup, we also used network visualizations of user and hashtag co-occurrences, exported from TCAT and processed with Gephi.


The #Good

The #Bad

The #Ugly


  • World cup tweeting is not very botted

  • World cup bots are very diverse, it’s not all #good and #bad

  • Large #ugly category in the middle that’s hard to categorize and make sense of

  • Boundaries between #ugly and #good are blurry and more difficult to draw than between #good and #bad


E. Borra, B. Rieder, (2014) "Programmed method: developing a toolset for capturing and analyzing tweets", Aslib Journal of Information Management, Vol. 66 Iss: 3, pp.262 - 278.

C. Gerlitz and B. Rieder, (2013) "Mining One Percent of Twitter: Collections, Baselines, Sampling" M/C Journal, Vol. 16, No. 2
Topic revision: r1 - 07 Jul 2014, chrislutz
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback