Digital Source Criticism

Team Members

Catherine Somzé, Liu Yang, Tine Schjærff Sørensen, Joe Shaw,Max Grömping, Agata Ludzis-Todorov, Simone Bernardi Pirini


Source criticism is the process by which the authenticity, representativeness, and relevance of an information source is evaluated for a given purpose. One of the criteria for establishing the reliability of a source is the closeness of a witness to an event in terms of location and time. In this research we are focusing on the geolocation of tweets and its relation to events reported/tweeted. The closer the Twitter user is from the events reported, the more accurate the report (tweet) is. Geolocated data (known latitude and longitude) was taken as the most accurate information about the actual location of Twitter users (against the self-indicated location of Twitter users).

Research Question & Aim

What effects or differences can be detected in the production and flow of accounts surrounding an event within different degrees of geographical closeness? The aim of this project is to explore differences between local and global accounts of several distinct events that have been reported on Twitter.


For this research, four sets of tweets were created from the twitter database corresponding to the query ‘drone/drones’ in the Digital Methods Initiative TCAT archive. The overall query ‘drone/drones’ was chosen for two reasons. Drones, which are unmanned aerial vehicles (UAV) or ‘Remotely Piloted Aircrafts’, have both military and civil uses. They are likely to become the object of public debate and collective fantasy, and they might have been spotted by civilians (their performance may have been witnessed).

The events related to drone appearance were chosen for their different degrees of political relevance and factuality:

1) Drone accident with Australian triathlete (6 April 2014)

2) Drone strike in Pakistan (3 July 2013)

3) Amazon fake/prank (30 November 2013)]

4) Korea - from serious to fun factor (6 May 2014)

From all four data sets, only the data set on the Amazon prank (3) contained a significant amount of geolocated data (known latitude and longitude). A more significant part of the data sets contained mention of the location chosen by Twitter users themselves, but they included both mention of the country and the city. Sometime it also mentioned several cities. For this reason (and the lack of time to go through all four data sets to ‘clean’ the location to only keep the city mentioned by the user, the choice was made to split four data sets into ‘local’ (within the country where the events took place) and ‘global’ locations. This was done by looking at the location column in each data sets and then manually look at all the locations and if it was in the country, to mark it as ‘local’.

Several stages of exploratory research were then conducted with the 6 data sets (3 events with corresponding local and global data). This included looking at the distribution of keywords in the user locale (to check accuracy of the sample) and an analysis of the verbatim present in each Tweet - showing a greater degree of description in the local accounts. Also, linked URLs and URL domains were aggregated over time, to show which sources were popular in which audience (global or local). This revealed a few sources of interest that overlapped each region - one was a Pakistani news source, which was cited with some time lag between local and global audiences (in an order that does not suggest time zones as a cause) and one which demonstrated more attention on the initial reports of a North Korean drone strike rather than the later hoax revelations that it was, in fact, a toilet door. However, this later anomaly appears to have been caused by two related spam bots - associated with North and South Korea - of which the North Korean bot did not report the hoax aspect. Global/local accounts containing the Pakistani source were then qualitatively assessed for difference in account.

Data extraction
- exclude parrot AND "the drones" AND "@thedrones" AND "@ardrone" AND "DIYdrones"


1) Drone accident with Australian triathlete (6 April 2014) ->lack of local accounds

Search terms: australian OR athlete OR triathlon OR Raija OR Ogden OR Geraldton OR Hospital OR Warren OR Abrams OR Simon OR Teakle OR New Era Film


2) drone strike in Pakistan (3 July 2013) ->few geotagged events (50-100); but, we can see ‘local space’ of the actual event

Search terms: Waziristan OR وزیرستان OR Pakistan OR پاکستان OR Miranshah ميرمشاه‎ OR Miran Shah OR Mirali OR Mir Ali OR Mir Ali Tehsil OR Datta Khel OR Dande Darpa OR Darpa Khel Sarai OR Haqqani OR strike


3) South Korea's 'crashed drone' turns out to be a toilet door!

Search terms: Korea OR Cheonggye OR Cheonggyesan OR Gwacheon 한국 OR 과천시 OR 무인 비행기 OR 무인 항공기 OR 청계산


4) Amazon fake/prank (30 November 2013) ->discard event due to global resonance and lack of spatial patterns

Keywords: amazon AND parrot


Temporal pattern of volume of tweets/retweets

Shows volume of tweets per hour (absolute numbers; red = local and blue = global)



Pakistan event had 2 peaks (first news breaks, second: Al Quaida operatives killed)

Korea event had also 2 peaks (first discovery of ‘drone’, second ‘it is toilet door’).

Australia had almost NO local resonance, no clear pattern of news diffusion (but, since absolute numbers, it is hard to see local resonance).

In order to identify sources of interest and see how they propoagate in the local and global audience, there were emerged TOP linked URLs (for Pakistan and Korea only). In case of Pakistan event it was "Express Tribune" and in case of Korea event - "EIN News". It is worth noting that "Pakistan Tribune" information enters global arena before entering the local arena. Pakistan_tribune.pngKorea_EIN_news_global.png

Content analysis

For every even there were prepared analysis of content (text contained in tweet), devided into local and global discourse. All clouds of words are attached.

In case of Pakistan event were additionally executed analysis for non-URLs-tweets (local and global). Objective of this project was to see more subjective, spontaneous and emotional expressions.

For example there appears Waziristan and other more detailed names of territory of Pakistan in local cloud of words.

Samle of tweets: user @FahadNaeem3

Someone from Waziristan told me even it's a thunder storm the children n families run in open field thinking it might b a drone attack


In case of Pakistan global non-URLs appear for egzample Obama name.

Sample of tweets: user Mngxitama

„Bush ordered 50 drones strikes in 8 years. Obama has ordered 375 in four and a half" - pro Adekeye Adebajo. Xolela Mangcu needs education.



1) The majority of tweets about any given event are not tweeted from the location

2) Local accounts of events have more specificity

3) Despite (2), the local account of event can be strongly led by the global account.
I Attachment Action Size Date Who Comment
Australia_global_text.pngpng Australia_global_text.png manage 52 K 28 Jun 2014 - 19:12 LiuYang  
Australia_local_text.jpgjpg Australia_local_text.jpg manage 98 K 28 Jun 2014 - 19:14 LiuYang  
Korea_global_text.jpgjpg Korea_global_text.jpg manage 70 K 28 Jun 2014 - 19:15 LiuYang  
Korea_local_text.jpgjpg Korea_local_text.jpg manage 104 K 28 Jun 2014 - 19:16 LiuYang  
Pakistan_global_text.jpgjpg Pakistan_global_text.jpg manage 62 K 28 Jun 2014 - 19:26 LiuYang  
Pakistan_local_text.jpgjpg Pakistan_local_text.jpg manage 76 K 28 Jun 2014 - 19:28 LiuYang manage 3 MB 28 Jun 2014 - 19:41 LiuYang tribune_URL_flow_Pakistan
Topic revision: r7 - 30 Jun 2014, Agata
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback