Rasha Abdulla, Claudio Coletta, Carolin Gerlitz, Stefania Guerra, Christoph Lutz, Bernhard Rieder, Antonin Segault, Steven Talbot, Rebekah Tromble
This research was conducted during the Digital Methods Initiatives 2014 Summer School as part of the project 1% Percent of Twitter pt. II - The return of the Geo coordinated by Caroline Gerlitz and Bernhard Rieder.
We began with a general interest in the way the current crisis in Iraq is being represented and discussed in multiple languages on Twitter. On 9 June 2014 an organization commonly called the Islamic State in Iraq and Syria, or ISIS, captured Mosul, Iraqs second largest city. Since then ISIS and various groups have been battling for control of different parts of the country.
ISIS itself has been quite active on Twitter. The organization created an application called Dawn (or the Dawn of Glad Tidings) allowing them to send tweets from the accounts of the application users 1 2
. This application was removed from the Android app store on 19 June 2014 3
Our initial research question was: Do those tweeting in different languages frame the Iraqi crisis in different ways? However, as we describe in the Findings section below, as we began to conduct deeper analysis of the data, we discovered that ISISs own use of Twitter created significant problems for interpretation of the data and ultimately shifted our focus to analysis of ISISs effect on the broader Twitter discourse.
We worked with the 1% sample of Twitter collected between 15 June and 23 June 2014 using the TCAT tool (Borra & Rieder 2014; Gerlitz & Rieder 2013). However, for practical reasons we conducted our analysis on just one day in the 1% dataset, 19 June 2014. We chose this day because there had been significant international news coverage of events related to a major Iraqi oilfield, control over which rapidly changed hands several times, and we therefore expected to find a relatively high volume of tweets in multiple languages on that date.
We ran a query on two basic terms--Iraq and ISIS--but in the multiple languages that were represented by the members of our group (English, Arabic, Italian, German, and French) and using multiple variations of the name of ISIS itself. (It is also called ISIL 4
, for example.) Devising the query posed a particular challenge for several reasons. First was the difficulty of managing Arabic transliterations into Latin script. We ultimately devised a rather long list of Latin options for one of the Arabic names for ISIS, Daash, but our query is almost certainly not exhaustive
The second challenge was related to the fact multiple names are used for ISIS in Arabic. A bit of research on the organization revealed that Daish is actually a derogatory term used by those who oppose ISIS, while the organization and its supporters tend to refer to it alternatively as al-Dawlah or al-Dawlah al-Islamiyah (the State or the Islamic State). Unfortunately, these last two monitors are such common words in Arabic that their inclusion would return an overwhelming number of irrelevant tweets.
Finally, we also faced issues concerning the limitations of queries run in TCAT. Quotation marks cannot be used in queries to designate the desire for exact matches, and simply querying isis (without quotation marks) returned every tweet containing common terms such as crisis and hashtags such as #thisisme. We ultimately decided to use brackets with a space placed before isis ([ isis]). This ensures isis is not preceded by additional letters, but, unfortunately, it also filters out all tweets that begin with isis. The full text of the query we ran is as follows:
#isis OR [ isis] OR #isil OR isil OR #eiil OR [ eiil] OR داعش OR Da'ash OR Daesh OR Da2sh OR Da2ish OR Da'ish OR Daish OR Da3esh OR Da3sh OR Iraq OR Irak
The study mainly relied on the analysis modules provided by the TCAT tool, including statistic modules such as user and hashtag frequencies that we used to identify the main tendencies in the dataset. Given our interest in the (potentially multiple) framing(s) of the Iraq crisis, we then focused on the network of hashtag co-occurrences, exported from TCAT and processed with Gephi.
TCAT returned a set of 10,404 tweets, with 9,653 unique users and 3,795 tweets with hashtags. The average number of tweets per user was 1.08, with a maximum of 18. Figure 1 provides an overview of these and other basic descriptive statistics.
The most frequent hashtags (those appearing at least 20 times in the dataset) are almost exclusively English and Arabic-language hashtags (Only one of the top hashtags, #Irak was in a language other than English or Arabic.). Figure 2 displays the frequency of these hashtags. The content of these hashtags can be grouped into five logical categories: location, actors, news, statements, and miscellaneous. (See Figure 3.)
These groupings correspond relatively neatly to clusters identified in the co-hashtag analysis produced by TCAT. Figure 4 shows the co-hashtag network, visualized with Gephi. It reveals the hashtag #Iraq as the central node. Closest to this node we find a dense, highly-connected cluster of location names (especially countries and cities), primarily in Arabic. At the periphery, we find another well connected cluster, the news cluster. It entails hashtags such as Euronews, SMS or Breaking.
Interestingly, the hashtag for ISIS is in between different clusters and somewhat isolated. However, it groups with some other closely related terms: #Isil, #No2isis, etc.
At first glance it appeared as if the co-occurrence network might point to different framings of the Iraqi crisis based on the languages in which users were tweeting. As noted, the largest, densest, and most central cluster was formed primarily by Arabic location name hashtags, while most of the English-language hashtags are located in different clusters or are isolates. Initially we thought this might indicate that Arabic tweets were focusing on the Iraqi crisis as a regional event, while those tweeting in English developed more diffuse framings that were less concerned with the regional implications of the crisis.
However, as we began to dig deeper into the data--exploring several clusters, specific hashtags and users--we began to discover that the use of many of these hashtags was not as it seemed. Instead, as described in the following sections, we quickly realized that the "Twitterverse" around the Iraqi crisis has been significantly impacted--in fact, we can probably say manipulated--by ISIS itself.
The news cluster
In the co-hashtag networks visualisation, we identified a small, highly connected, cluster of news related hashtags (#world, #breaking, #lemonde, #news). When browsing the tweets using these hashtags, we noticed that some of them were using unusual combinations, such as associating #lemonde and #fox, two different media groups. We also discovered that every URL in these tweets were targeting article on the same website, mojahedin.org. We therefore suspected that part of the news cluster was produced by these deceptive tweets. Consider the following example tweets:
#Iran Tire workers in #Tehran staging protest rally http://t.co/o5FNHYvG8s #iraq #LONDON #Belgium #FOX #Euronews #sydney #Syria #world
Al Jazeera: Iraq-#Syria border passages controlled by armed men http://t.co/FqovJnTJ1V #oman #columbus #BreakingNews #Columbia #FOX
Turkey evacuates consulate in Basra Iraq http://t.co/RNscZ00HRr #News #Breaking #usa #AlJazeera #FOX #sydney #sms #politics #Euronews
We then processed the hastag-user network to analyse the user activity around this cluster. As shown in Figure 5, it revealed that a very small number of five quite active users (including the three more active users of our dataset) was responsible for almost all these tweets. These users were all sharing numerous links to the mojahedin.org website and using lot of hashtags. Their profiles identifies them as iranian opposed to the current Iranian and Iraqi governments and supporting ISIS and the syrian revolution.
The hashtag No2ISIS occurred 59 times according to the TCAT-hashtag frequency query, but upon manual inspection, only 57 results returned #No2ISIS in the content of the Tweet. This was due to 2 users including #No2ISIS in their profile description. Of the 57 tweets with #No2ISIS, 10.52% (6) used the hashtag to promote pro-ISIS content. The users used pro-ISIS language in their tweets, identifying ISIS as "The State," and one user even posted a video mourning a lost child claimed to result from a NATO bombing. The majority of the content was a re-circulating of a single picture advocating for the co-operation of Sunni and Shia people.
Iraq Liberated (in Arabic)
A total of 229 tweets in Arabic used the Arabic words for liberated, which produced tweets mostly centered around Iraq_is_liberated (in Arabic). Those were posted by 227 distinct users. Links were used in 59 (25.8%) of those tweets. Many of the tweets were repeated, sometimes over 20 times per tweet, all from distinct users, clearly indicating BOT activity. Also interesting is the fact that the repeated tweets included a time code in the text of the tweet itself, so the identical tweet frequency did not catch them, but only caught their retweets. So the point is, it seems there is little actual content produced, and it was all pro-ISIS.
From the 227 accounts that posted the tweets, 92 had under 100 followers; 85 had between 100-1000 followers; 49 had between 1001-5300 followers, and only one had over 24000 followers. That one popular account is now suspended. In terms of language, over one half (51.5%) were registered as English accounts, and 45% were registered as Arabic.
#Iraqi_revolution (in Arabic)
Another pro-ISIS hashtag was Iraqi_revolution (in Arabic) which produced 99 tweets by 93 distinct users. The same pattern of user information was detected. Interestingly however, some of the tweets were retweeted from the account @IRAQIRevolution, which has over 50,000 followers, when that user does not seem to have any original tweets in the selection (meaning that the original tweets did not show up in the document). Were not sure whether this is because the original user did not use the hashtag Iraqi_revolution which would have later been added by the retweeter, or because the TCAT tool might have some issues mixing Arabic characters with Latin characters with special symbols (such as . or a single quote) in the same tweet.
Iraq_news01 was the most mentioned user in our query, totalling 830 times. The user was identified to be a Pro-ISIS actor, as indicated by their use of The State, and posting pro-ISIS content. Iraq_news01 had 68.3 thousand followers, claiming to provide coverage around the clock or the liberation of Iraq ... Information Office of the Sunni resistance. Interestingly a sample of 100 of the 830 twitter user profiles showed that 43% of the those who mentioned Iraq_news01 were suspended by Twitter. This may indicate a proclivity of these user to installing and using the ISIS Dawn app, however, more analysis is needed.
Hastags does not give automatically meanings, but guide us to identify some trends.
(To be improved)
E. Borra, B. Rieder, (2014) "Programmed method: developing a toolset for capturing and analyzing tweets"
, Aslib Journal of Information Management, Vol. 66 Iss: 3, pp.262 - 278.
C. Gerlitz and B. Rieder, (2013) "Mining One Percent of Twitter: Collections, Baselines, Sampling"
M/C Journal, Vol. 16, No. 2
- 27 Jun 2014