You are here: Foswiki>Dmi Web>MemoryMining (06 Jul 2012, ElenaMorenkovaPerrier)Edit Attach

Team Members:

Albrecht Hofheinz

Ali Honari

David Moats

Elena Morenkova

Ea Ryberg

Anton Sokolov

Wojtek Walczak


The role that social media (facebook, twitter) played during the Arab Spring of 2011 is still attracting the attention of scholars (See Arab Social Media Report of Dubai School of Government and also "The Revolutions were Tweeted"

However, the framing of these movements events was substantially different across countries as well as across internet spheres (news, blogs etc). We chose to apply Digital Methods to the less "social" but potentially far more influential media of news sources and blogs.

Our project started with questions about memory and conflict but quickly turned to more practical questions of time. How can we mine past events when the internet is stuck in a "perpetual present". How good of a "memory" do our tools have? We attempted three questions at different scales of analysis to test the limits of Digital Methods for historical studies.

We originally hoped to make use of the CorText platform, which effectively handles multiple languages, including Arabic, but due to conflicts between the DMI Text Ripper and the upload interface we were forced to work on English sites (with the exception of Global Voices which translates important blogs throughout the world into english and other languages)

Research Questions

1. Regions: Who popularised the terms "Facebook revolution" and "Twitter revolution" respectively, Western or Arabic News sources?

2. Spheres: How does the language of Egyptian News Sources compare to the language of prominant Egyptian blogs?

3. Platforms: how do the specificities of platrofms format format knowledge about the uprising?

Each question was addressed using a set of targeted sites as compiled with the help of expert Albrecht Hofheinz, drawing on previous research. These included:
Western News Pan-Arabic News Egyptian News Sites Important Egyptian Blogs



We also came to focus on Qatar news service AlJazeera and international political blog aggregator Global Voices.

1. [Western / Arab News Sources] Facebook / Twitter Revolution

It is a common assertion that the revolution in Egypt and elsewhere in the Arab region was a "facebook revolution" or a "twitter revolution". (see But was this how it was seen in the region, or was this a Western spin on the uprisings?


Western media:

To get our corpus of sites discussing these terms we first established a list of western news sources and ran it through the DMI google scraper.

This was mostly a question of query design - which went through many iterations. We started with

("arab spring" "egypt*" ~revolution "facebook") OR ("arab spring""egypt*" "facebook revolution")
("arab spring" "egypt*" ~revolution "twitter") OR ("arab spring""egypt*" "twitter revolution")

"egypt*" ~revolution "facebook"
"egypt*" "facebook revolution"
("egypt*" ~revolution "twitter") OR ("egypt*" "twitter revolution")

but got too many irrelevant results for "~revolution" and therefore decided to query the Western media only for:

1st query ("egypt*" "facebook revolution")
("tunisia*" "facebook revolution")
2nd query ("egypt*" "twitter revolution")
("tunisia*" "twitter revolution")

Arab media:

For Arab media we did not include the countries in the query, on the basis of the media already being on the 'inside':

("facebook revolution")
("twitter revolution")

We then created several queries, slicing the time period from Dec. 5, 2010 to March 12, 2011 by week.

Using Excel we parsed the number of unique articles from each domain.


Mentions of either "Twitter revolution" or "Facebook revolution" in Western media preceded and dwarfed the ones in the Arab media.

Peaks (expectedly) in the weeks where the Tunisian and Egyptian presidents were forced to leave office.

Huffington Post pushed social media revolution to a much larger extent than others, and held on to the concept more 'stubbornly'.


Problem with the Google (scraper?) queries: occasionally misunderstands date formats on certain websites – for exemple Google misunderstood 1/4/2011 as Jan, 4 while it really refers to 1 Apr, thus misrepresenting first occurrence of “Twitter Revolution” in ME media:
(We chose to ignore this one example, discovered manually, in the results sheet; lack of time prevented further cleaning)..

2. Revolution Blogs and Al Jazeera

Another common question within the literature is whether the Egyptian blogs or mainstream news sites drove discussion about the revolution (see mainly KHANFAR Wadah (2001), "Al Jazeera and the Arab Spring"

The initial ambition was aiming the comparison between top 10 egyptian revolutionnary blogs and prominent arab news sites, but due to the overload of the Googlescraper and lack of time we limited our corpus to blogs on the one hand and Al Jazeera news on the other hand.


Al Jazeera:

For this we used a simpler query, this time applied to AL Jazeera news.

( "egypt*" ~revolution)
( "egypt*" ~revolution)

Top blogs:

For top 10 Egyptian revolution blogs we used a Googlescraper with the query "revolution".

Time slicing: Comparing the framing within two key periods of #arabspring: January 2011 (period before the revolution) and November 2011 (first free parliamenary elections, and public unrest in Egypt).

From the list of URLs we used the DMI Text ripper to extract article text from the links found via Googlescraper.

Although we had enough time slices to compare the framing of the revolution within a sample of top Egyptian blogs and Al Jazeera.


[stacked area graph]

We can see only very slight findings Blogs more varied perhaps because of publishing schedules

[Wordij comparison]

(January 11 and November 11 for blogs, January 11 and November 11 for Al Jazeera)


The Text Ripper, naturally extracts certain artefacts, like html, tags or advertisements from websites when it rips. This makes comparisons between sites somewhat difficult as there may be (incidental) platform specific words due to the Text Ripper's method.

3. Profile: Al Jazeera and Global Voices

Having encountered some definitely limits, we decided to circumvent google by focusing on two individual sites. These were AlJazeera and Global Voices because they are very well read sites whose coding would be easy to mine. Global Voices translates Arabic articles into english which was one way of addressing the language problem. Wojtek Walczak created individual scripts to extract not only URLs, but with them metadata such as author, date created, date modified etc. The time stamps in particular would potentially give us the ability to study linear time rather than simpy slices.


Al Jazeera:

Evaluating the resonance of the term "revolution" across 16 middleeast countries, over a 6-months time span - 1 month before and 5 months after the "arab spring" occured. Querying for keywords "revolution" and <country name> for 16 countries in a set of pages with Googlesraper.

Query ised for Googlescraper to scrape Al Jazeera news across the countries.

("revolution" "Algeria*")

("revolution" "Bahrain*")

("revolution" "Egypt*")

("revolution" "Iran*")

("revolution" "Iraq*")

("Revolution" "Israel*")

("Revolution" "Jordan*")

("Revolution" "Kuwait*")

("Revolution" "Lebanon*")

("Revolution" "Libya*")

("Revolution" "Morocco*")

("Revolution" "Palestine*")

("Revolution" "Saudi Arabia*")

("Revolution" "Syria*")

("Revolution" "Tunisia*")

("Revolution" "UAE")

("revolution" "United Arab Emirates")

("Revolution" "Yemen*")


Time span: December 2010-May 2011

Global Voices:

For GlobalVoices we downloaded 360 articles categorized as <a href="" target="_self">Egypt's "Country archive"</a>. Out of the downloaded articles the oldest one was posted in October 2009, and the newest in June 2012. The URL structure for Global Voices is quite clear. To access the archives for Egypt one has to type following URL: To access second page of the results for Egypt, the URL is By changing the number at the end of the URL one can access following pages of Global Voices results. The articles are sorted by date, so the bigger the number, the older articles are returned. In this particular case the articles from pages 1 to 48 were downloaded. To download the articles we used the <a href="" target="_self">GNU Wget</a> package and a simple Python script to increment the number in the URL and to run the Wget application for every URL. After downloading the file another Python script was used to extract the data about the articles.

The regular expression for extracting the metadata about single article from the HTML file was: '<h3>[\w\W]*?<a href="([\W\w]*?)" title="([\W\w]*?)" rel="bookmark">[\W\w]*?<span class=\'credit-text\'>[\w\W]*?<span>Written by <a href=\'([\W\w]*)\' title=\'[\W\w]*?\'>([\W\w]*?)</a></span></span>[\W\w]*?<span class=\'primary-category\'><a href="[\W\w]*?" title="[\w\W]*?">([\W\w]*?)</a>, <a href="[\W\w]*?" title="[\W\w]*?">([\W\w]*?)</a></span>[\W\w]*?<p class=\'excerpt-text\'>([\W\w]*?)</p>'.

The bolded expressions indicate: (1) the URL of the article, (2) the title of the article, (3) the URL of the author's profile, (4) author's name, (5) primary category, (6) secondary category, (7) excerpt. This set of variables was extended by a date for every post extracted from article's URL.


Fig. 1

Description: frequencies of the most popular categories associated with Egypt-oriented blog posts on GlobalVoices (October 2009 - June 2012).

Fig. 2

Description: Network of Global Voices authors and words from excerpts for October, November and December of 2010.

The above network shows a few groups of authors that were covering the Egypt-oriented news on the Global Voices prior to the Arab Spring. The three core authors were covering mostly the internal affairs of Egypt and Egypt's relation with Western countries. A group of authors located at the left side of the network were covering mostly the relations between Middle East countries and Egypt. On the other hand, the authors located at the right side of the network were mostly covering the relations between Egypt and the other African countries.

Fig. 3

Description: Network of Global Voices authors and words from excerpts for January and February of 2011.

While it was quite easy to identify order in the pre-Arab-Spring graph, the level of chaos for during-Arab-Spring period is relatively higher. On the GlobalVoices site the Arab Spring in the two crucial months (Jan & Feb 2011) was covered predominantly by one person, Amira Al Hussaini. In one year period (October 2010 - September 2011) Amira Al Hussaini has posted 64 articles, and 44 of them were posted in January and February of 2011. The second most active author, Tarek Amr, posted 10 articles in January and February of 2011, and 33 articles in one year period (October 2010 - September 2011).

Fig. 4

Description: the graps show how the term "revolution" appeared for specified countries in December 2010 and how its usage evolved throughout the following 6 months.

It's often claimed that when the media spotlight focus on one uprising it do so on the expence of other. The analysis of a set of Al Jazeera pages (Fig. 1) demonstrated that in this case the focus on Egypt uprising apmlified the attention to uprisings in other regions.

Conclusion and research perspectives

It became clear that multiple languages and historical study were extremely difficult tasks for internet research in general because of the specificity of the internet (impossible to go back in time).

Strangely the most effective measures were the simplest ones.

In terms of the platform specific research, Global Voices warrants furter analysis because of the way it is formatted, while Global Voices is the leading agregator of blogs from tall around the world (translating original texts extracts from non-western social media), "citizen media" being their own term.


The Al Jazeera Effect (2008)

Twitter, Facebook and YouTube's role in Arab Spring (2012)

Social Media's role in Arab Spring still unclear (2011)

Arab Spring was really social media revolution (2011)

Study confirms social media's revolutionary role in Arab Spring

A. Hofheinz, "The Arab Spring| Nextopia? Beyond Revolution 2.0" (2011)

-- EaRybergDue - 05 Jul 2012
Topic revision: r11 - 06 Jul 2012, ElenaMorenkovaPerrier
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback