Data-Driven User Journalism: The Case of the Afghan War Diary
Camilo Cristancho, Catalina Iorga, Matteo Cernison
Is there an alternative account of the Afghan War Diary 2004 - 2010
documents released by WikiLeaks
, a "multi-jurisdictional public service designed to protect whistleblowers, journalists and activists who have sensitive materials to communicate to the public?" (1)
In other words, is the data hosted by WikiLeaks
used in different ways other than the mainstream media represented by WikiLeaks
' official partners?
Collect all inlinks to Afghan War Diary 2004 - 2010
- Observe which is the common root of all document URLs, namely 'http://wardiary.wikileaks.org/afg/event'
- Query Google by using the Google Scraper to obtain the first 1000 results which contain this common root as a textual component.
- Submit the 95 obtained webpages (alternatively considered as the top 100) to the Link Ripper in order to later get all outlinks to specific Afghan War Diary 2004 - 2010 document pages.
- Insert the Link Ripper output in the Harvester in order to alphabetize the obtained URLs and remove textual descriptions.
- Manually clean the output by again searching for the 'http://wardiary.wikileaks.org/afg/event' in an Excel file and produce a separate list of Afghan War Diary 2004 - 2010 document URLs.
- Analyze the list containing 179 non-unique results, select all document pages that receive at least two links (following the Issue Crawler logic) and create a file with the 'most mentioned' 17 warlogs, to be exact.
As shown by the graphs in the attached presentation, content syndication was based on local interest. For example UK political blogger James Barlow was referring to British-related entries, not necessarily commenting on them, but rather listing a collection of links. Thus, the level of engagement with the actual data is very low given the entries' extremely technical language.
The Afghan War Diary documents were usually not directly referenced; blog entries and news stories relied heavily on the reports and databases put together by WikiLeaks
' official media partners, namely Der Spiegel
, The Guardian
and The New York Times
Issues and Limitations
The highly technical language of the war diaries (military terms and codes) made them difficult to analyze individually, meaning that the envisioned content-based search did not occur, especially given the limited resources and time span of this particular project.
Based on such a reserved linking practice, the future of data-driven user journalism looks bleak. The Afghan War Diary 2004 - 2010
was a unique opportunity to deal with first-hand military information and to criticize crucial matters like the violation of human rights and unjust killings. If these documents are indeed discussed independently of linking or major media outlets, then this analysis is happening in the underground and it better come out for a true alternative account to emerge. The only beacon of hope in such a dark landscape, where only 17 documents are linked at least twice, is a blogger, Peak of Elephants
who astutely observes that most civilian shootings happened because of rebounds (2). One user on the entire Web who comments on the documents and
simultaneously links to them.
Contents analysis is expected to be useful in order to follow syndication practices that lead into identifying non-hyperlinked networks. In other words, careful examination of how documents are discussed without being linked to could shed new light on the distribution and circulation of these highly controversial pieces of classified information. Special emphasis should be placed on the reusability of content in order to avoid problems such as the undecipherable technicality of the original Afghan War Diary.