A genealogy of problematic information on Twitter and YouTube content moderation practices (draft version)

Team Members

Emillie de Keulenaar, Ivan Kisjes, Dorus Kok, Niels Willemsen, Gaurika Chaturvedi, Margherita Di Cicco, Liza Osakue, Josine Haas, Thomas de Boer, Matías Valderrama, Kandi Aryani Suwito, NienEn Liu, Son Nguyen, Adam Kouki, Alexander Teggin and Emilie Schwantzer


1. Introduction

Following myriad controversies, including harassment campaigns and dissemination of conspiratorial narratives (Jeong, 2019; Venturini et al., 2018), inter-ethnic violence and genocide (Mozur, 2018) and political extremism (Ganesh 2018), social media platforms and cloud hosts have had to conceive of measures to address user-generated content ranging from “bad” to “ugly”. YouTube and Twitter, in particular, have developed content moderation techniques that prevent the circulation of “problematic information” (Jack, 2017) via deletion or “deplatforming” (Rogers, 2020), automatically flagging content as misleading or false (Gorwa, et al 2020), and demoting or “shadow-banning” (Myers-West 2018) content in search and recommendation results (Goldman, 2021). As emerging literature on platform governance demonstrates (eg. Gorwa 2019, Douek 2020), content moderation targets a range from “misleading information” or “borderline content” to Holocaust denial, genocidal language, cross-cultural racism, and apologia for National Socialism (Roth & Pickels, 2020; YouTube, 2019).

These changes reflect a profound change in platforms’ content moderation philosophies after 2015, catalyzed by a different conception of the ugliness of user-generated content. Until then, Twitter and YouTube had largely left flagging and reporting problematic content up to users’ discretion (Crawford and Gillespie 2016). Partly due to ISIS’s use of social media, the proliferation of racist speech on and off the Web (Conway 2020), and regulatory pressure from states and civil society (Gorwa, 2019), content moderation policies eventually shifted towards a platform-centered, top-down model of content regulation. Concerns for “spamming”, “inauthentic behavior” or general “hate speech” began to trickle down into more specific forms of public and personal harm, and users’ responsibility to moderate contents through flagging or reporting was slowly conferred to platforms themselves.

The purpose of this project is to understand how ideas of “problematic information” have changed in content moderation policies, practices (techniques) and user debates. In other words, it aims to do a conceptual and technical “genealogy” of problematic information. This means looking at how ideas of problematic information have changed in content moderation policies and user debates, and how they have been operationalised into content moderation techniques (deplatforming, demotion, flagging, etc.).

We hypothesise that platforms’ increased concerns for the ugliness of user behavior pushed platforms to doubt users’ potential for self-regulation. This reflects a profound change in platform’s content moderation philosophy, where the relation between “online” user-generated content and the “offline” violence of events like Charlottesville and the Capitol Hill riots have encouraged them to embrace top-down forms of moderation as the final means for preventing platforms’ offline responsibilities from extrapolating further.

3. Research Questions

  1. How have ideas about hateful and other “problematic” speech evolved within Twitter and YouTube ’s content moderation policies?

  1. How have ideas of hate, incitement to violence and the potential “ugliness” of user-generated content and behavior affected YouTube and Twitter content moderation techniques over time?

  1. How did users’ own conceptions of problematic speech correspond with such policies and the measures they put forth?

4. Methodology and initial datasets

Overall objective

Using a combination of digital methods and content moderation analysis, we look at Twitter’s and YouTube ’s conception of harm in relation to hate speech, misleading information and incitement to violence, and how they have affected content moderation practices between 2014 and 2021.

Step 1: Historicising content moderation policies

We used the Wayback Machine to trace a history of the content moderation policies listed above. We trace changes per month. We put together the findings following the format of this spreadsheet, with an emphasis on responding:

  1. What is problematic and how problematic contents are defined;

  2. Examples of problematic content;

  3. The techniques used to deal with problematic content;

  4. Other relevant information listed in the spreadsheet.

Also quantify the amount of rules that appeared over time.

Step 2: Historicising content moderation techniques

We cross-check how changes in content moderation policies affected problematic information in a dataset of your choice.

  1. For deplatforming, we check the availability of contents over time;

  2. For demotion, we check the ranking and engagement of contents over time;

  3. For flagging, we check the moderation prompts (labels, etc) over time;

  4. For user reporting, we check affordances like reporting and flagging in YouTube and Twitter’s user interface over time. You can use the Wayback Machine.

Step 3: Documenting methods for studying the Web “after deplatforming”

Which techniques can we use and combine to archive moderated content and moderation practices? We have in the past done a number of projects (here and here) that relied on scraping moderation prompts, retrieval of metadata like search rankings and other so-called “algorithmic moderation” (e.g., demotion) from platform APIs, etc. This paper is an effort to further this documentation, where possible outputs would include methodological “recipes” for researchers to use and collaborate on.

5. Findings

  • Finding 1: All of Twitter’s and YouTube ’s content moderation policies are assembled into one main page (“terms of service” or “community guidelines”) up until 2018, when the platform begins to outline increasingly specific conditions for users (not) posting hate speech, misinformation and incitement to violence. These developments are likely due to accusations of impassivity from news media with regards to the Rohingya genocide in 2016-2017, the Charlotesville riots of 2017 and increasing fears of harm and violence in relation to COVID-19 and the upcoming U.S. elections of 2020. Slides 5-6.

  • Finding 2: Zooming into the specific genealogy of hate speech, incitement and misinformation on YouTube and Twitter, we see that all three types of problemation information stem from a common denominator: spam, or any types of instrumentalised misuse of Twitter and YouTube as a product. “Spam” then evolves into systematic “harassment” within anti-hate speech policies. Impersonation (using false identities for extortion, etc.) partly evolves into using coordinated inauthentic behavior (CIB) for specific motives, e.g. elections (political bots, etc.). Slides 7-9.

  • Finding 3: As content moderation policies become increasingly context-specific, so do content moderation techniques. There are at least three main phases in the evolution of anti-hate speech, incitement and misinformation moderation techniques. (1) Twitter and YouTube stipulate very broad punishments for harassment and spam, such as “user suspension” or “content deletion”, while recommending that users report or flag these behaviors themselves. (2) In light of the U.S. fake news debacle of 2016, the Rohingya massacre, Charlottesville and other high profile cases, platforms become far more proactive in moderating problematic content, introducing sweeping deplatforming measures. (3) With COVID-19 and the U.S. elections, Twitter and YouTube introduce more sensible content moderation techniques, such as the “strikes” system (1, 2, 3 strikes > permanent suspension) and opportunities for users to come back to the platform after temporary suspension (“a change of redemption”). Slides 10-28.

How was content related to hate speech, incitement to violence and misinformation moderated?

  • Finding 4: Incitement to violence is moderated in very different ways depending on the context. While Tweets related to #StopTheSteal are mostly all temporarily or permanently suspended for glorification or violence or infringing upon the elections integrity policy, content linked to the (second) Nagorno Karabakh war are mostly up or private. On YouTube, videos depicting massacres, threats and punitive measures between Azerbaijani and Armenian users can still be accessed by users after confirming their age. Slides 30-32.

  • Finding 5: YouTube ’s content moderation policy against hate speech evolved in relation to the types of hate speech posted on the platform. Though it always sanctioned explicit slurs, the platform takes notice of more covert types of hateful language by communities invested in scientific racism, and adapts its policy accordingly. Slide 33.

  • Finding 6: With the introduction of the “strikes” system, we see that a minority of users have been allowed back into Twitter and YouTube. Slide 36.

6. Discussion

  1. Twitter and YouTube ’s moderation seeks to model itself after “context”: it seeks to capture problematic situations, often “after the fact”. Slide 41.
  2. As it is confronted with increasingly complex situations (elections, COVID), its moderation techniques try to become more “sensible”: removal is replaced by suspension, strikes, and sometimes a chance of redemption. Slide 41.

8. References

De Keulenaar, E. et al. (2021) ‘A free market in extreme speech: Scientific racism and bloodsports on YouTube ’, Digital Scholarship in the Humanities. doi:10.1093/llc/fqab076.

De Keulenaar, E., Burton, A.G. and Kisjes, I. (2021) ‘Deplatforming, demotion and folk theories of Big Tech persecution’, Fronteiras - estudos midiáticos, 23(2), pp. 118–139. doi:10.4013/fem.2021.232.09.

Conway, M. (2020) ‘Routing the Extreme Right’, The RUSI Journal, 165(1), pp. 108–113. doi:10.1080/03071847.2020.1727157.

Gorwa, R. (2019) ‘The platform governance triangle: conceptualising the informal regulation of online content’, Internet Policy Review, 8(2), pp. 1–22. doi:10.14763/2019.2.1407.

Jardine, E. (2019) ‘Online content moderation and the Dark Web: Policy responses to radicalizing hate speech and malicious content on the Darknet’, First Monday, 24(12). doi:10.5210/fm.v24i12.10266.

Myers West, S. (2018) ‘Censored, suspended, shadowbanned: User interpretations of content moderation on social media platforms’, New Media & Society, 20(11), pp. 4366–4383. doi:10.1177/1461444818773059.

Pohjonen, M. and Udupa, S. (2017) ‘Extreme Speech Online: An Anthropological Critique of Hate Speech Debates’, International Journal of Communication, 11(0), p. 19. Available at: https://ijoc.org/index.php/ijoc/article/view/5843 (Accessed: 16 February 2021).

Rogers, R. (2020) ‘Deplatforming: Following extreme Internet celebrities to Telegram and alternative social media’, European Journal of Communication, p. 0267323120922066. doi:10.1177/0267323120922066.

Shen, Q. and Rose, C. (2019) ‘The Discourse of Online Content Moderation: Investigating Polarized User Responses to Changes in Reddit’s Quarantine Policy’, in Proceedings of the Third Workshop on Abusive Language Online. Proceedings of the Third Workshop on Abusive Language Online, Florence, Italy: Association for Computational Linguistics, pp. 58–69. doi:10.18653/v1/W19-3507.

Siapera, E. and Viejo-Otero, P. (2021) ‘Governing Hate: Facebook and Digital Racism’, Television & New Media, 22(2), pp. 112–130. doi:10.1177/1527476420982232.

Tushnet, R. (2019) ‘The Constant Trash Collector: Platforms and the Paradoxes of Content Moderation’, Jotwell: The Journal of Things We Like (Lots), 2019, p. 1. Available at: https://heinonline.org/HOL/Page?handle=hein.journals/jotwell2019&id=336&div=&collection=.

Vaccaro, K., Sandvig, C. and Karahalios, K. (2020) ‘“At the End of the Day Facebook Does What It Wants”: How Users Experience Contesting Algorithmic Content Moderation’, Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), p. 167:1-167:22. doi:10.1145/3415238.

Van Dijck, J., de Winkel, T. and Schäfer, M.T. (2021) ‘Deplatformization and the governance of the platform ecosystem’, New Media & Society, p. 14614448211045662. doi:10.1177/14614448211045662.

Venturini, T. et al. (2018) ‘A Field Guide to “Fake News” and Other Information Disorders’, SSRN [Preprint]. Available at: http://fakenews.publicdatalab.org/ (Accessed: 25 June 2018).
Topic attachments
I Attachment Action Size Date Who Comment
Areagraph epoche.jpgjpg Areagraph epoche.jpg manage 202 K 21 Oct 2019 - 13:30 EmilieDeKeulenaar  
Areagraph_03_Tavola disegno 1.jpgjpg Areagraph_03_Tavola disegno 1.jpg manage 302 K 21 Oct 2019 - 13:36 EmilieDeKeulenaar  
Atlantis_WikiTimeline_Tavola disegno 1.jpgjpg Atlantis_WikiTimeline_Tavola disegno 1.jpg manage 86 K 21 Oct 2019 - 13:28 EmilieDeKeulenaar  
Crusade_WikiTimeline-02.jpgjpg Crusade_WikiTimeline-02.jpg manage 70 K 21 Oct 2019 - 13:27 EmilieDeKeulenaar  
Screenshot 2019-07-22 at 15.22.51.pngpng Screenshot 2019-07-22 at 15.22.51.png manage 429 K 21 Oct 2019 - 13:20 EmilieDeKeulenaar  
Screenshot 2019-07-22 at 16.42.17.pngpng Screenshot 2019-07-22 at 16.42.17.png manage 527 K 21 Oct 2019 - 13:37 EmilieDeKeulenaar  
Screenshot 2019-07-23 at 12.25.46.pngpng Screenshot 2019-07-23 at 12.25.46.png manage 60 K 21 Oct 2019 - 13:24 EmilieDeKeulenaar  
Screenshot 2019-07-23 at 16.10.01.pngpng Screenshot 2019-07-23 at 16.10.01.png manage 327 K 21 Oct 2019 - 13:31 EmilieDeKeulenaar  
WW2_WikiTimeline-03.jpgjpg WW2_WikiTimeline-03.jpg manage 66 K 21 Oct 2019 - 13:28 EmilieDeKeulenaar  
cluster 2.pngpng cluster 2.png manage 1 MB 21 Oct 2019 - 13:44 EmilieDeKeulenaar  
image-wall-e3b55f6d8e296e95f13bd18fc943dd55.pngpng image-wall-e3b55f6d8e296e95f13bd18fc943dd55.png manage 934 K 21 Oct 2019 - 13:33 EmilieDeKeulenaar  
pasted image 0.pngpng pasted image 0.png manage 1 MB 21 Oct 2019 - 13:23 EmilieDeKeulenaar  
pasted image 2.pngpng pasted image 2.png manage 1 MB 21 Oct 2019 - 13:32 EmilieDeKeulenaar  
unnamed-2.pngpng unnamed-2.png manage 12 K 21 Oct 2019 - 13:34 EmilieDeKeulenaar  
unnamed-3.pngpng unnamed-3.png manage 11 K 21 Oct 2019 - 13:34 EmilieDeKeulenaar  
unnamed-4.pngpng unnamed-4.png manage 54 K 21 Oct 2019 - 13:37 EmilieDeKeulenaar  
Topic revision: r1 - 27 Feb 2022, EmilieDeKeulenaar
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback