An exploration of named entities and epistemic keywords in conspiratorial Instagram posts (2011-2020)

Facilitator: Tom Willaert

Group members: Dania Elahi, Zuzanna Kędzia, Cristina Pita, Sofia Rastelli, Tommy Shane, Elisa Serafinelli, Marc Tuters, Fabio Votta

Winterschool presentation slides: DMI2021-Instagram.pdf

Summary and Key Findings

In this DMI winter school project, we sought to learn more about the contents and rhetorical features of online conspiratorial discourse by analyzing the texts accompanying over 600.000 conspiratorial Instagram posts, spanning a period between 2011 and 2020. We were particularly interested in mapping which persons and organizations were most frequently mentioned in these texts, and in gaining a sense of how conspiratorial rhetoric on Instagram forms around a set of so-called ‘epistemic keywords’ - traces of epistemic activity in online spaces related to knowing, discovering, sense-making, theorizing, evidencing, doubting or persuading (Shane, 2021). To this end, we used the NLP technique of Named Entity Recognition to extract references to persons and organizations from the texts, and we explored the usage of the epistemic keywords by visualizing their contexts as word trees. Our analysis of the named entities suggests that the conspiracy posts most frequently antagonize political and industrial leaders, converging on figures including ‘Bill Gates’ and ‘George Soros’. Furthermore, the prominence of religious references stood out, suggesting a particularly close connection between religious and conspiratorial discourse. Our exploration of epistemic keywords provided a large sample of traces indicative of further intertwined discursive and rhetorical strategies, including references to embodied knowledge (‘trust your gut/intuition/instincts/body’), religion, Bible verses, and QAnon phrases (e.g. 'Trust the plan’). These preliminary findings suggest that the study of online conspiracy theories can gain a lot from looking closely at Instagram texts, and that these analyses can complement earlier work on hashtags and images (Votta, 2020). Furthermore, a diachronic analysis of named entity co-occurrences and additional epistemic markers, keywords and phrases holds the promise of providing insights into the dynamics through which disparate conspiracy theories converge into more coherent narratives.

Background and research questions

One of the main questions confronting the analysis of online conspiracy theories is understanding the mechanisms through which a disparate, antagonistic ‘multiverse of hate’ is reduced to a limited set of more coherent narratives (Maxmen & Ball, 2020). Our work aims to contribute to a theory of what could be called this ‘conspiracy singularity’ by conducting a data-driven investigation of conspiratorial textual traces scraped from social media. We thereby build on the outcomes of the 2020 DMI Summer School project on Instagram’s COVID-19 conspiracy Tribes, which explored how several popular conspiracy theories involving China, 5G, Bill Gates, QAnon, flat earth, and the deep state propagated across social media platforms by analyzing, among others, hashtag co-occurrences. Our project in particular focused on the analysis of the texts of conspiratorial Instagram posts, with the aim of answering a double research question: 1) Which persons and organizations are mentioned in these texts?, and 2) Which epistemic keywords figure in the texts and in which contexts are these used? This research focus is motivated by the idea that named entities and epistemic keywords provide insights into who these conspiracy posts are about, and how these (antagonized) persons and organizations are presented. In turn, mapping the structural relations between textual features can provide an understanding of the dynamics of convergence that mark these online conspiracies.


We base our exploration on a dataset of ca 600.000 Instagram posts (2011-2020) with at least one of 82 hashtags identified as related to conspiracies, or which were posted from 66 known conspiratorial accounts that frequently use those hashtags (for details, see Votta, 2021). Most of the posts appeared in 2020, which means many of them are related to recent conspiratorial thought in the context of the COVID-19 pandemic. This dataset was previously analyzed in the aforementioned Instagram’s COVID-19 conspiracy Tribes project. For our provisional purposes, we decided to focus on mapping the range and diversity of entities and epistemic keywords mentioned in the aggregated dataset, rather than to investigate similar cross-temporal dynamics (also see ‘Conclusion and future work’).


We approached the Instagram dataset through a quali-quantitative cycle in which we combined computational text analysis methods with manual data annotation, and close reading.

Named entity recognition

In order to probe the dataset for names of persons and organizations, we used a technique from natural language processing called Named Entity Recognition (NER). We parsed the body (text) of the Instagram posts using the python library spaCy and preserved those entities labelled as persons (‘PERSON’) or organizations (‘ORG’) by the algorithm. The initial results were automatically cleaned in order to remove @-mentions (usernames) and hashtags, thus preserving only those entities that were actually discussed in the main text of the post. The full lists of retrieved persons and organizations were then shared among the researchers via google sheets, where the results were manually cleaned and annotated. This iterative annotation process involved looping over the data in order to classify the retrieved persons and organizations for their relevant societal domains, such as Politics, Business, Government and administration, etc.. Throughout the process, we kept track of the counts of each of the retrieved terms.

Epistemic keywords

Following Shane (2021), epistemic keywords are queryable traces of epistemic activity in online spaces related to knowing, discovering, sensemaking, theorizing, evidencing, doubting or persuading. As a heuristic for identifying conspiratorial discourse, they can complement the use of more explicit search terms or such as named entities, as they are for instance less susceptible to disappearing terms through bans or shadow-bans, and allow for the identification of less explicit rumoring and conspiracy theory activity.

instagram post example.gif

Starting from an initial list of such epistemic keywords, we examined their prominence in the corpus of texts through regular expressions and a frequency analysis.We then explored the usage of ten highly frequent terms by means of Jason Davies’s word trees tool. This included instances of the keywords ‘apparently’, ‘truth’, ‘research’, ‘fact’, ‘fake’, ‘trust’, ‘lies’, ‘questions’, ‘awakening’, and ‘evidence’. Our examination of these keywords served the double purpose of probing some of the epistemic linguistic patterns, phrases and rhetorical devices that characterizes the discourse in the data, and of snowballing new words and phrases to add to the epistemic keywords toolbox.



Persons and organizations

Based on the retrieved named entities, we found that many of the persons figuring in our dataset pertain to the political domain, whereby former American president Donald Trump is the most mentioned actor in the whirlpool of antagonistic conspiratorial comments. The same holds for the organizations, where administrations such as NASA, the UN, the Senate and the Pentagon are most prominent. It thus appears that within the already alarming scenario of the pandemic, political conspiratorial paranoia has been the popular response to the efforts of governments and administrations. Our analysis of epistemic keywords (cf. infra) further foregrounded a large set of narratives that cast doubt on corona vaccinations and vaccination strategies. According to Hellinger (2018), such political conspiracies are symptomatic of an ongoing decline of democratic values, with lockdown restrictions threatening the perceived individual agency and administrative inconsistencies, incongruous information and opaque policies threatening reliability in political representatives. In addition to politics, it is clear from the data that references to religion play a significant role in conspiratorial narratives. In fact, Jesus, God, Lucifer and other biblical actors are among the most frequently referenced figures. Philosopher Karl Popper saw in the unfolding of conspiratorial theories a ‘secularisation of religious superstition’, where the faith in divine gods was associated with more modern gods, such as democratic politics and neoliberal capitalism. The posts under analysis indeed reveal this process, whereby allusions to spiritual and religious prophecies are operationalized to support wide-ranging claims that mix and combine conspiratorial strands. This is exemplified by the prominent ID2020 conjecture, which asserts that the biblical ‘Mark of the Beast’ corresponds to a chip the business magnate Bill Gates – a common antagonist in our dataset, along with George Soros – was allegedly scheming to implant along with COVID-19 vaccinations (Thomas and Zhang, 2020).

Overview of organizations mentioned in conspiracy-related Instagram posts (2011-2020)


Overview of persons mentioned in conspiracy-related Instagram posts (2011-2020)


Epistemic keywords in context

Our ‘word tree’ explorations of epistemic keywords and their contexts primarily revealed the efficacy of this approach as a method for the analysis of conspiratorial texts. It allowed us to detect highly frequent expressions such as ‘truth is stranger than fiction’, ‘trust your gut/intuition/instincts’, ‘do your own research’, ‘look at the facts and the truth’ or ‘the great awakening’, which hint at overarching conspiratorial logics that extend across theories. Hyperbolic talk is abundantly present in this domain, and even a preliminary look at the epistemic keywords revealed a range of dialectical and contradictory lines of reasoning. Indeed, phrases such as ‘do your own research’ are used alongside ‘trust your guts’, or the phrase ‘look at the facts and statistics’ appears next to its counterpart ‘fake science, fake tests, fake numbers, fake statistics’.


Conclusion and future work

This research project has demonstrated that the texts of conspiracy-related Instagram posts can be an interesting source for studying the contents and rhetorical features of conspiracy theories. We used the heuristic of named entities as a method of exploring the contents of the posts, revealing groups of antagonized persons and organizations. Similarly, we searched for a set of core epistemic keywords as a means of finding different traces of epistemic activity in the texts. Combined, both methods provided an initial understanding of how the diverse theories and messages that mark the Instagram conspiracy landscape tie together. This is a promising first step towards a more detailed account of the dynamics through which disparate conspiracy theories and antagonisms converge into coherent narratives.


Hellinger, D. C. (2018). Conspiracies and conspiracy theories in the age of Trump. Springer.

Maxmen, A. & Ball, P. (2020). The epic battle against coronavirus misinformation and conspiracy theories. Nature. May 27, 2020. (Retrieved January 30, 2021).

Shane, T. (2021). Epistemic keywords (forthcoming)

Thomas, E., & Zhang, A. (2020). ID2020, Bill Gates and the Mark of the Beast: How Covid-19 Catalyses Existing Online Conspiracy Movements. Australian Strategic Policy Institute. (Retrieved January 30, 2021)

Votta, F. (2020). Corona Cospiracyland. The Infodemic on Instagram. (Retrieved January 30, 2021)

Votta, F. (2021). Instagram Conspiracy Dataset. (Retrieved January 30, 2021).

Topic revision: r1 - 30 Jan 2021, TomWillaert
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback