The primary outcome of our research has been an approach to analysing the renegotiation and contestation of public figures, or /ourguy/, through the use of a 5 step protocol. The second aim was to showcase our model’s potential by doing a case study on political subreddits and looking at which public figures their communities discuss, as well as how they do so. The protocol allows for cross-subreddit and single-subreddit analysis and can easily be replicated for the study of other subreddits. To compile the protocol, we repurposed subreddit metrics to create a toolkit that allows us to characterize web communities. First the appropriate subreddits are selected based on a set of criteria, next we extract the most mentioned names in those subreddits using natural language processing. Then we make a selection of the relevant public figures among these names. After this we employ a twofold method of using contrast analysis and network mapping the subreddits. Lastly we generate and analyse word trees for a deeper understanding of how negotiations surrounding these public figures happen. This protocol is a useful framework for future subreddit analysis and shows that meaningful social research can still be conducted post-API.
Our study forms both a cross-subreddit and single subreddit analysis of /ourguy/ with two main aims. One is to develop a protocol to answer our research questions and the second one is to characterize the politicising capacity of subreddits. Our research questions can be seen as a way in which to characterize web communities through their negotiation and relation to certain figures. The characterization allows us to see if the public figures are part of specific subreddits’ ingroups or outgroups. By developing a cross-subreddit analysis protocol, we see exactly what certain communities talk about and in which way they relate to particular personalities.
Following Rogers’ thought, our cross-subreddit analysis seeks to repurpose medium methods for cultural and social research, which begins with a sensitivity to a certain user culture or subculture (Rogers 2014).
Access to data is the foundation of any research. Researchers need a constant and stable flow of information and access to as much of it as possible. Major social media platforms offered new digital and social research perspectives when they provided the means to enter their platforms through Application Programming Interfaces (APIs). Gathering and storing data using APIs (Perriam, Birkbak and Freeman 1), makes it possible to combine qualitative and quantitative research methods (Venturini, Rogers 532). However, due to recent controversies including the Cambridge Analytica scandal, major platforms such as Facebook and Twitter restricted access on their APIs, limiting the ability of social media researchers to investigate social phenomena (Bruns 1544).
In the light of the present social media research scene, authors urge for computational and digital methods to face the post-API environment (see Bruns; Freelon; Perriam, Birkbak and Freeman; Venturini and Rogers). Some of these methods are web scraping, repurposing mediums and digital ethnography. Positive aspects about the new paradigm exist too. One of them is the fact, researchers are encouraged to not depend on mainstream platforms, but to investigate other sources and ways to collect and examine records (Venturini, Rogers 532).
With 330 million monthly users, Reddit is one of the world’s largest news aggregator sites and social media platforms. As a community of communities the platform willingly accepts a multitude of cultures (Massanari 313). Subreddits form readably distinct communities around topics in the sheltered spaces of the subreddits. Each of them have their own, sharply defined rules, with the ability to “mirror the diversity of content all over the web, and build sub-communities around it” (Singer 519). Due to the very sub-divided communities with specific interests, the ways in which users on the subreddits discuss public figures may be different from one another too.
Reddit is not only home to niche topics, but also to almost every topic, from music, films, health, daily news, to political discussions. And for many internet users Reddit functions as an alternative news outlet next to the gatekeeper media (Weninger). The community on Reddit is self-referential, focusing on and reinforcing its own user-generated images and textual content over external sources (Singer). The website therefore plays an important role in forming the political opinion of the public. For example, the subreddit dedicated to Donald Trump, r/The_Donald, defined the tone of the 2016 US elections for a young and newly politicized generation. While doing so, the subreddit left the mainstream media behind, which tried to catch up with the subcultural in-joke style desperately. Especially the two emergent anti-establishment waves of the right and left use that kind of style. The hierarchical models of business and culture are becoming more and more replaced by citizen journalism, user-generated content and the hive mind (Nagle 8).
Even though not every board on Reddit portrays a subculture and especially not a political one like the r/The_Donald did, the use of Reddit alone as a source for political information instead of traditional news outlets means that users are searching for alternative opinions or certain sub-communities who share their political interests and directions. There is another reason for analysing Reddit as a platform for political discussion, which Nithyanand et al. present as being three-fold: Reddit is a pseudonymous platform with a strong democratization of discussion. It allows for lengthy and complex discussions while giving the possibility of scraping all of its content (2).
Political discussions have become substantially more offensive since the 2016 US election campaign, as well as unfolding more and more on “social and democratized media platforms” (Nithyanand et al. 12). This is correlated with a growing distrust in mainstream media platforms, as users tend to move further to fringe and alternative news sources. Building on the idea that “politicians have a strong incentive to use emotionally charged communication” (Ryan 1149), a growing amount of people experience anger. They are both motivated to participate in political discussions online and to use the same sentiment to engage in incivility, as it plays a role in political action (Nithyanand et al. 1).
In the context of growing polarization between the left and right wings in the United States, rather than moderate ambivalence (Abramowitz and Saunders), as well as the increasingly-offensive and overall incivility of discourse brought along with the election of Donald Trump (Southern Poverty Law Center), the same thing can be observed on Reddit. The rise in offensive content and extremity is especially used in rather conservative than democratic subreddits (Nithyanand et al.) after the 2016 elections.
Employing Mouffe’s theory of agonism, which can be succinctly explained as “an existential struggle between the polis (us) and its adversaries (them)” (Tuters and Hagen 5), we can infer that political subreddits also operate in a similar manner. In creating and negotiating a collective identity, different formations appear in the shape of ingroups or outgroups.
How does the practice of reflecting on public figures work as a means for collective political identification within pseudonymous & anonymous Internet subcultures?
Which public figures elicit the most discussion? How can we use computational and digital methods to determine their “contestation” within a given Internet subculture? Are they loved/hated? Do they form an “us” or “them”?
In order to answer our research questions, we developed a protocol for cross-subreddit analysis and single-subreddit analysis that consists of five steps, which will be detailed in this section. We therefore repurposed subreddit metrics to create a toolkit that allows us to characterize web communities. An important result of this protocol is that it can easily be replicated on any subreddit.
The first part of our method consists in the creation of a set of subreddits that serve the purpose of the research. We started by manually browsing the suggested subreddits in the logbook for potential leads with regard to people of interest or vernacular markers. Due to the nature of both the activity itself and the fact that the subreddits are relatively heterogeneous, we did not find a specific set of textual markers or people that are mentioned across the subreddits or that are of particular interest in a specific subreddit. Instead, we decided to analyse four subreddits that are representative for political web communities. In order to choose the most appropriate ones, we built a list of criteria. The criteria used in the selection of the subreddits consist of the following:
The subreddit must have over 50.000 subscribers; this choice was made due to the fact that a lot of smaller, more niche subreddits exist. They would not offer us sufficient data or would not necessarily be representative for our research.
The subreddit should not have the production or sharing of political memes as its main (implicit or explicit) focus. For example, this is the case for two subreddits relating to communism. While r/FULLCOMMUNISM has a more memetic character, the r/communism subreddit is used for more nuanced discussion pertaining to the same ideology.
The subreddits should not overlap with each other from an ideological standpoint and should cover as much of the political spectrum as possible. This posed a limitation on our research, as we could not include far-right subreddits because reddit usually bans these kind of subreddits in a relatively short amount of time.
With the above mentioned criteria in mind, we arrived at the following subreddits: r/socialism, r/anarchism, r/neoliberal and r/conservative. We looked at the posts published in the year 2019, in order to focus on the recency of some events, but also due to the time constraint. Coupled with the fact that this kind of analysis is resource-intensive and would not allow for the completion of the research in a timely manner.
After selecting a set of subreddits and a timeframe, we divided the research question in two sections: who is discussed in those spaces and how are they talked about. For the first question, we employed Natural Language Processing to extract a list of names from the post titles in each subreddit working with SpaCy entity recognition (en_web_sm model), using the Pushshift API as the source of our data. The output of this analysis was a list of all the named entities, which we then processed further in order to answer the second part of our research question — the way in which these people are discussed on the subreddits.
After extracting the lists for each individual subreddit, we manually filtered them in order to compile a list of personalities which are talked about per subreddit. Further, we compared them between the selected subreddits, arriving to personalities that are either subreddit-specific or discussed across all four subreddits.
These lists had to be filtered manually for the comparison to make sense. The following criteria were used for this selection:
The people discussed must be living, contemporary public figures. This choice was made to allow a fair comparison across the board. The criteria helps to not compare for example, Karl Marx, one of the most popular figures in the left-wing subreddits, and Donald Trump, who is heavily discussed on all subreddits.
We selected two types of public figures: politicians and nonpoliticians. Using the aforementioned lists, we looked at politicians popular across all the subreddits. Then, we selected wing-specific public figures who are not politicians per se, but have strong political positions. This ensured a cross-subreddit overview along with the possibility to go in depth for a single subreddit analysis.
The subreddit-specific personalities we analysed include Ben Shapiro for r/neoliberal and r/conservative, and Noam Chomsky for r/anarchism and r/socialism. While the cross-subreddit figures entail Joe Biden and Bernie Sanders. However, Donald Trump appeared most often for all of our subreddits in the year 2019. Nevertheless, he was not included in our list, as he, being the president of the United States, is already broadly discussed, politicised and controversial. This does not leave much space for the web communities to reappropriate him even further.
We queried 4CAT from 01.01.2019 to 01.01.2020 to obtain a recent overview on the discussion of these personalities. Thus, we opted not to look at the post titles, but instead use the body of the posts in the subreddits to examine exactly how these personalities are discussed. The queries made with 4CAT on the subreddits were as follows:
[bernie] on r/socialism, r/anarchism, r/neoliberal and r/conservative
[biden] on r/socialism, r/anarchism, r/neoliberal and r/conservative
[shapiro] on r/neoliberal and r/conservative
[chomsky] on r/socialism and r/anarchism
Working with this list, we proceeded to analyse the negotiation taking place within the subreddits. Firstly, we applied Cortext to parse the text from the 4CAT data into adjectives used when discussing the figures we focused on. Once we obtained the lists of adjectives, we conducted two analyses: contrast analysis and network mapping.
The first map, concerning the contrast analysis, shows how the adjectives used when talking about a specific person in two subreddits fare against each other. It indicates the top adjectives utilized per subreddit, their frequency, as well as overlapping adjectives. This map can be applied on two subreddits at a time, so it serves the purpose of a cross-subreddit analysis; however, this is also the main limitation of that particular method. It does not allow for a comparison of more than two boards simultaneously, which could render different results.
The second one allows us to map clusters of adjectives with the purpose of depicting how each personality is talked about across a single subreddit. The network comprises of the co-occurence of the adjectives in the text bodies of the posts and comments. This analysis helps us visualize the vocabulary used for each public figure, as well as notice the heterogeneity (or lack thereof) of the discussion. This mapping is useful for the single subreddit analysis because it focuses on one board at a time.
In our fifth step, we created word trees with the query [name is] on 4CAT. We then exported them to Jason Davies’ Word Tree tool, in order to see how the comments directly refer to the personalities we picked. This analysis has a twofold application: it allows us to identify how a specific personality is directly referred to across all subreddits, as well as to look at how a single public figure is discussed on a specific board. Thus, these word trees serve as a means to perform a close reading of the overall sentiment regarding our figures of interest, as well as reveal the particularities of each board or political orientation.
This section consists of a case study to showcase the usage of the developed protocol. Following the aforementioned steps, the findings help us characterise web communities. The section will start with introducing some general tendencies and will then be split in two sections: cross subreddit and single subreddit analysis.
The first finding to touch on is the difference in the type of public figures discussed along the political spectrum. As fig. 1 shows, there seems to be a tendency for right-wing communities to discuss more contemporary political personalities involved in newsworthy events (Donald Trump, Joe Biden), whereas the left seems to be more interested in discussing historical figures (Karl Marx, Vladimir Lenin).
Figure 2: a short version of the most mentioned public figures on r/socialism (full version here: https://imgur.com/a/Bcqi84l)_
The public figures studied in this subsection are Bernie Sanders and Joe Biden. The first thing we looked at was the way adjectives were used when discussing the two politicians. Thus, the contrast map in fig. 3 shows the way in which the two political wings discussed Bernie in 2019. The list on the right points to the most used adjectives, which helps us get an overview of the vocabulary vehiculated in the specific subreddits. Sanders is associated with offensive or negative characteristics on r/neoliberal, whereas on r/socialism he is discussed in more neutral terms related to political positions. As the map shows, not many adjectives are used in both subreddits, and if they are, their usage is on the infrequent side of the spectrum.
_Figure 3: Contrast map for Bernie Sanders on r/neoliberal and r/socialism in 2019
To develop a more contextual understanding of the discussion about Sanders, we used word trees from r/conservative and r/anarchism to get the full picture. As the figures show, Bernie’s candidacy is highly incoherent across the two subreddits. The users in r/conservative consider him an old communist, whereas r/anarchism’s users depict him as the only opponent to Trump that could represent their interests.
Figure 4: Word tree from r/conservative using [bernie is] query in 2019
Figure 5: Word tree from r/anarchism using [bernie is] query in 2019
Using the contrast map as well as the word trees from all the chosen subreddits, we are provided with a clear image of whether the public figure belongs to the in-group both of the specific board and of the political wing, or not. These figures consequently help us understand the terms utilised to negotiate to which specific community a specific person belongs.
For the next subsection, we looked at Noam Chomsky in r/anarchism and Ben Shapiro in r/conservative. These public figures are not politicians, but their political positions align with the left and right wing respectively, so they are representative for popular personalities discussed on the chosen boards.
The first finding is represented by a network map for Chomsky, in which we can see the clusters of adjectives used according to their co-occurrence. The co-occurrence can be understood as the way in which particular adjectives are used in similar contexts when discussing a particular figure.
Figure 6: Network map of adjectives used regarding Noam Chomsky on r/anarchism in 2019
Again, the word trees provide a more contextual understanding of the meaning of adjectives. In the case of r/anarchism, we are confronted with a mixed opinion about Chomsky, as Fig. 7 unravels. The core item of negotiation is his political orientation and its alignment with the ideology of the subreddit. Thus, a contestation of whether he is /ourguy/ or not, takes place in the discussion.
Figure 7: Word tree from r/anarchism using [chomsky is] query in 2019
These two steps in a single subreddit analysis demonstrate that there is an ongoing, heterogenous negotiation about a public figure, who is not fully representative of the community’s interests, yet a significant milestone in their ideology.
A similar analysis on Ben Shapiro will shed light on his negotiation in r/conservative. From the beginning, the network map reveals, the vocabulary used is more heterogeneous with regards to Ben Shapiro. This is, in part, also mobilised by his controversial character within the right-wing subreddits. In consequence, the discussion is also divided and showcased in a more incongruous way.
Figure 8: Network map of adjectives used when discussing Ben Shapiro on r/conservative in 2019
The word tree in Fig. 8 exhibits the same heterogeneity with a disjointed general opinion on the public figure. It moves all the way from white supremacist to conspiratorial slurs regarding his religious affiliation. The negotiation process shows a lack of consensus regarding Shapiro’s belonging to either the in or the out group.
Figure 9: Word tree from r/conservative using [shapiro is] query in 2019
To sum it up, the single subreddit analysis uncovers a web community’s style of negotiation regarding who is representative of a particular subreddit community’s ideas and beliefs and gives an overview on the level of harmony within the community.
The first aim of our research was to create a working protocol that can be reused for characterizing web communities through finding out who are the key people discussed in a particular subreddit and looking at the ways in which they are talked about. Building on the main research question, we argue that reflecting on public figures inside pseudonymous web communities unravels a group dynamic that is representative of subreddits and correlated to political orientations. The group dynamic becomes a parameter of how congruent the collective identity is. In our case study, the collective identity is built through negotiations upon the status of several political figures that either represent or not the core values of the group.
Thus, the second aim was to use our protocol for a cross-subreddit and single subreddit analysis of political web communities. Answering the research question gains more depth when using the two analyses, as they offer both an overview of these communities, as well as the possibility of going in depth in specific communities as to unravel more nuanced details such as the vocabulary used and debate strategies.
We noticed that a point of difference lays in the fact that right wing subreddits tend to be more polarized, making use of incivility and offensive speech when discussing public figures. On the other hand, left wing subreddits have a tendency to approach the discussion from a more theorizing perspective, looking to discuss either historical figures or how the political orientation of a specific figure aligns with the values of the group.
Moreover, there seems to be another specificity which determines contestation and negotiation within web communities. Even though we expected to find some political figures that are held in high regard, we concluded that a common practice is exclusion rather than inclusion. Thus, we saw that across all the studied subreddits, the communities tend to form their identity by berating particular figures, using exclusion as a debate strategy in order to narrow down the list of personalities which represent their interests.
Another interesting point we can make is the different levels of toxicity across subreddits. Whereas r/neoliberal was the smallest one in our sample, it generated the most data - especially about Bernie Sanders and Joe Biden, and, as shown above, is amongst the offensive ones in terms of usage of words and passing judgement, without reaching agreement inside the group regarding the public figures discussed. Even though different opinions are encountered in the left wing as well, the negotiation takes place in a more refined setting, with less name-calling and a tendency to reach a conclusive agreement, rather than have miscellaneous discussions to no avail.
We showcase that Reddit is a good source of research in the post-API era as it offers access to a considerable amount of valuable data, which can be used for digital and societal research. The large and diverse enough selection of political communities, furthermore makes Reddit an attractive platform for research.
Our study shows how polarization unfolds on a sample of subreddits across the political spectrum while discussing public figures. The key findings in this regard are that right wing subreddits tend to be more aggressive and their main interests are discussing contemporary public figures, while the left wing subreddits are much more interested in historical and ideological figures and how political beliefs alter over time and how this phenomenon can be theorized and debated.
This protocol can, thus, be used for comparative study of web communities hosted on Reddit to provide insights regarding topics discussed, subcultural vocabulary used, collective identity practice, meme usage. Therefore, following the protocol can provide societal research and computational studies, and can be customized for multiple societal and academic issues.
Abramowitz, Alan I., and Kyle L. Saunders. “Is Polarization a Myth?” The Journal of Politics, vol. 70, no. 2, Apr. 2008, pp. 542–55. DOI.org (Crossref), doi:10.1017/S0022381608080493.
Bruns, Axel. “After the ‘APIcalypse’: Social Media Platforms and Their Fight against Critical Scholarly Research.” Information, Communication & Society, vol. 22, no. 11, Sept. 2019, pp. 1544–66. DOI.org (Crossref), doi:10.1080/1369118X.2019.1637447.
Freelon, Deen. “Computational Research in the Post-API Age.” Political Communication, vol. 35, no. 4, Oct. 2018, pp. 665–68. DOI.org (Crossref), doi:10.1080/10584609.2018.1477506.
Massanari, Adrienne. “#Gamergate and The Fappening: How Reddit’s Algorithm, Governance, and Culture Support Toxic Technocultures.” New Media & Society, vol. 19, no. 3, Mar. 2017, pp. 329–46. DOI.org (Crossref), doi:10.1177/1461444815608807.
Nagle, Angela. Kill All Normies: The Online Culture Wars from Tumblr and 4chan to the Alt-Right and Trump. Zero Books, 2017.
Nithyanand, Rishab, et al. “Online Political Discourse in the Trump Era.” ArXiv:1711.05303 [Cs], Nov. 2017. arXiv.org, http://arxiv.org/abs/1711.05303.
Perriam, Jessamy, et al. “Digital Methods in a Post-API Environment.” International Journal of Social Research Methodology, Oct. 2019, pp. 1–14. DOI.org (Crossref), doi:10.1080/13645579.2019.1682840.
Rogers, Richard. Doing Digital Methods. SAGE, 2019.
Ryan, Timothy J. “What Makes Us Click? Demonstrating Incentives for Angry Discourse with Digital-Age Field Experiments.” The Journal of Politics, vol. 74, no. 4, Oct. 2012, pp. 1138–52. DOI.org (Crossref), doi:10.1017/S0022381612000540.
Singer, Philipp, et al. “Evolution of Reddit: From the Front Page of the Internet to a Self-referential Community?” Proceedings of the 23rd International Conference on World Wide Web - WWW '14 Companion, 2014, pp. 517–522., doi:10.1145/2567948.2576943.
Southern Poverty Law Center. The Trump Effect: The Impact of The 2016 Presidential Election on Our Nation’s Schools. 2016.
Tuters, Marc, and Sal Hagen. “(((They))) Rule: Memetic Antagonism and Nebulous Othering on 4chan.” New Media & Society, Nov. 2019, p. 146144481988874. DOI.org (Crossref), doi:10.1177/1461444819888746.
Venturini, Tommaso, and Richard Rogers. “‘API-Based Research’ or How Can Digital Sociology and Journalism Studies Learn from the Facebook and Cambridge Analytica Data Breach.” Digital Journalism, vol. 7, no. 4, Apr. 2019, pp. 532–40. DOI.org (Crossref), doi:10.1080/21670811.2019.1591927.
Weninger, Tim, et al. “An Exploration of Discussion Threads in Social News Sites.” Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining - ASONAM '13, 2013, pp. 579–583., doi:10.1145/2492517.2492646.-- SalHagen - 30 Jan 2020