Streams of the Deep Web: Rebel Media, YouTube, and the algorithmic shaping of media ecosystems
- Anthony Burton
- Elena Aversa
- Alessandra Facchin
- Ivana Emily Škoro
- Henri Mütschele
- Shenglang Qing
- Myrthe Reuver
Summary of Key Findings
The Rebel, a popular YouTube
channel that presents itself as a traditional news broadcasting channel, is presented by YouTube
's algorithms as an equally-viable source of information as mainstream Canadian news properties such as the CBC, CTV, and various newspapers when using the de-facto method of discovering new videos and information on the platform: the search function. While the idea of the search query in of itself masquerades as politically neutral, the particular configuration of YouTube
's content itself -- short videos that are designed to be consumed as media objects for entertainment just as much as they are knowledge objects -- means that the videos served by a particular general query can shape a hypothetical naive user's conception of the topics in powerful ways, especially when YouTube
's search algorithm, as our research shows, does not by default discriminate towards any particular ideological or epistemological belief system. Yet the value-neutrality of the search query quickly becomes insidious when combined with the second key affordance of YouTube
's algorithmic structure, the recommendation system. Our research shows that, once a user picks a video from a particular source, they are very quickly brought into a mediasphere dominated by channels and content that share the political leaning of the original-watched video.
But what does this mean for the Rebel? Our research shows that the Rebel's channel plays host to a series of discourses traditionally associated with the contemporary political movement that has been christened the "alt-right", a collection of neofascist-adjacent ideologies that are varyingly anti-feminist, white supremacist, anti-immigrant, and a variety of other reactionary positions. And when we trace the path of this hypothetical naive YouTube
user, one who may just want to know something more about Justin Trudeau's response to the Crown Inquiry into Missing and Murdered Indigenous Women (our example used below), a single click can make all the difference to the political shape of the mediasphere that they are algorithmically drawn within. This raises important questions about the perceived "neutrality" of algorithms, the interpellative power of the way that they shape consumption habits, and the demarcation of digital political spheres within contemporary platform capitalism.
Digital media platforms ranging from YouTube
to Facebook and Twitter to Instagram have come under critical social scrutiny for their role in facilitating and even amplifying the spread of hateful content, unfounded conspiracy theories, and political disinformation into the public sphere (Karl 2019; Matamoros-Fernández 2017). This transformation, combined with the content that is being spread, has resulted in an “alternative” news media ecosystem that serves to provide a platform to voices traditionally excluded from mainstream media - particularly contemporary right-wing movements and their new manifestations whose content basis involves the aforementioned toxic content, false “news”, and conspiracy theories. Underlying much of this discourse is another web of anonymous, anarchic internet communities that De Zeeuw and Tuters refer to as “the deep vernacular web” (2019), anonymous internet subcultures found on websites and forums such as 4chan.org that largely see themselves in opposition to the mainstream discourses and cultures that take place on the more visible, onymous parts of the internet. Much of the problematic and extremist content making its way to social media and news platforms originates on this space. However, there is a lack of critical scientific study about how extremist content makes its way onto mainstream platforms; how this content is picked up by commentators on mainstream platforms; and the effect that this has on contemporary political discourse and debate.
Canada, more so than any other Western nation, finds itself the breeding ground for alt-right YouTube
-style newscasts, primarily through TheRebel.Media
. This makes it a fruitful site of study for these broader overall effects. With the upcoming federal election in Canada, scheduled for October 21, we are provided as researchers a critical social event through which to study the political rhetoric and ideological effects created through these YouTube
channels – especially as research shows that in election cycles, political content peaks during election years (Arthurs, Drakopolou & Gandini 2018)
2. Initial Data Sets
A list of all people who have hosted videos on Rebel Media was also created from the Internet Archive’s cached copies
of Rebel Media’s masthead page. Using the advanced search page, these files were pulled in HTML form before a bash script was used to strip HTML formatting and duplicates.
An instance of the Digital Methods Initiative’s Twitter Capture and Analysis Toolkit is currently set up to capture over 150 search queries on Twitter related to Canadian politics, with a special focus on the Rebel and the political issues that it discusses.
A collected dataset of significant alt-right YouTube
personalities either located in Canada or with a distinct Canadian orientation in what they talk about has been compiled since early March. Numbering 90 actors, this data set was collected through qualitative observation of YouTube
videos and alt-right discourses on social media as well as coverage in major news journalism. The collected data was then cross-referenced with quantitative measuring of 1st-level relationships between these actors, to ensure all major actors in the network were accounted for.
Preliminary network analysis based on a seed analysis with 2 levels of depth had been undertaken on this collection of accounts. The analysis has measured related channel networks, related video networks, and subscriber networks. What we have found in this data is a series of siloed commentary locations, with little overlap between the two. The larger overall network is comprised of smaller, tangentially-linked networks organized by nationality: shooting off from an American discourse sphere in the middle is a Dutch discourse sphere, a French-Canadian sphere, and an English-Canadian sphere. The latter is tied more closely to the American sphere in the middle, with Ontario-based actors towards the middle and Western Canadian actors reaching out to the far end.
3. Research Questions
Our questions were guided by the desire to investigate how these platforms, especially YouTube
, have vested economic and social interests in keeping their users within these online spaces and bringing these deep web vernaculars from their subucultural web platforms onto their own mainstream, capitalized spaces. We simultaneously hoped to investigate the particularities of the Canadian political and historical context of this spread of alt-right media through this contemporary, platform-mediated lens. Our research questions are, therefore, split into three categories: the deep vernacular web, YouTube
, and Canadian alt-right politics.
How does YouTube
locate the Rebel within preexisting news and topical spheres?
- What are the relationships or disjuncts in the ideological orientations of the Rebel and the spheres within which it is placed by YouTube ’s various relational algorithms?
- What relationship do these discourses have to the contemporary “fake news” ecosystem?
How does YouTube
act as a mainstreaming filter for the ‘deep vernacular web’?
- What are the affordances that YouTube, as a digital media platform, provides to users who both intentionally and unintentionally spread and consume these discourses?
- How do these affordances intersect with historical, mainstreamed news infrastructures?
In order to work towards an answer to these questions, we chose to split the Rebel as a media property into two analytical frames: the content of the videos themselves, and the network of accounts and videos that are created through YouTube
’s relational affordances. This categorical distinction is created in order to understand how the Rebel locates itself within a particular political topology (based on that which it can control, i.e. the content of the videos) alongside how YouTube
locates them within its own political topologies (based on algorithmic categorization and content serving).
The content frame is examined through the corpus of text data. This includes a frequency analysis of key words and topics in all string fields outlined above. The network frame is analyzed through the relational data provided through both the frame of an “influencer network” (expanding on Lewis 2018’s qualitative analysis of the same issue), examining the personalities that feature on the Rebel as well as other influential alt-right media properties) as well as the algorithmic network created by YouTube
that interpellates viewers through features that entrain users to watch videos beyond a particular video in question such as autoplay and related video modules.
We first began with an attempt to determine what, exactly, it is that the Rebel talks about in their videos. We first pulled the metadata, including identifying information such as video ids and textual information such as titles and descriptions, from every one of the Rebel’s 11,439 YouTube
videos using Bernhard Reider’s YouTube
Using the video ids, a python script was created to run Ricardo Garcia Gonzalez’syoutube-dl
iteratively over this list of videos and pull all 500MB of text that YouTube
generates for closed-captions (essentially, auto-generated transcripts).
After stemming the texts and removing stopwords, we decided to build a topic model using Latent Dirchlet Allocation. LDA Topic Modelling can be classified as unsupervised machine learning, meaning that we create an algorithm to find the underlying (topic) structure within this corpus of texts.
In order to train the most reliable version of the topic model, we needed to find the structure most similar to the actual underlying structure in the dataset, and the settings that gave us topics with minimal overlap. We achieved this by importing the LDA algorithm from the scikitlearn package in Python, and using a ‘grid search’ to find the optimal parameters.
In order to explore this topic modelling and use it for further investigation, we created visualizations, found below under “Findings.” This was done with the pyLDAviz python package.
We put the optimized topic model in a program from this package, inspired by the same pyLDAviz tutorial, with T-SNE dimensity reduction. This means we went from a model of 10 dimensions (of the 10 topic clusters), to 2 dimensions, which allows visualization on a 2D plot. A second visualization of the same topic model was created in order to see the change in topic discussion over time, by constructing a Bokeh plot with dots for each video.
In order to examine the networks within which YouTube
places the Rebel’s content, we qualitatively analyzed our topic model and chose three networks to investigate: those involving videos about Canadian politics broadly-speaking, those that involved discussions about the leaders of Canadian political parties, and those that involved discussions of climate change and Canada’s carbon tax, an issue on which we find the Rebel discussing frequently. This was done in order to illustrate the connections that YouTube
makes through its search algorithms, and the ways that the search algorithm itself defines particular categories of video. Again using Reider’s data tools, we used the Video List Module, which provides a network of relations among videos based on the YouTube
API’s “relatedtoVideoId” tag. We pulled 5 iterations of searches for six key words with a crawl depth of one, split into the three categories outlined above: “canadian election” and “canadian politics” for the first category; “scheer”, “trudeau”, and “bernier” for the second category; and “climate change” and “carbon tax” for the third category. What this means is that by entering search queries as our starting point, we were able to determine the network of videos that YouTube
serves to the user as an answer to their query -- both primary results as well as the videos that a user may be served after consuming their first search result.
One problem that we encountered using Reider’s YouTube
Data Tools was the issues related to the contrast between the information that the YouTube
API presents versus the iterative customized recommendations that YouTube
serves to users based on their previous activities. Such an idea invalidates any particular conclusions that can be made about the ideologically interpellating power of YouTube
’s recommendation algorithms, given that each query to the API does not incorporate the context of a user’s viewing history. To investigate this question, we created a research persona to investigate how YouTube
tailors content to each individual user based on their activity on the platform. We were, specifically, interested in analysing the ratio of right-wing and left-wing content in the recommended videos, and whether the ratio would change while browsing predominantly right-wing content. We created a spreadsheet delineating our search history, noting the political leanings of the selected and recommended videos (i.e. whether they were hosted on a left-leaning, right-leaning or politically neutral channel, as indicated by the website Media Bias Fact Check.
A persona is a “fictional, yet realistic, description of a typical or target user of the product.”
Our persona is an uneducated white male from Canada with an ironic sense of humour and a penchant for visiting 4chan and subreddits geared toward right-wing politics. He is mostly interested in gaming videos; from which he has garnered anti-feminist inclinations.
We also used the YouTube
Data tools in order to investigate the ideological inclinations of Rebel users by Comparing related video networks of the “most likes (ML)” and the “most dislikes (MD)” videos in the channel of Rebel Media. We selected the top 10 “most likes” and “most dislikes” videos in the Rebel Media channel on Youtube. We put videos seeds on Youtube Data tool and launch the “video network” function (1 iretation, 1 depth). Once obtain the data of video networks, we put the files on Gephi and we get two networks of the ML and MD videos. The networks means the videos recommended by algorithms on Youtube video webpages of the ML and MD videos.
1. Basing much of our network exploration off of our topic models was accomplished through the following visualizations. Descriptions of each are found below.
Figure 1.1:Link to first topic visualization
This interactive 2D plot allowed us to explore most connected words to topics, words most connected to certain topics, and the relationship between topics. The size of the clusters means how frequent this topic is mentioned. We can also analyze the similarity between certain topics. The principal components show, for instance, that the “economy” topic and the “environment” topic are closely related, as they are close together. We also see one of the most salient
words across all video transcripts are “oil”, “trump”, and “Canada”.
Additionally, we see a topic focused on gender and academic institutions, with related words such as “woman”, “university” and “child”. We also see that a closely related topic is one on Canadian politics in general, with common words like “Trudeau” and “party”. The words within these topic clusters indicate particular ways that the Rebel frames the issues in discussion. For example, in cluster 7, environmental issues are discussed alongside or within context with words such as “oil” and “industry.”
Link to second topic visualization
In our second topic model, we once again see that the 10 dimensions of topic relation were reduced to 2 dimensions with t-SNE. This visualization shows us the individual video transcripts plotted with their most common topic, with a “time” filter allowing us to slide through time.
2. The following visualizations contain our three search-query based channel networks, grouped together from the demarcated topics above. The nodes represent videos by particular channels that are returned from a search query, while they are coloured based on their geographic location and proximity to the political process. We see in the first set of queries, “canadian election” and “canadian politics”, that the Rebel is located squarely within the network of mainstream, established Canadian media properties, like the Canadian Broadcasting Corporation (CBC), Canadian Television (CTV), print media like the Globe & Mail and the National Post.
Figure 2.1: The video network served by YouTube
to the query “canadian election”.
Figure 2.2: The video network served by YouTube
to the query “canadian politics”.
Our topic modelling, alongside our information from the TCAT database, painted climate change -- and, in particular, Canada’s proposed carbon tax -- as an important political issue, so we chose terms surrounding this as our query. These visualizations show us that if you search for climate change by Canadian video networks, the Rebel is almost nonexistent. Yet searches that explicitly focused on the carbon tax brought the Rebel into context as an important discussant. This reflects our topic model’s illustration that the Rebel focuses on climate change from an economic perspective, and that the phrase “climate change” is not found in the most salient terms. It also illustrates that YouTube
’s search algorithms, to an indeterminate degree, understand the topical content of their networks.
Figure 2.3: The video network served by YouTube
to the query “climate change canada”.
Figure 2.4: The video network served by YouTube
to the query “carbon tax canada”.
Our third set of network visualizations show the Rebel’s level of involvement in queries for current Prime Minister Justin Trudeau, his conservative party rival Andrew Scheer, and Maxime Bernier of the People’s Party of Canada, who is (to borrow alt-right vernacular) “Our Guy,” if the amount of interviews with the candidate that they publish is any indication.
Figure 2.5: The video network served by YouTube
to the query “scheer”.
Figure 2.6: The video network served by YouTube
to the query “trudeau”.
Figure 2.7: The video network served by YouTube
to the query “bernier”.
3. Our research persona, on the other hand, partially answered our hypothesis. As illustrated by the visualizations above, there was no clearly-demarcated pattern to the political leanings of the generated recommended videos – on some news issues, we were recommended videos from a similar quantity of accounts across the political spectrum, while other videos that fell closer to the categories of “vlogging” or “entertainment” kept our recommended videos within this right-wing network. It is important to note, as well, that many of the videos that are categorized as coming from “left-wing” channels, such as HBO’s Real Time with Bill Maher, feature prominent voices from the alt-right, such as Milo Yiannopoulos or Gavin McInnes
Figure 3.1: A visualization of the path that our research persona took from the query “trudeau genocide”. Left-wing videos are documented in blue, while right-wing videos are documented in red. Information about the political orientation of each video was taken from www.mediabiasfactcheck.com
. To view the full-size implementation, please click here
Furthering this character-central recommendation walkthrough is the recommended videos marquee on our Homepage, which is the first set of videos that one sees when they log in to YouTube
. On a clean browser, YouTube
offered a range of topics tied to Fortnite, Ice Skating, Visual Arts, Beauty Tips, Drama, and other benign, apolitical content. Yet after only a day of following the right-wing video recommendation path, the feed was swamped with predominantly right-wing content, as can be seen in the figures below.
Figure 3.2: A screenshot of the YouTube
homepage as presented to the research persona before embarking on his walkthrough.
Figure 3.3: A screenshot of the YouTube
homepage after a day of “following the persona”.
Our topic model shows us the ways in which the Rebel positions itself, both temporally and topically within the broader online political sphere. In the first topic model, the proximities of each topic cluster allow us to conclude the ways in which the Rebel brings these topics in conversation with each other and thus how they associate certain issues. It is also telling that the topic most related to the environment topic is one concerned with the economy, with words like “tax” and “dollar”. The most frequently mentioned topic, with the largest cluster, is one concerning terrorist attacks and police, though the “gender and academia” topic is also common. We also see, for example, that the Rebel does not focus exclusively on Canadian content. Instead, we see a distinct proximity between their usage of keywords about American politics in cluster 3 (such as “Trump”, “Clinton”, “Iran”, and “Republican”) and their discussions about Canadian politics. In contrast, the topics that are distinctly Canadian (“Refugees in Canada”, “Alberta Oil”, and automotive issues; clusters 6, 7 and 10, respectively) are located furthest from the American politics cluster. It is notable, as well, that cluster 3 is tied for the second-largest in size, with 14.2% of all tokens (keywords) falling under this cluster. We can conclude here that the Rebel is not distinctly Canadian in content, but instead brings in American political issues as part of its own ideological framework.
Our second topic model illustrates that videos concerning (Muslim) terrorist attacks mostly came early in the Rebel’s video timeline, around 2015. The year 2016 saw the rise of videos about fracking, oil, and the economy in the “environment” and “economy” topics, while 2018 and onward brings an increase of videos about gender and academia. We hypothesize that these temporal shifts in topic mimics the shift in broader media discourses from the threat of ISIS to the rise in prominence of culture wars based around issues such as North Carolina’s bill barring transgender people from using their chosen bathroom,
with the caveat that the Rebel’s location in these broader discourses cannot be determined to be causal or consequential without further analysis.
Whether it be causal or consequential, the Rebel seems to gain a large audience and relevance from YouTube
’s particular platform affordances. YouTube
’s search query tool places the Rebel squarely in the centre of the news networks of highly-relevant Canadian news topics, such as climate issues, the Canadian election, and information about specific Canadian politicians. Meanwhile, if one is not approaching from a naive, politically-agnostic perspective but instead the perspective of, say, our research persona, YouTube
keeps them within a sphere that understands the Rebel in a different light – one based on political topics and orientation. With the cookies built up from our research persona’s viewing habits, we see recommendations sorted not according to the naive query logic of the channel networks created from the YouTube
Data Tools, but instead video offerings that iteratively become, for lack of a better term, increasingly right-wing. It is not until the platform reaches a conclusion (one impossible to quantify without access to the algorithm itself) that YouTube
serves ideological content to a user upon first-visit, as seen in our comparison of the homepages before and after our research persona’s time following the recommendations.
Our topic modelling shows us indicators of the particular political orientation put forth by the Rebel, and one that, based on qualitative observation, is much farther to the right of the spectrum than the Canadian news media sphere that YouTube
places the Rebel within (a well-established
claim). This, in combination with the observed political spheres created by YouTube
’s related videos algorithm and explored in our research persona walkthrough, illustrates the way that YouTube
“configures” the user as a political agent
according to an accelerated concept of what it means to pursue political information: our persona, for example, is based on a single day’s worth of data from the walkthrough. Yet if we take YouTube
’s homepage recommendations as an indicator of how the platform algorithmically conceives of the individual and their political orientation, the follower of right-wing recommendation paths is one who, upon the start of a new YouTube
-watching session (for lack of a better term), will essentially only be interested in videos that perpetuate the particular ideological frame they have been exposed to thus far.
But how are these ideological frames constructed for the user? If I am to go to YouTube
to understand Canadian politics, our research above illustrates the Canadian political frame that it creates for this knowledge-seeking user. What is more important than the fact that the Rebel is on the right-side of the spectrum of this frame is the fact that the Rebel is placed within this frame at all. A news media site that has explicit ties to (as in, very recently employed) a leading figurehead
of the Canadian white nationalist movement certainly constitutes a shift in the political frameworks
deemed equally viable as longstanding mainstream media properties.
This project raises many broad questions about YouTube
’s algorithm, the role that particular user syntaxes play within it, and the ways that YouTube
understands and facilitates discursive participation within political media spheres. We also cannot draw any normative claims about the Rebel’s utilization of YouTube
-- whether their adoption of newscast syntaxes came because of or reinforced the algorithmic prioritization of their content on YouTube
that has facilitated their success. But what this research points to is a particular path wherein the axis of YouTube
’s algorithm, Ezra Levant, and the deeper alt-right sides of the platform can make a Rebel out of you or I.
Reider, Bernhard. ‘YouTube Data Tools’. Software. YouTube
Data Tools, 2015. https://tools.digitalmethods.net/netvizz/youtube/index.php
It is important to note is that the term “topic” has no specific theoretical definition: it simply means, in this context, “words that often co-occur (close) together”. It should not be interpreted as meaning “importance” or “interest”, as these are far more qualitative and nuanced values that cannot be simply quantified by term co-occurrence. However, in practice, the model does an excellent job of finding and visualizing frequently used, related words in a large set of text documents.
A grid search tests multiple models, and picks the best one based on its lowest complexity and highest log likelihood. The parameters that Myrthe Reuver, our designer of the topic model, was interested in were the amount of topics (K) and the learning rate (learning decay). The grid search model was described in a tutorial on machinelearningplus. The optimal parameters for this dataset turned out to be K=10 topics, with 0.9 learning decay.
A port of the R data science language’s LDAvis package. It can be found at https://github.com/bmabey/pyLDAvis
We are indebted to this Kaggle tutorial that provides a frame for visualizing LDA topic modelling with Bokeh and T-SNE, found at https://www.kaggle.com/yohanb/lda-visualized-using-t-sne-and-bokeh
Found at http://mediabiasfactcheck.com
, it is a crowdfunded platform that performs qualitative analysis of various new media websites. We assumed that this qualitative approach, and the accessible, vernacular language of the website’s author would lead to it being an appropriate source for our research persona.
Harley, Aurora. ‘Personas Make Users Memorable for Product Team Members’. Nielsen Norman Group, 16 February 2015. https://www.nngroup.com/articles/persona/
Saliency here means the weighted frequency, or the most common words across videos.
See, for example, Robertson, Gary D. and Emery P. Dalesio, “In Late Night Move, NC Guv Signs Anti-LBGT Bill Into Law.” TPM, 24 March 2016. Accessed 21 July 2019. https://talkingpointsmemo.com/news/north-carolina-bill-blocks-anti-discrimination-measures-transgender-bathrooms
See, for example, the “News Websites” section at http://www.thecanadaguide.com/basics/news-and-media/
, which is a website that pitches itself as a “primer” to Canadian culture for newcomers.
This concept of “configuring” or “scripting” users comes from Woolgar’s investigations into usability trials, a set of responsive instructions given to a product’s user that consciously or unconsciously delimit their engagement with the product. See Woolgar, Steve. ‘Configuring the User: The Case of Usability Trials’. The Sociological Review 38 (May 1990): 58–99. https://doi.org/10/gffmws
If we’re so inclined, a shift in the Overton window.
It used to be the case that Canadian media kept its moral transgressions to defrauding investors