Agathe Bourdarias 13855158
Jasmin Shahbazi 12553093
Kiana Saidah 13427989
Marsha Batubara 14218488
Tomi Fischer 100517150
Open Source Intelligence (OSINT) used to be a niche practice on the internet, but has recently earned a lot of interest from journalists, human right organisations and institutions. Fiorella (2022) explains that the reason for this appeal is due to the fact that this practice allows for anyone to pursue an investigation: “if you’ve an internet connection, free time, and a stubborn commitment to getting the facts right, then you too, can be an open source researcher”. The OSINT research community builds bridges off of online forums, with the most prominent place of exchange of practices, information, and findings being Twitter as it allows for a form of anonymity (Fiorella, 2022). As we witness the democratisation of OSINT practices, questions arise about the power of such research, of data, and of the ones actively pursuing OSINT goals.This research aims to map OSINT landscapes to understand the current dynamics ruling over the practice. Mapping allows for visual representations of forces and social dynamics. According to Rogers et. al.’s (2015) methods for issue mapping, Latour’s work mentions that “there is no such thing as fixed groups a priori, but instead group-like formations in becoming that are in continual development, that are often arrangements that change and whose boundaries need to be defined over and again” (p.16). The digitally-driven methods issued by Rogers et. al. (2015) paint a portrait of the OSINT issue-space, through its definition and analysis of relations. Using Twitter as a starting point, this experimental research presents the case of the #OSINT issue space, highly consulted and interacted with in light of the ongoing war in Ukraine. OSINT researchers are driven by this more immaterial side of the conflict to unravel a truth, based on evidence from open source information.
Digital-methods have been in use since the early 2000s to analyse issue spaces. As Rogers et al. (2015) explain, the term ‘issue mapping’, “takes as its object of study current affairs and offers a series of techniques to describe, deploy, and visualise the actors, objects, and substance of a social issue” (p. 9). In other words, it deploys a set of different analyses using web data to uncover patterns and connections within topics of online discussion, which can range from actors to the actual topics. The topics investigated are “especially scientific and technical topics (e.g. climate change, genetically modified food, and energy policy) involving both laypeople and clearly delineated experts” (Burgess & Matamoros-Fernández, 2015, p.81) (Venturini, 2012). Therefore, using digital methods to conduct issue mapping enables the use of a focused methodology to highlight issue publics through their online engagement (Marres, 2015).
Our initial dataset is composed of 990,055 tweets scraped from Twitter using 4CAT. These tweets were tweeted between the period of January 1, 2020 to November 1, 2022 and contained the term “#OSINT” within the body. Initial investigation revealed the dataset to be riddled with bot accounts that tweet thousands of tweets with empty engagement (e.g. the account @DataAbyssAI shown in Figure 1), adding no meaningful discussion to the topic. As engagement is an essential element of an issue space (Rogers et al., 2015), we decided to filter per engagement. Twitter has 3 forms of engagement that show up in our data: replying, retweeting, and liking. Replying requires more thought and commitment than the other two so we decided to try using likes and/or retweeting as a filter. We initially filtered for retweets and found that tweets that had thousands of retweets were getting no likes. This behaviour seemed suspicious so we settled on using tweets that had at least 1 like as a filter for our data.
Fig. 1: Pinned tweet by Data Abyss.
We used 4CAT to manipulate our filtered dataset and tokenized the body of tweets to run Natural Language Processing queries and visualise our findings. We ran multiple iterations of tokenization, pulling out certain terms that we deemed too broad or associated with spam (e.g. emojis and non-latin alphabets). After curating our data, we were able to pick out trends in the topics discussed as we followed the issue space throughout the entire time period our data represented. The #OSINT issue space focused mainly on discussions regarding cybersecurity and OSINT tools up until the invasion of Ukraine, when the topic of the war became the top issue discussed within the space. We decided to filter our data once more, focusing on the time period between December 2021 and July 2022 where the most significant changes in the landscape occurred. We created a timeline of events during the few months following Putin’s announcement of the invasion of Ukraine. This timeline helped us connect and contextualise the increase in engagement and change in issues discussed online with events happening on the ground. After applying the context of the war, we investigated this dataset further and looked at the most popular tweets and users to better map the #OSINT landscape.
Fig. 2: Data filtering and visualisation protocol.
To visualise our findings, we generated 3 different graphics using 4CAT and other software. Figure 2, above, shows the steps we took to filter our data before creating visualisations. Using 4CAT we were able to generate a histogram (Figure 3) showing the number of tweets per month in our dataset of liked tweets regarding #OSINT. This was a great starting point as it showed us very clearly where to focus our mapping of the issue space.
Fig. 3: Histogram of #OSINT tweets from January 2020 to October 2022. Highlighted is an increase in #OSINT discussion corresponding to Putin’s announcement of the invasion of Ukraine.
To create our next visualisation, a rankflow diagram (Figure 4), we used a combination of 4CAT and RankFlow, a tool by Bernhard Rieder (2018) that we found online. After noticing a difference in the number of tweets via the histogram we decided to focus our research further on the time period between December 2021 and July 2022. First we found the 10 most important terms for each month during this time period using a term frequency-inverse document frequency (tf-idf) function in 4CAT. We then counted how many times each term appeared in the separate tokenized text files. Finally we took this data and put it into the RankFlow tool to create our rankflow diagram. This visualisation shows how the discussion around #OSINT evolved as events surrounding the war against Ukraine occurred. We can see the exact month where the majority of the conversation changes from cybersecurity and OSINT tools to the war against Ukraine. To add more detail we created a timeline of events and placed it above the rankflow diagram to show what was happening outside the Twitter space to influence discussions surrounding #OSINT.
Fig. 4: Rankflow Timeline of the evolution of the #OSINT issue space landscape from December 2021 to July 2022.
Our final visualisation (Figure 5) used a combination of 4CAT and a free Word Cloud Generator we found online. We combined the tokenized text files of our chosen time period, counted the number of times each of the top monthly terms appeared, then entered them into the online generator. This visualisation gives a quick overview on the issues discussed from December 2021 and July 2022.
Fig. 5: Word Cloud representation of the biggest topics in the #OSINT issue space from December 2021 to July 2022.
Discussions around #OSINT on Twitter has grown tremendously since Putin’s announcement of the invasion of Ukraine in February of 2022. As seen in the histogram (Figure 3) there is a clear rise in the number of tweets around this time, going from 4580 tweets in January 2022, to 7110 in February 2022 (MoM increase of 55%). While the volume of tweets drops after February, the number of tweets featuring #OSINT averages about 5695.9 a month, which is still an increase on the average before Putin’s invasion announcement.
Focusing the research on the 8-month period between December 2021 to July 2022, the rankflow diagram (Figure 4) demonstrates a change in the main issues discussed in the #OSINT Twitter space before and after the invasion announcement of Ukraine. Topics surrounding ‘cybersecurity’ and ‘infosec’ lead the conversation with #OSINT before the announcement. After Putin’s announcement to invade Ukraine, the discourse shifts significantly, as ‘Ukraine’ and ‘Russia’ become the focus of discussion. This is not to say that discussion around ‘cybersecurity’ and ‘infosec’ halted. Rather, the number of occurrences per month of these topics was stable, with an average of 537 for ‘cybersecurity’ and 383 for ‘infosec’ throughout the 8 month period. However, ‘Ukraine’ and ‘Russia’ undergo a sudden growth in monthly occurrences with ‘Ukraine’ rising from 87 in December 2021 to 2200 in February 2022 and ‘Russia’ going from 132 to 2060.
Unlike Ukraine, ‘Russia’ was one of the more discussed topics before Putin’s invasion announcement. Before the announcement, Russia was discussed in the #OSINT community in reference to OSINT tools used to decipher general misinformation, such as identifying Russian-led misinformation websites or bugs (Figure 6). However, after the announcement, the most engaged with tweets in the #OSINT landscape were OSINT investigations (such as tracing and verifying claims) against Russia in context of the war (Figure 6). Despite being featured both before and after the war, how ‘Russia’ was discussed in the #OSINT community shifted. The war against Ukraine triggered users to perform OSINT investigations with pro-Ukraine incentives, rather than investigating more general disinformation/misinformation claims.
|Fig. 6: Example tweet before Putin’s invasion announcement (H I Sutton, 2021).||Fig. 7: Example tweet after Putin’s invasion announcement (Forrest Rogers, 2022).|
Another notable observation is how topics of discussion become the most discussed topics. In our research, we found there to be a significant portion of individuals spamming tweets. Notably in April and June of 2022, 4 of the 10 topics were tweets by one respective user per topic. For instance, one of the most important terms we found after running a tf-idf function in 4CAT on our data was ‘russianoil’ which was a specific topic discussed by an individual user, RuOilTracker. In our Rankflow Timeline we note 8 total important terms that show up in our data due to passionate individuals in the space, such as ‘butcherofbucha’ ‘ukrainerussiawar’. When inspecting these accounts that were tweeting, we didn’t classify them as bots, but rather “passionate users” who use the platform in their own way. They are still involved in the conversation surrounding #OSINT, however their discussions may tend to hold more weight in the data than they represent in the issue space as a whole.
In mapping the issue space from December 2021 to July 2022 we found 32 terms deemed most important by the tf-idf function in 4CAT, displayed in Figure 5. Notably 16 of the words (in blue) are related to the war against Ukraine. 9 are more technical discussions related to information security and OSINT tools. 4 are in the realm of aviation. Therefore, on a more semantic and holistic note, the most engaged with discussions in the #OSINT landscape featured tweets in the context of the war against Ukraine.
In our research we found the #OSINT issue space changed in a big way due to Putin’s announcement regarding the invasion of Ukraine. We see that tweets mostly contained the terms ‘cybersecurity’ and ‘infosec’ with discussions surrounding the general use of OSINT tools before Putin’s announcement and then afterwards tweets mostly contained the terms ‘ukraine’ and ‘russia’ and discussions surrounded using OSINT to share and correct info regarding the war against Ukraine (Figure 4). We also found that the amount of tweets containing #OSINT grew from 3071 average monthly tweets between January 1, 2020 and January 31, 2022 to 6051 between February 1, 2022 and October 31, 2022.
For future research we would want to explore the parameters of engagement. We chose tweets with 1 like as the bar for engagement, however given more time we might investigate further to see if raising the bar to 5 or 6 likes might strengthen the data further and remove some of the unclear data regarding passionate individuals who tweet the same terms over and over as hashtags. Noting the occurrences of the individual users that have their own respective occurring topic (such as ‘russian oil’ or ‘butcherofbucha’), these passionate users tend to muddy the data collected with specific hashtags and hashtag groups that aren’t necessarily representative of the entire #OSINT landscape. A large majority of these tweets contained only one like. Given that an issue space is defined by engagement (Rogers et al., 2015), it is very important to make sure the parameters of engagement are clear and precise.
The focus of our research was on mapping the issue space for #OSINT, however if given more time we might have collaborated more with the group handling actors in the issue space, as getting a clearer look at who is tweeting is important as well. We could better define who are discussing the issues and thus eliminate topics that pop up which may not be as widely discussed.
Finally, with further research into the OSINT space we would include tweets that contain the word OSINT or find relative terms and include those tweets in our data to paint a bigger picture of the space. Only using the term #OSINT as an initial data filter limits our data to those who use hashtags and leaves out those who might discuss issues regarding OSINT without strict boundaries to their expression.
Burgess, J. & Matamoros-Fernández, A. (2016). Mapping sociocultural controversies across digital media platforms: one week of #gamergate on Twitter, YouTube, and Tumblr, Communication Research and Practice, 2(1), 79-96, doi:10.1080/22041451.2016.1155338
Fiorella, G. (2022, August 31). First Steps to Getting Started in Open Source Research. Bellingcat. https://www.bellingcat.com/resources/2021/11/09/first-steps-to-getting-started-in-open-source-research/
Marres, N. (2015). Why map issues? On controversy analysis as a digital method. Science, Technology & Human Values, 40, 655–686. doi:10.1177/0162243915574602
Rieder, B., Matamoros-Fernández, A., & Coromina, Ò. (2018). From ranking algorithms to ‘ranking cultures’: Investigating the modulation of visibility in YouTube search results. Convergence : The International Journal of Research into New Media Technologies, 24(1), 50-68. https://doi.org/10.1177/1354856517736982
Rogers, R., Sánchez-Querubín, N., & Kil, A. (2015). Issue Mapping for an Ageing Europe. Amsterdam University Press. http://www.jstor.org/stable/j.ctt155j2dk
Venturini, T. (2012). Building on faults: How to represent controversies with digital methods. Public Understanding of Science, 21(7), 796–812. doi:10.1177/0963662510387558
H I Sutton [@CovertShores]. (2021, December 3). *Attacking Open Source Intelligence: The Fake AIS data epidemic explained* new #OSINT video -> https://youtube.com/watch?v=dnuKpd0TNKc Examples, and possible ways. Also two major variations. #Russia, #China discussed. Https://t.co/Q2KUxgW0oO [Tweet]. Twitter. https://twitter.com/CovertShores/status/1466747998969516038
Forrest Rogers [@Forrest_Rogers]. (2022, February 22). Putin convened an unscheduled meeting with his Security Council in Moscow on Monday. The meeting was broadcast at 5 pm. But what time was it really held? Let’s look at some participants’ watches. Sergei Shoigu & Sergei Lavrov prep at 11:45. #OSINT #UkraineRussia #Russia #Ukraine https://t.co/YlnLodkjdq [Tweet]. Twitter. https://twitter.com/Forrest_Rogers/status/1496254107660738568
|MappingOSINTIssues Poster.pdf||manage||32 MB||13 Jan 2023 - 15:22||JasminShahbazi||[Poster] IssueMappingOSINT|
|mov||MappingOSINTIssues.mov||manage||74 MB||13 Jan 2023 - 14:47||JasminShahbazi||[Video] IssueMappingOSINT|
|png||Screenshot 2019-07-22 at 16.42.17.png||manage||527 K||21 Oct 2019 - 13:37||EmilieDeKeulenaar|
|png||Screenshot 2019-07-23 at 12.25.46.png||manage||60 K||21 Oct 2019 - 13:24||EmilieDeKeulenaar|
|png||Screenshot 2019-07-23 at 16.10.01.png||manage||327 K||21 Oct 2019 - 13:31||EmilieDeKeulenaar|
|jpg||WW2_WikiTimeline-03.jpg||manage||66 K||21 Oct 2019 - 13:28||EmilieDeKeulenaar|
|png||cluster 2.png||manage||1 MB||21 Oct 2019 - 13:44||EmilieDeKeulenaar|
|jpg||mappingOSINT2.jpg||manage||23 MB||13 Jan 2023 - 14:54||JasminShahbazi||MappingOSINTIssuesPoster|
|png||pasted image 0.png||manage||1 MB||21 Oct 2019 - 13:23||EmilieDeKeulenaar|
|png||pasted image 2.png||manage||1 MB||21 Oct 2019 - 13:32||EmilieDeKeulenaar|
|png||unnamed-2.png||manage||12 K||21 Oct 2019 - 13:34||EmilieDeKeulenaar|