According to Google Images: Visual epistemologies of climate change and biodiversity loss
Warren Pearce, Elena Pilipets (facilitators), Maud Borie, Laura Bruschi, Ariel Chen, Daniele Dell’Orto, Matthew Hanchard, Alessandro Quets, Zijing Xu
According to Google Images: Visual epistemologies of biodiversity loss and climate change
Google Images search engine is probably the most important online gatekeeper of visual culture worldwide, with over two billion searches conducted on the platform every day (Fishkin, 2018). Despite its importance, there is a scarcity of analyses on how the platform determines the visibility of images within its search rankings. For example, are ‘authority’ metrics in a similar way to its textual web search results (Birkbak & Carlsen, 2016)? Or do computer vision techniques offer new criteria for promoting images and the websites that host them? By addressing three core questions, the project builds a dataset of country-specific search rankings from Google Images for a range of search queries. We are interested in the image content, the linked web pages and how search results for image and text differ. We will also make use of a historic dataset of Google Images search results from 2019.
2. Research Questions
How does Google Images see and show climate change and biodiversity loss?
- To what extent does Google Images homogenise or diversify visions of climate change and biodiversity loss?
- How does the 2022 vision of climate change compare with 2017?
- What do these results tell us about Google Images’ ranking regime?
3. Methodology and initial datasets
The research followed seven protocols, as below:
(1) Data was collected using anonymous Google Chrome/Mozilla Firefox accounts (to minimise personalisation effects). We searched both Google Images and Google Search with ‘biodiversity loss’ and ‘climate change’ as key terms’ for six countries: Mexico, Netherlands, Nigeria, China, Brazil and Australia. The top 50 images of these countries were selected and compiled into a dataset. Countries were selected to be representative of a range of geographical economic situations (at least one per continent), and types of engagement with climate change and biodiversity loss. We searched our key terms in English and in the official national language for each country i.e. for Google.nl we searched for ‘climate change’ and ‘klimaatverandering’. Comparing the top 50 results for each search (See Dataset 1) provided an overview of the most prominent visual rhetorics and images in circulation;
(2) Using a Python script (See Script 1) to scrape the top fifty results garnered a .csv file (Dataset 2) with key characteristics of each image, i.e. the rank assigned by Google, the image url, webpage location, domain, as well as the image alt text, size and file extension type;
(3) To examine the most similar/mobile images across datasets we used Google Cloud Vision
(See Instructions 1) and Memespector
(See Instructions 2). The returned .csv file (See Dataset 3) gathered detail on how the different computer Vision APIs categorise images. This included the labels used by Google Cloud Vision (e.g. tarantula; cliff; patas; fiddler_crab; mountain_bike), and web entities applied through Google’s web detection module
to the fifty highest ranked images within each country (by relying on the detection of sites with matching images and web references that Google knowledge graph (see Singhal, 2012) assigned to images based on site-specific image captions). This provided us with an understanding of how highly ranked images for ‘biodiversity loss’ and ‘climate change’ compare between country-specific queries (See Research Question 1);
(4) We manually coded the dataset to identify the most commonly recurring images both within each country (Dataset 4), requiring an analytical process of differentiating between identical and similar (augmented/amended versions of an image) infomed by semiotics. This provided an understanding of the types of images promoted by Google for particular search terms (See Research Question 2). Image Sorter
was used to organise the images by country, and for the entire dataset, so as to get an insight into the ‘style spaces’ (Manovic 2011) of (i) the issues (aggregated) and (ii) each country.
(5) We merged the country specific datasets into a master spreadsheet (in .CSV format) for each search term i.e. biodiversity loss (Dataset 5) and climate change (Dataset 6), and then used the master spreadsheets to identify the ten most commonly recurring images using Clarifai general concept to sort the images by the similarity (sorting the column from A to Z). We also accessed Google Vision API’s web detection feature through Memespector to identify the websites containing images that fully match the top three ‘perfectly repeating’ stock images per search query;
(6) One of our project aims (See Research Question 3) was to compare Google’s Search and Images search engine results. We used a VPN (CyberGhost
VPN) and gathered our data using anonymous Mozilla Firefox accounts. The two queries were searched in both English and the country’s native language (if not English), replicating what we did in Protocol 1. We scraped Google’s Search engine using the Search Engine Scraper (DMI, 2010) and extracted the domain of the first 10 ranking websites. We compared the results of the queries in each country using the Triangulate tool (DMI, 2008);
(7) We conducted a social semiotic analysis to deconstruct the visual grammar of the most recurring five images in the climate change dataset and biodiversity dataset (10 images in total). This enabled us to address Research Question 2. To do so, we mainly draw on the analytical tools outlined by Kress & van Leeuwen (1996). This allowed us to show more detailed representational characteristics and patterns of the images that are not able to identify by digital methods. Social semiotics has also enabled us to reveal the subtle ideological messages that are embedded in this Google Images collection.
As outputs, alongside the seven datasets and this Wiki, we also produced two posters
. The design of the posters required several decisions on how to reduce the data and what to include or exclude.
Data and protocols were collated in a shared Google Drive folder. A range of tools were used for data collection, in particular a Python script and Downloadthemall, and for data analysis: Clarifai, Cloud Vision, Memespector, Imagesorter, Rawgraph and Gephi. Data was centralised in csv format using google spreadsheets. We circulated around the images, displaying them in different ways: according to the initial Google Image ranking (as on the image wall), using most frequent labels and web entities, and also by colours.
Google Cloud Vision API web detection feature draws and updates its information in relation to the authority of landing pages an image is attached to. By default, Google limits the range value to ten pages containing full matching images (this can be changed in the "maxResults" settings). Where the total number of full matching images exceeds 10, it is unclear how Google chooses which to include in the Google Vision results. One possibility, based on observation of Google Vision results in this and previous projects, is that the number of clicks on images and previous user queries (known as query logs) that match image captions can influence the prioritisation of both web pages and web entities in the results. Query log information may also be used to associate locations with websites and web pages (Slawski 2012).
Through our analyses of the different datasets we found a set of similarities and differences between the prevalent Images in each country for each search term. Below, we set out our analysis of [biodiversity loss] and [climate change] separately, before bringing them together in the Discussion and Conclusion sections.
4.1. Biodiversity Loss
4.1.1. Biodiversity loss: Differences between countries
We retrieved 50 images for each of the six countries (n = 300 images) with [biodiversity loss] as the search term. Some images are widely represented, in particular images of damaged forests as well as a significant number of images showing a lonely animal in front of a damaged ecosystem. The analysis of each dataset however suggests that some differences exist between countries (see Figure 1). For Australia, Mexico, and Nigeria damaged forests are present in roughly 50% of the top 50 images (Table 1, Figure 2). In the Netherlands, the share of forest-related images is low, with a greater number of scientific graphs and charts. The China results are even more dominated by images containing text and graphs, with barely any representation of ecosystems and none at all of damaged forests.
Figure 1: Style spaces of biodiversity loss across the six countries. (Top 50 images for each pair of country/keyword were downloaded via DownThemAll
and sorted by visual similarity using ImageSorter
) (hi-res version
Damaged forest on image
Table 1: Presence of damaged forest on picture
The dominance of forests as the main visual proxy for ‘biodiversity loss’ leaves some key aspects of global biodiversity policy absent from the search results (e.g. oceans, wetlands, meadows, polar regions). Overall, no single image dominates the dataset, but different “family of images” (genres) are dominant. We can broadly characterise these as:
Images of degraded forests;
Images containing scientific materials (graphs);
Cartoonesque depictions (e.g. educational material); and
Generic ‘global’ images bearing some similarity with the ones found in the climate change dataset (see below).
While the most repeated images across countries are images of degraded forests used as proxy for biodiversity loss (figure 2) and scientific diagrams depicting the issue (figure 3).
Figure 2: Example image of a degraded forest used as a proxy for biodiversity loss (Source: Watts, 2018)
Figure 3: Example image of a widely circulated scientific diagram illustrating biodiversity loss (Source: Leclère et al., 2020)
4.1.2. Biodiversity loss: Comparing Google’s search ranking criteria
For the six countries there is little to no overlap (max: 3, min: 0) between the top search results associated with [biodiversity loss] on Google Images compared to those on Google Search. This suggests that the ranking logic of Google Images is very different from the one of Google Search. A broader range of websites are associated with Google Images. We used the Triangulate tool to compare the search results (for an example, see Figure 4).
Figure 4: comparison of search result for [biodiversity loss] in Mexico (green represents unique host pages)
Table 2 shows the number of unique websites found in the top 10 results when querying [biodiversity loss] on Google Images and Search. The Netherlands is the country that shares the most websites between the two engines: these are Wageningen University & Research, the European Parliament and Compassion in World Farming, a worldwide NGO which aims to end factory farming practices, one of the main causes of biodiversity loss. Australia shares the encyclopaedia The Britannica, which can also be found in Nigeria alongside the website of a multinational electric utility company. Finally, China only shares the International Atomic Energy Agency’s website.
Table 2: number of unique websites found in the top 10 of [biodiversity loss] per country
4.2. Climate Change
4.2.1. Climate change: Similarities and differences between countries
Figure 5: Top ranked images on Google Images for [climate change]
With the search term [climate change], we retrieved 50 images per country and 300 images in total. Overall, the image results for climate change exhibit a lot of similarity across the six countries, with images of natural landscapes, trees, the Earth, oceans and ice and some charts and data visualisations. The Earth appears cartoonised and/or augmented through manipulated photographic representation in different contexts. For example, (1) anthropomorphic earth wearing thick cloth and mask, being hot and sweaty, (2) fake images show a human hand holding the Earth; and (3) photographic earth floating on the sea. The ocean and icebergs usually appear together, sometimes accompanied by a lone polar bear standing on the melting ice. Most images contained trees, and were split half and half between lush green trees and blue skies to one side, contrasting with withered trees and dry land often on the other. People appeared in very few images, but when they did there was a similarity in the visual rhetoric employed across different countries. Here, most images presented a solitary human facing away from the viewer and looking across an endless horizon showing a parched drought-ridden landscape.
Further analyses of the different datasets revealed disparities between countries. For Brazil and Mexico, images containing text and emission of greenhouse gases were presented most frequently. Meanwhile, the dataset for China shows a quarter of its images contain polar bears and penguins, and it also indicated that climate change is regarded as a politically charged issue there, with international cooperation emphasised (see Figure 6).