Things Internet Researchers Should know About Google

This page lists useful tips for doing research with Google search.

Good query design

See our lecture on query design.

  • Consider what it takes to turn search into research.
  • Look into Google’s search operators
  • Use quotes around every word which need to be literally included, as Google normally
    • makes automatic spelling corrections
    • persoanlizes search by using information such as sites visited before
    • includes synonyms of search terms (matching “car” when you search [automotive])
    • finds results that match similar terms to those in the query (finding results related to “floral delivery” when searching [flower shops])
    • searches for words with the same stem, e.g. “running” when [run] was submitted
    • makes some of the terms optional, like “circa” in [the scarecrow circa 1963]
  • Some queries might result in overly fresh results, or as Google puts it: “Search results, like warm cookies right out of the oven or cool refreshing fruit on a hot summer’s day, are best when they’re fresh”
  • Take into account transliterations: “osama bin laden” vs “osama ben laden” vs the Arabic spelling. Use e.g. Wikipedia’s articles in other languages.
  • Discrete and underspecified search terms often work well
  • When noting down queries in a research report one can use brackets. E.g. we queried [HIV] in google.co.uk and later refined the query to [“AIDS”] so that no synonyms are included.

Disentangling the researcher from Google

When using the Google scraper or Lippmannian Device with our Firefox toolbar, the researcher needs to take a few steps to ensure that day to day activities do not interfere with research.

Google and the local

See our video tutorials on analyzing engine results and localizing web sources.

  • The default local (as provided by the engine and disentangled of the researcher) is what is important.
  • Google automatically ‘localizes’ its users (based on IP, toolbar or sidebar) and provides targeted results for the user’s location. One cannot turn of localization, but when one wants results for another country, visit a different Google local domain instead.
  • How Google decides what ‘nationality’ a site has: Geotargeting factors uses cctld, geotargeting for gtlds (webmaster tools), server locaction, other signals (addresses and phone numbers). At the bottom of this list is a list of local domain Googles
  • Cross-language information retrieval updates: “For queries in languages where limited web content is available (Afrikaans, Malay, Slovak, Swahili, Hindi, Norwegian, Serbian, Catalan, Maltese, Macedonian, Albanian, Slovenian, Welsh, Icelandic), we will now translate relevant English web pages and display the translated titles directly below the English titles in the search results. This feature was available previously in Korean, but only at the bottom of the page. Clicking on the translated titles will take you to pages translate from English into the query language.”
  • Google Transparency Report lists the number and type of requests for content removal per state: http://www.google.com/transparencyreport/

Harvesting and triangulating Google results

The symbiosis of Google and Wikipedia

  • Wilkinson and Huberman (2007) find evidence of a direct correlation between the visibility level of a certain article (measured in terms of its Google pagerank popularity level) and the number of edits received by that article. See Wilkinson, D.M., and B.A. Huberman, 2007. “ Assessing the Value of Coooperation in Wikipedia
  • It has been shown that the pagerank has a strong correlation with the number of times a Wikipedia page is viewed. See Spoerri, A., 2007. " What is popular on wikipedia and why?," First Monday.

Algorithm changes

In what follows, Google algorithm changes that have resulted in new, or changing, modes of research that were not possible before the change type are listed; from the first named and confirmed Boston update in 2002 until June 2015. The timeline is by no means exhaustive. Google changes its algorithm 500-600 times per year. While most of these changes are minor, others are ‘major’ in that they have the biggest impact on (re-)search. A selection is made from the work by SEO consultancy MOZ, which keeps track of these major algorithm changes by tracking changes in results for a set of queries with their ‘Rank Tracker’ tool, community submissions and updates reported by Google. Table adapted from Weltevrede, Esther (2016). Repurposing digital methods. The research affordances of platforms and engines. Ph.D. Dissertation, Amsterdam, NL: University of Amsterdam (pp 120).

year update name update type key Google algorithm change
2003 Boston Anti-manipulation / Quality More emphasis on quality back-links
2003 Cassandra Anti-manipulation / Quality Cracking down on link-quality issues, such as co-linking from domains, hidden text & hidden links
2003 Dominic Anti-manipulation / Quality Improving on counting and reporting backlinks
2003 Emeralda Infrastructure Improvements on the index infrastructure
2003 Fritz Infrastructure Improvements on the index infrastructure
2003 Supplemental Index Anti-manipulation / Quality Update splitting off results of lesser quality into the "supplemental index"
2003 Florida Anti-manipulation / Quality Crack-down on low-value late 90s SEO tactics, like keyword stuffing
2004 Austin Anti-manipulation / Quality Crack-down on SEO-tactics, inc. deceptive on-page tactics, including invisible text and META-tag stuffing
2004 Brandy Semantic / Query Latent Semantic Indexing (LSI), anchor text relevance, synonyms and keywords, intro idea of link "neighbourhoods"
2005 Allegra Anti-manipulation / Quality Crack-down on suspicious-looking links
2005 Bourbon Anti-manipulation / Quality Improvements in how duplicate content and non-canonical (www vs. non-www) URLs were treated
2005 Personalized Search Personalization / Social Results take user's search histories into account
2005 Jagger Anti-manipulation / Quality Crack-down on low-quality links, including reciprocal links, link farms, and paid links
2005 Google Local/Maps Local Maps data is integrated into the Local Business Center
2005 Big Daddy Infrastructure Infrastructure update changing the way URL canonicalization, redirects a.o. technical issues are handled
2006 Supplemental Update Anti-manipulation / Quality Change to the supplemental index and how filtered pages were treated
2007 Universal Search Universal Integrating traditional search results with News, Video, Images, Local, and other verticals
2007 Buffy Semantic / Query Update to single-word search results and other small changes
2008 Dewey Universal Unspecified update to the index, reportedly pushing Google's own internal properties, including Google Books
2008 Google Suggest Semantic / Query Update displaying suggested searches in a dropdown below the search box and later powering Instant
2009 Vince Trust Big brands get a boost in the results
2009 Real-time Search Real-time / freshness Twitter feeds, Google News, newly indexed content, a.o. were integrated into a real-time feed on some SERPs
2010 Google Places Local "Places" originally only a part of Google Maps was now integrated more closely with local search results
2010 May Day Anti-manipulation / Quality Crack-down on low-quality pages ranking highly for long-tail searches
2010 Caffeine Real-time / Freshness Launch of new web indexing infrastructure resulting in a 50% fresher index
2010 Brand Update Trust Same domains are allowed to appear multiple times on a SERP
2010 Google Instant Semantic / Query Displaying search results as a query was being typed
2010 Social Signals Personalization / Social Social signals are included in determining ranking, including data from Twitter and Facebook
2010 Negative Reviews Trust Update to ranking based on negative reviews
2011 Panda Anti-manipulation / Quality Crack-down on thin content, content farms, sites with high ad-to-content ratios, and a number of other quality issues
2011 Freshness Update Real-time / Freshness Update primarily affecting time-sensitive results signaling a much stronger focus on recent content
2012 Search + Your World Personalization / Social Update pushing Google+ social data and user profiles into SERPs
2012 Venice Local More localized organic results and more tightly integrate local search data
2012 Penguin Anti-manipulation / Quality Crack-down on spam factors, including keyword stuffing and link schemes
2012 Knowledge Graph Semantic / Query Rolling out a SERP-integrated display providing supplemental object about certain people, places, and things
2012 Exact-Match Domain (EMD) Update Anti-manipulation / Quality Crack-down on low quality websites that have search terms in their domain names
2012 DMCA Penalty ("Pirate") Anti-piracy Crack-down on software and digital media piracy
2013 In-depth Articles Universal New type of result, dedicated to more ever-green, long-form content
2013 Hummingbird Semantic / Query Core algorithm update that powers changes to semantic search and the Knowledge Graph
2014 Payday Loan Anti-manipulation / Quality Crack-down on spammy queries
2014 Pigeon Local Altering local results and modifying how location cues are handled, creating closer ties between the local and core algorithm(s)
2014 HTTPS/SSL Update Trust Giving preference to secure sites
2014 Authorship Removed Trust Authorship bylines disappearing from all SERPs
2014 "In The News" Box Universal Change to News-box results expanding news links to a much larger set of potential sites
2014 Pirate 2.0 Anti-piracy Crack-down on software and digital media privacy
2015 Mobile Update / "Mobilegeddon" Mobile Mobile friendliness becomes a stronger ranking factor for mobile searches
2015 The Quality Update Anti-manipulation / Quality Core algorithm change impacting "quality signals"

Other useful observations

Period of observance Observation Reference / example Google service
2002 - 2012 The maximum amount of results served by Google is 1000. In this example query Google indicates it has indexed about 226,000,000 results, while one can not click beyond result 874 http://www.google.com/search?q=Things+one+should+know+about+google&hl=en&client=firefox-a&rls=org.mozilla:en-US:official&hs=6bK&num=100&start=900&sa=N all
2008 - 2012 Google Trends is based on 'sucessful queries.' How does Google Trends for Websites work? When you enter the address of a website into the search box, Trends for Websites shows you a graph reflecting the number of daily unique visitors (the number of people who visit a website) to that website. http://www.google.com/intl/en/trends/websites/help/index.html#1 trends
2004 - 2012 Screen scraping Google might get you blocked. DMI Google scraping experience all
2004 - 2012 the US version of Google (google.com) returns the most "international" results you can also go to http://google.com/ncr (No country redirect) google web search
2004 - 2012 cheat sheet / search operators http://www.google.com/help/cheatsheet.html google web search
2004 - 2012 cheat sheet / search operators http://jwebnet.net/advancedgooglesearch.html google web search
2004 - 2012 cheat sheet / search operators http://www.internettutorials.net/boolean.asp google web search
2012 cheat sheet / URL parameters http://code.google.com/intl/en/apis/customsearch/docs/xml_results.html
- 2012 different results are returned when one is logged in   all
2002-2012 the maximum nr of results returned by Google per query = 100 add &num=100 to the URI all
Topic attachments
I Attachment Action Size Date Who Comment
Updates_timeline_.pdfpdf Updates_timeline_.pdf manage 98 K 06 Jan 2016 - 10:39 ErikBorra Google Algorithm 'Change Types'
Topic revision: r17 - 08 Jan 2016, ErikBorra
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback