Wikipedia TOC Scraper

Scrape Table of Contents for revisions of a wikipedia page and explore the results by moving a slider to browse across chronologically ordered TOCs.


The tool allows one to enter a page title and language into the input field. The scraper will retrieve a list of all revisions of the given page and language, taking into account page title normalization as well as possible redirects. By default, a maximum of 500 TOC revisions will be retrieved, but this can be set to a higher or lower number (where specifying 0 will retrieve all revisions. Not all revisions may be retrievable, for example deleted revisions are usually protected from public viewing on Wikipedia. Scraping large pages with many revisions may take a while. Wikipedia needs to parse each individual revision to render the TOC, which is a more difficult task for larger pages. The scraped TOC's are stored on the DMI servers to provide cached versions. When re-running a scrape for the same page and language, the process is much faster. In case the scraper aborts with an error, refresh the tool page and start with the same input. The scraper will continue retrieving revisions after going through the previously cached versions.

Topic revision: r3 - 17 Dec 2015, UnknownUser
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Send feedback