Input text in the harvester to extract URLs.

Tip: On a website, view source. Copy and paste source code into harvester in order to extract the URLs (or embedded links).

Tip: To harvest the results of a Google query open it in Firefox, select the results you want to rip the links from, right-click the selection and click 'View Selection Source'. Now paste this into the harvester. To extract only the URLs from the results, choose the setting 'only return uniques' as well as 'Exclude URLs from Google and Youtube '. To extract only the hosts from the results, choose the previous two as well as 'only return hosts'. Note that in its search results Google also includes links to a site's categories etc. If you would only like to extract the links to the specific search results, you can better use the Google Scraper, leaving the top URL box empty.

This tool will only recognize hyperlinks which start with http:// or https:// or www. You might also try the Link Ripper Tool which extracts the hyperlinks (href) from a set of URLs.
Topic attachments
I Attachment Action Size Date Who Comment
Picture_7.pngpng Picture_7.png manage 8 K 19 Dec 2008 - 13:25 Main.issuecrawler14  
Picture_9.pngpng Picture_9.png manage 82 K 19 Dec 2008 - 13:27 Main.issuecrawler14  
viewsource.jpgjpg viewsource.jpg manage 40 K 19 Dec 2008 - 13:32 Main.issuecrawler14  
Topic revision: r8 - 24 Oct 2013, ErikBorra
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback