Extract URLs from text, source code or search engine results. Produces a clean list of URLs.
Instructions
Input text in the harvester to extract URLs.
Tip: On a website, view source. Copy and paste source code into harvester in order to extract the URLs (or embedded links).
Tip: For the results of a Google query, view source and copy and paste the source code into the harvester. To extract only the URLs from the results, choose the setting 'only return uniques' as well as 'Exclude URLs from Google and Youtube '. To extract only the hosts from the results, choose the previous two as well as 'only return hosts'.
Sample project
Project: Extract URLs from the Daily Kos blogroll
Go to dailykos.com
View page source (in Firefox, choose View>Page Source or press ctrl-u)
In the page source, find the relevant text under blogroll
Copy and paste into the Harvester, outputting a list of URLs ready for further analysis, e.g. using the Issuecrawler