Extract URLs from text, source code or search engine results. Produces a clean list of URLs.
Instructions
Input text in the harvester to extract URLs.
Tip: On a website, view source. Copy and paste source code into harvester in order to extract the URLs (or embedded links).
Tip: Copy and paste the results of a Google query into the harvester, and extract the URLs only. Choose the setting unique hosts to remove duplicates.
Sample project
Project: Extract URLs from the Daily Kos blogroll
Go to dailykos.com
View page source (in Firefox, choose View>Page Source or press ctrl-u)
In the page source, find the relevant text under blogroll
Copy and paste into the Harvester, outputting a list of URLs ready for further analysis, e.g. using the Issuecrawler