Facebook Scraper

To get data from Facebook.com, a scraper has been built which currently can be found at: http://marijn.digitalmethods.net/scrapers/facebook.php The tool takes two input values, one for max (max number of pages to retrieve) and one for the search query (querying all instances in Facebook.com). The scraper logs in to a newly created Facebook.com account and scrapes the front-page links of each found instance and one iteration on the specific 'links" section.

The Facebook search returns results for people, pages, groups, events and applications. Please note that applications will generally not return any links and therefore will not return in the results. It is not possible to access people's walls or discussion pages, until you are their 'friend,'.

The facebook scraper outputs a list of URLs per returned category (i.e. groups, events) both from the front page as one iteration into the specific 'links' section (returning max 10 links). The URLs are harvested using the harvestURL tool [[http://tools.issuecrawler.net/beta/harvestUrls/) and using the same tool, the domains of these URLs are gatheres to be able to count number of URLs per domain.

As the Facebook scraper returns a list of URLs per queried facebook category, the 'category URLs' (i.e. http://www.facebook.com/group.php?gid=112272896773) will be in included in the results but these should be ignored in the analysis.
Topic revision: r1 - 03 Jul 2009, issuecrawler14
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback