Item44: crawl within a set of site, when all links of each site in the set found, see if they interlink.

Priority: CurrentState: AppliesTo: Component: WaitingFor:
Urgent Closed crawler    

Details

crawl within a set of site, when all links of each site in the set found, see if they interlink.

basically this is a snowball WITHIN a set of sites until there are no more links findable WITHIN the sites. Then see if the sites in the sets interlink and draw a cluster map from it.

Or Input URLs, find URLs' outlinks (3 deep), map interlinkings between inputted actors only.

Specification

Input URLs, find URLs' outlinks (3 deep), map interlinkings between inputted actors only.

Notes (gmc)

Basically, create Issue Crawler? .shouldVisit(), reject any links that are outside the sp's.

Do one iterations (= 0 iterations in frontend language), check what final iter does.

The Plan:

  1. Run a test on the devel crawler with test1.issuecrawler.net and test2.issuecrawler.net as input
  2. Add the shouldVisit() method
    • Does class Issue Crawler? know the starting points?
  3. Run another test on the devel crawler

ItemTemplate
Summary crawl within a set of site, when all links of each site in the set found, see if they interlink.
ReportedBy Erik Borra
AppliesTo crawler
Priority Urgent
CurrentState Closed
WaitingFor

Topic revision: r4 - 30 Sep 2009 - 09:54:09 - Erik Borra