Item19: Optimize harvester to remove duplicates

Priority: CurrentState: AppliesTo: Component: WaitingFor:
Enhancement Closed frontend   Erik Borra

Details

The harvester currently does not distinguish between urls or urls with a trailing slash. It also does not distinguish between www.host or host.

Solution:

  • strip http:// and slash at end. Remove www too, compare what's left. Put www back if it existed. Put http:// in front

-- Erik Borra - 28 Feb 2008

Decided to not strip www, did the rest.

-- Erik Borra - 09 Apr 2008

ItemTemplate
Summary Optimize harvester to remove duplicates
ReportedBy Erik Borra
AppliesTo frontend
Priority Enhancement
CurrentState Closed
WaitingFor Erik Borra
Topic revision: r2 - 09 Apr 2008 - 13:03:42 - Erik Borra