Issue Discovery Tool


Enter URLs, and discover the most relevant words and phrases contained in them. One also may enter text, or an Issuecrawler result file (.xml). Based on Yahoo Term Extraction
 

Instructions

For each document - whether it be a page from an Issue Crawler network or text submitted by the user - the Issue Discovery Tool does the following:

  1. Make a phrase list of noun phrases and Capitalized Sequences (Resulting in a list of Proper Nouns, Acronyms, ...)
  2. Add to the phrase list a list of significant words or phrases extracted from a larger source set of content by using the Yahoo Term Extraction Web Service
  3. Output is adjusted as follows:
  • Lowercase all phrases in the list (for easy comparison)
  • Remove phrases that have a length less than 3
  • Weight each phrase found in the previous steps as follows: Count the number of times the phrase appears in the document. If the phrase comes from Yahoo add 1 to the previous count (This favors Yahoo's presumed robustness). If the phrase does not come from Yahoo but if there are multiple terms in the phrase, add 2 to the previous count. (This assumes preference for multiple terms to single terms, if they did not come from Yahoo).
  • Remove phrases that are on the stop word list.
  • Remove phrases that are also part of a longer phrase in the list.
  • Sum the weight of all phrases obtained from all documents into one large list.
  • Rank the list.

The Issue Discovery tool is not designed to 'give proper weight' to items. It is more a heuristic, a data exploration tool rather than an empirical tool.

Other projects using this tool

Personalized Search
Personalized Search Experiment Protocol Purpose Nicholas Negroponte is talking in The Daily Me (1995) about the decline of the shared experience and the decline...

Topic revision: r4 - 01 Dec 2009, ErikBorra
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding digitalmethods.net? Send feedback