DMI Tool Wish List

Facebook Group: Links Scraper

Fortunately Facebook groups have a separate page for links. We would like all the 664 links from this group:

http://www.facebook.com/posted.php?id=222337082312#!/posted.php?id=222337082312&start=0&hash=64841c5a5cc79ad7b0ed57c3fad46d1b

thank you wink

we want to retain all links for analysis. the file will likely include facebook links but we want to keep them as the group will refer to other facebook groups internally.

we are compiling a linklist we will later want to exclude from the analysis:

http://www.facebook.com/ajax/intl/language_dialog.php
http://www.facebook.com/facebook
http://www.facebook.com/campaign/landing.php
http://developers.facebook.com/
http://www.facebook.com/privacy/explanation.php
http://www.facebook.com/terms.php
http://www.facebook.com/help/
http://www.facebook.com/posted.php
http://www.facebook.com/ajax/share_dialog.php
http://www.facebook.com/ajax/report.php
http://www.facebook.com/?ref=home
http://www.facebook.com/?ref=logo
http://www.facebook.com/?sk=messages&
http://www.facebook.com/editaccount.php?ref=mb&
http://www.facebook.com/jvssk*
http://www.facebook.com/notifications.php
http://www.facebook.com/profile.php?id=*
http://www.facebook.com/reqs.php
http://www.facebook.com/search/
http://www.facebook.com/share_options.php
http://www.facebook.com/share_partners.php
http://www.facebook.com/campaign/
http://www.facebook.com/ads/adboard/
http://www.facebook.com/giftshop.php
http://www.facebook.com/pages/?ref=asf

http://www.facebook.com/posted.php?id=222337082312#!/posted.php?id=222337082312&start=0&hash=64841c5a5cc79ad7b0ed57c3fad46d1b&num=100

FIXED: Google Image Scraper

http://tools.issuecrawler.net/beta/googleImages/

  • returns 120 images instead of 100
    • the google image scraper scrapes until it gots at least 100 images. Sometimes thus more.
  • hashtags don't work, eg #G20 does not work but G20 does. It works in Google Images though.
    • now hashtags do work

FIXED: Gephi

  • Experimental export to: GEXF (Gephi) BROKE
  • requires .gephi file instead of .gexf
    • It did not break and you do not need a .gephi file. You need to FIRST start a new project. Then open file, select the .gexf file, and your done. .gephi is a 'saved project' file .gexf is a raw data file.

DONE: Facebook Number of Shares scraper:

http://www.facebook.com/share/

Link to share > Custom URL

Done: Twitter/ Backtype scraper

Enter URL > Get number of tweets

http://www.backtype.com/ ziet er beter uit voor tweets, sommige (oude) zitten niet in topsy.com

Done: Yahoo! Inlink nr of links scraper

Enter URL > Click Inlinks > Select Except from this domain > count nr & export tsv

Done: Harvester

harvester doesn’t recognize: http://j.mp/duhGrf

All your wishes are belong to DMIDevelopment To Do now


Tags:

create new tag
, view all tags
Topic revision: r44 - 12 Aug 2010 - 13:49:23 - Erik Borra