Discus Comment Scraper


This tool scrapes threads and comments from websites implementing the Disqus commenting system.
 

Instructions

Paste a list of URL's where the Disqus comments can be found into the form and press "Scrape comments". Please be patient as the processing may take a while.

Output format

The currently supported output format is a tab separated CSV. This filetype is easily readable by Libre Office. Every row in the CSV will be a single comment. The fields in the CSV indicate: url where the comment is found, thread where the comment is found, message content, author information and possibly a list of attached media elements.

Anonymous comments are recognised by the missing author id and most of the other author data fields will be empty as well.

Comment and replies

If the website supports threaded discussions, the hierarchical comment relations can be found in the post_id and post_parent_id fields in the CSV file. Using other tools, such as table2net, a network graph could be produced. In the future this tool may automate that process.

Data unavailable?

The tool attempts to detect a unique Disqus shortname, i.e. the forum identifier, for every host in the URL list. Main website names (such as www.timesofmalta.com) are assumed to have a single such identifier.

If the tool cannot find the identifier it will fail to gather data and all URLs of this website will get a 'data unavailable' mention in the CSV output.

If the tool does not manage to find specific posts or threads on a given URL this will also be mentioned in the CSV with a similar label.

Topic revision: r1 - 17 Apr 2014, ErikBorra
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback