Group 1
Team
Steps
- Source Distances
- query issues and subissues in google, get top 10 results (= rank sources for (sub)issue)
- query each of the (sub)issues in combination with those sources the blog and the news sphere
- compare google rank with frequency of mentionings in news and blog sphere
- Probs:
- it is not possible to search for a link (source) and a keyword in any of technorati, blogpulse, and news.google.com
- only if the full link is written as text in the article the search engine will find it
- in Technorati and Blogpulse however, it is possible to search only for links. This changes the research question somewhat. We now cannot look for (sub)issues in combination with a link. We can however see how many times a url was linked to from the blogosphere. Thus, we can still compare ranks with google, but not for a specific subissue.
- Findings
- See if the news and blog sphere ranks the sources (by frequency of mentionings) in the same way as google. Is google following what the news and blog sphere says?
Received critique
- Q: Talking del.icio.us as issue discovery tool skews your results
A: We could have taken any issue discovery tool / method. We just want to use our tools. We actually have two methods: issue discovery / issue ranking (frequency of bookmarks of subissues vs frequency of mentionings of subissues)
- Q: Query del.icio.us for issue + subissue, get top 10 of most bookmarked urls, rank them according to nr of times bookmarked. Compare that with the top 10 sources of google
A: good idea, let's see if we have enough time to do it
Results
Issue Discovery
- Delicious results, top 10 on 1/7/7 after leaving out too generic results (like climate)
- carbon (33)
- energy (29)
- gore (21)
- co2 (18)
- bush (18)
- diy (16)
- emissions (13)
- efficiency (10)
- sustainability (10)
- pollution (8)
- ice (8)
- technorati charts
- note on tool: we want to be able to enter 10 subissues and 1 main issue
Rank of (sub) issues in del.icio.us and technorati
| technorati rank |
delicious rank |
issue |
chart in technorati of last 365 days |
| |
|
"global warming" |
|
| 1 |
3 |
"global warming" + gore |
|
| 2 |
2 |
"global warming" + energy |
|
| 3 |
7 |
"global warming" + emissions |
|
| 4 |
11 |
"global warming" + ice |
|
| 5 |
1 |
"global warming" + carbon |
|
| 6 |
5 |
"global warming" + bush |
|
| 7 |
10 |
"global warming" + pollution |
|
| 8 |
4 |
"global warming" + co2 |
|
| 9 |
8 |
"global warming" + efficiency |
|
| 10 |
9 |
"global warming" + sustainability |
|
| 11 |
6 |
"global warming" + diy |
|
Rank of sources in different devices
- google
- results (1/7/7, default settings):
- google news
- query 'issue + subissue + url'
- get nr of results
- results for us news, default settings, scrape on 050707 (1 months news), all (sub)issues + all sources for those issues.
- preliminary findings:
- Only a small part of the sources is mentioned in the news from the past 3 months.
- Only 2 images are found for all (120) queries
- technorati
- query 'issue + subissue + url'
- get nr of results
- source_distance_exercise_wg1_technorati_results.xls: Results of technorati scrape on 05-07-07, with default settings (but further specified for authority)
This file contains the results of the technorati scrape. Each source has been queried in Technorati. Then the source was queried in combination with the issue and the subissue. The percentage of blogposts that reference the source in combination with the issue and subissue is also depicted. The queries have all been done for blogs with any authority, blogs with a little authority, blogs with some authority, and blogs with a lot of authority.
- preliminary findings (didn't have time to actually look at it yet, but ...)
- wikipedia is popular
- gore is mentioned a lot in combination with his site
- compare google rank with ranking of nr of mentionings in blog and news sphere (todo)
Todo
- look of what kind (gov, movie, ...) the few sites mentioned in the news are. Take care to note that googlenews doesn't find hyper links but only text - so only if the source was written in plain text or as anchor text like "gore's site is climatecrisis.net"
- look of what kind (gov, person, encyclopedia, ... ) the few sites mentioned in the blogs are. Technorati does find hyperlinks, in contrast with news.google.com
"global warming"
"global warming" + carbon
"global warming" + energy
"global warming" + gore
"global warming" + co2
"global warming" + bush
"global warming" + diy
"global warming" + emissions
"global warming" + efficiency
"global warming" + sustainability
"global warming" + pollution
"global warming" + ice
Cross-Spherical Analyzer
Onderzoek tot nu toe
Allereerst is er onderzocht wat de 10 belangrijkste subissues waren m.b.t global warming. Dit is gedaan door Technorati te scrapen. Hierna is er per subissue gekeken wat volgens Google de 10 belangrijkste links waren. Hierna de vraag hoevaak deze links voor komen in blogposts en nieuwsposts.
Belangrijk op te merken is dat Google nieuws geen hyperlinks gebruikt maar alleen tekst. Dit is volgens ons de reden dat heel weinig overeenkomsten te vinden zijn tussen Google's top 10 van links en de gevonden nieuws artikelen. (Scrapen van de blogposts op de to do list.) Het webadres moet volledig in het artikel voorkomen wat tevens de reden kan zijn dat er zo weinig artikelen gevonden zijn.
Nieuws linkt weinig. Deze bevinding heeft belangrijke implicaties en zou in een nieuwe tool gebruikt moeten worden, willen er conclusies gevonden kunnen worden welke sfeer wie volgt.
Wat zeggen de aanwezige links in beide sferen over de organisaties, waarom hebben zij wel deze links? Goede marketing, meer wetenschappellijke artikelen etc.? Tot nu toe is het enige wat we erover kunnen zeggen het volgende: Links naar hosts zijn de enige returns in de nieuwsartikelen. Deep links worden niet gebruikt en komen hierdoor ook niet voor in de ranking. Wat we hard kunnen maken is dat Al Gore nummer 1 op Google Page Rank staat en in 3 artikelen, dus dat zijn link goed vertegenwoordigd is, in vergelijking met de andere links.
Te doen
In nieuwsartikelen naar global warming zoeken en kijken wat de top 10 aan sources hierbij zijn.
Wat wordt er in de blogosphere over vermeld? Ook beide kanten op.
Cross-Spherical Analyzer
Deze tool vergelijkt per informatiesfeer voor een bepaalde issue de sources die overeenkomen.
- Datum invoer coördineren tussen verschillende tools om niet appels met peren te gaan vergelijken
- Tegelijkertijd beide kanten op analyseren
Tags:
,
view all tags
Topic revision: r9 - 31 Aug 2007 - 12:03:20 -
Erik Borra