One Percent of Twitter ---++ Members Bernhard, Julian, Nikolas, Liliana, Lonneke, Esther, Erik, Carolin. ---++ Introduction This project is concerned with the question of building a Twitter corpus by random sampling vs. thematic collections. Starting from a random sample of 1% of all tweets provided by the Twitter API, we set out to compare such top-down approaches to creating a Twitter corpus with the more common approach of building collections based on hashtags, keywords or sets of users.

Focusing on the random sample, we also seek to explore the state of such firehose-timeline, tracing the status and average practices of Twitter usage. ---++ Research Questions What are the characteristics of the firehose-timeline? How to work with a random sample and what are its affordances for research? * Who are the most mentioned users in this sample and how can they be categorised? * Which are the most mentioned hashtags and how can they be categorised? * What is the geographical distribution of tweets? * What is the language distribution of users? * How do hashtags co-occur together?

Methodology

Using the Twitter streaming API, we collect a set of 1% tweets starting on 24 January 2013 at 19:28:57 and ending on 25 January 2013 at 20:26:16 (1 day, 0 hours, 57 minutes and 19 seconds). The total number of tweets in the sample is 4,577,401. The sample contains 2,849,881 different user accounts. 12.2% of tweets contain a URL and 13.27% have a hashtag.

After identifying the most used hashtags, we categorised them according to their content or objective. Following from here, we explore the associational profiles of the following hashtags...
Similarly, we identified the most mentioned users and categorised them according to their Twitter usage.

Furthermore, we determined the most used languages during that day.

Methodology

Using the Twitter streaming API, we collect a set of 1% tweets starting on 24 January 2013 at 19:28:57 and ending on 25 January 2013 at 20:26:16 (1 day, 0 hours, 57 minutes and 19 seconds). The total number of tweets in the sample is 4,577,401. The sample contains 2,849,881 different user accounts. 12.2% of tweets contain a URL and 13.27% have a hashtag.

Findings

The most mentioned users in this sample are (1) user-generated news, (2) celebrities, (3) spam, (4) e-celebrities, and (5) media organisations and user-generated content platforms.

The most mentioned hashtags in this sample are (1) followbacks and retweets, (2) memes, (3) status updates and comments, (4) topics and (5) celebrities.

This topic: Dmi > Winter13OnePercentOfTwitter
Topic revision: 25 Jan 2013, CarolinGerlitz
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback