Upload
axel-bruns
View
1.695
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Paper presented at AoIR 2010, Gothenburg, 22 Oct. 2010
Citation preview
Mapping Australian User-Created Content: Methodological,
Technological and Ethical Challenges
Axel Bruns / Jean BurgessARC Centre of Excellence for Creative Industries and Innovation, [email protected] – @[email protected] – @jeanburgesshttp://mappingonlinepublics.net – http://cci.edu.au/
Thomas Nicolai / Lars KirchhoffSociomantic Labs, [email protected] / [email protected] http://sociomantic.com/
Image by campoalto
Project: New Media and Public Communication
• ARC Discovery (2010-12) – A$410.000
• Axel Bruns (CI), Jean Burgess (SRF) – QUT, Brisbane
• Lars Kirchhoff, Thomas Nicolai (PIs) – Sociomantic Labs, Berlin
• Project blog: http://mappingonlinepublics.net/
Year 1 Year 2 Year 3
Social network sources:
· YouTube· Flickr· Twitter· blogs
Research tools:
· network crawler· content scraper· content analysis· network analysis
Research tool development and baseline data
Baseline information:
· data extraction· content creation
statistics· patterns in terms
and themes· baseline social
networking map· interconnections
between social network spaces
Content creation patterns
Changes over time:
· short-term statistics· regular / seasonal
patterns
Cluster profiling:
· common themes / patterns
· lead users
Focus on specific events
Cultural dynamics:
· rapid spread of new ideas
· communication across clusters
· thematic discourse analysis
· relationship with main- stream media coverage
Methodology – Blogs
Analysis
Capture
Identification
Known Blogger
Population / Blog
Directories
Post Statistics and
Embedded Links
Patterns of Activity over
Time
Networks of Interlinkage (short/long
term)
Post Texts
Thematic Clusters and
Keyword Mapping
Analysis – Blogs
•Volume over time
•Comparison across clusters
Patterns of Activity over Time
•Interlinkage between known blogs
•Outlinks to external sources
Networks of Interlinkage (short/long term)
•Keyword analysis by cluster
•Keyword co-occurrence maps
Thematic Clusters and Keyword Mapping
0
200
400
600
800
1000
1200
1400
1600
1800
2000
5.11.2007 12.11.2007 19.11.2007 26.11.2007 3.12.2007 10.12.2007 17.12.2007 24.12.2007 31.12.2007 7.01.2008 14.01.2008 21.01.2008
Posts
Outgoing Links
Blog Network (between known blogs only)(~8500 blogs / 17 July to 25 Aug. 2010 / All page links / Node size: Indegree)
politics food
parenting
arts & crafts
design and style
Methodology – Twitter
Analysis
Capture
Identification
Australian Twitter Users (by Location)
and Their Networks
Tweet Statistics and @Replies
Patterns of Activity over
Time
Networks of @Replies
(short/long term)
Tweet Texts
Keyword /Key PhraseMapping
Analysis – Twitter
•Volume over time
•Keyword frequencies
Patterns of Activity over Time
•Conversation vs. follower network
•Dissemination of RTs vs. @replies
Networks of @Replies(short/long term)
•Keyword analysis over time
•Keyword co-occurrence maps
Keyword / Key Phrase Mapping
Data Processing – Twitter
• Typical data structure (#ausvotes):
Data Processing – Twitter
• Tools:
• Gawk – Scripting tool für CSV processing (open source)
• Excel – Data aggregation, pivot tables and charts
• Leximancer / WordStat – Keyword extraction, co-occurence matrices
• Gephi – Network analysis and visualisation (open source)
# Extract @replies for network visualisation## this script takes a CSV archive of tweets, and reworks it into network data for visualisation## expected data format:# text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type, # geo_coordinates_0,geo_coordinates_1,created_at,time## output format:# from,to,tweet,time,timestamp## the script extracts @replies from tweets, and creates duplicates where multiple @replies are# present in the same tweet - e.g. the tweet "@one @two hello" from user @user results in# @user,@one,"@one @two hello" and @user,@two,"@one @two hello"## Released under Creative Commons (BY, NC, SA) by Axel Bruns - [email protected]
BEGIN {print "from,to,tweet,time,timestamp"
}
/@([A-Za-z0-9_]+)/ {
a=0 do {
match(substr($1, a),/@([A-Za-z0-9_]+)?/,atArray)a=a+atArray[1, "start"]+atArray[1, "length"]
if (atArray[1] != 0) print $3 "," atArray[1] "," $1 "," $12 "," $13
} while(atArray[1, "start"] != 0)
}
# filter.awk - Filter list of tweets## this script takes a CSV or other list of tweets, and removes any lines that don't include RT @username# the script preserves the first line, expecting that it contains header information## script expects command-line argument search={searchcriteria} _before_ the input CSV filename# enclose the search term in quotation marks if it contains any special characters## e.g.: gawk -F , -f filter.awk search="(julia|gillard)" tweets.csv >filteredtweets.csv## expected data format:# CSV or simple list of tweets, line-by-line## output format:# same as above, listing only retweets## Released under Creative Commons (BY, NC, SA) by Axel Bruns - [email protected]
BEGIN { getline print $0
}
tolower($0) ~ search {
print $0
}
#ausvotes: Overall Activity (17 July – 24 Aug. 2010)
#ausvotes: Discussion Network17 July to 25 Aug. 2010 / All @replies / Node size: Indegree / Node colours: betweenness centrality)
Keyword Co-Occurrence
#ausvotes: Mentions of the Leaders (cumulative)
#ausvotes: Key Themes
Challenges
• Twapperkeeper relies on #hashtags
• Problem if #hashtags are inconsistent/unclear
• Follow-on @replies and retweets may not continue to use #hashtags
• May miss early developments – e.g. #hashtag standardisation
• Need to look at overall user activity / Twitter firehose for more comprehensive picture
• Need to track baseline activity to understand how exceptional acute events are
• Ethical considerations:
• Using only publicly available data (no protected tweets, no firewalled blogs)
• But technical publicness not enough – ‘publicly available’ ≠ ‘meant to be public’
• No easy answers – #hashtags probably indicate intention to be public, but may not
• Need to consider data storage and publication carefully, too
• See more at mappingonlinepublics.net – up next: time-based animations...
• Or find us at @snurb_dot_info and @jeanburgess