15
Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges Axel Bruns / Jean Burgess ARC Centre of Excellence for Creative Industries and Innovation, Brisbane [email protected] @ snurb_dot_info [email protected] @ jeanburgess http://mappingonlinepublics.net http://cci.edu.au/ Thomas Nicolai / Lars Kirchhoff Sociomantic Labs, Berlin [email protected] / [email protected] Image by campoalto

Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Embed Size (px)

DESCRIPTION

Paper presented at AoIR 2010, Gothenburg, 22 Oct. 2010

Citation preview

Page 1: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Mapping Australian User-Created Content: Methodological,

Technological and Ethical Challenges

Axel Bruns / Jean BurgessARC Centre of Excellence for Creative Industries and Innovation, [email protected] – @[email protected] – @jeanburgesshttp://mappingonlinepublics.net – http://cci.edu.au/

Thomas Nicolai / Lars KirchhoffSociomantic Labs, [email protected] / [email protected] http://sociomantic.com/

Image by campoalto

Page 2: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Project: New Media and Public Communication

• ARC Discovery (2010-12) – A$410.000

• Axel Bruns (CI), Jean Burgess (SRF) – QUT, Brisbane

• Lars Kirchhoff, Thomas Nicolai (PIs) – Sociomantic Labs, Berlin

• Project blog: http://mappingonlinepublics.net/

Year 1 Year 2 Year 3

Social network sources:

· YouTube· Flickr· Twitter· blogs

Research tools:

· network crawler· content scraper· content analysis· network analysis

Research tool development and baseline data

Baseline information:

· data extraction· content creation

statistics· patterns in terms

and themes· baseline social

networking map· interconnections

between social network spaces

Content creation patterns

Changes over time:

· short-term statistics· regular / seasonal

patterns

Cluster profiling:

· common themes / patterns

· lead users

Focus on specific events

Cultural dynamics:

· rapid spread of new ideas

· communication across clusters

· thematic discourse analysis

· relationship with main- stream media coverage

Page 3: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Methodology – Blogs

Analysis

Capture

Identification

Known Blogger

Population / Blog

Directories

Post Statistics and

Embedded Links

Patterns of Activity over

Time

Networks of Interlinkage (short/long

term)

Post Texts

Thematic Clusters and

Keyword Mapping

Page 4: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Analysis – Blogs

•Volume over time

•Comparison across clusters

Patterns of Activity over Time

•Interlinkage between known blogs

•Outlinks to external sources

Networks of Interlinkage (short/long term)

•Keyword analysis by cluster

•Keyword co-occurrence maps

Thematic Clusters and Keyword Mapping

0

200

400

600

800

1000

1200

1400

1600

1800

2000

5.11.2007 12.11.2007 19.11.2007 26.11.2007 3.12.2007 10.12.2007 17.12.2007 24.12.2007 31.12.2007 7.01.2008 14.01.2008 21.01.2008

Posts

Outgoing Links

Page 5: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Blog Network (between known blogs only)(~8500 blogs / 17 July to 25 Aug. 2010 / All page links / Node size: Indegree)

politics food

parenting

arts & crafts

design and style

Page 6: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Methodology – Twitter

Analysis

Capture

Identification

Australian Twitter Users (by Location)

and Their Networks

Tweet Statistics and @Replies

Patterns of Activity over

Time

Networks of @Replies

(short/long term)

Tweet Texts

Keyword /Key PhraseMapping

Page 7: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Analysis – Twitter

•Volume over time

•Keyword frequencies

Patterns of Activity over Time

•Conversation vs. follower network

•Dissemination of RTs vs. @replies

Networks of @Replies(short/long term)

•Keyword analysis over time

•Keyword co-occurrence maps

Keyword / Key Phrase Mapping

Page 8: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Data Processing – Twitter

• Typical data structure (#ausvotes):

Page 9: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Data Processing – Twitter

• Tools:

• Gawk – Scripting tool für CSV processing (open source)

• Excel – Data aggregation, pivot tables and charts

• Leximancer / WordStat – Keyword extraction, co-occurence matrices

• Gephi – Network analysis and visualisation (open source)

# Extract @replies for network visualisation## this script takes a CSV archive of tweets, and reworks it into network data for visualisation## expected data format:# text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type, # geo_coordinates_0,geo_coordinates_1,created_at,time## output format:# from,to,tweet,time,timestamp## the script extracts @replies from tweets, and creates duplicates where multiple @replies are# present in the same tweet - e.g. the tweet "@one @two hello" from user @user results in# @user,@one,"@one @two hello" and @user,@two,"@one @two hello"## Released under Creative Commons (BY, NC, SA) by Axel Bruns - [email protected]

BEGIN {print "from,to,tweet,time,timestamp"

}

/@([A-Za-z0-9_]+)/ {

a=0 do {

match(substr($1, a),/@([A-Za-z0-9_]+)?/,atArray)a=a+atArray[1, "start"]+atArray[1, "length"]

if (atArray[1] != 0) print $3 "," atArray[1] "," $1 "," $12 "," $13

} while(atArray[1, "start"] != 0)

}

# filter.awk - Filter list of tweets## this script takes a CSV or other list of tweets, and removes any lines that don't include RT @username# the script preserves the first line, expecting that it contains header information## script expects command-line argument search={searchcriteria} _before_ the input CSV filename# enclose the search term in quotation marks if it contains any special characters## e.g.: gawk -F , -f filter.awk search="(julia|gillard)" tweets.csv >filteredtweets.csv## expected data format:# CSV or simple list of tweets, line-by-line## output format:# same as above, listing only retweets## Released under Creative Commons (BY, NC, SA) by Axel Bruns - [email protected]

BEGIN { getline print $0

}

tolower($0) ~ search {

print $0

}

Page 10: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

#ausvotes: Overall Activity (17 July – 24 Aug. 2010)

Page 11: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

#ausvotes: Discussion Network17 July to 25 Aug. 2010 / All @replies / Node size: Indegree / Node colours: betweenness centrality)

Page 12: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Keyword Co-Occurrence

Page 13: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

#ausvotes: Mentions of the Leaders (cumulative)

Page 14: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

#ausvotes: Key Themes

Page 15: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges

Challenges

• Twapperkeeper relies on #hashtags

• Problem if #hashtags are inconsistent/unclear

• Follow-on @replies and retweets may not continue to use #hashtags

• May miss early developments – e.g. #hashtag standardisation

• Need to look at overall user activity / Twitter firehose for more comprehensive picture

• Need to track baseline activity to understand how exceptional acute events are

• Ethical considerations:

• Using only publicly available data (no protected tweets, no firewalled blogs)

• But technical publicness not enough – ‘publicly available’ ≠ ‘meant to be public’

• No easy answers – #hashtags probably indicate intention to be public, but may not

• Need to consider data storage and publication carefully, too

• See more at mappingonlinepublics.net – up next: time-based animations...

• Or find us at @snurb_dot_info and @jeanburgess