14
Network Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theußl 7/10/2009

Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

  • Upload
    lyngoc

  • View
    222

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Network Text Analysis of R Mailing ListsUseR! Rennes 2009

Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair,Stefan Theußl

7/10/2009

Page 2: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

A mailing list social network

R-help mailing list:

Jan 2008 to May 2009

Number of authors: 5326

Number of mails: 41457

Avg. degree: 4.4

Diameter: 7

Legend:

Author A Author Banswered

Page 3: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Combine SNA and TM

I Goal: Combine social network analysis (SNA) and text mining(TM) to find out more

I Data: Mailing lists R-help and R-devel

I Packages: sna and tmI Results:

I “Interest maps” of R usersI Detection of bottlenecks in communication

Page 4: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Data preparation for social network analysis

I Create a social network from e-mail headers (tm):From: dwinsemius at comcast.net (David Winsemius)

Date: Thu, 30 Apr 2009 18:49:55 -0400

Subject: [R] Extracting Element from S4 objects

In-Reply-To: <[email protected]>

Message-ID: <[email protected]>

Author A: ''Hallo, I have a question.''

Author B: ''This is the answer.''

Author C: ''This, too.''

Author D: ''And this.'' Author D: ''And this.''

Author A: ''Thank you.''

Author B

Author C Author D

Author A

I Find aliases:knoblauch at lyon.inserm.fr (knoblauch)

knoblauch at lyon.inserm.fr (Ken Knoblauch)

ken.knoblauch at inserm.fr (Kenneth Knoblauch)

ken.knoblauch at inserm.fr (Ken Knoblauch)

Levensthein Distance:agrep(base)

Page 5: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Data preparation for text miningE-mail subjects:[R] passing args from the command line

[R] navigating ggplot viewports

[R] how to go to a line in R

...

Term Frequencies (termFreq(tm)):

question

datahelp

function

plotusing

sub

package

matrix

error

values

framevector

list

file

regression

test

model

multiple

issue

font

time

variable

looptable

lattice

fedora

value

renderingkde

plasma

plotting

plots

axis

read

column

output

code

object

textvariables

analysis

columns

rows

string

packages

windows

glm

functions

line

anova

simple

row

bug

dataframe

files

creating

distribution

names

random

reading

linux

ggplot2

linear

series

set

date

factor

graph

lines

size

name

logistic

results

sweave

character

histogram

labels

subset

xyplot

create

running

installing

legend

format

message

graphics

numeric

install

lmer

color

correlation

missing

newbie

models

objects

extracting

latex

array

convert

summary

vectors

adding

mean

frames

pdf

nls

zoo

lmesas

finding

errors

write

questions

mixed

subject

confidence

standard

ubuntu

argument

calculatingfitting

memory

apply

add

command

image

matrices

normal

version

based

matching

converting

elements

sample

barplot

grid

factors

graphs

method

library

repeated

sum

panel

predict

programming

density

getting

boxplot

effects

change

csv

type

fit

mac

calculate

coefficients

pkgs

class

curve

save

system

extract

arguments

optimization

difference

expression

script

simulation

distance

excel

equivalentspss

rgl

changing

design

matlab

operator

binary

loading

building

survival

title

combining

log

request

dates

odpsource

device

estimation

formula

variance

levels

loops

selecting

subsetting

nested

generate

remove

tables

means

optim

paste

index

nonlinear

trouble

binomial

length

scale

user

draw

specific

strings

statistics

run

sampling

merge

warning

select

double

reshape

tests

gam

multi

word

call

unique

axes

behavior

intervals

logical

characters

cluster

merging

replace

bar

bit

interval

load

bootstrap

please

sets

documentation

html

tapply

window

aggregate

bars

label

multivariate

vec

likelihood

basic

colors

max

maximum

pca

regular

response

setting

step

writing

ggplot

check

contour

editor

hmisc

comparing

importing

parameter

sortinteraction

power

calling

graphic

lapply

null

count

equation

methods

nlme

conditional

installation

sequence

update

chart

inside

level

box

dataframes

empty

rodbc

search

split

symbols

cross

generating

pairs

parameters

sorting

testing

xml

negative

stats

access

dataset

default

limit

lists

manipulation

map

print

single

specify

splitting

display

square

distributions

result

datasets

integer

rpart

syntaxarima

combine

fwd

gui

sem

true

unable

replacing

limits

lme4

odfweave

passing

removing

selection

symbol

weighted

average

cmd

key

prediction

unix

zero

clustering

false

classes

computing

cor

directory

estimates

position

aovefficient

expressions

interpreting

measures

processing

comparison

element

frequency

querysolve

strange

barchart

par

sig

advice

arrays

categorical

course

png

statistical

trellis

words

build

fortran

sql

accessing

behaviour

cran

figure

program

quantile

range

regarding

statement

fill

import

instead

parsing

tinn

tree

component

custom

doesn

effect

randomforest

server

algorithm

cox

inverse

median

path

plus

positive

applying

compute

condition

drawing

observations

statements

avoid

batch

efficiency

example

nas

process

rcmdr

alternative

calculation

counting

curves

gamma

howto

looking

partial

passed

poisson

res

return

rmysql

saving

scatter

transform

colours

scatterplot

understanding

fixed

form

grep

heatmap

information

pattern

start

base

complex

coxph

cumulative

moving

printing

significance

smooth

space

survey

allocate

automatically

document

generation

outside

pairwise

regex

solving

via

vista

aic

filter

intercept

levelplot

sink

spatial

tick

txt

according

background

coefficient

compare

covariance

ignore

insert

mode

postscript

precision

smoothing

transformation

conversion

following

forest

history

java

ratio

strsplit

updating

decimal

download

emacs

igraph

lrm

maps

mle

weights

environment

estimate

generalized

input

page

product

chi

constrained

coordinates

family

fast

license

messages

modifying

multiplication

quick

reference

solved

var

book

constraints

control

defined

defining

interactive

interface

options

parallel

surface

utf

classification

database

definite

deviation

existing

exporting

google

match

produce

resolution

slow

stata

usage

assign

calculations

compile

dependent

dimensional

fails

handle

handling

histograms

including

integrate

internal

looping

rmpi

samples

snow

structure

automatic

comments

export

formatting

inconsistency

join

layout

solution

trying

validation

wrong

assigning

equations

faster

finance

greek

include

invalid

link

proper

spec

workspace

console

don

fitted

local

main

post

related

spline

xls

2009

boxplots

containing

contrasts

fits

found

guis

hist

indexing

labeling

measure

mixturenormality

prettyr

scripts

transforming

colnames

combinations

cons

correct

delete

diff

longitudinal

numerical

permutation

permutations

repository

separate

strptime

times

x11

abline

attributes

circular

compiling

connection

dimensions

pros

randomly

scaling

sequences

speed

stepaic

stuff

tcl

warnings

xtable

available

bwplot

cbind

convergence

day

digits

entries

equal

etc

hazard

integers

iso

issuespercentages

rbind

residuals

sens

types

web

weibull

changes

dll

functional

graphical

horizontal

interpretation

intersection

left

logit

mailing

original

parse

perl

previous

recursive

scales

score

shape

singular

svm

terms

transfer

trees

assignment

colour

contingency

difficult

duplicated

fisher

goodness

impossible

integration

naming

ols

panels

simulating

spam

substitute

upgrade

bivariate

displaying

exclude

external

failure

huge

increase

modelling

plotmath

rdata

squares

university

beginner

biplot

comparisons

compilation

crosstable

estimating

filled

improving

kernel

language

meaning

modeling

polynomial

probability

rgui

stepwise

suggestions

try

char

clusters

confusion

created

facet

manipulations

popular

probabilities

recoding

regresion

segments

skip

specifying

starting

strip

structures

versus

xlim

differences

dummy

executing

feature

figures

gee

generic

gsub

loess

margins

outputs

posixct

predicting

reduction

rep

root

stack

unexpected

ancova

auto

avoiding

censored

computation

dendrogram

else

eval

fft

forecasting

hashhoc

identical

individual

manipulate

network

operations

pages

parametric

simulate

site

study

style

subscript

survreg

vertical

accuracy

checking

controlling

covariates

creation

discriminant

duplicate

exponential

finite

fourier

identify

mirror

overdispersion

pass

pre

qplot

r2winbugs

rank

recursively

seq

shapiro

stacked

thanks

truncated

unclassified

8859

acf

ascii

beta

components

constraint

correction

criteria

deleting

distances

downloading

ess

frequencies

gradient

little

neural

plm

sapply

subsets

union

unit

win

behavioural

block

breaks

cat

charts

col

conditionally

contrast

decomposition

duplicates

elegant

factorial

failed

gnu

jri

kruskal

letter

mgcv

multinomial

nesting

operation

osx

overlapping

persp

quotes

rjava

rownames

rprofile

similar

support

training

unequal

analyzing

applied

collection

expand

free

garch

importance

incomplete

installed

iterations

machine

manipulating

mapping

optimize

overlay

proportions

quality

raster

regressionsrelative

risks

scope

solaris

suggestion

tasktcltk

week

wise

100th

advanced

chisq

coords

counts

dimension

encoding

fatal

ifelse

inserting

jpeg

legality

lower

maptools

marks

mfrow

odd

option

pie

plotrix

renaming

retrieving

scores

settings

sign

ticks

timeseries

transferring

trend

urgent

competing

concatenating

days

dealing

difftime

filtering

items

labelling

menu

multidimensional

nnet

paired

prcomp

producing

proportional

quote

searching

significant

tabulation

troubles

tutorial

vegan

weird

wireframe

x86

central

common

concatenate

constructing

copy

correlated

debian

descriptive

devices

division

dynamic

fonts

gaussian

initialize

manual

obtaining

permanent

principal

rbinom

replicate

residual

robust

roc

scatterplot3d

sec

sensitivity

spearman

specification

statistic

storing

suppressing

theorem

tkgraph

topic

wide

2008

atomic

bad

bottom

crash

crashes

daily

diagram

efficiently

expected

gls

R-help

package

bugwindowserror

function

packages

cmdusing

check

data

install

names

code

file

building

plot

files

help

suggestion

matrix

patch

behaviour

class

argument

date

text

true

fails

method

build

functions

factor

frame

library

methods

source

list

version

surprising

incorrect

typo

message

warning

example

rfc

x11

default

documentation

numeric

crashes

devel

false

mac

memory

linux

axis

multiple

object

parallelr

sequence

graphicsfoo

missing

question

add

dll

errors

outputsize

closed

extending

free

segfault

time

wishlist

devicename

behavior

proposed

sprintf

ubuntu

vector

embedded

lines

2009

calling

change

compiling

integer

value

vignettes

crash

intel

match

type

wrong

description

level

loading

summer

table

google

request

call

format

issue path

test

blas

configure

cran

installing

key

license

pdf

write

cut

development

environments

extensions

glm

html

infinite

shlib

view

arguments

binary

checking

compile

fortran

installation

null

update

attributes

posixct

read

rmpi

running

slot

warnings

bad

conflict

minor

random

solaris

vista

alias

changelog

loop

print

returns

13570

base

causes

external

result

tests

anrpackage

api

decreasing

empty

extension

fix

generic

line

loaded

mandatory

results

seed

segfaults

set

sweave

unique

usage

variable

cat

computer

dir

environment

foreign

gui

length

limit

objects

parameter

performance

png

readable

rtools28

safe

status

strange

sub

unix

window

exe

leak

legend

libraries

link

namespace

produces

suggestions

svn

vignette

bindingcollection

describe

don

garbage

grid

java

jpeg

lapack

report

tcltk

x86

buggy

column

console

generate

importing

item

levels

model

nas

operator

optimization

reading

related

reset

return

roxygen

sapply

string

times

writing

10744

adding

bin

called

compilation

debian

definition

digits

dlls

dput

found

image

interactive

interrupting

label

local

programming

qqnorm

saveplot

skeleton

system

zoo

create

cross

custom

distance

enable

examples

index

indexing

invalid

linking

macports

note

particulars

references

regular

streams

subsetting

unknown

values

versions

2008

beta

buglet

cairo

changes

compiler

copy

correction

creates

dates

derivs

drop

embedding

feature

gsub

logicalloosing

negative

postscript

simulate

snow

space

start

style

13671

aix

bit

close

containing

directory

etc

failure

frames

fwd

generated

input

labels

mkl

nls

overflow

par

platform

rgui

spss

uninstall

user

v10

win

adds

aggregate

alpha

analyzer

arima

array

assign

automatically

callcc

cbind

comments

comparison

cook

cpu

finding

freebsd

header

inaccurate

inf

lexical

locale

makefile

messagesnative

osx

reference

row

rprofile

rsitesearch

setdiff

symbol

testing

thread

var

vectors

via

wish

xyplot

zero

11281

9682

access

available

bugs

classes

command

convert

crashy

cycle

dimduplication

dynload

encoding

expressions

fail

faq

forge

handle

harddisk

hardy

heron

idea

interface

intro

issues

lib64

limits

loops

mean

merge

namespaces

news

odesolve

plots

readline

recent

rhel

rng

roracle

search

shortpathname

significant

simple

soc09

stack

13515

anova

application

assigning

automatic

buffer

capturing

character

clash

colour

current

defining

detected

dos

dropping

ebimage

endings

exec

execution

exits

faster

fault

figure

filename

formulae

getclass

inheritance

log10

log2

matching

means

nan

option

overplot

page

proc

program

questions

rbuildignore

readbin

registry

require

rgl

segmentation

serialize

sun

survfit

usr

won

wrapper

xlim

10571

10746

addcomment

based

binomial

breaks

changed

compiled

congiure

definevar dnbinom

doc

engine

equivalent

event

export

extract

failing

fitted

frf2

gfortran

gpl

gram

info

maintained

makevars

matplot

median

modification

mpi

multi

parallel

passing

paste

persists

platforms

posixt

quantile

quartz

rcmdr

remove

rowsums

rscproxy

runs

save

sig

stated

strings

summary

swig

tarball

title

unable

underscore

updating

variables

winxp

10600

11493

12540

13161

actions

actual

alternative

amd64

avoid

bounded

bundle

catch

clean

closure

col

colsums

computing

config

conversion

detectable

dev

differences

downloading

embed

experiments

funtion

gamma

gctorture

getting

git

guide

holtwinters

iconv

ignores

import

inconsistency

installed

javareconf

minimal

patched

pnorm

posting

preserveprogress

r45591

rank

recursive

revision

rjava

rmath

rrors

schemes

scope

script

snapshot

stringsasfactors

thrown

top

upper

utf

windialogstring

wrapping

10589

13284

48590

accuracy

adj

aka

aliasing

app

assignment

chol2inv

clobbered

coerce

condition

conditionally

constructor

crasher

demo

descriptor

directories

dislikes

display

editor

effect

effects

elements

expression

field

focus

folder

fools

formula

friendly

gdb

getsrclines

gregexpr

hist

incoherent

incorrectly

influences

installer

locating

manual

math2

maximum

mess

miktex

odd

offset

optim

optional

parser

party

perl

please

plotting

popup

posixlt

printed

recommended

repositories

returning

rodbc

routine

rows

rushing

setting

sexp

spaces

splinefun

standalone

storage

third

treatment

trellis

undefined

unicode

upside

util

xtfrm

ylim

zlib

1000

11491

12742

13283

13391

13538

13646 ability

aborting

amount

args

biobase

bizarre

calls

caused

child

choose

clarification

combination

configured

conform

constraint

couldn

dependencies

dependency

dimension

dispatch

docs

doesn

eval

exporting

feasible

fonts

forcing

formatting

giv

hook

identify

ignored

intentionally

internal

lapply

libs

linkingto

listing

literate

lookup

macos

mass

mishandled

net

noncentrality

numerical

openmpi

optimisation

options

outdated

overlap

overriding

overwriting

parent

postinstall

preserved

prod

produced

puzzled

quote

range

recursion

recursions

regions

relative

removing

reproducible

resuts

rin

rnorm

rout

rterm

s4method

section

selection

sessioninfo

sgi

shouldn

signal

sort

split

standard

str

support

suppress

switch

symbolic

tarballs

textconnection

threads

tiff

trivial

unexpected

unsigned

unsorted

valgrind

violation

zip

000

10565

10635

10666

10776

10945

11470

11497

11499

11511

11527

12770

13231

13487

13686

13703

8192

able

action

allocation

arithmetic

attribute

auto

autocompletion

availability

bar

bayesian

bounds

broken

browser

c99

cache

calculating

callbacks

car

cdf

centos

changing

closing

cmp

color

compilers

conflicts

connections

correct

correctly

costs

creating

creation

dead

deep

defined

depending

despite

dimnames

distortion

distributing

dotcall

double

download

drop1

dump

dup

embeding

enhancement

esc

executable

externalptr

extrapolating

filter

fitdistr

fitting

generics

gsoc

handling

heatmap

identifying

ifelse

improvement

including

iqr

irrelevant

itemize

iterator

jri

lend

licensing

links

locales

locate

loses

machine

macintosh

maintainer

manova

matrices

max

merging

messing

mistakenly

models

nlminb

normal

nulls

ops

panel

parsing

pass

passed

perils

polr

poly

predict

preferred

primitive

printing

programs

quality

query

r44608

randomly

raw

rcmd

release

rimage

rmysql

roadmap

rpart

sample

scatter

scoping

scripts

selected

seq

single

smooth

solve

sometimes

specifying

speed

speedup

splus

sslogis

strategy

strsignif

subscripting

suggested

svnversion

symbols

timestamp

tkcanvas

touch

transperancy

unchange

unlist

unserialize

upgrading

weighted

weights

weird

whitespace

writting

xpd

yscrollcommand

10000

10592

10801

10953

11034

11036

11054

11192

11231

11334

11399

11495

11537

12016

12112

12520

12636

12931

13292

13361

13454

13475

13494

13504

13533

13551

13620

13631

13699

13711

2003

2846

accessing

acf

addition

additional

address

ann

antialias

applied

arith

articlebasename

blah

blocking

boxplot

broke

bugfix

builder

builds

calculation

characters

checkout

chi

chisq

clipping

cmyk

cocoa

codes

concerning

conf

contain

core

counting

deal

debugger

defaultfont

defaults

R-devel

Page 6: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Word Networks

“lattice”Thierry ONKELINX

Jim Lemon

Deepayan Sarkar

Weidong GuFelix Andrews

Sundar Dorai RajBert Gunter

John Kane

Stefan Grosse

Dieter Menne

Saptarshi Guha

Brian Ripleyhadley wickham

Alex Brown

Paul Hiemstra

Thomas Zumbrunn

Henrique DallazuannaRon Bonner

Tom Cohen

Levi Waldron

Patrick Connolly

Michael Kubovy

K Elo

Qian R

Aaron Arvey

Paulo Cardoso

Charilaos Skiadas

RINNER Heinrich

Ola Caster

Andrewjohnclose

Gabor Grothendieck

Douglas Bates

Mark Leeds

Steven McKinney

Karl Ove Hufthammer

Malte Brockmann

John Smith

Ben Bolker

Troels Ring

Gavin Simpson

Michael HopkinsMark DiffordChuck Cleland

baptiste auguie

Judith Flores

Daniel Ott

Surai

stephen sefick

Tom Bonen

John Fox

Richard Cotton

Alex Karner

Ferry

Paul Boutros

Dhruv Sharma

Gary Nelson

Paul Murrell

Marianne Promberger

Rolf Turner

Scott R Waichler

steve

Robbie Heremans

John Poulsen

Greg Snow

Ryan Hafen

Jon LoehrkeDimitri Liakhovitski

remko duursma

Christos Argyropoulos

Andrew Beckerman

Leandro Marino

Iago Mosqueira

Juliet Hannah

Mike Lawrence

Mark Wilkinson

Dan Kortschak

Dimitris Rizopoulos

Naomi Robbins

jimdare

Steve FriedmanColtrey Mather

Thomas Roth

Andrew McFadden

Ranjan Maitra

Gustaf Rydevik

Rebecca Fisher

Daniel Kornhauser

Pieter van der Spek

Tyler Smith

Sebastien Bihorel

rhelp 20 trevva

Einar Arnason

William Deese

Don McKenzie

Samuel B Civ USAF AFMC AFRL RVBXI Cable

David Winsemius

Alex ReynoldsBenjamin Tyner

francesc montane

Bram Kuijper

Wolfram Fischer

Jim Price

audrey

Mark Coletti

Katell HAMON

Richard and Barbara Males

Henning Wildhagen

Kyle Roberts

G Draisma

GOUACHE David

David Chosid

Andreas Krause

Patrick Hausmann

Chris Barker

Brian Desany

Joachim Heidemeier Dr

PALMIER Patrick CETE NP INFRA TRF

Seth W Bigelow

Javier PB

lost river

Mark Heckmann ravi

Page 7: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Word Networks

“ggplot”

hadley wickham

Paul Murrell

Domenico Vistocco

P E David ThompsonThierry ONKELINX

ba8

John Kane

Rainer M Krug

Felipe Carrillo

Uwe Ligges

Martin Rittner

Chris Friedl

Tribo Laboy

Pedro de Barros

Sebastian Weber

Ptit Bleu

guillaume chaumet

Brian Ripley

Xavier Chardon

mihalicza peter

Ronggui Huang

Michael Friendly

Carsten Jaeger

Michael Frumin

Ben Bolker

Aric Gregson

Megh Dal

david f

Sorn Norng

baptiste auguie

stephen sefick

Eric

Gabor Grothendieck

Tylere Couture

galneweinhaw

David Hajage

Dave Murray Rust

Juliet Hannah

steve

Josep Maria Campanera Alsina

Wayne F

Harsh

Jason Rupert

Tom Cohen

Avram Aelony

Ian Fiske

Christopher David Desjardins

Ista Zahn

Dieter Menne

Bernd Weiss

haettulegur

Marianne Promberger

Andreas Christoffersen

MUHC Research

Sunil Suchindran

Paul Emberson

Mike Lawrence

Zeljko Vrba

Ivan AlvesTena Sakai

Ian Fellows

Felix Andrews

jiho

Matias Gallego Liberman

simeon duckworth

Bernd Engelmann

Bernd Ebersberger

Jason Law

Ingo Michaelis

Mikhail Spivakov

Thorsten Vogel

Williams Scott

Edna Bell

Henning Wildhagen

Tom Bonen

Albin Blaschka

btcruiser

Adam Marsh

Elena Schulz

Spinu Vitalie

Jacob Etches

Erich Studerus

levyofi

Etches Jacob

stephen bond

Page 8: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Word Networks

“legend”

Henrique Dallazuanna

Tom Snowden

Marc Schwartz

tom soyer

John Kane

Peter Dalgaard Lauri Nikkinen

Stanley Ng

Earl Glynn

Duncan Murdoch

Deepayan Sarkar

hadley wickham

Gabor Grothendieck

Jim Lemon

Thomas Steiner

Thierry ONKELINX

Pedro de Barros

Gavin Simpson

Greg Snow

Georg Otto

Hans Joerg Bibiko

Uwe Ligges

Yasir Kaheil

Paul Johnson

Carsten Jaeger

Markus Gesmann

Tomas Lanczos

Matthew Pietrzykowski

Alexandre Aguiar

baptiste auguie

stephen sefick

Julia Liu

Bernardo Rangel Tura

Joachim Heidemeier DrFelix Andrews

Christophe Genolini

Mathieu Ribatet

Simone Giannerini

Martin MaechlerBrian Ripley

Dimitri Liakhovitski

David Winsemius

Julien Beguin

Dieter Menne

Avram Aelony

Aparna Vemuri

Rodrigo Aluizio Jorge Velez

Samuel B Civ USAF AFMC AFRL RVBXI Cable

Sherri Heck

jimdare

Rebecca Fisher

Michael Head

Christophe Dutang

Olivier ETERRADOSSI

Mathew Fox

Simon Pickettanisha sinnarkar

Steve Murray

Sarah Goslee

Peter Flom

Mike Lawrence

Felipe Carrillo

Zeljko Vrba

Tom Boonen

steve

rhelp 20 trevva

Tom Cohen

Bernd Engelmann

ming kung

Neuer Arkadasch

kate

yvo

Georg Ehret

Levi Waldron

Agus Susanto

Mark Farnell

Anne Marie Ternes

ba8

Tom Bonen

Patrick Hausmann

Nelson B Villoria

yk

Lavan

Sarah Vandome

mbr

Michael Kubovy

PALMIER Patrick CETE NP INFRA TRF

Paul Emberson

valeria pedrina

Erich Studerus

Giovanni Petris

John

Luis Ridao Cruz

Etches Jacob

Wade Wall

Page 9: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Word Networks

“boxplot”

Brian Ripley

Henrique Dallazuanna

Petr Pikal

Jim Lemon

Chuck Cleland

David Hewitt

Gabor Grothendieck

Marc Schwartz

John Kane

Sharon Kuhlmann B

HBaize

Thierry ONKELINX

Chris Friedl

hadley wickham

Alex Reynolds

Felipe Carrillo

Matthias Kohl

Peter Alspach

Sherri Heck

Thomas Adams

Stephan Kolassa

Marlin Keith Cox

Sebastian Luque

Michael Kubovy

Bill Venables

Phil taylor

mihalicza peter

Birgit Lemcke

S Ellison

jim holtman

stephen sefick

Megan J Bellamy

Erik IversonLeandro Marino

Bernardo Rangel Tura

Uwe Ligges

Frank E Harrell

Adaikalavan Ramasamy

Jorge Velez

cathelf

Greg Snow

Yihui Xie

Keith Ponting

Coey Minear

Antje

Ben Bolker

Peter Dalgaard

goran brostrom

Deepayan Sarkar

Kenneth Roy Cabrera Torres

steve

Phillip Porter

Ken Knoblauch

Petter Hedberg

Dimitri Liakhovitski

David Winsemius

Annette Heisswolf

Pooja Jain

Coen van HasseltAndreas Christoffersen

Gabriel Rodriguez

Mike Lawrence

Dieter Menne

Zeljko Vrba

Richard Yanicky

Corinna Schmitt

rich

Karin Lagesen

Marc Bernard

Cornelis de Gier

Marcin Kozak

Tom Cohen

Giulio Di Giovanni

Marek Bartkuhn

Georg Ehret

Tubin

Paul Adams

Sebastian Merz

mm745

Rajasekaramya

Chad Junkermeier

Daniela Garavaglia

Murlidharan T Nair

Aldi Kraja

James Lenihan

Amit Patel

Samor GandhiKenneth Takagi

Page 10: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Centrality Measures

Notion lattice ggplot legend boxplot

Thierry ONKELINX

Jim Lemon

Deepayan Sarkar

Weidong GuFelix Andrews

Sundar Dorai RajBert Gunter

John Kane

Stefan Grosse

Dieter Menne

Saptarshi Guha

Brian Ripleyhadley wickham

Alex Brown

Paul Hiemstra

Thomas Zumbrunn

Henrique DallazuannaRon Bonner

Tom Cohen

Levi Waldron

Patrick Connolly

Michael Kubovy

K Elo

Qian R

Aaron Arvey

Paulo Cardoso

Charilaos Skiadas

RINNER Heinrich

Ola Caster

Andrewjohnclose

Gabor Grothendieck

Douglas Bates

Mark Leeds

Steven McKinney

Karl Ove Hufthammer

Malte Brockmann

John Smith

Ben Bolker

Troels Ring

Gavin Simpson

Michael HopkinsMark DiffordChuck Cleland

baptiste auguie

Judith Flores

Daniel Ott

Surai

stephen sefick

Tom Bonen

John Fox

Richard Cotton

Alex Karner

Ferry

Paul Boutros

Dhruv Sharma

Gary Nelson

Paul Murrell

Marianne Promberger

Rolf Turner

Scott R Waichler

steve

Robbie Heremans

John Poulsen

Greg Snow

Ryan Hafen

Jon LoehrkeDimitri Liakhovitski

remko duursma

Christos Argyropoulos

Andrew Beckerman

Leandro Marino

Iago Mosqueira

Juliet Hannah

Mike Lawrence

Mark Wilkinson

Dan Kortschak

Dimitris Rizopoulos

Naomi Robbins

jimdare

Steve FriedmanColtrey Mather

Thomas Roth

Andrew McFadden

Ranjan Maitra

Gustaf Rydevik

Rebecca Fisher

Daniel Kornhauser

Pieter van der Spek

Tyler Smith

Sebastien Bihorel

rhelp 20 trevva

Einar Arnason

William Deese

Don McKenzie

Samuel B Civ USAF AFMC AFRL RVBXI Cable

David Winsemius

Alex ReynoldsBenjamin Tyner

francesc montane

Bram Kuijper

Wolfram Fischer

Jim Price

audrey

Mark Coletti

Katell HAMON

Richard and Barbara Males

Henning Wildhagen

Kyle Roberts

G Draisma

GOUACHE David

David Chosid

Andreas Krause

Patrick Hausmann

Chris Barker

Brian Desany

Joachim Heidemeier Dr

PALMIER Patrick CETE NP INFRA TRF

Seth W Bigelow

Javier PB

lost river

Mark Heckmann ravi

hadley wickham

Paul Murrell

Domenico Vistocco

P E David ThompsonThierry ONKELINX

ba8

John Kane

Rainer M Krug

Felipe Carrillo

Uwe Ligges

Martin Rittner

Chris Friedl

Tribo Laboy

Pedro de Barros

Sebastian Weber

Ptit Bleu

guillaume chaumet

Brian Ripley

Xavier Chardon

mihalicza peter

Ronggui Huang

Michael Friendly

Carsten Jaeger

Michael Frumin

Ben Bolker

Aric Gregson

Megh Dal

david f

Sorn Norng

baptiste auguie

stephen sefick

Eric

Gabor Grothendieck

Tylere Couture

galneweinhaw

David Hajage

Dave Murray Rust

Juliet Hannah

steve

Josep Maria Campanera Alsina

Wayne F

Harsh

Jason Rupert

Tom Cohen

Avram Aelony

Ian Fiske

Christopher David Desjardins

Ista Zahn

Dieter Menne

Bernd Weiss

haettulegur

Marianne Promberger

Andreas Christoffersen

MUHC Research

Sunil Suchindran

Paul Emberson

Mike Lawrence

Zeljko Vrba

Ivan AlvesTena Sakai

Ian Fellows

Felix Andrews

jiho

Matias Gallego Liberman

simeon duckworth

Bernd Engelmann

Bernd Ebersberger

Jason Law

Ingo Michaelis

Mikhail Spivakov

Thorsten Vogel

Williams Scott

Edna Bell

Henning Wildhagen

Tom Bonen

Albin Blaschka

btcruiser

Adam Marsh

Elena Schulz

Spinu Vitalie

Jacob Etches

Erich Studerus

levyofi

Etches Jacob

stephen bond

Henrique Dallazuanna

Tom Snowden

Marc Schwartz

tom soyer

John Kane

Peter Dalgaard Lauri Nikkinen

Stanley Ng

Earl Glynn

Duncan Murdoch

Deepayan Sarkar

hadley wickham

Gabor Grothendieck

Jim Lemon

Thomas Steiner

Thierry ONKELINX

Pedro de Barros

Gavin Simpson

Greg Snow

Georg Otto

Hans Joerg Bibiko

Uwe Ligges

Yasir Kaheil

Paul Johnson

Carsten Jaeger

Markus Gesmann

Tomas Lanczos

Matthew Pietrzykowski

Alexandre Aguiar

baptiste auguie

stephen sefick

Julia Liu

Bernardo Rangel Tura

Joachim Heidemeier DrFelix Andrews

Christophe Genolini

Mathieu Ribatet

Simone Giannerini

Martin MaechlerBrian Ripley

Dimitri Liakhovitski

David Winsemius

Julien Beguin

Dieter Menne

Avram Aelony

Aparna Vemuri

Rodrigo Aluizio Jorge Velez

Samuel B Civ USAF AFMC AFRL RVBXI Cable

Sherri Heck

jimdare

Rebecca Fisher

Michael Head

Christophe Dutang

Olivier ETERRADOSSI

Mathew Fox

Simon Pickettanisha sinnarkar

Steve Murray

Sarah Goslee

Peter Flom

Mike Lawrence

Felipe Carrillo

Zeljko Vrba

Tom Boonen

steve

rhelp 20 trevva

Tom Cohen

Bernd Engelmann

ming kung

Neuer Arkadasch

kate

yvo

Georg Ehret

Levi Waldron

Agus Susanto

Mark Farnell

Anne Marie Ternes

ba8

Tom Bonen

Patrick Hausmann

Nelson B Villoria

yk

Lavan

Sarah Vandome

mbr

Michael Kubovy

PALMIER Patrick CETE NP INFRA TRF

Paul Emberson

valeria pedrina

Erich Studerus

Giovanni Petris

John

Luis Ridao Cruz

Etches Jacob

Wade Wall

Brian Ripley

Henrique Dallazuanna

Petr Pikal

Jim Lemon

Chuck Cleland

David Hewitt

Gabor Grothendieck

Marc Schwartz

John Kane

Sharon Kuhlmann B

HBaize

Thierry ONKELINX

Chris Friedl

hadley wickham

Alex Reynolds

Felipe Carrillo

Matthias Kohl

Peter Alspach

Sherri Heck

Thomas Adams

Stephan Kolassa

Marlin Keith Cox

Sebastian Luque

Michael Kubovy

Bill Venables

Phil taylor

mihalicza peter

Birgit Lemcke

S Ellison

jim holtman

stephen sefick

Megan J Bellamy

Erik IversonLeandro Marino

Bernardo Rangel Tura

Uwe Ligges

Frank E Harrell

Adaikalavan Ramasamy

Jorge Velez

cathelf

Greg Snow

Yihui Xie

Keith Ponting

Coey Minear

Antje

Ben Bolker

Peter Dalgaard

goran brostrom

Deepayan Sarkar

Kenneth Roy Cabrera Torres

steve

Phillip Porter

Ken Knoblauch

Petter Hedberg

Dimitri Liakhovitski

David Winsemius

Annette Heisswolf

Pooja Jain

Coen van HasseltAndreas Christoffersen

Gabriel Rodriguez

Mike Lawrence

Dieter Menne

Zeljko Vrba

Richard Yanicky

Corinna Schmitt

rich

Karin Lagesen

Marc Bernard

Cornelis de Gier

Marcin Kozak

Tom Cohen

Giulio Di Giovanni

Marek Bartkuhn

Georg Ehret

Tubin

Paul Adams

Sebastian Merz

mm745

Rajasekaramya

Chad Junkermeier

Daniela Garavaglia

Murlidharan T Nair

Aldi Kraja

James Lenihan

Amit Patel

Samor GandhiKenneth Takagi

Mostcentralpersons

Deepayan Sarkar,Sundar Dorai Raj,baptiste auguie

hadley wickham,Thierry ONKELINX

Duncan Murdoch,hadley wickham,Greg Snow

Gabor Grothendieck,hadley wickham

lattice

ggplot

legend

boxplot

Deepayan Sarkar

Sundar Dorai Rajbaptiste auguie

hadley wickhamThierry ONKELINX

Duncan Murdoch

Greg Snow

Gabor Grothendieck

Page 11: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Results: Interest maps

question

data

help

function plotusing

sub

package

matrix

error

frame

values

vectorlist

file

regression

test

model

multiple

issue

time

variable

loop

tablelattice

value

plotting

plots

axis

read

column

output

object

code

text

variables

analysis

columns

rows

string

windows

packages

glm

linefunctions

anova

simple

bug

dataframe

row

files

creating

distribution

random

names

ggplot2

readingseries

date

linear

set

graph

factor

lines

size

name

logistic

results

histogram

labels

character

xyplot

subsetcreate

running

legend

installing

format

graphics

message

install

lmer

numeric

color

correlation

models

missing

objects

summary

array

convert

vectors

frames

adding

pdf

zoo

mean

lme

nls

errors

write

mixed

memory

standard

argument

image

command

version

add

apply

sample

based

elements

normal

factors

grid

library

barplot

matrices

matching

panel

sum

graphs

programming

boxplot

predict

change

getting

method density

csv

mac

fit

savecalculate

type

extractclass

curve

system

arguments

expression

rglscript

loading

equivalent

log

title

dates

request

source

device

buildinglevels

formula

tables

loops

generate

means

binomial

length

index

specific

draw

strings

paste

run

optim

warning

reshape

multi

select

call

unique

word

axes

characters

bar

bit

load

replace

sets

aggregate

bars

label

tapply

vec

window

ggplot

step

colors

regular

setting

checkgraphic

editor

max

lapply

count

chart

level

box

equation

empty

search

sequence

symbols

cross

split

map

limit

manipulation

integer

lists

print

result

gui

cmd

symbolkey

cor

average

expressions

position

processing

efficient

element

par

build

behaviour

program

range

sig

statement

import

Gabor Grothendieck

Henrique Dallazuanna

jim holtman

Duncan Murdoch

Peter Dalgaard

Wacek Kusnierczyk

Brian Ripley

Greg Snow

hadley wickham

Deepayan Sarkar

Ted Harding

Jorge Velez

Rolf Turner

John Kane

John Fox

Frank E Harrell

Jim Lemon

David Winsemius

Erik Iverson

Uwe Ligges

Daniel Nordlund

Bert Gunter

Douglas Bates

Ben Bolker

Henrik Bengtsson

tom soyer

Charilaos Skiadas

Andrew Robinson

Philippe Grosjean

Wensui Liu

esmail bonakdarian

Kingsford Jones

Sherri Heck

Eik Vettorazzi

Ravi Varadhan

Muhammad Azam

Page 12: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Results: Communication bottlenecks

Gabor GrothendieckHenrique Dallazuanna

jim holtman

Duncan Murdoch

Peter Dalgaard

Wacek Kusnierczyk

Brian Ripley

Greg Snow

hadley wickham

Deepayan Sarkar

Ted Harding

Jorge Velez

Rolf Turner

John Kane

John Fox

Frank E HarrellJim Lemon

David Winsemius

Erik Iverson

Uwe Ligges

Daniel Nordlund

Bert Gunter

Douglas Bates

Ben Bolker

Henrik Bengtsson

tom soyer

Charilaos Skiadas

Andrew Robinson

Philippe Grosjean

Wensui Liu

esmail bonakdarian

Kingsford Jones

Sherri Heck

Eik Vettorazzi

Ravi Varadhan

Muhammad Azam

Page 13: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Results: Communication bottlenecks

Good.

B A

Can be improved.

C

D

Page 14: Network Text Analysis of R Mailing Lists Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt Hornik, Patrick Mair, Stefan Theuˇl 7/10/2009

Thank you!

Packages:

sna: Carter T. Butts (2008). Social Network Analysis withsna. Journal of Statistical Software 24/6.

tm: I. Feinerer, K. Hornik, and D. Meyer (2008). Text MiningInfrastructure in R. Journal of Statistical Software 25/5.

References:

C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A.Swaminathan. Mining email social networks. In Proceedingsof the 2006 international workshop on Mining softwarerepositories. ACM, New York, 2006.

Contact:

[email protected], www.angela-bohn.de