How A Million People Could Save the Planet: The Next Research Agenda for Collaborative Computing 2012 Brazilian Symposium on Collaborative Systems David

How A Million People Could Save the Planet:

The Next Research Agenda forCollaborative Computing

2012 Brazilian Symposium on Collaborative Systems

David W. [email protected]

University of WashingtonThe Information School

October 18, 2012

These are not <the thing>,

but are pointers to <the thing>.

The Shifting Paradigm in Collaborative Computing

• There are a set of interesting problems at the intersection of computing and how people use computing

• The issues at the intersection can change as participation scales up to larger numbers

• Insight gained from studying the intersection can fundamentally change computing as a purposely designed and built artifact

• Theory and methods originating from one single perspective (computational, behavioral, or social) are insufficient to fully interpret the happenings in the intersection

Talk Outline

• Introduction• The Shifting Paradigm in Collaborative Computing• Insights from Prior Research

– Expertise Locating– Proactive Displays– Lifestyle Behavior Change

• Patterns of Behavioral Observations in Wikipedia– Collective Behavioral Observation– Machine Learning Experiments– Candidate Patterns– Validation and Limitations

• Social Computational Systems Research Agenda– High-Level Research Challenges– Research Openings

• Research Questions– How do people find necessary expertise?– How can we build systems to support

natural expertise locating behavior?

• Methods– Qualitative, 9 month ethnographic field

study– Grounded Theory analysis– System building/design (wrote code)

– Quantitative evaluation of locating and matching heuristics

Expertise Locating

• Findings– Locating process – Identification,

Selection, Escalation– Identification – Work products and

byproducts can be used to generate recommendations of individuals with ‘localized’ expertise

– Selection – Social Networks for contextualizing social recommendation are only partially effective

– Plugable software architecture (ERArch) to allow extension and addition of Identification and Selection techniques

Expertise Locating

Proactive Displays

• Design Goals/Questions– Enhance the feeling of community among

conference attendees. – Mesh with common social practices at the

conference. – Manage the privacy concerns of all participants.

• Methods– System building/design– Field trial (deployment) at academic conference– Observation, ad-hoc interviews, post

conference survey

Auto Speaker ID

Ticket 2 Talk

Neighborhood Window

Proactive Displays

• Findings– Proactive Display as an Open Region – an

area where people of different status are socially allowed to interact (Goffman, Behavior in Public Places, 1963)

– Shared Interactions – You don’t really “interact” with a “proactive” system

– Design Implication for Public Displays – Context(s), Content, Control

Auto Speaker ID

Ticket 2 Talk

Neighborhood Window

Lifestyle Behavior Change - UbiFit

• Research Question– How can technology help people move from the

behaviors that define the lifestyle they have to a new lifestyle they want?

• Methods– System building/design– Field Trial – 3 weeks– Field Experiment – 3 month– Interviews, surveys, activity data– Analysis

• Presentation of Self (Goffman)

• Cognitive Dissonance Theory (Festinger)

• Transtheoretical Model of Behavior Change (Prochaska et al)

Lifestyle Behavior Change - UbiFit

• Findings– Traditional models of validation for

inference systems are problematic when deployed in the real world

– Theories being used for UbiComp fitness/health applications are somewhat problematic (TTM)

– Awareness of behavior through personal ambient display can overcome avoidance

– Fitness behavior patterns are not very regular (exceptions are the rule)

Talk Outline





Collective Behavioral Observations

• People make behavioral observations– Every day social/behavioral science– Motivating example – Driving

Collective Behavioral Observations

• People make behavioral observations– Every day social/behavioral science– Motivating example – Driving

• Online Communities– Observations are attenuated– Leverage the power of the crowd, many people

• Wikipedia Behavioral Observations– Barnstars

Barnstars

Barnstars

Barnstar Gallery

Observational Patterns

• Can we identify patterns of user activity through non-specialist observations?

• Possible problems …– Pro-social recognition (piling on)– Singular activity – popular – Singular activity – extraordinary efforts

Generate Train & Test Sets

• Previous work (became Train Set)– Mined Nov. 2006 Wikipedia data dump– Over 14K unique barnstars, ~4900 recipients– Created coding scheme, 7 top-level categories– 3 coders, ~2126 barnstars

• Additional Coding (new Test Set)– Random selection, cleaning – 2 coders, ~478 barnstars

Train & Test Set Distributions

Train Set Test Set

Dimension of Observed Activity

Codes

% Codes

%

Editing Work 852 27.8 180 29.1

Social and Community Support Action

763 24.9 150 24.2

Border Patrol 342 11.2 81 13.1

Administrative 284 9.3 54 8.7

Collaborative Action and Disposition

244 8.0 41 6.6

Meta-Content Work 128 4.2 23 3.7

Undifferentiated Work 447 14.6 90 14.5

Classification Experiments

• General Multi-label Classification Approaches– Problem Transformation (PT)– Algorithm Adaptation

• Features– n-gram, barnstar name, barnstar image name, policy named,

policy linked, link to a page, link to a specific edit, …

• What worked reasonably well– PT1 – Independent binary classification– PT4 – Classifier for every set of applied labels– AA – MLkNN, multi-label version of k Nearest Neighbors

PT1 – Results (AUC)

Dimension of Activity

Logistic Regression

Naïve Bayes

Random Forest (1k Trees)

KNN (k=10)

Administrative 0.833 0.949 0.942 0.903

Border Patrol 0.922 0.941 0.952 0.956

Collaborative Action 0.750 0.722 0.743 0.725

Editing 0.878 0.875 0.879 0.884

Meta-Content 0.835 0.842 0.883 0.800

Social and Community

0.802 0.796 0.797 0.805

Undifferentiated Work

0.847 0.848 0.844 0.854

Avg. AUC 0.838 0.853 0.862 0.847

Identifying Candidates

• Select Barnstar Recipients– Recipients with 9 or more barnstars

• 259 candidates, 4327 barnstars

• Applied the Random Forest– Label the received barnstars

• Candidate Recipients– Predominate observed activity if the same label applied to

more than half

Candidates

Dimension of Activity Label Avg %Candida

tes

Editing E 67.9 25

Border Patrol B 73.2 13

Social and Community S 61.5 54

Administrative A 66.4 75

Collaborative Actions C 52.0 1

Meta-Content M 76.8 4

Undifferentiated Work U 60.0 10

182

Review Candidates & Labels

• Random selection of pattern candidates– 39 of the 182 (21.4% ), yield 544 of 4327 barnstars

(12.6%)

• Validation– Possible duplicates, possible non-barnstars– Mislabel application

Reviewing Patterns

• Independence of the Observations– Seem relatively independent– No evidence of barnstars awarded to the same recipient

for the exact same event

• Limitations– Skew in what the community “values” and in the numbers

(a challenge for ML validation – unbalanced data)– Link candidate and patterns to the actual edits

• Future work

Working at the Intersection

• Contribution to Computing– Naturalistic datasets open interesting problems for ML

algorithms• Massive datasets probably require application of ML techniques

– Approaches for handling short text, incremental contributions

– Unbalanced data

McDonald, D. W., S. Javanmardi and M. Zachry (2011) Finding Patterns in Behavioral Observations by Automatically Labeling Forms of Wikiwork in Barnstars. Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym'11).

Sajnani, H., S. Javanmardi, D. W. McDonald and C. Lopes. (2011) Multi-Label Classification of Short Text: A Study on Wikipedia Barnstars. Presented at the “Analyzing Microtext” Workshop at the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11).

Talk Outline





The Shifting Paradigm in Collaborative Computing

• There are a set of interesting problems at the intersection of computing and how people use computing

• The issues at the intersection can change as participation scales up to larger numbers

• Insight gained from studying the intersection can fundamentally change computing as a purposely designed artifact

• Theory and methods originating from one single perspective (computational, behavioral, or social) are insufficient to fully interpret the happenings in the intersection

Social Computational Systems Research Agenda• Defining SoCS

– A Social Computational System (SoCS) interleaves machine activity and human activity to solve problems that neither machine nor human can solve alone.

• Properties of SoCS– Allow people to do what people do best– Allow machines to do what machines do best– Solve unique problems that interleave both

– Could be a 1-with-1 system (one person with one machine)– Perhaps scaling of SoCS could solve more difficult problems

SoCS: Research Openings

Collaborative Substrate/Infrastructure

SoCS: Research Openings,Collaborative Substrate

Collaborative Substrate/Infrastructure

• Software Engineering– Architectures for effective interleaving– Toolkits to support new system development

• Languages– Support massive parallelization between people/machine– Expressive asynchrony– Support task decomposition & recomposition among people/machine

• Data Management– Data Provenance– Who generated the data? (Person or machine?)– How does data (quality) change over time?

• Psychological– Motivations, incentives to make contributions– Promote high quality contributions– Skill development and individual growth

• Interaction, Social– Support for prosocial or congenial interaction– Leveraging or minimizing conflict– Effective support for meta conversations about the

system/tasks– Provide meaningful feedback on the work, tasks,

contributions

SoCS: Research Openings,Human/Social

• Intelligent Systems– Understanding error rates of machine and people– Patterns across very large numbers of contributions– Patterns in very small contributions

• Data Mining– Effective use of user contribution– Working to minimize multiple collections

SoCS: Research Openings,Computational

SoCS: Research Openings,Interface

• Usability– Simplify making a contribution– Identifying tasks or places where contributions are needed– Administration tasks

• Visualizations– Understand, interpret who, what, where of contributions– Where are groups of people, clusters of work– Where are there gaps

Social Computational Systems Research Agenda• Three High-Level Challenges for SoCS

– Methodological ChallengeEffectively use existing methods to study the intersection and,

where those methods fail, develop new methods to address the intersection.

– Human Trait or Technical Quality (Trait/Quality) ChallengeUnderstand the shifting influences of human traits and technical

qualities across scales to accommodate shifting levels of participation in SoCS - potentially increasing or decreasing.

– Design ChallengeCommunicate SoCS design principles so that the broader

community of system builders and industry can readily utilize them.

Promising Domains

• Leverage human skills, insight, intuitions• Leverage the ability of machines to model,

calculate, aggregate, visualize

Promising Domains



• Social Computational Systems as Applications– Cognitive support – memory, understanding, comprehension– Social support – facilitate interactions with others, cross-cultural– Educational – interleave people and machines for teaching as

well as learning– Government – grow participation in decision making– Work/Labor – enable new forms of work, potentially new

economies

Promising Domains



• Social Computational Systems for Grand Challenges– Global warming– Preserve cultural knowledge from extinction– Sustainable economic development– Health and wellness

Obrigado!

• Questions & Discussion

• Acknowledgements– Patterns of Behavioral Observations Study

• Sara Javanmardi, Hitesh Sajnani, Greg Tsoumakas, Mark Zachry, Crista Lopes

• NSF IIS-0811210

– Many other students, collaborators on the prior work

Documents

How A Million People Could Save the Planet: The Next Research Agenda for Collaborative Computing 2012 Brazilian Symposium on Collaborative Systems David