A Task-Centered Framework för Computationally Grounded Science Collaborations

  • View
    18

  • Download
    2

Embed Size (px)

Text of A Task-Centered Framework för Computationally Grounded Science Collaborations

PowerPoint-Prsentation

A Task-Centered Framework for Computationally Grounded Science Collaborations1Information Sciences Institute, University of Southern California 2Department of Software Engineering for Business Information Systems, Technical University of Munich3Department of Civil and Environmental Engineering at Penn State University4Center for Limnology at the University of Wisconsin Madison

Yolanda Gil1, Felix Michel12, Varun Ratnakar1, Matheus Hauder2, Christopher Duffy3, Hilary Dugan4, and Paul Hanson4

11th IEEE International Conference on eScience

Organic Data Sciencehttp://www.organicdatascience.org/

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE Evolution of the scientific enterpriseEvolution of the scientific enterprise from [Barabasi, 2005] extended with the ATLAS Detector Project at the Large Hadron Collider [The ATLAS Collaboration, 2012].

Motivation

single-authorshipco-authorshiplarge number ofco-authorsthe community as author

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE Galileo, Newton, Darwin and Einstein published fundamental worksingle-authorship

Watson and Crick made progress on unscrambling the DNAs structureco-authorship

International Human Genome Sequencing Consortiumlarge number co-authors

ATLAS Detector Project at the Large Hadron Collider in CERNthe community as author

2

Taxonomy of Science Communities Collaboration types with resources and activities [Bos et al 2007]

IntroductionTools (instruments)Information(data)Knowledge(new findings)Aggregatingacross distance (loose coupling, often asyn-chronously)Shared InstrumentNEONCommunication Data SystemPDBVirtual Learning Community GLEON,

Virtual Community of PracticeVIVO

Co-creatingacross distance(requires tighter coupling, often synchronously)InfrastructureCSDMSOpen Community Contribution SystemZooniverseDistributed Research CenterENCODE

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE Shared Instruments, NASA and the University of Hawaii to share a telescope physically located in Hawaii. NEON National Ecological Observatory Network

Community Data Systems is based on a geographically-distributed community that creates, modifies and maintains data sets. PDBProtein Databank

Open Community Contribution Systems The approach focuses on contributing work, not on data. Zooniverse

Virtual Communities of Practice is a network of interests, advices, and links to reassures in a research area. They do not work on joint projects.GLEON Global Lake Ecological Observatory Network

Virtual Learning Communities help educating their participants. VIVO The VIVO Researcher Network

Distributed Research Centers Research Centers profit from synergies through aggregation of resources and talents and effort. ENCODE ENCyclopedia of DNA Elements

Community Infrastructure Projects develop a common infrastructure for a certain domain. CSDMS Community Surface Dynamics Modeling System3

Introduction

Multi-disciplinary contributionsSignificant coordinationEngaging unanticipated participantsR1:R2:R3:Goal: Supporting Distributed Research Activities with Unanticipated Participants Joining Over Time

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE

4

Approach

Algorithm Black box

Input

Parameter

Output-> x1-> x2

-> y1-> y2

Description

z ->v ->

a ->b ->

This component uses the X model to generate .factor: 20repeat: 16 timesMin: 0.5 unitsmax: 11.5 units

Meta DescriptionSoftware ComponentComputational Workflow

Modeling

AnalyzeProvenance

Executed 2014Input:Results:

Executed 2013Input:Results:

Executed 2012Input:Results:

Executed 2011Input:Results:

Implement computational data analysis1) Workflow creation activities

Supported by workflow systemsComputationally Grounded Science Collaboration: Layers

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE

ApproachCode

Input

Parameter

Output-> x1-> x2

-> y1-> y2

Description

z ->v ->

a ->b ->

This component uses the X model to generate .factor: 20repeat: 16 timesMin: 0.5 unitsmax: 11.5 units

Meta Description

Algorithm

Software Component

Select/develop software

Computationally Grounded Science Collaboration: Layers2) Softwaredevelopmentactivities

Supported by shared software repositories

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE

Approach

Select problems, strategies, data, models, methods, etc.

Organic Data Science

Workflow Black box

DataParameters

Descriptionz ->v ->

a ->b ->

Model X with data source Y indicates

Meta DescriptionComputational Workflow

Models

Meta WorkflowComputationally Grounded Science Collaboration: Layers3) Meta-workflow designactivities

Our focus

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE Computationally Grounded Science Collaboration: Layers

Meta-workflow design activitiesWorkflow creation activitiesSoftwaredevelopment activities

Approach

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE

Collaboration that occurs in distributed research activities with unanticipated participants joining over timeMeta-workflow design layer: scientists working together to agree on a problem to solve and a strategy to solve itReducing the coordination effort, lower the barriers to growing the communityFocus of this work

Approach

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE Computationally-grounded collaboration occurs at several levels, from high-level meta-workflow design to determine what scientific problem to solve and how, to workflow creation to select the data and analytic software to be used, to coding activities to implement the software needed. The focus of this work is the former, that is, the collaboration that occurs when scientists are working together to agree on a problem to solve and a strategy to solve it.9

Social Design PrinciplesSelected social principles from [Kraut and Resnick 2012] for building successful online communities that can be applied to Organic Data Science.A1: Carve a niche of interest, scoped in terms of topics, members, activities, and purpose A2: Relate to competing sites, integrate content A3: Organize content, people, and activities into subspaces once there is enough activity A4: Highlight more active tasks A5: Inactive tasks should have expected active times A6: Create mechanisms to match people to activities

B1: Make it easy to see and track needed contributions B2: Ask specific people on tasks of interest to them B3: Simple tasks with challenging goals are easier to comply with B4: Specify deadlines for tasks, while leaving people in control B5: Give frequent feedback specific to the goals B10

C1: Cluster members to help them identify with the community C2: Give subgroups a name and a tagline C3: Put subgroups in the context of a larger group C4: Make community goals and purpose explicit C5: Interdependent tasks increase commitment and reduce conflict

DD1: Members recruiting colleagues is most effective D2: Appoint people responsible for immediate friendly interactions D3: Introducing newcomers to members increases interactions D4: Entry barriers for newcomers help screen for commitment D5: When small, acknowledge each new member D12 BAC

Approach

Starting communities

Encouraging contributions through motivation

Encouraging commitment

Attracting and Engaging Newcomers

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE

Best Practices from Polymath and EncodeSelected best practices from the Polymath [Nielsen 2012] project and lessons learned from ENCODE [Encode 2004]. E1: Permanent URLs for posts and comments, so others can refer to themE2: Appoint a volunteer to summarize periodicallyE3: Appoint a volunteer to answer questions from newcomersE4: Low barrier of entry: make it VERY easy to commentE5: Advance notice of tasks that are anticipatedE6: Keep few tasks active at any given time, helps focusF1: Spine of leadership, including a few leading scientists and 1-2 operational project managers, that resolves complex scientific and social problems and has transparent decision makingF2: Written and publicly accessible rules to transfer work between groups, to assign credit when papers are published, to present the workF3: Quality inspection with visibility into intermediate stepsF4: Export of data and results, integration with existing standardsEF

Approach

Lessons learned from ENCODE

Best practices from Polymath

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE

Self-Organization through Dynamic Task Decomposition

Approach

eScience

Nr.ApproachIntroductionMotivationEvaluationConclusion

USC INFORMATION SCIENCES INSTITUTE

Organic Data S