27
From Big Data to Insights: Opportunities and Challenges for TEI in Genomics Orit Shaer, Ali Mazalek, Brygg Ullmer, Miriam K. Konkel

Big Data and Tangibles - TEI 13

Embed Size (px)

DESCRIPTION

Slides created for the Tangible Embedded & Embodied Interaction conference 2013

Citation preview

Page 1: Big Data and Tangibles - TEI 13

From Big Data to Insights:

Opportunities and Challenges

for TEI in Genomics

Orit Shaer, Ali Mazalek, Brygg Ullmer, Miriam K. Konkel

Page 2: Big Data and Tangibles - TEI 13

Outline

Introduction to genomics/motivation

Design challenges

Case studies

Opportunities for TEI

Going forward

Page 3: Big Data and Tangibles - TEI 13

Genomics

“While the work is a challenge, making genetics

interactive is potentially as

transformative as the move from batch

processing to time sharing” -Bafna V. et al. Communications of the ACM Jan 2013

Page 4: Big Data and Tangibles - TEI 13

Project flow:

Genome Sequencing Project

Sequencing

Centers

High-

throughput

Sequencing

Draft Sequence

Finished Sequence

Sequence Archiving

Genome Annotation

DNA

Sequence

Protein

Prediction Pathways

Comparative

Analysis

Target Selection

Page 5: Big Data and Tangibles - TEI 13

Schkolne, Ishii, and Schroder 2004.

TEI for Scientists

Gillet et al. 2005 Brooks et al. 1990

Project GROPE

Tabard, A., et. al 2011. eLabBench.

Page 6: Big Data and Tangibles - TEI 13

Challenges

Scale

Heterogeneous Data

Diverse Audience

Page 7: Big Data and Tangibles - TEI 13

Scale

Filesystem @ Broad Inst.: 13+PB

One run of an Illumina HiSeq 2500:

6 billion paired-end sequences

(600 gigabases, or 120Gb/day)

Thousand Genomes project:

692 collaborators

110 institutions

>15 groups in (bi-)weekly

conference calls

Blue Waters cluster:

>380K CPU cores

+ >3K GPUs

Page 8: Big Data and Tangibles - TEI 13

Heterogeneous Data

Page 9: Big Data and Tangibles - TEI 13

Diverse Audience

Citizen Scientist

Genomic Scientists

Citizen Scientist General Public

Future Scientists

Page 10: Big Data and Tangibles - TEI 13

How can TEI systems be designed to

• Empower citizens to make informed health decisions?

• Communicate scientific data to communities?

• Enhance learning of complex concepts?

• Support experts interacting with big data?

Page 11: Big Data and Tangibles - TEI 13

Challenges

Scale

Heterogeneous Data

Diverse Audience

Page 12: Big Data and Tangibles - TEI 13

Case Studies

Tabletop Genome Browsing & Primer Design

Tangible-targeted Computational Genomics

Tangibles For Visualizing Systems Biology

Page 13: Big Data and Tangibles - TEI 13

Locate

Learn Retrieve

Annotate

Compare

Page 14: Big Data and Tangibles - TEI 13
Page 15: Big Data and Tangibles - TEI 13

48.4%

1.0% 2.4%

46.6%

1.6%

Human genome: understanding ca. 2012

Mobile elements

Processed pseudogenes

Tandem repeats & lowcomplexity DNA

Dark matter

Protein & RNA codingregions

Composition of other primate genomes is very similar

Tangibles-targeted computational genomics

Page 16: Big Data and Tangibles - TEI 13

Example projects: rhesus, orangutan, human, marmoset genomes

• Often multi-institution, multi-person efforts

– Above articles: ~250, 100 co-authors

• Often long duration (e.g., 4-6 years before first publication)

• Iterative fusion of computational and “wet bench” analyses

• Some analyses “big CPU” (e.g., 200 cpu cores for weeks);

others, “big RAM” (200+GB RAM)

Page 17: Big Data and Tangibles - TEI 13

Tangible Visualization:

persistent representations

of people, projects, activities…

Interactions 2012.07: Entangling space, form, light, time, computational STEAM, and cultural artifacts

Page 18: Big Data and Tangibles - TEI 13

CS3: Systems Biology Modeling

Page 19: Big Data and Tangibles - TEI 13
Page 20: Big Data and Tangibles - TEI 13
Page 21: Big Data and Tangibles - TEI 13

Lessons learned TEI can facilitate immediate, visible, and easily reversible manipulations

• How to design TEI for open-ended creative inquiries?

Tangible representations can facilitate multi-stage workflows

• Important for execution and tracking of complex analyses

• Need parametrized, annotatable representations of complex large datasets

TEI could facilitate collaboration for distributed and co-located teams

• Large interdisciplinary teams and distributed work are common in this area

• Users can jointly manipulate assumptions and see consequences

Tangible tools can support understanding and discovery

• Provide access to different pieces of the problem (data, reactions)

• Help users forms accurate mental models through tangible/embodied manipulation

Page 22: Big Data and Tangibles - TEI 13

Opportunities for TEI Engagement

Understanding Complex Problems

Visualizing Biological Data

Enabling Large Collaborations

Supporting Diverse Audiences

Managing Varied Timescales

Page 23: Big Data and Tangibles - TEI 13

Understanding Complex Problems

Page 24: Big Data and Tangibles - TEI 13

Enabling Large Collaborations

Page 25: Big Data and Tangibles - TEI 13

Managing Varied Timescales

Powers of 10,000:

• Milliseconds

• Minutes

• Months

• Millenia

Entangling Space, Form, Light, Time, Computational STEAM, and Cultural Artifacts

Examples

• Many genome projects: 5+ years

• Sequencing Lincoln’s DNA: under

active discussion since 1991

• Most of us sequenced within decade?

materially impacting all our descendants

Page 26: Big Data and Tangibles - TEI 13

Going forward

• Some aspects w/ broad TEI, computational science synergies

• How to visualize and engage data, activity, progress spanning

many systems, people, places, timescales?

• What representational forms, device ecologies, most

appropriate for large, abstract data?

• Facilitating engagement with big data in ways that highlight

connections between multiple forms of evidence

• Some aspects specific to genomics

• 2023: anticipate most of us in room + many thousands of

species having genomes fully or partially sequenced

• Commonalities, distinctions in engagements by scientists,

students, street people, senators, senior citizens, solicitors, …

Page 27: Big Data and Tangibles - TEI 13

THANKS!

Orit Shaer: [email protected]

Ali Mazalek: [email protected]

Brygg Ullmer: [email protected]

Miriam Konkel: [email protected]

Consuelo Valdes (Wellesley College) and Andy Wu (Georgia Tech).

This work has been partially funded by NSF IIS-1017693, DRL-

097394084, and CNS-1126739.