Upload
consuelo-valdes
View
87
Download
4
Embed Size (px)
DESCRIPTION
Slides created for the Tangible Embedded & Embodied Interaction conference 2013
Citation preview
From Big Data to Insights:
Opportunities and Challenges
for TEI in Genomics
Orit Shaer, Ali Mazalek, Brygg Ullmer, Miriam K. Konkel
Outline
Introduction to genomics/motivation
Design challenges
Case studies
Opportunities for TEI
Going forward
Genomics
“While the work is a challenge, making genetics
interactive is potentially as
transformative as the move from batch
processing to time sharing” -Bafna V. et al. Communications of the ACM Jan 2013
Project flow:
Genome Sequencing Project
Sequencing
Centers
High-
throughput
Sequencing
Draft Sequence
Finished Sequence
Sequence Archiving
Genome Annotation
DNA
Sequence
Protein
Prediction Pathways
Comparative
Analysis
Target Selection
Schkolne, Ishii, and Schroder 2004.
TEI for Scientists
Gillet et al. 2005 Brooks et al. 1990
Project GROPE
Tabard, A., et. al 2011. eLabBench.
Challenges
Scale
Heterogeneous Data
Diverse Audience
Scale
Filesystem @ Broad Inst.: 13+PB
One run of an Illumina HiSeq 2500:
6 billion paired-end sequences
(600 gigabases, or 120Gb/day)
Thousand Genomes project:
692 collaborators
110 institutions
>15 groups in (bi-)weekly
conference calls
Blue Waters cluster:
>380K CPU cores
+ >3K GPUs
Heterogeneous Data
Diverse Audience
Citizen Scientist
Genomic Scientists
Citizen Scientist General Public
Future Scientists
How can TEI systems be designed to
• Empower citizens to make informed health decisions?
• Communicate scientific data to communities?
• Enhance learning of complex concepts?
• Support experts interacting with big data?
Challenges
Scale
Heterogeneous Data
Diverse Audience
Case Studies
Tabletop Genome Browsing & Primer Design
Tangible-targeted Computational Genomics
Tangibles For Visualizing Systems Biology
Locate
Learn Retrieve
Annotate
Compare
48.4%
1.0% 2.4%
46.6%
1.6%
Human genome: understanding ca. 2012
Mobile elements
Processed pseudogenes
Tandem repeats & lowcomplexity DNA
Dark matter
Protein & RNA codingregions
Composition of other primate genomes is very similar
Tangibles-targeted computational genomics
Example projects: rhesus, orangutan, human, marmoset genomes
• Often multi-institution, multi-person efforts
– Above articles: ~250, 100 co-authors
• Often long duration (e.g., 4-6 years before first publication)
• Iterative fusion of computational and “wet bench” analyses
• Some analyses “big CPU” (e.g., 200 cpu cores for weeks);
others, “big RAM” (200+GB RAM)
Tangible Visualization:
persistent representations
of people, projects, activities…
Interactions 2012.07: Entangling space, form, light, time, computational STEAM, and cultural artifacts
CS3: Systems Biology Modeling
Lessons learned TEI can facilitate immediate, visible, and easily reversible manipulations
• How to design TEI for open-ended creative inquiries?
Tangible representations can facilitate multi-stage workflows
• Important for execution and tracking of complex analyses
• Need parametrized, annotatable representations of complex large datasets
TEI could facilitate collaboration for distributed and co-located teams
• Large interdisciplinary teams and distributed work are common in this area
• Users can jointly manipulate assumptions and see consequences
Tangible tools can support understanding and discovery
• Provide access to different pieces of the problem (data, reactions)
• Help users forms accurate mental models through tangible/embodied manipulation
Opportunities for TEI Engagement
Understanding Complex Problems
Visualizing Biological Data
Enabling Large Collaborations
Supporting Diverse Audiences
Managing Varied Timescales
Understanding Complex Problems
Enabling Large Collaborations
Managing Varied Timescales
Powers of 10,000:
• Milliseconds
• Minutes
• Months
• Millenia
Entangling Space, Form, Light, Time, Computational STEAM, and Cultural Artifacts
Examples
• Many genome projects: 5+ years
• Sequencing Lincoln’s DNA: under
active discussion since 1991
• Most of us sequenced within decade?
materially impacting all our descendants
Going forward
• Some aspects w/ broad TEI, computational science synergies
• How to visualize and engage data, activity, progress spanning
many systems, people, places, timescales?
• What representational forms, device ecologies, most
appropriate for large, abstract data?
• Facilitating engagement with big data in ways that highlight
connections between multiple forms of evidence
• Some aspects specific to genomics
• 2023: anticipate most of us in room + many thousands of
species having genomes fully or partially sequenced
• Commonalities, distinctions in engagements by scientists,
students, street people, senators, senior citizens, solicitors, …
THANKS!
Orit Shaer: [email protected]
Ali Mazalek: [email protected]
Brygg Ullmer: [email protected]
Miriam Konkel: [email protected]
Consuelo Valdes (Wellesley College) and Andy Wu (Georgia Tech).
This work has been partially funded by NSF IIS-1017693, DRL-
097394084, and CNS-1126739.