Upload
susanna-gibson
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Tony Di Sera• Passionate about Software, fascinated by Molecular Biology.• Over 20 years in the software field
Intel Human Genome Project
LIMs Software Startup
Hunstman Cancer
Institute at the
University of Utah
Original author of GNomEx, Project Lead
University of Utah and The Hunstman Cancer Institute
3 Genomics Cores
Bioinformatics Core
HCI Research Informatics
+ Sys Admin
GNomEx at a Glance
LIMs Order Tracking Workflow Email Notification Results Delivery
Data Repository Analysis Project Center Configurable
Annotations Private to Public
Visibility
SubmitExperiment Workflow
Results Delivery
Automated Billing
Analysis Visualization
For Computationally Intense processes
2 X Faster than Slow Disk
Fast RPM SCSI
Files have short life
Fast Disk
For Large Storage
Infrequent Disk Hits
Files have long life span
Mountable to GNomEx and Analysis Pipeline
Slow Disk
Transferring BIG Data - FDT by CalTech
Pool of directly mapped buffers Data Transfer Socket
Connection& ControlManagement
Pool of directly mapped buffers
Restore MultipleFiles Concurrently
IndependentThreads perDevice
Illumina Data Pipeline
FastQImage Processing FastQFile Splitting
GNomEx Barcode Tags Experiment Info Run Info
Experiment Folders
Images
Sequencing Analysis
• NovalignAlign
• Align around known IndelsRecalibrate
• GATKSNP Indel
• Annovar, VAAST, VarscanAnnotate
• Identify likely ChIP PeaksChip SEQ
• Find differentially expressed genes from transcriptsRNA Seq
Automated Analysis Pipeline
# run novoalign with default parameters #e [email protected]#a A1325@align -g hg19 -i *.txt.gz
#map, recalibrate and call SNP/INDEL w/ GATK@snpindel -g hg19 -i A*.txt.gz#map, recalibrate, call SNP/INDEL, annotate@annot -g hg19 -i control_A*.gz case_B*.gz -vaast -annovar
Simplifies running analyses on cluster
Fully versioned
Customizable
Complicated Data• Configurable
AnnotationsSample
Annotations
• WorkflowExperimental parameters, multiplexing, and protocols
• Links from Experiment to Analysis
Data Genealogy
• TopicsTie many
experiments and analyses
together
Experiment
Protocols
Lanes
Sample
The Data Model
Folder 1
File 1
File 2
File 3
File 4
Folder 2
Files A
File B
Folder 4
File 1
File 2
The File System
Sensitive Data
Physical
Server Room Access
OSServer, File Permissions,Network
ApplicationAuthorization
ApplicationAuthentication
Challenge #2The Demand
① More Researchers
② More Experiments
③ More Samples per
Lane
④ Push for Faster Results
Slower Response Times
Addressing the Bottlenecks
App Server
Data Transfers
File Conversion
s
More Hardware
Faster Authenticatio
n
Apache Tomcat
Workload Balancing
Efficient Database Queries
Offload to Batch
Processing
Thinner Client
GNomEx Image Processing Analysis
How many servers are we talking about?
TomcatFDT
DatabaseServer
File Server
DataPipeline
Analysis
FastDisk
HighPerformance
Clusters
Slow DiskThe Repository
FastDisk
FastDisk
FastDisk
Biggest Bottleneck is….Getting the features implemented and bugs fixed in GNomEx.
Year 1 Year 2 Year 3 Year 4 Year 5 Year 6 Year 70
50
100
150
200
250
300
350
Backlog
Backlog
Different Users, Different
Perspectives
3 Core Facilities
Bioinformatics
Researchers at your Institution
Outside Researchers
Accounting
Three Kinds of Users
SubmitExperiment Workflow
Results Delivery
Automated Billing
Analysis Visualization
SubmitAnnotatePreapprove
AuthorizeRegister
Track
Record
Data Pipeline
ReviewSplitInvoice
Analysis PipelineUploadAnnotateOrganize
LinkOrganizeBrowse
Browse
Download Pay
Rese
arch
er
Core
Bio
info
rmatics
Download
We Don’t Always Speak the Same Language
Core Facility
Bioinformatics
Software Developers
System Admin
JDK
SQL
P-Value
FDR Cluster Nodes
HibernateEclipse
Ant
Case/Control
NICs
NFS
REFS
ImageCopy
Cluster densityMolarity Adapters
5’ vs 3’
CpG Islands
Optical Error
Linux Kernal
Interface
Inheritance
Spike in
But We Share the Same Goal
Deliver clean, beautiful data to the Researcher as quickly as possible…..
Agile ManifestoValue… More Than…
Individuals and Interactions
Processes and Tools
Working software Comprehensive Documentation
Customer Collaboration Contract Negotiation
Responding to Change Following a Plan
In Summary
Data
Demand
People
Housing Big Data requires$ and expertise
System performanceIs multi-faceted
Work towards Shared Understanding. Build a team and process that embraces change.
Plans
Reporting Mobile app for Work lists
Usability, Simplify Interface
Translational Research
Abstracting and Mining Genomic Findings
Parting Thoughts
Privileged to work in this fieldWorking with bright, interesting, fun, and nice peopleIn an area exploding with new advancementsThat will ultimately lead to important scientific discoveries
http://www.sourceforge.net/projects/gnomexhttp://hci-scrum.hci.utah.edu/[email protected]