Genedata Profiler & iRODS An Open & Collaborative

Preview:

Citation preview

Genedata Profiler & iRODSAn Open & Collaborative Enterprise Software Platform for Patient and Compound Profiling

Marc Flesch, Tamas Rujan

© 2015 Genedata 2Confidential and Proprietary

Genedata – Corporate Snapshot

RootsEstablished in 1997 | Privately owned | Headquartered in Switzerland

Global Reach~ 200 employees | Offices in Europe (Basel, Munich), North America (Boston, San Francisco) & Asia (Tokyo)

Dedicated to Drug Discovery & BiotechnologyInnovative portfolio of enterprise systems increasing productivity of data rich & complex research processes

Domain ExpertiseExperienced Ph.D. level experts coupled with efficient software engineering processes

Marquee Customer BaseLeading pharmaceutical, biotechnology, and other life science organizations

© 2015 Genedata 3Confidential and Proprietary

Customer Base – Pharma

San Francisco

Munich Basel

Tokyo

Boston

25 of Top 25 Pharmasand more …

© 2015 Genedata 4Confidential and Proprietary

Supporting the Patient Profiling Process

Patient cohorts NGS

Responder

Non-responder

Patient stratificationDrug response prediction

ATCTCTTGGCTCCATCATTTAGAGGAAGGAACTGTCAAAACTTGTTGCTTCGGCGGGGCCTGCCGTGGCATCTCTTGGCTCCAGCAGCATCGATGAATCGATACTTCTGAGTCGGATCTCTTGGCTACAACGGATCTCTTCGGATCTCTTGGCTGATGAAGAACGCAG

© 2015 Genedata 5Confidential and Proprietary

Major Challenges of Patient Profiling Process

• Efficiently managing, processing, and analyzing data– Huge & complex datasets containing patient related omics data

– Integrating disease & genomic information from different studies

• Facilitating collaboration within interdisciplinary teams– Enabling easy data, method & result sharing

– Global distribution of data generators & data consumers

• Working with data from human samples in research environments– Ensuring privacy of patient information

– Maintaining chain of custody

6© 2015 Genedata Confidential and Proprietary

“Using data from clinical samples is challenging, because we need to take patient privacy very seriously” *Henrik Seidel, Bayer

Problem Statement

© 2015 Genedata 7Confidential and Proprietary

Data privacy within a global Organization

Illumina SequencerHPC Cluster

… how-to efficiently work with distributed data?

Illumina Sequencer HPC Cluster

User GroupUser Group

User Group

© 2015 Genedata 8Confidential and Proprietary

At Present…

Common technologies applied include

• UNIX file permissions• POSIX Access Control Lists (ACLs)• CIFS Shares (SAMBA)

With the following shortcomings

• UNIX permissions are too simple to model project centric access patterns

• paths on UNIX file systems can’t replace data management systems • permissions have to be maintained manually which is extremely

cumbersome• ACLs are hard to manage• distributed storage problem stays unresolved

Our Solution

© 2015 Genedata 10Confidential and Proprietary

Marrying Security with Performance

HPC

InputData

CacheCopy

TempResults

ResultData

ComputeCluster

© 2015 Genedata 11Confidential and Proprietary

RNA-Seq Data-Processing Pipeline

© 2015 Genedata 12Confidential and Proprietary

and Interaction Points with

© 2015 Genedata 13Confidential and Proprietary

Profiler

Chain-of-Custody

rna1_1.fq

rna1_2.fq

rna2_1.fq

rna2_2.fq

rna3_1.fq

rna3_2.fq

rna4_1.fq

rna4_2.fqTina

Alice

Bob

Joe

Tina

Joe

Bob

sequencealignment

RNAquantifi-

cation

dataexport

Alice

© 2015 Genedata 14Confidential and Proprietary

Enabling Intuitive Raw Data Management

1. Visualization of clinical sample annotation together with corresponding raw data

2. Flexible search functionalities across the whole database

3. Powerful annotation curation capabilities including bulk editing and annotation information protection

© 2015 Genedata 15Confidential and Proprietary

Marrying Raw Data with Sample Annotation

Sample AnnotationRaw Data

© 2015 Genedata 16Confidential and Proprietary

Providing ‘Google-Like’ Search

search result

complex search

© 2015 Genedata 17Confidential and Proprietary

Sample Annotation Curation

locked downattribute

multiple valuesincluding units

browse sequence

© 2015 Genedata 18Confidential and Proprietary

Summary

• The smooth integration of Genedata Profiler with iRODS enables scientists to preserve their research eco-system when working with confidential data

• Genedata Profiler’s data processing and management capabilities together with iRODS’ metadata and security concepts are a unique combination to establish the chain-of-custody for analyzing personalized medicine data

Recommended