Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

  • View
    124

  • Download
    2

  • Category

    Science

Preview:

Citation preview

SCALABLE, COLLABORATIVE, REPRODUCIBLE, AND EXTENSIBLE ANALYSIS OF TCGA DATA IN THE

CLOUDBrandi Davis-Dusenbery, PhD

AACRApril 18, 2016

DISCLOSURE & FUNDING

This project has been funded in whole or in part with Federal funds from the National

Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400008C.

I am an employee of Seven Bridges

GUIDING PRINCIPLES

Making data available isn’t

enough to make it usable

The best science happens in

teams

Reproducibility shouldn’t be

hard

The impact of TCGA is

extended by new data & tools

MAKING DATA AVAILABLE

ISN’T ENOUGH TO

MAKE IT USABLE

THE CGC ALLOWS YOU TO ACCESS MORE THAN 1PB OF MULTIDIMENSIONAL -OMICS DATA.

multiple Samples per Case

Primary Tumor

Solid Tissue NormalBlood Derived Normal

Metastatic… …

multiple Analyses per Sample

Genomic Transcriptomic

Proteomic Epigenomic

… …

Open Data Controlled Data

EXPLORE THE DATASET…

… AND THEN IMMEDIATELY RUN AN ANALYSIS.

THE BEST SCIENCE

HAPPENS IN TEAMS

SECURE AND COMPLIANT PROJECT MEMBERSHIP

• Projects serve as isolated workspaces for your data and tools.

• Fine-grained permissions give you control over who can see and use your assets.

• TCGA Controlled data projects access limited to only Authorized users.

RICH COMMUNICATION & EFFECTIVE COLLABORATION

Project descriptions, conversations, and realtime notifications keep everyone on the same page.

REPRODUCIBILITY SHOULDN’T BE

HARD

The inputs, outputs, and parameters as well of the

precise tool versions (including dependencies!)

are always linked and available for reference days

or months later.

EACH TASK IS REPRODUCIBLE & REMEMBERABLE

• Even the most complex workflows are captured as small run-able text files.

• Easy to share and save.

… AND SELF CONTAINED

THE IMPACT OF TCGA IS

EXTENDED BY NEW DATA &

TOOLS

• Graphical uploader

• Command Line uploader

• FTP / HTTP

• API

FOUR WAYS TO ADD YOUR OWN DATA

~40 properties in visual interface, unlimited custom properties via API.

EASILY ANNOTATE UPLOADED DATA TO MAKE IT EASIER TO FIND LATER

AS THE AMOUNT OF DATA HAS GROWN, SO TOO HAS THE NUMBER OF

TOOLS AVAILABLE TO ANALYZE IT

-omics data analysis tools* (each with many versions)

50+ used in a single TCGA marker paper

11,160

*omictools.com

DOCKER + CWL MAKES IT EASY TO PUT THESE TOOLS ON THE CGC …

AND OTHER PLACES

+

DEFINE THE TOOL, INPUTS, OUTPUTS AND PARAMETERS

ADD YOUR TOOL TO 100’S OF EXISTING TOOLS TO CREATE A WORKFLOW

WWW.CANCERGENOMICSCLOUD.ORG

MORE THAN $1M IN COMPUTE AND STORAGE CREDITS AVAILABLE FOR

YOU TO USETiered model allows everyone to access up to $1,600

(~ enough to do whole exome analysis of all pancreatic carcinoma samples)

Request up to $10,000 credits for large collaborative projects (Graduate students and Post-docs are particularly

encouraged to submit a request)

NEARLY 500 RESEARCHERS ARE USING THE CGC TODAY …

Early Adopter

Open Release

WWW.CANCERGENOMICSCLOUD.ORG

… JOIN THEM

Booth 452 Networking event

WWW.CANCERGENOMICSCLOUD.ORG

THANK YOU

This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No.

HHSN261201400008C.

Recommended