72
Provided to you by the Canadian Bioinformatics Workshop series www.bioinformatics.c a NCRI Cancer Conference: Cancer data and its analysis practical workshop November 1, 2015

Cancer uk 2015_module1_ouellette_ver02

Embed Size (px)

Citation preview

Page 1: Cancer uk 2015_module1_ouellette_ver02

Provided to you by theCanadian Bioinformatics

Workshop series

www.bioinformatics.ca

NCRI Cancer Conference:Cancer data and its analysis

practical workshopNovember 1, 2015

Page 2: Cancer uk 2015_module1_ouellette_ver02

2Module #: Title of Module

Page 3: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

You are free to:Copy, share, adapt, or re-mix;

Photograph, film, or broadcast;

Blog, live-blog, or post video of;

This presentation. Provided that:You attribute the work to its author and respect the rights and licenses associated with its components.

Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites

Page 4: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Slides are on slideshare.net• http://www.slideshare.net/bffo/cancer-uk-

2015module1ouellettever02

Page 5: Cancer uk 2015_module1_ouellette_ver02

Module 1Cancer genomic databases

B.F. Francis OuelletteAssociate Director, Informatics and Biocomputing

Ontario Institute for Cancer ResearchNovember 1, 2015

Page 6: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

@bffo

[email protected]

Page 7: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Schedule for Module 1:Cancer Genomic Databases

• Introduction to the Canadian Bioinformatics Workshop series.

• The Databases: – The Cancer Genome Atlas (TCGA)– The International Cancer Genome Consortium

(ICGC)• Data Access: human genomes and security

and privacy issues: Open Data vs. Controlled Access

data• Another Database:

– The Catalogue of Somatic Mutations in Cancer (COSMIC)

Page 8: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

http://bioinformatics.ca/

Page 9: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

http://bioinformatics.ca/workshops/2015/bioinformatics-cancer-genomics-2015

Page 10: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Workshops planned for 2016: http://bioinformatics.ca/workshops

1. Bioinformatics for Cancer Genomics2. High-throughput Biology: From Sequence to Networks (2017 -

CSHL)3. Introduction to R4. Exploratory Analysis of Biological Data using R5. Informatics for RNA-sequence Analysis6. Informatics on High Throughput Sequencing Data7. Pathway and Network Analysis of -omics Data8. Informatics and Statistics for Metabolomics9. Analysis of Metagenomic Data10. How to Work in the Cloud: Computing on Human

Genome Data11. Epigenomic Data Analysis12. Big Data in Precision Genomics

Page 11: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

http://bioinformatics.ca/workshops/2015

Page 12: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

E-mail: [email protected]

Web: http://bioinformatics.ca

Workshop announcement mailing list:

http://bioinformatics.ca/mailman/listinfo/announce

Page 13: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Soap-Box time!• Open Access, Open Data and Open Source are essential for good

Science.• Openness is a responsibility, an obligation, and something that

comes with the privilege of doing publicly funded work.

Open AccessOpen Source

Open Data

Opencourseware

Page 14: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Page 15: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Cancer therapy is like beating the dog with a stick to get rid of his fleas.

- Anna Deavere Smith, Let me down easy

Page 16: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

http://goo.gl/Yhbsj

Page 17: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

The revolution in cancer research can summed up in a single sentence: cancer is in essence,a genetic disease.

- Bert Vogelstein

Page 18: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Cancer: a Disease of the Genome

Challenge in Treating Cancer: Every tumour is different Every cancer patient is different

Page 19: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

https://en.wikipedia.org/wiki/List_of_databases_for_oncogenomic_research

Page 20: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Page 21: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Papers (PMID)– TCGA: 24071849 21720365

23000897 22960745 22810696

24476821– ICGC: 20393554– COSMIC: 25355519– Data Access: 22807659

http://www.ncbi.nlm.nih.gov/pubmed/[PMID]

Page 22: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

TCGAThe Cancer Genome Atlas is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.

Page 23: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

About the TCGA• National Cancer Institute (NCI)• National Human Genome Research Institute

(NHGRI)• Phased Structure:

– Three-year pilot in 2006 with an investment of $50 million from each

– TCGA will collect and characterize more than 20 additional tumour types

Page 24: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Where to start with the TCGA?

Wiki: https://wiki.nci.nih.gov/display/TCGA/About+TCGA

Page 25: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Division of Labour• Biospecimen Core Resource (BCR)

– centre where samples are carefully catalogued, processed, qualitychecked and stored along with participant clinical information

• Genome Sequencing Centre (GSC)– uses high-throughput methods to identify changes to DNA

sequences that are associated with specific cancer types• Genome Characterization Centre (GCC)

– uses high-throughput technologies to analyze genomic changes involved in cancer

• Genome Data Analysis Centre (GDAC)– provides novel informatics tools to the research community– provides analysis results using TCGA data.

• Data Coordinating Centre (DCC)– Central provider of TCGA data.– Standardizes data formats and validates submitted data.

Page 26: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

TCGA Data• Sequence reads from newer sequencing

technologies are available at the Cancer Genome Hub: https://cghub.ucsc.edu/

• Higher level sequence data (variation calls and abundance measures) are available at the TCGA Portal: http://cancergenome.nih.gov/

• Also integrated with ICGC data (more on this later)

Page 27: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

TCGA data flow

http://goo.gl/b5nojx

Page 28: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Data Coordinating Centre• Play a central role

– Receiving data from BCR, GSC and GCC sites– Providing access to users– Performing analysis of data

• Responsibilities:– Protecting participant privacy and confidentiality– Developing data standards and controlled

vocabularies– Establishing informatics pipelines for data flow– Developing new analytical and visualization

technologies to facilitate data analysis, for all audiences

Page 29: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

TCGA DCC Data Portal• Provides a platform to search, download and

analyze TCGA data sets• Two data access tiers: Open and Controlled• Analytic tools include: Cancer Molecular

Analysis and Cancer Genome Workbench (NCBIB), Integrative Genomics Viewer (Broad) and CancerGenomics Analysis (MSKCC).

Page 30: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

TCGA Data Browserhttps://tcga-data.nci.nih.gov/tcga/

Query TCGAdata onlineusing theTCGA DataBrowser

Page 31: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

The International Cancer Genome Consortium (ICGC)

• http://www.icgc.org/

• “ICGC was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe”

Page 32: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

ICGC Map – February 201585 projects launched

Page 33: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

ICGC datasets to date: https://dcc.icgc.org/projects/history

Page 34: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 35: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 36: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Select “Pancreatic cancer – Canada”

Page 37: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

… But where is the data?

Page 38: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 39: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

http://dcc.icgc.org/

Page 40: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

DACO

ICGC

dbGaP

EGA

TCGA

BAM

Open

Open

ERA

BAM

Germ

Line

+ EGA id

BAMBA

M

Page 41: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

ICGCBAM/FASTQ

TCGABAM/FASTQ

ICGCOpenData

(includes TCGA

Open Data)

Page 42: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

ICGC

TCGA

Page 43: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

ICGC

TCGA

Differences between ICGC & TCGA• Different tumour types• Different geographic rules• Many countries vs one jurisdiction• Different definitions of what is controlled• Different data access rules

Page 44: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

• Detailed Phenotype and Outcome data

• Gene Expression (probe-level data)

• Raw genotype calls

• Gene-sample identifier links

• Genome sequence files

• Germ line variants

ICGC Controlled Access Datasets

• Cancer Pathology Histologic type or subtypeHistologic nuclear grade• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up• Gene Expression (normalized)• DNA methylation • Computed Copy Number and

Loss of Heterozygosity• Somatic variants from Exome or WGS

ICGC OpenAccess Datasets

http://goo.gl/w4mrV

Page 45: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

• Primary sequence data (BAM and FASTQ files)

• SNP6 array level 1 and level 2 data• Exon array level 1 and level 2 data• Somatic variants from whole

genome sequencing• Certain information in MAFs• A full list of controlled-access

data types can be found at: http://goo.gl/K1h7zu

TCGA Controlled Access Datasets

• De-identified clinical and demographic data

• Gene expression data• Copy number alterations in regions

of the genome• Epigenetic data• Summaries of data compiled across

individuals• Anonymized single amplicon DNA

sequence data• Somatic variants from scrubbed

exome sequencing

TCGA OpenAccess Datasets

http://goo.gl/A1rMRB

Page 46: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

TCGA/ICGC users agreed:• … to keep all computer systems on which

controlled access data reside, or which provide access to such data, up to date with respect to software and security patches.

• … to protect Controlled Access Data against disclosure to unauthorized individuals. 

• … to monitor and control which individuals have access to Controlled Access Data. 

Page 47: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

TCGA/ICGC users agreed:• … to destroy all copies of controlled access

data after controlled access privileges expires. 

• ... to only use secure transfer protocols: e.g. https and sftp

• … to encrypt Controlled Access data in transfers and storage

Page 48: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

What does it mean for this file?

simple_somatic_mutation.aggregated.vcf.gzhttps://dcc.icgc.org/repository/icgc/release_19/Summary

Page 49: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 50: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 51: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Identify yourself

Fill out detail form which includes:• Contact and Project

Information• Information

Technology details and procedures for keeping data secure

• Data Access Agreement

All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf

Page 52: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

http://icgc.org/daco

Page 53: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 54: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 55: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 56: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

• Name• Institution• Title of Project• Collaborators• Research Summary• Lay Summary• Ethics

• IT Security

• Cloud Storage

• Agreement• Appendices

Page 57: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 58: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

http://goo.gl/2UVLDJ

Page 59: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 60: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 61: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Page 62: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

DACO approved projects

Page 63: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

DACO/DCC User Data Access Process

• Users approved through DACO are now automatically granted access to ICGC controlled access datasets available through the ICGC Data Portal and the EBI’s EGA repository

DACO Web Application

DCC User Registry

DCC Data Portal

EBI EGA

application approvedby DACO

user accounts activated

Page 64: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Catalogue of Somatic Mutations in Cancer (COSMIC)

• http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/

• COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.

Page 65: Cancer uk 2015_module1_ouellette_ver02

ICGCBAM/FASTQ

TCGABAM/FASTQ

ICGCOpenData

(includes TCGA

Open Data)

COSMICOpen Data

Page 66: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

COSMIC• Somatic Mutations Only• Diverse sources

– Literature (Arrays, Next-Gen, PCR...)– TCGA– ICGC

• Diverse ways to look at data– Gene– Variation– Tumour type– Cell line– Experiment

Page 67: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

FAQ

Page 68: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Looking up your favorite gene

1 2 3

Page 69: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Page 70: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

Page 71: Cancer uk 2015_module1_ouellette_ver02

bioinformatics.ca

NCRI Workshop 2015

NCRI Workshop 2015 – Module 1

In closing• Remember all these sites have great amounts

of documentation• The field is changing quickly, and so are the

portals. • New features are planned as we speak, and so

you need to use the sites, and keep coming back.

• Don’t be afraid to explore• Interested in learning more after today?

Consider one of the bioinformatics.ca workshops!

Page 72: Cancer uk 2015_module1_ouellette_ver02

NCRI Workshop 2015 – Module 1 bioinformatics.ca

Acknowledgements: the CBW gang

Michelle Brazas

MichaelStromberg

MarcFiume

MichaelBrudno