Upload
genome-quebec
View
294
Download
0
Embed Size (px)
Citation preview
Provided to you by theCanadian Bioinformatics
Workshop series
www.bioinformatics.ca
NCRI Cancer Conference:Cancer data and its analysis
practical workshopNovember 1, 2015
2Module #: Title of Module
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
You are free to:Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:You attribute the work to its author and respect the rights and licenses associated with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Slides are on slideshare.net• http://www.slideshare.net/bffo/cancer-uk-
2015module1ouellettever02
Module 1Cancer genomic databases
B.F. Francis OuelletteAssociate Director, Informatics and Biocomputing
Ontario Institute for Cancer ResearchNovember 1, 2015
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Schedule for Module 1:Cancer Genomic Databases
• Introduction to the Canadian Bioinformatics Workshop series.
• The Databases: – The Cancer Genome Atlas (TCGA)– The International Cancer Genome Consortium
(ICGC)• Data Access: human genomes and security
and privacy issues: Open Data vs. Controlled Access
data• Another Database:
– The Catalogue of Somatic Mutations in Cancer (COSMIC)
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
http://bioinformatics.ca/
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
http://bioinformatics.ca/workshops/2015/bioinformatics-cancer-genomics-2015
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Workshops planned for 2016: http://bioinformatics.ca/workshops
1. Bioinformatics for Cancer Genomics2. High-throughput Biology: From Sequence to Networks (2017 -
CSHL)3. Introduction to R4. Exploratory Analysis of Biological Data using R5. Informatics for RNA-sequence Analysis6. Informatics on High Throughput Sequencing Data7. Pathway and Network Analysis of -omics Data8. Informatics and Statistics for Metabolomics9. Analysis of Metagenomic Data10. How to Work in the Cloud: Computing on Human
Genome Data11. Epigenomic Data Analysis12. Big Data in Precision Genomics
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
http://bioinformatics.ca/workshops/2015
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
E-mail: [email protected]
Web: http://bioinformatics.ca
Workshop announcement mailing list:
http://bioinformatics.ca/mailman/listinfo/announce
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Soap-Box time!• Open Access, Open Data and Open Source are essential for good
Science.• Openness is a responsibility, an obligation, and something that
comes with the privilege of doing publicly funded work.
Open AccessOpen Source
Open Data
Opencourseware
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Cancer therapy is like beating the dog with a stick to get rid of his fleas.
- Anna Deavere Smith, Let me down easy
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
http://goo.gl/Yhbsj
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
The revolution in cancer research can summed up in a single sentence: cancer is in essence,a genetic disease.
- Bert Vogelstein
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Cancer: a Disease of the Genome
Challenge in Treating Cancer: Every tumour is different Every cancer patient is different
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
https://en.wikipedia.org/wiki/List_of_databases_for_oncogenomic_research
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Papers (PMID)– TCGA: 24071849 21720365
23000897 22960745 22810696
24476821– ICGC: 20393554– COSMIC: 25355519– Data Access: 22807659
http://www.ncbi.nlm.nih.gov/pubmed/[PMID]
NCRI Workshop 2015 – Module 1 bioinformatics.ca
TCGAThe Cancer Genome Atlas is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
About the TCGA• National Cancer Institute (NCI)• National Human Genome Research Institute
(NHGRI)• Phased Structure:
– Three-year pilot in 2006 with an investment of $50 million from each
– TCGA will collect and characterize more than 20 additional tumour types
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Where to start with the TCGA?
Wiki: https://wiki.nci.nih.gov/display/TCGA/About+TCGA
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Division of Labour• Biospecimen Core Resource (BCR)
– centre where samples are carefully catalogued, processed, qualitychecked and stored along with participant clinical information
• Genome Sequencing Centre (GSC)– uses high-throughput methods to identify changes to DNA
sequences that are associated with specific cancer types• Genome Characterization Centre (GCC)
– uses high-throughput technologies to analyze genomic changes involved in cancer
• Genome Data Analysis Centre (GDAC)– provides novel informatics tools to the research community– provides analysis results using TCGA data.
• Data Coordinating Centre (DCC)– Central provider of TCGA data.– Standardizes data formats and validates submitted data.
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
TCGA Data• Sequence reads from newer sequencing
technologies are available at the Cancer Genome Hub: https://cghub.ucsc.edu/
• Higher level sequence data (variation calls and abundance measures) are available at the TCGA Portal: http://cancergenome.nih.gov/
• Also integrated with ICGC data (more on this later)
NCRI Workshop 2015 – Module 1 bioinformatics.ca
TCGA data flow
http://goo.gl/b5nojx
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Data Coordinating Centre• Play a central role
– Receiving data from BCR, GSC and GCC sites– Providing access to users– Performing analysis of data
• Responsibilities:– Protecting participant privacy and confidentiality– Developing data standards and controlled
vocabularies– Establishing informatics pipelines for data flow– Developing new analytical and visualization
technologies to facilitate data analysis, for all audiences
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
TCGA DCC Data Portal• Provides a platform to search, download and
analyze TCGA data sets• Two data access tiers: Open and Controlled• Analytic tools include: Cancer Molecular
Analysis and Cancer Genome Workbench (NCBIB), Integrative Genomics Viewer (Broad) and CancerGenomics Analysis (MSKCC).
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
TCGA Data Browserhttps://tcga-data.nci.nih.gov/tcga/
Query TCGAdata onlineusing theTCGA DataBrowser
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
The International Cancer Genome Consortium (ICGC)
• http://www.icgc.org/
• “ICGC was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe”
NCRI Workshop 2015 – Module 1 bioinformatics.ca
ICGC Map – February 201585 projects launched
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
ICGC datasets to date: https://dcc.icgc.org/projects/history
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
Select “Pancreatic cancer – Canada”
NCRI Workshop 2015 – Module 1 bioinformatics.ca
… But where is the data?
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
http://dcc.icgc.org/
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
DACO
ICGC
dbGaP
EGA
TCGA
BAM
Open
Open
ERA
BAM
Germ
Line
+ EGA id
BAMBA
M
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
ICGCBAM/FASTQ
TCGABAM/FASTQ
ICGCOpenData
(includes TCGA
Open Data)
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
ICGC
TCGA
NCRI Workshop 2015 – Module 1 bioinformatics.ca
ICGC
TCGA
Differences between ICGC & TCGA• Different tumour types• Different geographic rules• Many countries vs one jurisdiction• Different definitions of what is controlled• Different data access rules
NCRI Workshop 2015 – Module 1 bioinformatics.ca
• Detailed Phenotype and Outcome data
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files
• Germ line variants
ICGC Controlled Access Datasets
• Cancer Pathology Histologic type or subtypeHistologic nuclear grade• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up• Gene Expression (normalized)• DNA methylation • Computed Copy Number and
Loss of Heterozygosity• Somatic variants from Exome or WGS
ICGC OpenAccess Datasets
http://goo.gl/w4mrV
NCRI Workshop 2015 – Module 1 bioinformatics.ca
• Primary sequence data (BAM and FASTQ files)
• SNP6 array level 1 and level 2 data• Exon array level 1 and level 2 data• Somatic variants from whole
genome sequencing• Certain information in MAFs• A full list of controlled-access
data types can be found at: http://goo.gl/K1h7zu
TCGA Controlled Access Datasets
• De-identified clinical and demographic data
• Gene expression data• Copy number alterations in regions
of the genome• Epigenetic data• Summaries of data compiled across
individuals• Anonymized single amplicon DNA
sequence data• Somatic variants from scrubbed
exome sequencing
TCGA OpenAccess Datasets
http://goo.gl/A1rMRB
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
TCGA/ICGC users agreed:• … to keep all computer systems on which
controlled access data reside, or which provide access to such data, up to date with respect to software and security patches.
• … to protect Controlled Access Data against disclosure to unauthorized individuals.
• … to monitor and control which individuals have access to Controlled Access Data.
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
TCGA/ICGC users agreed:• … to destroy all copies of controlled access
data after controlled access privileges expires.
• ... to only use secure transfer protocols: e.g. https and sftp
• … to encrypt Controlled Access data in transfers and storage
NCRI Workshop 2015 – Module 1 bioinformatics.ca
What does it mean for this file?
simple_somatic_mutation.aggregated.vcf.gzhttps://dcc.icgc.org/repository/icgc/release_19/Summary
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
Identify yourself
Fill out detail form which includes:• Contact and Project
Information• Information
Technology details and procedures for keeping data secure
• Data Access Agreement
All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf
NCRI Workshop 2015 – Module 1 bioinformatics.ca
http://icgc.org/daco
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
• Name• Institution• Title of Project• Collaborators• Research Summary• Lay Summary• Ethics
• IT Security
• Cloud Storage
• Agreement• Appendices
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
http://goo.gl/2UVLDJ
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
NCRI Workshop 2015 – Module 1 bioinformatics.ca
DACO approved projects
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
DACO/DCC User Data Access Process
• Users approved through DACO are now automatically granted access to ICGC controlled access datasets available through the ICGC Data Portal and the EBI’s EGA repository
DACO Web Application
DCC User Registry
DCC Data Portal
EBI EGA
application approvedby DACO
user accounts activated
NCRI Workshop 2015 – Module 1 bioinformatics.ca
Catalogue of Somatic Mutations in Cancer (COSMIC)
• http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/
• COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.
ICGCBAM/FASTQ
TCGABAM/FASTQ
ICGCOpenData
(includes TCGA
Open Data)
COSMICOpen Data
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
COSMIC• Somatic Mutations Only• Diverse sources
– Literature (Arrays, Next-Gen, PCR...)– TCGA– ICGC
• Diverse ways to look at data– Gene– Variation– Tumour type– Cell line– Experiment
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
FAQ
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
Looking up your favorite gene
1 2 3
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
bioinformatics.ca
NCRI Workshop 2015
NCRI Workshop 2015 – Module 1
In closing• Remember all these sites have great amounts
of documentation• The field is changing quickly, and so are the
portals. • New features are planned as we speak, and so
you need to use the sites, and keep coming back.
• Don’t be afraid to explore• Interested in learning more after today?
Consider one of the bioinformatics.ca workshops!
NCRI Workshop 2015 – Module 1 bioinformatics.ca
Acknowledgements: the CBW gang
Michelle Brazas
MichaelStromberg
MarcFiume
MichaelBrudno