10

Click here to load reader

Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

Embed Size (px)

Citation preview

Page 1: Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

Next-Generation Sequencing Technologies

Final Project: Discovering Somatic Mutations

in a Breast Cancer Genome

Nicholas E. Navin, Ph.D. MD Anderson Cancer Center

Dept. Genetics Dept. Bioinformatics

Introduction to Bioinformatics GS011062

Page 2: Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

SK-BR-3

Breast cancer cell line isolated in 1970 from a 43-year old female patient at Memorial Sloan-Kettering Used widely in cancer research around the world Genomic mutations that cause this cell line to have a cancerous phenotype are largely unknown Final Project We sequenced the exome of SK-BR-3 at 50X coverage depth using paired-end sequencing on the Illumina HiSeq2000 Each student will receive data from a single chromosome as a BAM file Students will detect variants and annotate mutations to identify possible driver mutations and filter normal germline SNPs Students will write up a research report (3-4 pages) on their findings and submit it by December 14th (no extensions will be given!)

Page 3: Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

Final Project

All variant detection and annotation should be performed on the course server on the UT Health network: SSH File Server IP address: 139.52.107.173 login: first letter of first name + last name (ex. Nicholas Navin = nnavin) password: same Please log into the server from UNIX using SSH and change your password using passwd command User account names and chromosome assignments are listed here: http://www.navinlab.com/bioinfo/bioinfo/final_project.html

Page 4: Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

Files for Final Project

Files are located in this directory on the server: /course/final_project Copy the BAM and BAI file to your home directory cp /course/final_project/bam/chr20.bam ~ cp /course/final_project/bam/chr20.bai ~ Copy the BED files for the exon coordinates and cancer genes to your home directory cp /course/final_project/bed/cancer_genes.hg18.bed ~ cp /course/final_project/bed/hg18_exons.bed ~ * please perform all variant detection and annotation in your home directory and save all files

Page 5: Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

Analysis Software

The following software is installed on the server: -UNIX -Samtools -Genome Anaylsis Toolkit (GATK) using the Unified Genotyper to Call Variants -Varscan2 for variant detection -Annovar (to annotate variants and determine polyphen & sift scores) -Bedtools (to intersect bed files) All applications can be found in the /bioinfo/ folder Additional Web Tools: -COSMIC Database http:// http://www.sanger.ac.uk/genetics/CGP/cosmic/ -Integrated Genome Viewer http://www.broadinstitute.org/igv/

Page 6: Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

Using the Integrated Genome Viewer on Your Local Computer

To use the integrated genome viewer: Copy BAM and BAI files from the server to your local computer with AFP In mac osx finder select Go > Connect to Server afp://139.52.107.173 Select your home folder Copy files to your local computer Launch IGV from the website http://www.broadinstitute.org/software/igv/log-in (register for an account if you do not have one) Select HG18 and load the BAM files from your computer

Page 7: Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

Research Report

Students will write up their results as a research paper: Abstract Introduction Methods Results Required Questions* Discussion Length: 3-4 Pages in length, including any figures or tables The final grade will be based 50% on the research report and 50% on answering the required questions

Page 8: Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

Required Questions

Required Questions (and recommended tools to answer them): 1) How many reads are in the BAM file? What is the read length? (samtools) 2) How many exons are located on your chromosome? (unix) 3) How many cancer genes are located on your chromosome? (unix) 4) How many variants were detected? (GATK, Varscan) 5) How many variants are heterozygous or Homozygous? (annovar) 6) How many variants are transitions and transversions? (annovar) 5) How many somatic variants were detected after filtering with dbSNP129? (annovar) 6) How many germline variants were classified as normal SNPs? (annovar) 7) How many somatic variants are intergenic? Intronic? Synonymous? Nonsynonymous? (annovar) 8) Which mutations are likely to disrupt protein function? (annovar, SIFT, Polyphen) 9) Which mutations are located in cancer genes? (bedtools) 10) Have any nonsynonymous cancer mutations previously been reported in COSMIC? (cosmic)

Page 9: Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

Running Software on the Server

Examples for running variant calling and annotation can be found on the course website: http://www.navinlab.com/bioinfo/bioinfo/final_project.html GATK #for variant detection issue this command in the UNIX terminal java -Xmx1G -jar /bioinfo/GATK2/GenomeAnalysisTKLite.jar -T UnifiedGenotyper -glm BOTH -R /bioinfo/genomes/hg18_all.fa -I /user/nnavin/chr22.bam -o /user/nnavin/chr22.vcf -S SILENT -nt 10 -dt BY_SAMPLE -dcov 2500 -l INFO -mbq 20

Page 10: Final Project Discovering Somatic Mutations Next ... · Next-Generation Sequencing Technologies Final Project: Discovering Somatic Mutations ... Each student will receive data from

Getting Help

Please Contact: Ken Chen [email protected] Nicholas Navin [email protected] Yong Wang [email protected] Please Note: THERE IS ABSOLUTELY NO EXTENSION POSSIBLE FOR THE DECEMBER 14th DEADLINE THE GRADES ARE DUE FOR THE GSBS REGISTRAR DEADLINE LATE PROJECTS WILL GET A ZERO !!!!