69
Bioinforma)csinaBox 04/18/15 Vermont Gene)cs Network Professional Development Event Pomeroy Alumni Center, St. Michael’s College Colchester, VT Faye D. Schilkey Na/onal Center for Genome Resources

Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinforma)cs-­‐in-­‐a-­‐Box  04/18/15  

 Vermont  Gene)cs  Network  

 Professional  Development  Event  

Pomeroy  Alumni  Center,  St.  Michael’s  College  Colchester,  VT  

Faye  D.  Schilkey  Na/onal  Center  for  Genome  Resources  

Page 2: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

NCGR: National Center for Genome Resources

n  Not-for-profit research organization n  Formed: 1994 in Santa Fe, NM

n  Expertise: Bioinformatics (21 yrs) and Next Gen Sequencing (8 yrs)

n  Applies bioinformatics, software engineering and next-generation sequencing to solve the -omic challenges of 21st century

n  Collaborative research and services

Page 3: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Faye D. Schilkey, BS Computer Engineering

n  First Career: Software engineering in automotive (robotics) and aerospace (guidance and autopilot systems).

n  Second Career (Big Data): n  IT/Software Engineering/Database Development in Genomics/

Bioinformatics (> 15 yrs) n  Genome Sequencing Center Operations & Services (8 yrs) n  Director, NM INBRE Bioinformatics Core (9 yrs) n  Director, NM INBRE Sequencing & Bioinformatics Core (8 yrs) n  Founding Steering Committee Member of Network of IDeA-

funded Core Laboratories NICL (6 yrs)

Page 4: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Agenda n  NCGR

n  NM-INBRE Sequencing & Bioinformatics Core (SBC) and IDeA research advancement

n  Sequencing and bioinformatics technologies

n  Bioinformatics-in-a-Box

n  Collaboration/education avenues n  Summer Bioinformatics Intensive Internship

n  NM Bioinformatics, Science and Technology (NMBIST) conference

n  Sequencing and bioinformatics project ideas

n  Conclusion and discussion

Page 5: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Research at NCGR

u Focus u Human health and nutrition u Plant science u > 200 publications

AJ Brass Foundation

Page 6: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Human Health Research Publications at NCGR

­  Dengue virus infection (Virology 2015)

­  Vibrio cholerae (Genomics Discovery 2014)

­  Guinea Pig (Genome Announc 2013)

­  Eyeless Hedgehog (PLoS One 2012)

­  Carrier Screening (Beyond Batten - Sci Transl Med 2011 )

­  Multiple Sclerosis (Twins study – Nature April 29, 2010 cover)

­  Sepsis (J Clin Microbiol. 2010)

­  Korean Genome (Nature 2009)

­  Mesothelioma (Proc Natl Acad Sci 2008)

­  Schizophrenia (PLos One 2008)

Page 7: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

•  Medicago truncatula (Barrel clover) HapMap (500Mb) –  Cornell,  UVM,  JCVI,  NSF,  UCSC,  INRA-­‐Montpellier,  ENSAT-­‐Toulouse,  Boyce  Thompson  Inst.  –  Samuel  Roberts  Noble  Foundation  

•  Medicago sativa (Alfalfa) Genome (860Mb) –  Samuel  Roberts  Noble  Foundation  

•  Theobroma Cacao (Chocolate) Genome (330Mb) –  USDA-­‐ARS  &  Mars,  Inc.,  Washington  State  University,  JGI,  USDA-­‐ARS,  IBM,  PIPRA,  CUGI  

•  Glycine Max (Soybean) (1  Gbp)  and  Zea  Mays  (Maize)  (2Gb)  Genetic Diversity  –  Syngenta    

•  Sorghum Transcriptome –  USDA-­‐ARS  

•  Gossypium arboreum (Cotton) Genome (1.7  Gbps) –  Texas  Tech  University  &  Bayer  Crop  Sciences  

•  Phytophthora capsici (100  Mbps)  –  Univ.  of  Tennessee,  Ohio  State  Univ.,  USDA/NSF    

•  Legume Disease Resistance –  Na/onal  Science  Founda/on,  University  of  California  –  Davis  

Plant/Animal/Fungus/Bacteria Science

Page 8: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

•  Chickpea & Pigeon Pea Diversity –  CIMMYT  -­‐  Genera/on  Challenge  Program,  ICRISAT  

•  Andean Birds (Hummingbird) Transcriptome (1 Gbp)

–  UNM,  NSF  

•  Green Microalga (85 Mbp) and Diatom strain RGd-1 (25 Mbp) Genomes –  Center  for  Biofilm  Engineering,  Montana  State  University  

•  Staphylococcus aureus strains (3 Mbp) –  NMSU,  OSU,  NIH,  NM-­‐INBRE    

•  Burkholderia glumae (rice blight) genome (7.3 Mbp) –  Louisiana  State  University  

•  Bacteroides xylanisolvens strains (6 Mbp) –  USDA-­‐ARS,  DARPA,  Vital  Probes  

•  Polaromonas sp . Strain CG9_12 (pollutant degradation) Genome (5 Mbp) –  Center  for  Biofilm  Engineering,  Montana  State  University  

       •  Kibdelosporangium sp. MJ126-NF4 (Actinobacteria having natural products: anti-bacteria/viral/cancer)

Genome (11  Mbps)  –  UNM  

Plant/Animal/Fungus/Bacteria Science

Page 9: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

NM-INBRE Sequencing and Bioinformatics Core (SBC) research advancement

0 2 4 6 8 10 12 14 16 18 20

2008

2009

2010

2011

2012

2013

2014

Number of projects, pubs, and grants

Year

Serving to date > 160 researchers/postdocs/students

Pubs in press (31)

Grants Awarded/Continued (30)

Projects (66)

Page 10: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

NM INBRE SBC Collaborations > 65 projects

23

1

3

4

2 3

2

1

1

1

1

1

1

2

1 2

2

15

INBRE 40 9 HHMI-SEA Phage INBRE 17 2014 -2015: 2008-2013:

Page 11: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

•  Dr. Charles “Chad” Melancon - "De Novo Genome Sequencing of Novel Bacterial Isolates from Cave Environments." - UNM

•  Dr. Douglas J. Perkins - “Discovery of Genetic Biomarkers for Severe Malaria” - UNM

•  Dr. Rebecca A. Reiss - “Nanoinformatics: Characterizing Cell Proliferation on Nanostructured Titanium” - NM Institute of Mining & Tech

•  Dr. Travis R. Robbins - “Comparing genomic variation caused by invasion of a novel threat versus geographic separation of populations” - NNMC

•  Dr. Alvaro Romero - “Study of transcriptional changes upon dengue virus infection in the Asian tiger mosquito, Aedes albopictus” - NMSU

•  Dr. Hitoshi Tsujimoto - “Study of transcriptional changes upon dengue virus infection in the Asian tiger mosquito, Aedes albopictus” - NMSU

•  Drs. Ben Wheaton & Rob Miller - “The role of the immune system in spinal cord injury and recovery.” UNM

•  Dr. Tim Wright - “Genomic Approaches to Detecting Evolutionary Responses in Biological Invaders " - NMSU

2014-2015 pilot awardees

Page 12: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

•  Dr. Colleen Fordyce - “Cellular pH during carcinogenesis and how pH can be exploited for therapeutic benefit” - UNM

•  Dr. Michael Franklin - “Epigentics of Pseudomonas aeruginosa during biofilm growth” - Montana State

•  Dr. Kathryn A. Hanley - “Quasispecies Dynamics of West Nile Virus in Avian Reservoir Hosts” -NMSU

•  Dr. Zoe Harrold - “Fire and Ice: metagenomic investigations of a unique sub----‐glacial ice cave system” - Montana State

•  Dr. Mario Izaquierre-Sierra - “Transposable Element Regulation in Land Plants: Arabidopsis coilin and Cajal bodies, a case study.” - NNMC

•  Dr. Thomas L. Kieft - “Metagenomic Sequencing of U-Contaminated Soils and Sediments” - NMTech

•  Dr. Samuel A. Lee - “Illumina RNA-seq expression analysis of cranberry-derived proanthocyanidins for the prevention of Candida albicans urinary biofilms” – UNM

•  Dr. Nora Perrone-Bizzozero - “Identification of KSRP neuronal RNA targets by RIP-Seq” - UNM

•  Dr. Giancarlo Lopez-Martinez - “The transcriptomics of low-oxygen hormesis and irradiation: What drives the strong organismal performance improvement?” - NMSU

2014-2015 pilot awardees (cont.)

Page 13: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Next Gen Sequencing Applications

Digital Transcript Expression Small RNA

Discovery & Expression

ChIP-SEQ

Genome Structural Variation

Mutation Frequencies

DNAse1 HS Sites

Genetic Association

De novo genome Sequencing

DNA Methylation

Metagenomics

Exome Sequencing

Splice Isoform Abundance

Page 14: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

SBC technologies accelerate IDeA research: Sequencing

Illumina HiSeq2000: •  RNA, DNA, microRNA, and ChIP seq •  1x and 2x 50/100bp read lengths, ~300Gb yield/10-day run

PacBio RS II: Single Molecule Real-Time observation of DNA synthesis •  No PCR bias, faster and accurate P6 polymerase •  ~8000bp average read lengths •  > 40kb read lengths •  > 500Mb per v3 SMRT Cell •  8-16Gb yield per 16 cell run in 48 hours •  DNA, De novo assembly, Base modification detection •  IsoSeq: Determine the transcript landscape of your organism by sequencing

full-length transcripts and gene isoforms. No assembly required!

Page 15: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Why Sequence mRNA? 1.  Cost Effective: Transcriptome ≈ 2% Genome

2.  Biologically relevant – active in affected cell or tissue

3.  Enables genomic congruence analysis (gene expression, isoform usage and non-synonymous variant information

4.  Identifies mutations that are not apparent by genome sequencing (epigenetic silencing, RNA editing, allele-specific expression)

Page 16: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Drew Sheneman, New Jersey -- The Newark Star Ledger

de novo Assembly

Page 17: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

2) Custom bioinformatics for de novo/hybrid assemblies, ChIP, metagenomics, etc.

1) New simple bioinformatics tool for “biologists”

Focused on the most popular Next Gen Sequencing experiments:

•  RNA-Seq (expression analysis) •  DNA-Seq (mutation detection) •  microRNA seq analysis

Bioinformatics

Page 18: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

RNA-Seq Analysis What’s involved?

Page 19: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

QUALITY CHECK TOOLS

n  FastQC: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ n  Evaluate data quality based on several benchmarks (seq

quality, GC content) n  Easy to read report n  Important to verify that the samples have consistent quality

n  BLAST:

http://www.ncbi.nlm.nih.gov/books/NBK1762/ n  Verify species

Bioinformatician obtains data and downloads, installs, updates and runs…

Page 20: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

TOOLS TO ALIGN/MAP READS TO GENOME Popular alignment algorithm n  Tophat 2.0 http://ccb.jhu.edu/software/tophat/index.shtml

n  Tophat 1.2/1.3

But what genome (and version) are you mapping against?

•  UCSC: ftp://hgdownload.cse.ucsc.edu/goldenPath/ •  NCBI: ftp://ftp.ncbi.nih.gov/genomes/ •  Ensembl: ftp://ftp.ensembl.org/pub/ •  Custom

Bioinformatician downloads, installs, updates and runs…

Page 21: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

READ QUANTIFICATION TOOLS

n  HtSeq-Count: http://www-huber.embl.de/users/anders/HTSeq/doc/count.html n  Raw hit count n  Transcript or Gene-based results

n  Cufflinks: http://cufflinks.cbcb.umd.edu/ n  Normalizing, transcript-based quantification n  FPKM/RPKM values n  Gene-based values are aggregates

Bioinformatician downloads, installs, updates and runs…

Page 22: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

EXPRESSION ANALYSIS TOOLS n  DESeq:

http://bioconductor.org/packages/release/bioc/html/DESeq.html

n  Requires up-to-date R installation; works with raw-hit-count values

n  EdgeR: http://www.bioconductor.org/packages/release/bioc/html/edgeR.html

n  Requires up-to-date R installation; works with raw-hit-count values

n  Cuffdiff: http://cufflinks.cbcb.umd.edu/

n  Part of cufflinks, new version also works with CSV files; works with FPKM values

Bioinformatician downloads, installs, updates and runs…

Page 23: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

COLLECT AND INTEGRATE ANNOTATION

ENSEMBL: http://www.ensembl.org/info/docs/api/index.html

NCBI: http://www.ncbi.nlm.nih.gov/refseq/

GO Interactive: http://amigo.geneontology.org/amigo

KEGG Interactive: http://www.genome.jp/kegg/genes.html

PubMed: http://www.ncbi.nlm.nih.gov/pubmed

Bioinformatician downloads, installs, updates resources and writes scripts to….

Page 24: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatician writes custom scripts in Perl, AWK and Python to

Find significant genes/ elements

Compare analysis results

Page 25: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Sequencing  provider  Sequence  files  

Experimental  design  

Quality  Checks  

Read  Mapping  to  Genome    Quan/fica/on  of  reads  

Expression  Analysis  (e.g.  DeSeq)  

Bioinforma/cian  downloads,  installs/updates  various  tools  and  performs  

Annota/on  

Significant  gene  discovery  

Result  comparison  

RNA-Seq experiment and analysis ….. and analysis ….

results  

“What if?”

2 months later of hard work by Bioinformatician

Requires analysis to be repeated

Page 26: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Enter Bioinformatics-in-a-BoxTM

Web-based tool for organized data management and analysis of NGS data

Page 27: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Securely  share:  •  results  •  analysis  steps  •  work  together!  

365x24  

results  

Publish  faster  

Bioinforma/cs-­‐in-­‐a-­‐Box!  

Easily  execute  “what  if”  ques/ons  

Support  provid

ed  every  step  o

f  

the  way  to  ensu

re  success  

Collaborate

A Bioinformatics tool for “Biologists” and Bioinformaticians with large workloads!

•  Organized  analysis  ar/facts  •  Parameter  tracking  

Computa/on  power/disk  is  in  the  cloud  or  on  your  hardware  

Page 28: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Example Start with an RNA-Seq Data set

n  Six Samples n  3 Normal Prostate and 3 Prostate Adenocarcinoma

Samples

n  SRA Project n  SRP003611

n  Publication n  Nacu, S., et al., Deep RNA sequencing analysis of

readthrough gene fusions in human prostate adenocarcinoma and reference samples. BMC Med Genomics, 2011. 4: p. 11.

Page 29: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-BoxTM: Obtain the Data

n  Load your own data or from SRA n  Combine Technical Replicates

RNA  Seq  Experiment  

Page 30: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Quality Check n  A click to run FastQC n  A click to run BLAST & align to NCBI All Genomes Database (nr)

Page 31: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Read Quantification

Integrated Tools: n  Cufflinks: FPKM values n  Ht-SeqCount: Hit Count values

Page 32: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Expression Analysis

Cuffdiff: takes FPKM n  Genes, isoforms, TSS

edgeR: takes Hit Count n  Genes or isoforms

DESeq: takes Hit Count n  Genes or isoforms

Page 33: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Integrated Annotations

n  ENSEMBL n  NCBI n  GO n  KEGG n  PubMed

Page 34: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

n  ENSEMBL n  NCBI n  GO n  KEGG n  PubMed

Bioinformatics-in-a-Box: Integrated Annotations

Page 35: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

n  ENSEMBL n  NCBI n  GO n  KEGG n  PubMed

Bioinformatics-in-a-Box: Integrated Annotations

Page 36: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

n  ENSEMBL n  NCBI n  GO n  KEGG n  PubMed

Bioinformatics-in-a-Box: Integrated Annotations

Page 37: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

n  PubMed

Bioinformatics-in-a-Box: Integrated Annotations

Page 38: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box:

Set filter criteria n  P-value n  Adjusted p-value n  Fold change n  Absolute expression

Save  your  subset  of  genes  

Find Significant Genes/Elements

Page 39: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Compare Results

Conclusion: Biological replicates are preferable

Indicates too many false positives with single-sample comparisons

5369  1317  

962  

Single Sample (T1 vs. C1) vs. Biological Replicates (T1,2,3 vs. C1,2,3)

Page 40: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Compare DE Results

Indicates many differences between algorithms!

Conclusion: It is advisable to use multiple algorithms

DESeq vs. edgeR vs. Cuffdiff

Page 41: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Compare Results Limma vs. NGS Algorithms

Conclusion: Limma found genes undetected by NGS tools

Page 42: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Compare Results

Limma detects differential genes missed by edgeR & DESeq

Limma vs. NGS Algorithms

Conclusion: Traditional algorithms can be useful for analyzing NGS data

Page 43: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

DNA-Seq Mutation Analysis

Page 44: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

DNA-Seq Mutation Analysis: Analysis steps

1.  Obtain and load data 2.  Quality check 3.  Align to genome

n  Bowtie, Bowtie2, BWA

4.  Check actual coverage (optional) 5.  Mutation detection

n  GATK, samtools, pindel

6.  Compare results

Page 45: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Start with Data set: Human Exome

n  Enrichment: Agilent Sure Select v4 n  Configuration: 2x100; Approximately

100 million reads n  Theoretic average coverage: ~130x

Page 46: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

n  Note quality drop-off after base 60

Bioinformatics-in-a-Box: Quality Check

Page 47: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Align to Genome

Set mapping parameters,

including trimming

Set pairing parameters

Page 48: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Check actual coverage Lower than theoretical, as expected

Page 49: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box Integrated tools for SNP Detection

n  GATK: https://www.broadinstitute.org/gatk/ n  Samtools: http://samtools.sourceforge.net/ n  FreeBayes: https://github.com/ekg/freebayes

Longer INDELs (> ~10b) and other SV

•  Pindel: http://gmt.genome.wustl.edu/pindel/current/

Page 50: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-BoxTM: Mutation detection Select an algorithm of choice

Set pre-processing options

Page 51: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

SNP & INDEL Detection by hand n  Using scripts, Integrate Annotation

n  dbSNP, 1000genomes: URL API is slow, recommend local database installation

n  Classification snpEff: http://snpeff.sourceforge.net/ n  Selection, result comparison

§  Algorithm-specific filtering §  Perl, Python, etc.

•  Using scripts, filter by location, coverage, quality, type of mutation, codon impact, protein impact, clinical impact

n  Using scripts, compare results

Page 52: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-BoxTM: Integrated Annotations

n  Known SNP n  Location, gene (if

appropriate) n  Codon, amino-acid, protein

impact

•  Up-stream/down-stream sequences, quality, coverage, allele frequency

Page 53: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-BoxTM: SNP Details

Page 54: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-BoxTM: SNP Quality

Mutation Viewer

Page 55: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-BoxTM: Insertion

Mutation Viewer

Page 56: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-BoxTM: Deletion

Mutation Viewer

Page 57: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-BoxTM: Integrated Annotations •  Known SNP •  Location, gene (if appropriate) •  Codon, amino-acid, protein impact

•  Up-stream/down-stream sequences, quality, coverage, allele frequency

Page 58: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Filter by quality, location, impact, etc.

Save  dataset  

Bioinformatics-in-a-Box Selecting SNPs and INDELs

Page 59: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Compare SNP results GATK versus Sam

Different algorithms generate different results

Page 60: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: Compare SNP results using Samtools & BWA versus Bowtie2

Different ALIGNMENT algorithms generate different results

Page 61: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-BoxTM: Compare InDel results using Samtools & BWA vs. Bowtie2

Different ALIGNERS generate different results

Page 62: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

End of DNA Mutation Detection

Page 63: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Bioinformatics-in-a-Box: 365x24 n  Data analysis

n  Peer reviewed algorithms n  RNA-Seq, SNP Detection and Genotyping and miRNA-Seq n  What if? Scenarios

n  Data management n  Linking all primary data, algorithms, genome references,

parameters with results n  Breadcrumb trail of what has been done, with what

settings and versions (algorithms and references)

n  Secure worldwide collaboration

n  Hands-on support (and documentation if you must…)

Page 64: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

NCGR/NM-INBRE Bioinformatics Internship The National Center for Genome Resources & NM-INBRE Present June 15, 2015 - July 31, 2015 (tentative dates) 7-Week Intensive Program June 15 – June 26: 2 weeks of instruction June 29 – July 31: 5 weeks of hands-on projects including a presentation of your work Deadline to apply: 11:59pm Thursday, April 30, 2015 SPACE IS LIMITED Targeted towards: Grads and undergrads PREREQUSITE: The program requires some knowledge of UNIX and includes prerequisite reading and understanding of chapters 4 and 5 of the following: http://my.safaribooksonline.com/book/bioinformatics/1565926641

Page 65: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Annual Educational Symposia

New Mexico BioInformatics, Science and Technology (NMBIST) Symposium

“Transcriptional Control”

March 26,27 2015 Drury Plaza Hotel, Santa Fe, NM

-  Experts in the field -  Student poster session -  Student speaking slot competition -  Highlights: Dr. Klemens Hartel

Page 66: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Sequencing and Bioinformatics project ideas 1) Small genome sequencing and de novo assembly Draft assemblies for genomes up to 100Mb in size.

•  Pacbio only sequencing and assembly •  Illumina only assembly •  Pacbio/Illumina/454 hybrid assembly approaches

2) PacBio sequencing and analysis, projects include

•  IsoSeq pilot •  Base Modification Detection

3) Illumina genomic sequencing and mutation detection 4) Illumina RNA-seq or miRNA-seq and expression analysis 5) Bioinformatics only 6) Custom

Page 67: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Conclusion and discussion The NM-INBRE SBC has the resources and track record to advance your research:

•  Sequencing: Illumina and PacBio technologies

•  Bioinformatics: Standard pipelines and custom analysis

Work with VGN to impact science!

Please contact us to find out more at [email protected]!

Page 68: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Acknowledgments NCGR/NMINBRE Sequencing and Bioinformatics Core

Science/Bioinformatics Sequencing Lab *Anitha Sundararajan Peter Nagm *Johnny Sena Jennifer Jacobi Joann Mudge Pooja Umale Nico Devitt Thiru Ramaraj IT/Administration Stephanie Guida Forrest Black Connor Cameron Kathy Myers Andrew Farmer *Lisana Chavez Boris Umylny Callum Bell NIH NIGMS (5P20GM103451)

Page 69: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing

Thank you! Faye D. Schilkey

[email protected] of: 505-995-4449 cl: 505-660-4388