Upload
trinhtram
View
217
Download
3
Embed Size (px)
Citation preview
CPTR title slide
ReSeqTB data platform
pipeline threshold values
Jamie Posey, PhD
CDC
Pipeline Scheme
Pipeline flowchart
Pipeline flowchart
Pipeline flowchart
Key steps on pipeline
• Input data validation & QC
• Species specificity check
• Sequence reads mapping & refinement
• Variant calling
• Functional Annotation & Lineage Analysis
Input data validation & QC
Quality Scores
QUALITY SCORE ACCURACY (%)
Q10 90
Q20 99
Q30 99.9
Q40 99.99
Input data validation & QC
• Fastq format files
-From next-generation Sequencing platforms
-specifically Illumina sequencing
• FastQValidator Version 1.0.5
Are Sequence reads in fastq format or not?
Input data validation & QC
Input data validation & QC
• Prinseq-lite.pl Version 1.0.5
- Trim reads based on quality Threshold
QC Threshold: Q20 Average Read Sequence Quality
Species Specificity check
Species Specificity check
• Kraken version 0.10.5
-Is the percentage of reads mapping to Mycobacterium tuberculosis Complex(MBTC) above acceptable threshold?
QC Threshold : Percent of reads mapping to MBTC -> 90%
Species Specificity check
Sequencing reads mapping & refinement
Sequencing reads mapping & refinement
• Reference Genome: H37Rv (NC_000962.3)
• BWA MEM: Version 0.7.12
- Mapping Tool
• QC: Qualimap Version 2.1
- Output: Quality Report, inferring mapping
Sequencing reads mapping & refinement
Sequencing reads mapping & refinement
• Removing duplicate reads
PICARD tools Version 1.134
• Cleaning Indels & recalibration
GATK Version 3.4.0
• Calculation of coverage statistics
Variant Calling
Variant Calling
• Samtools & Bcftools Version 1.2
-QC Threshold : Q20 Minimum base call quality
-QC Threshold: Q20: Minimum mapping quality
-QC Threshold : Minimum read depth >/= 10X
-QC Threshold: SNP clusters; 3 SNPs in 10 nucleotide bases
Variant Calling
Pipeline flowchart
FFILTER VCF FileCustom Script
Functional Annotation & Lineage AnalysisSnpEff Ver. 4.1 & custom Script
Mapping to ReseqTB Database
Input: VCF file (Raw)
Filtered VCF file
Output: Annotation Report & Lineage Report
Functional Annotation and Lineage Analysis
Functional Annotation & Lineage Analysis
• Filtering output VCF file
-Custom loci bed list & vcftools Version 0.1.126
• Initial annotation
-SnpEff Version 4.1
• Reformatting annotation and Lineage analysis
-Custom Script
Annotation Report
Lineage Report
Summary of UVP analysis
Total Isolates Analyzed : 3717
Number passed all checks: 3570
Total failed QC: 147
- Failed Kraken specificity: 67
- Flagged for multiple rrs/rrl mutations : 76
- Mixed infection : 4
Distribution of MTBC major lineages in dataset
Phylogenetic representation of Isolates in dataset
BovisEast AsianEast African Indian
West African L5
Indo-Oceanic
West African L6Euro American
Antibiotic resistance profile across major lineages
Summary
• The Unified variant pipeline is very comprehensive, includes additional genomic data analysis steps (Species and lineage specificity, custom annotations)
• Applies current versions of bioinformatics tools to set quality thresholds at all stops on the pipeline to ensure confidence in variant calls.
• Annotation results validation with results from a number of other variant calling pipelines, including PhyReeSE (Silke et al 2015) shows agreement across most variant positions.
Thank You!