24
ILLUMINA PROPRIETARY 15066594 Rev. B December 2014 Amplicon DS App v1.1 Introduction 3 Running Amplicon DS 5 Amplicon DS Output 7 Amplicon DS Methods 21 Technical Assistance

BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

ILLUMINA PROPRIETARY15066594 Rev. BDecember 2014

AmpliconDSAppv1.1

Introduction 3Running Amplicon DS 5Amplicon DS Output 7Amplicon DS Methods 21Technical Assistance

Page 2: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"), and are intended solely for thecontractual use of its customer in connection with the use of the product(s) described herein and for no other purpose. Thisdocument and its contents shall not be used or distributed for any other purpose and/or otherwise communicated, disclosed,or reproduced in any way whatsoever without the prior written consent of Illumina. Illumina does not convey any licenseunder its patent, trademark, copyright, or common-law rights nor similar rights of any third parties by this document.

The instructions in this document must be strictly and explicitly followed by qualified and properly trained personnel in orderto ensure the proper and safe use of the product(s) described herein. All of the contents of this document must be fully readand understood prior to using such product(s).

FAILURE TO COMPLETELY READ AND EXPLICITLY FOLLOW ALL OF THE INSTRUCTIONS CONTAINED HEREINMAY RESULT IN DAMAGE TO THE PRODUCT(S), INJURY TO PERSONS, INCLUDING TO USERS OR OTHERS, ANDDAMAGE TO OTHER PROPERTY.

ILLUMINA DOES NOT ASSUME ANY LIABILITY ARISING OUT OF THE IMPROPER USE OF THE PRODUCT(S)DESCRIBED HEREIN (INCLUDING PARTS THEREOF OR SOFTWARE) OR ANY USE OF SUCH PRODUCT(S) OUTSIDETHE SCOPE OF THE EXPRESS WRITTEN LICENSES OR PERMISSIONS GRANTED BY ILLUMINA IN CONNECTIONWITH CUSTOMER'S ACQUISITION OF SUCH PRODUCT(S).

FOR RESEARCH USE ONLY

© 2014 Illumina, Inc. All rights reserved.

Illumina, 24sure, BaseSpace, BeadArray, BlueFish, BlueFuse, BlueGnome, cBot, CSPro, CytoChip, DesignStudio,Epicentre, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, HiSeq X, Infinium,iScan, iSelect, MiSeq, NeoPrep, Nextera, NextBio, NextSeq, Powered by Illumina, SeqMonitor, SureMDA,TruGenome, TruSeq, TruSight, Understand Your Genome, UYG, VeraCode, verifi, VeriSeq, the pumpkin orange color,and the streaming bases design are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or other countries. Allother names, logos, and other trademarks are the property of their respective owners.

Page 3: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

Introduction

Amplicon DS v1.1 User Guide 3

Introduction

The BaseSpace® Amplicon DS app analyzes DNA samples that have been preparedusing a mirrored, dual strand TruSeq Amplicon method, such as TruSight Tumor assaysamples. After alignment, variant calling is performed using the Somatic Variant Caller.See the following pages for more information:} Somatic Variant Caller on page 22Variants are called only for the target regions, which are specified in the manifest files.Statistics reporting accumulates coverage and enrichment-specific statistics for eachtarget as well as overall metrics.The main output files generated by the Amplicon DS app are:} BAM files, containing the reads after alignment plus alignment details.} VCF files, containing the variant calls.} Genome VCF (.genome.vcf) files, describing the calls for all variant and non-variantsites in the genome.

} Annotation file (ANT). This binary file can be loaded into VariantStudio for viewing;see www.illumina.com/clinical/clinical_informatics/illumina-variantstudio.ilmn.

In addition, there are analysis reports and summary.csv files.See Amplicon DS Methods on page 21 and Amplicon DS Output Files on page 14 for moreinformation.

Figure 1 Amplicon DS AppWorkflow

Page 4: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

4 15066594 Rev. B

VersionsThe following module versions are used in the Amplicon DS app:} Amplicon DS (BaseSpace Workflow) 1.1} Isis (Analysis Software) 2.5.40.16} SAMtools 0.1.19-isis-1.0.3} Somatic Variant Caller 3.5.2.1} IAS (Annotation Service) VEP 72.5

LimitationsBefore running the Amplicon DS app, be aware of the limitations described in this topic.Current Limitations:} This app supports the analysis of TruSight Tumor samples or other samplesprepared using a mirrored, dual strand TruSeq Amplicon assay.

Technical Limitations:} Variants are found only in the regions that are targeted in the manifest, which isloaded automatically for the TruSight Tumor assay.

} Reads must be at least 50 bases in length.} No minimum number of reads is required, but use sufficient data for each sample tosupport appropriate depth of coverage for variant calling.

} App supports running only one manifest per analysis.} Samples need to all be paired-end or all single-end; a warning is thrown if single-end.

} Samples all need to have the same read lengths.

Page 5: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

Running

Amplicon

DS

Amplicon DS v1.1 User Guide 5

RunningAmpliconDS

1 Click the Apps button.The BaseSpace Apps page, which lists all the available BaseSpace applications,opens.

2 Find Amplicon DS in the list of applications, and then click the Launch button.An End-User License Agreement (EULA) page might open; otherwise, the AmpliconDS Input page opens.

3 If applicable, read the EULA, and then click Accept.The EULA closes and the Amplicon DS Input page opens.

Figure 2 Amplicon DS Input Form

4 Enter the input information for the Amplicon DS app.• Analysis Name: A default name (App name and the current date and time) isprovided. You can leave the default name as-is, or modify it.

• Save Results To: Click Select Projects to open the Select Projects popup, andthen do one of the following:— Select an existing project for which you have Write permissions to store the

app results, and then click Confirm.— Click New to open to open the Create Project popup, and then do the

following:— Enter a name, and optionally, a description for the new project.— Click Create to display the new project at the top of the Select Projects

popup.— Select the new project, and then click Confirm.

NOTEYou can enter search criteria in the Search field to search for a project in the list.

• Variant Caller: Somatic is the only variant caller available, and you cannot clearthis option.

Page 6: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

6 15066594 Rev. B

NOTESomatic in the only variant caller that is available because the Amplicon DS is designed toanalyze samples that have been prepared with theIllumina TruSight Tumor Librarylibrary prep kit. The Somatic caller is designed to detect low frequency mutations(approximately 5% mutant allele frequencies).

• Annotation: Select the gene and transcript annotation reference database forannotating the called variants.

• Samples: To add each sample for a pooled sample pair, do the following:— Click Select Pairs(s) to open the Select Pairs popup.— If needed, search for the samples you want to analyze.— Select a sample from the search results, or click Select All to select all the

samples, and then drag and drop the samples to the appropriate location(FPA or FPB).

— Click Confirm.

NOTESelect matching pooled sample pairs.

5 Click Continue.The Amplicon DS app is launched and the analysis of the samples is started. TheAnalysis Info page opens. This page shows the following information about theanalysis: the analysis name, the selected application, the analysis start date, theanalysis completion date, the duration of the analysis, the session type, and theanalysis status.

Figure 3 Amplicon DS Analysis Info page

You receive an email when the analysis is complete. You can then open the specificproject to view the analysis results. See Amplicon DS Output on page 7

Page 7: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

Amplicon

DSOutput

Amplicon DS v1.1 User Guide 7

AmpliconDSOutput

After you launch the Amplicon DS and start the analysis of the samples, you can signout of the application. You receive an email when the analysis is complete. You can thensign back in to the application, and open the specific project to view the analysis results,which include various reports and output files. This chapter describes the Amplicon DSapp output.

NOTEWhen the project analysis is complete, you receive a notification email. You can click thelink in the email to navigate to the specific project, or you can log on to BaseSpace andnavigate to the project as described in Steps 1 and 2.

1 Click the Projects button.The Projects page, which lists all the BaseSpace projects for which you havepermissions, opens.

2 Click the name of the appropriate project.The Analyses page opens. This page lists all the analyses that have been carried outfor the selected project.

3 Click the appropriate analysis.The Project Output page opens. The page shows the Pairwise Analysis report for thematching sample pair that was analyzed last. The left side of the page shows theOutput Navigation bar.

Figure 4 Amplicon DS Output Navigation Bar

4 Click an option on the Output Navigation bar to access the associated project output.See:• Analysis Info: A read-only summary of the project analysis settings. See Analysis

Info on page 13 for descriptions.• Inputs: A read-only summary of the project input settings. See Inputs on page 14for descriptions.

• Output Files: Access to the individual project output files. See Amplicon DSOutput Files on page 14 for descriptions.

• Analysis Reports— Pairwise Analysis reports: Access to the analysis reports for a single sample

pair. One entry for each sample pair that was analyzed, with the entriesdisplayed in reverse chronological order of analysis. See Sample AnalysisReports on page 8 for a description.

Page 8: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

8 15066594 Rev. B

— Aggregate Summary report: Access to the analysis metrics for the aggregateresults. The Aggregate Summary option is only displayed if multiple samplepairs were analyzed. See Aggregate Summary Report on page 10 fordescriptions.

Sample Analysis Reports

The Pairwise Analysis report provides an overview of the analysis statistics for eachsample pair separated by pools. The information is shown is tables and in plots. ThePairwise Analysis report page also has an option for downloading the Summary Report(Sample Information, Sample Analysis, and Analysis Details) as a PDF.

Amplicon Summary

Statistic Definition

Number of Amplicon Regions The number of amplicon regions that were sequenced.

Total Length of AmpliconRegions

The total length of the sequenced amplicon regions in basepairs.

} Read Level Statistics

Statistic Definition

Total alignedreads

The total number of reads passing filter present in the data set that aligned tothe reference genome.

Percent alignedreads

The percentage of reads passing filter that aligned to the reference genome.

} Base Level Statistics

Statistic Definition

Total AlignedBases

The total number of bases present in the data set that alignedto the reference genome.

PercentAligned Bases

The percentage of bases that aligned to the referencegenome.

Percent Q30 The percentage of bases with a quality score of 30 or higher.

Mismatch Rate The average percentage of mismatches across both reads 1and 2 over all cycles.

Small Variants SummaryThe Small Variants Summary provides metrics about the number of SNVs, deletions,and insertions. Data are first analyzed for each individual pool (FPA and FPB). The dataare then reconciled for the two pools and a consensus call is made for all the calledvariants.

Page 9: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

Amplicon

DSOutput

Amplicon DS v1.1 User Guide 9

Statistic Definition

TotalPassing

The total number of variants present in the data set that passed thevariant quality filters.

PercentFound indbSNP

100*(Number of SNVs in dbSNP/Number of SNVs). The SNVs that werefound in the dbSNP are annotated accordingly.

Het/HomRatio

Number of heterozygous variants/Number of homozygous variants.

Ts/TvRatio

Transition rate of SNVs that pass the quality filters divided bytransversion rate of SNVs that pass the quality filters. Transitions areinterchanges of purines (A, G) or of pyrimidines (C, T). Transversions areinterchanges between purine and pyrimidine bases (for example, A to T).

Coverage SummaryThe Coverage Summary provides details about the uniformity of coverage and meancoverage separately for each sample pool. Typically, if the two pools were consistentlyprepared, the results are similar between the two pools.

Statistic Definition

AmpliconMeancoverage

The mean coverage across all sequenced amplicons.

UniformityofCoverage

The percentage of amplicon regions with coverage values greater than the lowcoverage threshold, where the low coverage threshold is defined as (0.2 * AmpliconMean coverage).

Coverage by Amplicon Region (Overall) plotThe Coverage by Amplicon Region plot shows the coverage across the entire panel ofamplicons for both pools, with each amplicon shown as individual data point. The Xaxis shows the individual amplicon data points, and the Y axis shows the coverage(Log10). This layout is useful for quickly identifying outliers. The plot has the followingcharacteristics:} Amplicon regions that have coverage values that are greater than the Low CoverageThreshold (0.2 * Amplicon Mean coverage) are shown as blue data points.

} Amplicon regions that have coverage values that are less than the Low CoverageThreshold (0.2 * Amplicon Mean coverage) are shown as red data points.

} The horizontal orange line marks the Moving Average across all coverage values.} The horizontal red line marks the Low Coverage Threshold.

Page 10: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

10 15066594 Rev. B

Figure 5 Coverage by Amplicon Region (Overall) plot

You can place your cursor on any amplicon data point in the plot to open a tooltip thatdetails the coverage that was achieved for the amplicon. The tooltip also provides otherinformation about the amplicon region, including the gene name, the exome name, thechromosome number, and an amplicon-specific string. You can export the plot to a CSVfile.

Figure 6 Coverage by Amplicon Region (Overall) plot with tooltip

Coverage by Amplicon Region (Pool) plotThe Coverage by Amplicon Region (Pool) plot is identical to the Coverage by AmpliconRegion (Overall) plot with one exception - the information that is shown for anindividual pool rather than for both pools.

Aggregate Summary Report

The Aggregate Summary report is available if more than one sample pair was analyzed.The Aggregate Summary report aggregates all the individual Pairwise Analysis reportsfor the project into a single report. The information is shown is tables and in plots. Forany plot in the report, you can place your cursor on a data point to open a tooltip thatshows the name of the associated sample and the value for the report metric. TheAggregate Summary report page also has an option for downloading the Summary

Page 11: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

Amplicon

DSOutput

Amplicon DS v1.1 User Guide 11

Report (Manifest Information, Sample Information, and Aggregate Summary Details) as aPDF.

Figure 7 Aggregate Summary report plot with tooltip

Manifest InformationThe manifest is the list of targeted regions that were analyzed.

NOTEThe Manifest Information is a fixed parameter for Amplicon DS.

Amplicon Summary

} Read Level StatisticsThe Percent Aligned Reads is plotted against each sample for each analyzed pair,resulting in a single plot with one data point per sample pair. The following metricsare detailed in a table above the plot:

Statistic Definition

TotalAlignedReads(R1/R2)

The total number of reads passing filter present in the data set that aligned to thereference genome. Numbers are per read.

PercentAlignedReads(R1/R2)

The percentage of reads passing filter that aligned to the reference genome.

OverallPercentAlignedReads

The percentage of reads passing filter that aligned to the reference genome acrossboth reads (R1 and R2). The value is the average of the individual Percent AlignedReads values.

} Base Level StatisticsThe Percent Aligned Bases is plotted against each sample for each analyzed pair,resulting in a single plot with one data point per sample pair. The following metricsare detailed in a table above the plot:

Statistic Definition

TotalAlignedBases(R1/R2)

The total number of bases present in the data set that aligned to the referencegenome. Numbers are per read.

Page 12: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

12 15066594 Rev. B

Statistic Definition

OverallTotalAlignedBases

The total number of bases present in the data set that aligned to the referencegenome across both reads (R1 and R2). The value is the average of the individualTotal Aligned Bases values.

PercentAlignedBases(R1/R2)

The percentage of bases that aligned to the reference genome.

Numbers are per read.

OverallPercentAlignedBases

The percentage of bases that aligned to the reference genome across both reads(R1 and R2). The value is the average of the individual Percent Aligned Basesvalues.

PercentQ30

The percentage of bases with a quality score of 30 or higher.

MismatchRate

The percentage mismatch to the reference genome averaged over cycle per read(R1 and R2).

Small Variants Summary} SNVsThe total number of SNVs that passed the quality filters is plotted against eachsample for each analyzed pair, resulting in a single plot with one data point persample pair. The following metrics are detailed in a table above the plot:

Statistic Definition

SNVs The total number of variants present in the data set that passed thevariant quality filters.

PercentFound indbSNP

100*(Number of SNVs in dbSNP/Number of SNVs). The SNVs that werefound in the dbSNP are annotated accordingly.

Ts/TvRatio

Transition rate of SNVs that pass the quality filters divided bytransversion rate of SNVs that pass the quality filters. Transitions areinterchanges of purines (A, G) or of pyrimidines (C, T). Transversions areinterchanges between purine and pyrimidine bases (for example, A to T).

SNVHet/HomRatio

Number of heterozygous variants/Number of homozygous variants.

} InsertionsThe total number of insertions that passed the quality filters is plotted against eachsample for each analyzed pair, resulting in a single plot with one data point persample pair. The following metrics are detailed in a table above the plot:

Page 13: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

Amplicon

DSOutput

Amplicon DS v1.1 User Guide 13

Statistic Definition

Insertions The total number of variants present in the data set that passed thevariant quality filters.

PercentFound indbSNP

100*(Number of SNVs in dbSNP/Number of SNVs). The SNVs thatwere found in the dbSNP are annotated accordingly.

InsertionHet/HomRatio

Number of heterozygous variants/Number of homozygousvariants.

} DeletionsThe total number of deletions that passed the quality filters is plotted against eachsample for each analyzed pair, resulting in a single plot with one data point persample pair. The following metrics are detailed in a table above the plot:

Statistic Definition

Deletions The total number of variants present in the data set that passed thevariant quality filters.

PercentFound indbSNP

100*(Number of SNVs in dbSNP/Number of SNVs). The SNVs thatwere found in the dbSNP are annotated accordingly.

DeletionsHet/HomRatio

Number of heterozygous variants/Number of homozygousvariants.

Coverage SummaryThe Amplicon Mean Coverage Depth is plotted against each sample for each analyzedpair, resulting in a single plot with one data point per sample pair. The followingmetrics are detailed in a table above the plot:

Statistic Definition

AmpliconMeanCoverageDepth

The total number of aligned bases to the targeted region divided by the targetedregion size.

Uniformity ofCoverage

The percentage of amplicon regions with coverage values greater than the lowcoverage threshold, where the low coverage threshold is defined as (0.2* Amplicon Mean coverage).

Analysis Info

This app provides an overview of the analysis on the Analysis Info page.

A brief description of the metrics is below.

Page 14: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

14 15066594 Rev. B

Row Definition

Name Name of the app session.

Application App that generated this analysis.

Date Started Date and time the app session started.

Date Completed Date and time the app session completed.

Duration Duration of analysis.

Session Type The number of nodes used.

Size Total size of all output files.

Status Status of the app session.

Table 1 Analysis Info

Log Files

Clicking the Log Files link on the Analysis Info page provides access to the app log files.

Inputs

The Inputs page provides an overview of the input samples and settings that werespecified when the Amplicon DS project was set up.

Amplicon DSOutput FilesThe Output Files page provides access to the output files for each sample pair that wasanalyzed, with one top-level folder for each sample pair. If more than one sample pairwas analyzed, then an Aggregate Summary folder is also available.

Figure 8 Output Files Folder Structure

You can click a top-level sample pair folder to openit and view the following:} A folder for each sample that was analyzed for the pair (FPA folder and FPB folder.)} A list of files that detail the consensus data for the sample pair, including a PDFSequencing Report file that provides a detailed summary of the analysis. Thedetailed summary consists of Sample Information, Amplicon Summary, Read LevelStatistics, Base Level Statistics, Small Variants Summary, Coverage Summary andPlots, Analysis Details, and Software Versions.

Page 15: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

Amplicon

DSOutput

Amplicon DS v1.1 User Guide 15

Figure 9 Opening a top-level sample pair folder

You can click either the FPA or FPB folder to open and view a list of files that detail theraw data for the sample. You can also click any file link to open a popup that shows anonscreen preview of the file with an option to download the file.

Figure 10 Previewing an Output data file

For detailed descriptions about the different types of Output files that are available forsamples and consensus data, see the following:} BAM Files on page 16} VCF Files on page 16} gVCF Files on page 18} ANT File on page 18} Summary.csv File on page 18

Page 16: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

16 15066594 Rev. B

BAM FilesThe Sequence Alignment/Map (SAM) format is a generic alignment format for storingread alignments against reference sequences, supporting short and long reads (up to 128Mb) produced by different sequencing platforms. SAM is a text format file that is human-readable. The Binary Alignment/Map (BAM) keeps the same information as SAM, but ina compressed, binary format that is only machine readable.If you use an app in BaseSpace that uses BAM files as input, the app locates the filewhen launched. If using BAM files in other tools, download the file to use it in theexternal tool.Go to samtools.sourceforge.net/SAM1.pdf to see the exact SAM specification.

VCF FilesVCF is a text file format that contains information about variants found at specificpositions in a reference genome. The file format consists of meta-information lines, aheader line, and then data lines. Each data line contains information about a singlevariant.If you use an app in BaseSpace that uses VCF files as input, the app locates the file whenlaunched. If using VCF files in other tools, download the file to use it in the external tool.A detailed description of the VCF format is provided in the BaseSpace User Guide.

Amplicon DS VCF EntriesThe VCF files for Amplicon DS can have the following entries in the FILTER, FORMAT,and INFO fields:

Entry Description

LowGQ The genotyping quality (GQ) is below a cutoff.

LowVariantFreq The variant frequency is less than the given threshold.

PB The prevalence of the variant is significantly biased between thetwo forward and reverse probe pools.

R8 For an indel, the number of adjacent repeats (1-base or 2-base) inthe reference is greater than 8.

SB The strand bias is more than the given threshold.

LowDP Applied to sites with depth of coverage that is below a cutoff.

Table 2 VCF FILTER Entries

Entry Description

AD Allelic depths for the ref and alt alleles in the order listed. For indels, this valueincludes only the reads that confidently support each allele (posterior probability0.999 or higher that read contains indicated allele vs all other intersecting indelalleles).

GQ Genotype Quality.

Table 3 VCF FORMAT Entries

Page 17: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

Amplicon

DSOutput

Amplicon DS v1.1 User Guide 17

Entry Description

GQX Minimum of {Genotype quality assuming variant position,Genotype qualityassuming non-variant position}.

GT Genotype.

NL Noise level, as a Q score.

PB Probe pool bias.

SB Strand bias.

VF Variant frequency in the sample.

Entry Description

AA The inferred allele ancestral to the chimpanzee/human lineage.

AF1000G The allele frequency from all populations of 1000 genomes data.

clinvar Clinical significance from the ClinVar database(www.ncbi.nlm.nih.gov/clinvar/).

cosmic The numeric identifier for the variant in the Catalogue ofSomatic Mutations in Cancer (COSMIC) database(www.cancer.sanger.ac.uk/cancergenome/projects/cosmic/).

CSQR Regulatory consequence as predicted by Variant Effect Predictor(www.ensembl.org/info/docs/tools/vep/index.html) version 72.A comma-separated list for each affected regulatory region(including transcription factor binding sites) is provided using thefollowing delimited format: RegulatoryID|Consequence. Theannotations provided in this field come from the Ensembldatabase of regulatory features even if RefSeq was selected asthe annotation source. Many of the RegulatoryIDs begin withENSR. The consequences are indicated using valid SequenceOntology (SO) terms(www.ensembl.org/info/genome/variation/predicted_data.html#consequences) and typically are either regulatory_region_variant or TF_binding_site_variant.

CSQT Transcript consequence as predicted by Variant Effect Predictor(www.ensembl.org/info/docs/tools/vep/index.html) version 72.Only canonical transcripts are included in the VCF file tomaintain readability. The ANT file contains consequences for allaffected transcripts. This binary file can be loaded intoVariantStudio for viewing. Seewww.illumina.com/clinical/clinical_informatics/illumina-variantstudio.ilmn.

DP The depth (number of base calls aligned to a position and used invariant calling). In regions of high coverage, GATK down-samples the available reads.

EVS Allele frequency, sample count, and coverage taken from theExome Variant Server (EVS). Format:AlleleFreqEVS|EVSCoverage|EVSSamples.

Table 4 VCF INFO Entries

Page 18: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

18 15066594 Rev. B

Entry Description

EXON A comma-separated list of exon regions read from RefGene.

FC Functional consequence.

GI A comma-separated list of gene IDs read from RefGene

GMAF Global minor allele frequency (GMAF); technically, the frequencyof the second most frequent allele. Format:GlobalMinorAllele|AlleleFreqGlobalMinor.

phastCons Denotes if the variant is an identical or similar sequence thatoccurs between species and maintained between speciesthroughout evolution.

TI A comma-separated list of transcript IDs read from RefGene.

gVCF FilesThis application also produces the Genome Variant Call Format file (gVCF). gVCF wasdeveloped to store sequencing information for both variant and non-variant positions,which is required for human clinical applications. gVCF is a set of conventions appliedto the standard variant call format (VCF) 4.1 as documented by the 1000 GenomesProject. These conventions allow representation of genotype, annotation, and otherinformation across all sites in the genome in a compact format. Typical human whole-genome sequencing results expressed in gVCF with annotation are less than 1 Gbyte, orabout 1/100 the size of the BAM file used for variant calling. If you are performingtargeted sequencing, gVCF is also an appropriate choice to represent and compress theresults.gVCF is a text file format, stored as a gzip compressed file (*.genome.vcf.gz).Compression is further achieved by joining contiguous non-variant regions with similarproperties into single ‘block’ VCF records. To maximize the utility of gVCF, especially forhigh stringency applications, the properties of the compressed blocks are conservative.Block properties like depth and genotype quality reflect the minimum of any site in theblock. The gVCF file can be indexed (creating a *.tbi file) and used with existing VCFtools such as tabix and IGV, making it convenient both for direct interpretation and as astarting point for further analysis.For more information, see sites.google.com/site/gvcftools/home/about-gvcf.

ANT File

The Illumina Annotation Service (IAS) generates a binary ANT annotation file, whichcontains consequences for all affected transcripts. The annotations are more detailed thanthe annotations in the VCF file. This binary file can be loaded into VariantStudio forviewing; see www.illumina.com/clinical/clinical_informatics/illumina-variantstudio.ilmn.

Summary.csv File

The Amplicon DS app produces an overview of statistics for each sample and theaggregate results in a comma-separated values (CSV) format: the *summary.csv. Thesefiles are located in the results folder for each sample and the aggregate results.

Page 19: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

Amplicon

DSOutput

Amplicon DS v1.1 User Guide 19

Statistic Definition

Sample ID IDs of samples reported on in the file.

SampleName

Names of samples reported on in the file.

Run Folder Run folders for samples reported on in the file.

Manifest The manifest file used for analysis. This file specifies the targeted regions for thealigner and variant caller.

Referencegenome

Reference genome selected.

Number ofampliconregions

The number of amplicon regions that were sequenced.

Total lengthof ampliconregions

The total length of the sequenced bases in the target region.

Total PFreads

The number of reads passing filter for the sample.

Total alignedreads

The total number of reads passing filter present in the data set that aligned to thereference genome.

Numbers are calculated per read, and over both reads.

Percentaligned reads

The percentage of reads passing filter that aligned to the reference genome.

Numbers are calculated per read, and over both reads.

Total PFbases

The number of bases passing filter for the sample.

Total alignedbases

The total number of bases present in the data set that aligned to the referencegenome.

Numbers are calculated per read, and over both reads.

Percentaligned bases

The percentage of bases that aligned to the reference genome.

Numbers are calculated per read, and over both reads.

Percent Q30 The percentage of bases with a quality score of 30 or higher.Numbers are calculated per read.

Mismatchrate

The average percentage of mismatches across both reads 1 and 2 over all cycles.Numbers are calculated per read.

Ampliconmeancoverage

The total number of aligned bases to the targeted region divided by the targetedregion size.

SNVs,Insertions,Deletions

Total number of variants present in the data set that pass the quality filters.

Page 20: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

20 15066594 Rev. B

Statistic Definition

SNVs,Insertions,Deletions(Percentfound indbSNP)

100*(Number of variants in dbSNP/Number of variants).

SNV Ts/Tvratio

The number of Transition SNVs that pass the quality filters divided by thenumber of Transversion SNVs that pass the quality filters. Transitions areinterchanges of purines (A, G) or of pyrimidines (C, T). Transversions areinterchanges of purine and pyrimidine bases (for example, A to T).

SNVs,Insertions,DeletionsHet/Homratio

Number of heterozygous variants/Number of homozygous variants.

Page 21: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

Amplicon

DSMethods

Amplicon DS v1.1 User Guide 21

AmpliconDSMethods

The Amplicon DS workflow evaluates short regions of amplified DNA, or amplicons, forvariants. Focused sequencing of amplicons enables high coverage of particular regionsacross many samples. Amplicon DS samples are generated using a mirrored, dualstrand amplicon assay. This chapter describes the methods that are used in theAmplicon DS application.

AlignmentClusters from each sample are aligned against amplicon sequences specified in themanifest file.Each paired-end read is initially evaluated in terms of its alignment to the relevant probesequences for that read. Read 1 is evaluated against the reverse complement of theDownstream Locus-Specific Oligos (DLSO) and Read 2 is evaluated against theUpstream Locus-Specific Oligos (ULSO). If the start of a read matches a probe sequencewith no more than one mismatch, the full length of the read is aligned against theamplicon target for that sequence. This alignment is performed along the length of theamplicon target using a banded Smith-Waterman alignment algorithm.The banded Smith-Waterman algorithm performs local sequence alignments todetermine similar regions between two sequences. Instead of looking at the totalsequence, the Smith-Waterman algorithm compares segments of all possible lengths,given the restriction that the maximum indel size is 25 bp.Any alignments that include more than three indels are filtered from alignment results.Filtered alignments are written in alignment (BAM) files as unaligned and are not usedin variant calling. Indels within the DLSO and ULSO are not observed given the assaychemistry.

Paired-End EvaluationFor paired-end runs, the top-scoring alignment for each read is considered. Reads areflagged as an unresolved pair under the following conditions:} If either read did not align, or the paired reads aligned to different chromosomes.} If two alignments come from different amplicons or different rows in the Targetssection of the manifest.

Bin/SortThe bin/sort step groups reads by sample and chromosome, and then sorts bychromosome position. Results are written to one BAM file per sample.

Variant CallingSNPs and short indels are identified using the somatic variant caller. Developed byIllumina, the somatic variant caller identifies variants present at low frequency in theDNA sample and minimizes false positives.The somatic variant caller identifies SNPs in three steps:} Considers each position in the reference genome separately.} Counts bases at the given position for aligned reads that overlap the position.} Computes a variant score that measures the quality of the call.

Page 22: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

22 15066594 Rev. B

Variant scores are computed using a Poisson model that excludes variants with avariant quality score below Q20. Additionally, the model only calls variants for basesthat are covered at 300x or greater for a single amplicon.Variants are first called for each pool separately. Then, variants from the two pools arecompared and combined into a single output file.If a variant meets the following criteria, the variant is marked as PASS in the variant file:} Must be present in both pools.} Cumulatively have a depth of 1000 or an average depth of 500x per pool.} Meets all the VCF filter requirements as specified. See Amplicon DS VCF Entries onpage 16.

For more information, see the Amplicon - DS Variant Caller Technical Note on the TruSightTumor Sample Preparation support page.

Somatic Variant Caller

The Somatic Variant Caller is designed for variant calling in tumor samples with nopaired normal. The Somatic Variant Caller is recommended for detection of lowfrequency variants, such as those found in heterogeneous cancer samples.

Variants are flagged as homozygous or heterozygous in the VCF sample column, witheither a 1/1 or a 0/1 respectively. During somatic variant calling, somatic variants areobserved at any frequency. Therefore, het/hom calls are made to indicate the mostreasonable diploid genotype that can be assigned to a variant if it is a non-somatic(germline) variant.

For more information about the Somatic Variant Caller, seeres.illumina.com/documents/products/technotes/technote_somatic_variant_caller.pdf.

Illumina Annotation Service (IAS)

Annotation with IAS populates several values in the VCF file, including dbSNP ID (inthe ID column), and some values in the INFO column. More detailed and extensiveannotations are stored in a binary ANT file. This binary file can be imported intoVariantStudio.

Annotation through IAS is available for alignments against the human reference genome:UCSC build hg19 / Ensembl build GRCh37 / NCBI build37.2.

Page 23: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

TechnicalAssistance

Amplicon DS v1.1 User Guide

Technical Assistance

For technical assistance, contact Illumina Technical Support.

Website www.illumina.com

Email [email protected]

Table 5 Illumina General Contact Information

Region Contact Number Region Contact NumberNorth America 1.800.809.4566 Italy 800.874909Austria 0800.296575 Netherlands 0800.0223859Belgium 0800.81102 Norway 800.16836Denmark 80882346 Spain 900.812168Finland 0800.918363 Sweden 020790181France 0800.911850 Switzerland 0800.563118Germany 0800.180.8994 United Kingdom 0800.917.0041Ireland 1.800.812949 Other countries +44.1799.534000

Table 6 Illumina Customer Support Telephone Numbers

Safety Data SheetsSafety data sheets (SDSs) are available on the Illumina website atsupport.illumina.com/sds.html.

Product DocumentationProduct documentation in PDF is available for download from the Illumina website. Goto support.illumina.com, select a product, then click Documentation & Literature.

Page 24: BaseSpace App TruSeq Amplicon User Guide · 4 15066594Rev.B Versions ThefollowingmoduleversionsareusedintheAmpliconDSapp:}AmpliconDS(BaseSpaceWorkflow)1.1}Isis(AnalysisSoftware)2.5.40.16}SAMtools0.1.19-isis

IlluminaSanDiego,California92122U.S.A.+1.800.809.ILMN (4566)+1.858.202.4566 (outsideNorth America)[email protected]