22
Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular Medicine, Leeds Teaching Hospitals & Cancer Research UK

Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Embed Size (px)

Citation preview

Page 1: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Illumin8er: Software for the Illumina GAII

Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor

Leeds Institute of Molecular Medicine, Leeds Teaching Hospitals & Cancer Research UK

Page 2: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Sipping from the hosepipeThe cost of DNA sequencing is plummeting

Current sequence output from an Illumina GAII is over 1 Gigabase per day

Managing the data is the single biggest challenge to bringing the benefits to patients and cost savings to to the Healthcare budget

The next biggest challenge is optimising the workflow to achieve cost efficiency

Page 3: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

What should the software do?

Scan for and report mutations against a defined reference sequence.

Be able to handle bar-code sequence tags

Be easy to use

Report on data quality

Export to a database

Page 4: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Why Illumina?Cost: 0002p per base

Capacity: 3.5 Gigabase per run

Simplicity: library>cluster station>sequence>data

Page 5: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

500,000,000 bases per channel

Page 6: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Software requirementsRuns in MS Windows

User definable reference sequence

Quality scores

Automatic mutation callingSNPs Indels

Speed

Page 7: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Initial data manipulationIlluminator can transform data in prb.txt or

seq.txt in to fasta files

If tagged data is used each tag is separated in to an individual file.

The prb.txt files can be filtered for low quality data

Page 8: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Reference filesReference files are created from plain text

files of the genomic sequence and a cDNA sequence in either a plain text file or a genbank web page.

If a genbank page is used the SNP data in the page is also imported with cDNA sequence.

The reference file contains the position of the exons and ORF relative to the genomic sequence to aid mutation annotation.

Page 9: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Indexing the reference sequence

Each octamer in the reference sequence is mapped to an array of 65537 octamers (the extra one is for unmapped rubbish such as ‘nnnnnnnn’)

Some octamers have no positions in the reference while others have several.

GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG

aaaaaaaaaaaaaaac

aaaaaaataaaaaaag

aaaaaacaaaaaaacc

tttttttt

tttttttctttttttg

~65000

nnnnnnnn

Page 10: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Mapping reads with 3’ mismatchesTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGGAAA

Position where octamer is found in ref seq

60629005000

6148900

3066221400

18302500

Match up positions where octamer increase by 8 606

29005000

6148900

3066221400

NA

not+8b

p+8bp +8bp

3’ mismatches have a run of 3 foot prints with the last octomer missing.This goes in to array 2 (phase 2)

GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG

Page 11: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Mapping reads with 5’ mismatchesGTGAGGGGGGGGCAGGAGTGCTTGGGTTGTGGTGAA

Position where octamer is found in ref seq 5700

6148900

3066221400

630

Match up positions where octamer increase by 8 NA 614

8900

3066221400

630+8bp

+8bp

GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG

not+8b

p

5’ mismatches have a run of 3 foot prints with the first octomer missing.This goes in to array 3 (phase 3)

Page 12: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Mapping reads with internal mismatches

TGAGGGGTGGGGCAGAAGTGCTTGGGTTGTGGTGAA

Position where octamer is found in ref seq

60629005000

16645900

3066221400

630

Match up positions where octamer increase by 8 606

29005000

16645900

3066221400

630+8bp

not+8bp

GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG

not+8b

p

internal mismatches have a run of 3 foot prints with either the second or third octamer out of phase.This goes in to array 4 (phase 4)

+16bp

Page 13: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

What each phase is used for

Phase 1 = perfect matches

Phase 2 = indels and small mutations at end of a read

Phase 3 = indels and small mutations at start of a read

Phase 4 = small mutations in the middle of read

Page 14: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Small changes These are found by looking at Phase 4 data.

Homozygous mutation are in Phase 4 but not phase 1 (seen as a hole)

Heterozygous variants are in seen in phase 4 and wt seen in phase 1 data.

WT in Phase 1data

Mut in Phase 4Data.(The wt alleleIs present due to seq errors elsewhere in the read.)

Page 15: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

InDels

Phase 2 data gets indels from end of the read while Phase 3 gets them from the start of the read.

In a perfect world Phase 2 and 3 data should mirror each other.

Page 16: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Global view

Data for a PCR product containing two exons; blue = exonic DNA pink = protein coding DNA

The red and blue lines show the read depth of forward and reverse reads.

The lower panel shows the reference and deduced sequences around the a point on the upper panel selected by clicking on the panel with the mouse

Page 17: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Data view

Forward and Reverse sequences

Patient sequence

Patient’s other allele sequence

Score for each nucleotideReference genomic, cDNA and protein sequence

Read depth

Heterozygous base

Page 18: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Indel interface

Forward and Reverse sequences

Reference sequence

Patient sequences with indel at start and end of read

Consensus sequence of patient reads across indel

Alignment of patient and reference sequence to identify indel

Page 19: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Data exportThe program can both export and import the

alignment data as a plain text file

Create an updatable library of sequence variants

Export sequence variants as a text file

Create a LOVD import file for the sequence variants

Page 20: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Validation: BRCA1&BRCA2

Illuminator detected all the mutations previously identified by dye terminator Sanger sequencing of the exons in BRCA1 and 2 of 10 individuals. Each nucleotide had a read depth of at least 75 reads (approximately 6.6x103 sequences per gene). The alignment and mutation annotation took ~50 seconds per gene per person

Page 21: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

ConclusionsIllumin8er is

Easy to use RapidRuns on Windows desktopUses standard Illumina output filesReports mutations in a sensitive and specific

manner

Page 22: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular

Next steps..Make freely available by download

http://dna.leeds.ac.uk/illumin8er/

Design compatible LOVD

Large scale validation trial