Whole genome sequencing for outbreak analysis and pathogen ... · Whole genome sequencing for...

Preview:

Citation preview

Whole genome sequencing for outbreak analysis and pathogen typing

Challenges and Opportunities

Alan TsangScientific Officer (Medical)

Microbiology Division, PHLSB

23 Dec 2019

Agenda• Overview of typing

• WGS-based typing

• Examples

• Challenges and advantages of WGS

Typing• Allow differentiation of microbes beyond the species and subspecies

level

– To relate individual cases to an outbreak of infectious disease

– To establish an association between an outbreak of food poisoning

and a specific food vehicle

– To trace the source of contaminants within a manufacturing process

Typing

• Phenotypic

– Characterization of bacteria based on expressed traits

• Serotyping

• Genotyping

– Characterization of bacteria based on genetic content

• Pulsed–field gel electrophoresis (PFGE)

• Multi-locus sequence typing (MLST)

• Variable-number tandem repeat (VNTR) typing

Drawbacks• Low resolution

– Only rough idea of relationship between isolates

• Labour intensive

– Lots of tedious lab work

• Relatively expensive

– In time and consumables

3 years ago…

Systems Comparison

15Gb Output• For E. coli, ~5 Mb • 80x coverage depth• ~ 0.4 Gb• ~ 3% of a MiSeq run

Whole genome sequencing workflow

Kwong JC et al. Whole genome sequencing in clinical and public health microbiology. Pathology. 2015

Next-Gen Sequencing Library preparation

How does Illumina sequencing work

Better libraries, better runs, better data

Basic genome informatics• Millions of DNA sequences

– Reads

• Typically 50-300 bp each

• Includes quality information

• File size ~ 1 gigabyte

Whole genome sequencing workflow

Kwong JC et al. Whole genome sequencing in clinical and public health microbiology. Pathology. 2015

Two main approaches

• Gene-by-gene comparisons

• Single Nucleotide Polymorphism (SNP) analysis

gene-by-gene comparisons • Compare in Gene level

• Multi-Locus Sequence Typing (cgMLST/wgMLST)

• Can be standardized between laboratories

• Databases:

• Ridom SeqSphere+ (Commercial Software)

• BIGSdb

cgMLST database

cgMLST database

Genomes and Loci

L1

Strain 1

Strain 2

Strain 3

Strain 4

Strain 5

Strain 6

L2 L3 L4 L5 L6 L7 L8

L1

L1

L1

L1

L1

L1

L2 L3 L4 L5 L6 L7 L8

L2850

L2850

L3 L4 L5 L6 L7 L8 L2850

L3 L4 L5 L6 L8 L2850

L2 L3 L4 L5 L7 L8 L2850

L3 L4 L5 L6 L7 L2850

L2 L3 L4 L5 L6 L7 L8 L2850

…..

…..

…..…..…..

…..…..

cgMLST

L1

Strain 1

Strain 2

Strain 3

Strain 4

Strain 5

Strain 6

L3 L4 L5

L1

L1

L1

L1

L1

L1

L3 L4 L5

L3 L4 L5

L3 L4 L5

L3 L4 L5

L3 L4 L5

L3 L4 L5

L2850

L2850

L2850

L2850

L2850

L2850

L2850

…..

…..

…..…..…..

…..…..

Genomes and Loci

L1

Strain 1 1111….1

Strain 2 1111….1

Strain 3 2211….2

Strain 4 3322….3

Strain 5 2111….2

Strain 6 1111….1

L3 L4 L5

L1

L1

L1

L1

L1

L1

L3 L4 L5

L2850

L2850

L3 L4 L5 L2850

L3 L4 L5 L2850

L3 L4 L5 L2850

L3 L4 L5 L2850

L3 L4 L5 L2850

Whole genome sequencing workflow

Kwong JC et al. Whole genome sequencing in clinical and public health microbiology. Pathology. 2015

Single Nucleotide Polymorphism (SNP) analysis

• This approach provides an even higher resolution power than cgMLST

• A difference between DNA sequences in the identity of a single

nucleotide (an A, T, G, or C)

• have the advantage of including intergenic regions

Read mapping

What a SNP look likeSNP (A=>G)

Reference

SNP-based typing

Ref GGCAGCAGTGTCTTGCCCGATTGCAGGATGAGTTACCAGCCACAGAATT

Strain A GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT

Strain B GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT

Strain C GGCAGCAGTGTCATGCCCGATTGCAGGATGAGTTACCAGCCACAGAATT

Strain D GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT

Strain E GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT

Strain F GCCACCAGAGTCTTACCGGATAGCAGCATGAGATACCTGCCACACAATT

SNP-based typing

A B C D E

A

B 0

C 1 1

D 0 0 1

E 0 0 1 0

F 12 12 11 12 12

Phylogenetic treeA

B

D

E

C

F

1 SNP

SNP matrixConcatenated SNP’s from the SNP matrix are

used to construct a phylogenetic tree

Ref GGCAGCAGTGTCTTGCCCGATTGCAGGATGAGTTACCAGCCACAGAATT

Strain A GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT

Strain B GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT

Strain C GGCAGCAGTGTCATGCCCGATTGCAGGATGAGTTACCAGCCACAGAATT

Strain D GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT

Strain E GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT

Strain F GCCACCAGAGTCTTACCGGATAGCAGCATGAGATACCTGCCACACAATT

SNP-based typingRef GGTTGCTGGTAG

Strain A GGTAGCTCGTAG

Strain B GGTAGCTCGTAG

Strain C GGTAGCTGGTAG

Strain D GGTAGCTCGTAG

Strain E GGTAGCTCGTAG

Strain F CCATAGAGCATC

A B C D E

A

B 0

C 1 1

D 0 0 1

E 0 0 1 0

F 12 12 11 12 12

Phylogenetic treeA

B

D

E

C

F

1 SNP

SNP matrixConcatenated SNP’s from the SNP matrix are

used to construct a phylogenetic tree

Example – outbreak investigation• In 2019, a cluster of Candida auris colonization occurred in a public

hospital in Hong Kong and affected 15 patients over a period of

approximately one month. This occurrence marked the first ever

detection of C. auris in Hong Kong.

• Whole-genome sequencing for the isolates was performed as part of the

outbreak investigation.

Major clades of Candida auris

Strains were:

• Very different across clades

• Highly related within clade

SNP numbers will vary…

using SNP callingpipeline A

using SNP callingpipeline B

SNP analysis • Many academic researchers have developed pipelines for similar

analysis, some of which are publically available

– output vary

• Many variables affect the number of measured SNPs between isolates

– tools employed

– SNP-calling filters / parameters

– species (nucleotide mutation rates vary between pathogens)

– reference sequence

– number and diversity of isolates analyzed

– time between samples

• Interpret genomic data in parallel with local epidemiological data

• No SNP databases or nomenclature is available

Schürch AC et al. Clin Microbiol Infect. 2018

Hatherell HA et. al. BMC Med. 2016

Example – serovar prediction• Traditional serology and the Kauffmann White Scheme (KWS) have

been the gold standard for Salmonella serotyping

– maintained by the World Health Organization (WHO)

Collaborating Centre for Reference and Research on Salmonella,

located at the Pasteur Institute in Paris, France

– The current (9th) edition issued in 2007 comprises antigenic

variants that had been validated as of January 1, 2007

• Evaluate the potential use of WGS to serve as a method for the routine

serotyping of Salmonella isolates

Salmonella Serotyping Using WGS

Strain Traditional Serotyping

Tool A Tool B v1 Tool B v2

1 Derby Derby N/A Derby

2 Bovismorbificans Bovismorbificans N/A Bovismorbificans

3 Wandsworth Wandsworth Wandsworth N/A

4 Typhimurium I 4,[5],12:i:- Typhimurium Typhimurium

5 Chailey Breda Chailey Chailey

6 Virchow Virchow N/A Virchow

7 Urbana Johannesburg N/A Urbana

8 Crewe Crewe|Poitiers N/A Crewe

New edition of the scheme - 2020

Challenges• Different pipelines

– different results

• Different versions of same pipeline

– different results

Drawbacks• Interpretation of WGS data

• A set of standardized tools and guidelines is not defined yet

• Cost?

• Data storage

– WGS generates large amounts of data

– requires both physical space and virtual space

• Internet connection/speed

– The large amounts of data generated by WGS need to be

transferred through the Internet to be available and of benefit to the

global community

Benefits of WGS• Performance

– a far superior resolution

– provides more information on pathogens

• Ease of sharing

– can be easily exchanged electronically around the globe

– can be stored in repositories (e.g. NCBI, EBI)

– the genomic data can be reanalyzed locally at any time

– local pathogens can easily be compared with other sequences in

publicly available international databases, allowing the local

outbreak to be interpreted in an international context

• Universality

– universal across all pathogens

X species-specific primer

X species-specific enzyme

Thank you

For Your Attention

Recommended