22
Sequence quality: GMI Proficiency Tests for Whole Genome Sequencing of bacteria Research Group of Genomic Epidemiology National Food Institute, Technical University of Denmark EURL-AR Training course 2017 Presented by Pimlapas Leekitcharoenphon (Shinny) (DTU-Food)

Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

24/09/17

Sequence quality: GMI Proficiency Tests for

Whole Genome Sequencing of bacteria

Research Group of Genomic Epidemiology National Food Institute, Technical University of Denmark EURL-AR Training course 2017

Presented by Pimlapas Leekitcharoenphon (Shinny)

(DTU-Food)

Page 2: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

GMI

2

(www.globalmicrobialidentifier.org)

Page 3: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

Objectives of GMI PT

•  The main objective of the annual proficiency test (PT) is to facilitate the production of reliable laboratory results of consistently good quality within the area of whole genome sequencing (WGS) by

– Selecting two strains of three species of public health importance – Selecting species that range in sequencing difficulties – Assessing the sequencing quality based on a set of quality markers

e.g. N50, no of contigs etc. but also the ability to identify epidemiological markers such as MLST and resistance genes

–  Identify participants underperforming

•  To facilitate harmonization and standardization of whole genome sequencing and data analysis setting tentative arbitrary quality control thresholds

3

Page 4: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

Objectives of GMI PT

4

-  To quantify differences among laboratories in order to facilitate the

development of reliable laboratory results of consistently good quality within the area of DNA preparation, sequencing, and analysis (e.g. phylogeny).

-  To facilitate harmonization and standardization in whole genome sequencing and data analysis

Page 5: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

Structure of GMI PT, wet-lab

Component 1a Material provided: Bacterial cultures (lyophilized) •  DNA extraction, purification •  Library-preparation, and whole-genome-sequencing of six bacterial

cultures Component 1b Material provided: Purified DNA (pre-prepared, dried) •  Library-preparation, and whole-genome-sequencing of the same six

bacterial cultures Results •  Submission reads (via a portal or ftp site) •  Survey response

– Method details – MLST (optional) – Resistance genes (optional)

5

Page 6: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

Development of GMI PT

6

2014 – pilot PT 2015 – ‘full roll-out’

Salmonella (2) E. coli (2) S. aureus (2)

2016

K. pneumonia (2) L. monocytogenes (2) C. coli (1) C. jejuni (1)

Page 7: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

Participation in the 2016 GMI PT •  46 laboratories in 22 countries had provided data for at least one of the

PT components – Australia (3), Austria, Belgium (2), Canada (2), Denmark (3),

Finland, France, Germany (3), Hong Kong, Italy (7), Latvia, Luxembourg, Mexico, the Netherlands (3), Poland, Portugal, Singapore (2), Sweden (2), Switzerland, Taiwan, the United Kingdom (2), the United States (6)

Page 8: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

• Number of reads mapped to – reference total DNA sequence – reference chromosome – reference plasmid #1 – reference plasmid #2 – reference plasmid #3 – and unmapped reads

• Proportion of reads mapped to the above • Depth of coverage, of the above • Size of assembled genome • Size of assembled genome per total size of DNA sequence (%) • Total number of contigs • Number of contigs > 200 bp • N50 • NG50

Measured QC parameters

8 24 September

2017

Page 9: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

Individual participants reports

9

Pending for the 2016 trial

Page 10: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

QC parameters output

24 September

2017

10

•  Resistance gene partly as expected •  Resistance gene not as expected •  Resistance gene as expected

•  2 times standard deviation •  3 times standard deviation

•  Data from participants with obvious errors will be omitted prior to analysis

Page 11: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

•  The proportion of reads produced which map directly to the closed genome of the same strain. (=> cannot exceed 100%)

Proportion of reads mapped to reference DNA sequence (%)

11 Campylobacter; GMI16-001 – omitted #114

% %

Outlier

Outlier

Only in Bact samples •  Indication of contamination or strain mix up •  #83 and #115 missing reads

Page 12: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

•  The proportion of contigs which map directly to the closed genome of the same strain (=> should not exceed 100%)

Size of assembled genome per total size of DNA sequence (%)

12 Campylobacter; GMI16-001

Outlier

Outlier

% %

Outlier

Clearly contaminations •  Assembly exceed the expected size of the

reference •  #83 and #79 of the DNA and #71 of both

samples types

Page 13: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

Number of contigs

- Fewer is better

N50

Total size of contigs

50% of size

Size of contig

N50

Page 14: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

•  Definition: The length for which the collection of all contigs of that length or longer contains at least half of the sum of the lengths of all contigs, and for which the collection of all contigs of that length or shorter also contains at least half of the sum of the lengths of all contigs. A N50 more than 15000 normally indicate good quality.

N50

14

15.000

Campylobacter; GMI16-001

Outlier

bp bp Poor performance – short contigs •  #79 and #105 for the Bact sample •  #71, #105, and #110 for the DNA sample

Page 15: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

•  The total number of contigs assembled. A number of contigs less than 1000 normally indicate good quality.

Total number of contigs

15 Campylobacter; GMI16-001

Outlier

bp

1.000

bp

Poor performance – large number of contigs •  #71, #79 and #105 for the Bact sample •  #71, and #105 for the DNA sample

Page 16: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

SNP analysis

#83

Strain SampleType

NumberofSNPs

GMI16-001 Culture 3DNA 0

Number of SNPs per strain

Campylobacter; GMI16-001

3 SNPs difference to the ref. (#83)

Page 17: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

• Obvious outliers removed, #114 submitted data of another strain

• #83 Bact, indication of contaminations – Detected AMR genes not present in the reference genome – Proportion of reads mapping to ref. much less than 100% – Proportion of size per total size of ref. much higher than

100% – 3 SNPs difference to the ref.

• #79 Bact, indication of contaminations and poor performance – Detected AMR genes not present in the reference genome – Proportion of size per total size of ref. much higher than

100% – A total no. of contig higher than 1.000 – N50 lower than 15.000 bp

Overall results – poor performance

17

Page 18: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

• #115 Bact, indication of contaminations –  Proportion of reads mapping to ref. much less than 100%

• #71 Both sample types, indication of contaminations and poor performance

– Proportion of size per total size of ref. much higher than 100%

– A total no. of contig higher than 1.000 – N50 lower than 15.000 bp for DNA

• #105, Both sample types, indication of poor performance – A total no. of contig higher than 1.000 – N50 lower than 15.000 bp for DNA

Overall results – poor performance, cont

18

Page 19: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

• #110, Both sample types, indication of poor performance – N50 lower than 15.000 bp for DNA

Overall results – poor performance, cont

19

Page 20: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

• The interpretation of the MLST data and final layout of the QC are pending but scheduled to be finished in May ‘17

• The individual participants reports disseminated before July ’17

• PT report 2016 online before July ’17 and 2015 report before Sep ‘17

• A satisfactory results for most labs except for – #71, #79, #83, #114, #115 due to contaminations – #71, #79, #105, #110 due to poor sequencing performance

• Continuation in 2017 focusing on Salm., E.coli and S. aureus

Summary of PT 2016

20

Page 21: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

Acknowledgement

21

Oksana Lukjancenko (DTU Food)

Susanne Klarsmose Pedersen (DTU Food)

Pimlapas Leekitcharoenphon (DTU Food)

Rolf Sommer Kaas (DTU Food)

Inge Marianne Hansen (DTU Food)

Jacob Dyring Jensen (DTU Food)

Frank Aarestrup (DTU Food)

Ole Lund (DTU Systems Biology)

Jose Luis Bellod Cisneros (DTU Systems Biology)

James Pettengill (US FDA)

Division of Microbiology (CFSAN/FDA)

Anthony Underwood (PHE)

Brian Beck (Microbiologics)

Isabel Cuesta de la Plaza (ISCIII)

Angel Zaballos (ISCIII)

Jorge De La Barrera Martinez (ISCIII)

…..and the rest of WG 4 (‘advisory group’)

GMI is supported by:

Page 22: Sequence quality: GMI Proficiency Tests for Whole Genome ... · within the area of whole genome sequencing (WGS) by – Selecting two strains of three species of public health importance

DTU Food, Technical University of Denmark Add Presentation Title in Footer via ”Insert”; ”Header & Footer”

Thank you for your attention

Pimlapas Leekitcharoenphon (Shinny), PhD

Research Group Genomic Epidemiology

WHO Collaborating Centre for Antimicrobial Resistance in Food borne Pathogens

and Genomics

European Union Reference Laboratory for Antimicrobial Resistance

National Food Institute, Technical University of Denmark

[email protected]