Upload
vuongque
View
229
Download
1
Embed Size (px)
Citation preview
The Comprehensive Antibiotic Resistance Database A Platform for Antimicrobial Resistance Surveillance
Andrew G. McArthur, Ph.D.
Michael G. DeGroote Institute for Infectious Disease Research
McMaster University, Hamilton, Ontario, Canada
arpcard.mcmaster.ca
Global AMR Crisis
Which resistance genes are present?
Which genes are moving around?
Which pose a threat?
NDM-1 metallo-beta-lactamase
Promise of Molecular Surveillance
Promise of Molecular Surveillance
retrospective & survey
DNA sequencing
clinical DNA
sequencing
Promise of Molecular Surveillance
PHAC’s ‘One Health’ Concept
Wright. 2010. Expert Opin. Drug Discov. 5:779-788.
A Wicked Problem
• No single solution – but critical demand for action
• Very complex – requires the combined expertise of many individuals,
groups, and disciplines
• Working in ‘silos’ without collaboration does not solve wicked problems
• Solutions require a very comprehensive and interdisciplinary strategy
• AMR gene surveillance a key component
A Wicked Problem
A Wicked Problem
genes and
pathogens we are
tracking
A Wicked Problem
genes and
pathogens we are
tracking
characterized genes
we don’t routinely
track; variants of
known genes
A Wicked Problem
genes and
pathogens we are
tracking
characterized genes
we don’t routinely
track; variants of
known genes
emergent threats
Promise of Molecular Surveillance
DNA sequencing
prediction of
phenotype &
antibiogram
comparison to reference
sequences
prediction of
resistome
Comprehensive Antibiotic Resistance Database
arpcard.mcmaster.ca
Comprehensive Antibiotic Resistance Database
• High quality reference data on the molecular basis of AMR – expert
curation.
• Organized by the Antibiotic Resistance Ontology (ARO), a theoretical
framework for organizing antibiotic resistance information.
• Breadth of data – AMR via horizontal gene transfer (HGT) +
comprehensive mutation data for genome-based AMR.
• Advanced analytics – predicts resistome based on both sequence
similarity and mutant detection.
• Discovery – development of methods for detection of new variants and
emergent threats
• Growth – constantly curated resource + building algorithms for
detection of AMR mechanisms not examined by other databases (rRNA
mutations, van clusters, etc.)
Comprehensive Antibiotic Resistance Database
• A controlled vocabulary for the codification of drugs, targets, resistance
genes, and mechanisms.
• Resistance terms are linked together by a set of relationships.
• Allows for computation over a network of curated AMR knowledge.
• An essential step towards the development of standards for data sharing
among research teams.
• Development to date has been a scrum-style accumulation of ontology
terms and metadata
• Now moving to systematic approaches to ARO development
• ESKAPE pathogens
• Formal ontology structures
• Computer-assisted literature curation
Antibiotic Resistance Ontology
Antibiotic Resistance Ontology
Detection Model Ontology
DNA sequencing
prediction of
phenotype &
antibiogram
comparison to reference
sequences
prediction of
resistome
detection models &
parameters
Detection Model Ontology
• protein homolog model
• protein variant model
• rRNA mutation model
• absent protein homolog model – resistance by absence
• gene order model – functional glycopeptide resistance clusters
• multiple gene mutations model – resistance by co-mutation
• regulatory models – efflux upregulation by mutations in promoters
Detection Model Ontology
• protein homolog model
• reference sequence
• BLASTP cut-off
Detection Model Ontology
• protein variant model
• reference sequence (often sensitive wildtype)
• BLASTP cut-off
• Mutations – SNPs, indels, etc.
Detection Model Ontology
• gene order meta-model
• secondary interpretation of resistome predictions
• a step towards prediction of phenotype
vanR vanS vanH vanA vanX vanY vanZ
functional VanA operon meta-model
vanR
vanS
vanH
vanA
vanX
vanY
vanZ
vanC
vanXYc
vanT
vanBc
vanDc
Gen
om
e sequ
ence
etc
RGI
3458 Ontology Terms, 3228 Reference Sequences, 703 SNPs, 1979
Publications, 2263 AMR Detection Models
• McArthur et al. 2013. The
Comprehensive Antibiotic Resistance
Database. Antimicrobial Agents and
Chemotherapy, 57, 3348-3357.
• Cited 158 times since publication
• Data used for genome analysis
• Data used to build AMR models
(e.g. ResFams)
• Data used to enrich a database
(e.g. IslandViewer, ARG-
ANNOT)
• RGI use to predict resistome
• Averaging ~3000 visits per month
and ~300 RGI analyses per month
1
List of antibiotic resistance genes curated in the CARD as of February 2015. Each entry provides
citation, molecular sequence, protein structure, mechanism, and Antibiotic Resistance Ontology (ARO)
classification details. * There a large number of antibiotic efflux pump proteins, see the CARD for the
complete list. Efflux pumps are cross-referenced to the antibiotic molecules exported. aminocoumarins
aminocoumarin resistant DNA topoisomerases
aminocoumarin resistant GyrB, ParE, ParY
aminoglycosides aminoglycoside acetyltransferases
AAC(1), AAC(2'), AAC(3), AAC(6')
aminoglycoside nucleotidyltransferases
ANT(2''), ANT(3''), ANT(4'), ANT(6), ANT(9)
aminoglycoside phosphotransferases
APH(2''), APH(3''), APH(3'), APH(4), APH(6),
APH(7''), APH(9)
16S ribosomal RNA methyltransferases
ArmA, RmtA, RmtB, RmtC, Sgm, etc.
b-lactams
class A b-lactamases
AER, BLA1, CTX-M, KPC, SHV, TEM, etc.
class B (metallo-) b-lactamases
BlaB, CcrA, IMP, NDM, VIM, etc.
class C b-lactamases
ACT, AmpC, CMY, LAT, PDC, etc.
class D b-lactamases
OXA b-lactamases
methicillin resistant PBP2
mecA, mecC
mutant porins protein conferring antibiotic resistance
antibiotic resistant Omp36, OmpF, PIB (por)
genes modulating b-lactam resistance
bla (blaI, blaR1) and mec (mecI, mecR1) operons
chloramphenicol chloramphenicol acetyltransferase (cat)
chloramphenicol phosphotransferase
chloramphenicol/florfenicol export proteins
cmlA, floR, MexEF-OprN, AcrAB-TolC, etc.
elfamycins elfamycin resistant elongation factor Tu
facT
ethambutol ethambutol resistant arabinosyltransferase (EmbB)
fluoroquinolones
fluoroquinolone acetyltransferase
fluoroquinolone resistant DNA topoisomerases
fluoroquinolone resistant GyrA, GyrB, ParC
quinolone resistance proteins (Qnr)
fosfomycin fosfomycin phosphotransferases
FomA, FomB, FosC, FosC2
fosfomycin thiol transferases
FosA, FosB, FosX
fosfomycin resistant murA
fusidic acid fusidic acid esterase fusH
glycopeptides bleomycin resistance protein (BRP)
teicoplanin resistance protein vanJ
glycopeptdie resistance clusters
VanA, VanB, VanC, VanD, etc.
lincosamides Cfr 23S ribosomal RNA methyltransferase
clbA, clbB, clbC
Erm 23S ribosomal RNA methyltransferases
ErmA, ErmB, Erm(31), etc.
lincosamide nucleotidyltransferase (Lin)
linezolid Cfr 23S ribosomal RNA methyltransferase
lipopeptide antibiotics polymyxin resistance genes arnA, rosA, rosB
daptomycin resistance liaFSR
antibiotic resistant cardiolipin synthetase (cls)
lipopeptide resistant beta-subunit of RNA polymerase
macrolides Cfr 23S ribosomal RNA methyltransferase
Erm 23S ribosomal RNA methyltransferases
ErmA, ErmB, Erm(31), etc.
non-Erm 23S ribosomal RNA methyltransferase
rlmA(II), chrB, myrA, tlrB
macrolide esterases
EreA, EreB
macrolide glycosyltransferases
GimA, Mgt, Ole
macrolide phosphotransferases (MPH)
MPH(2')-I, MPH(2')-II, MPH-C
mupirocin mupirocin resistant isoleucyl-tRNA synthetases
MupA, MupB
peptide antibiotics integral membrane protein MprF
bactitracin resistance genes bacA, bcrC
non-Erm 23S ribosomal RNA methyltransferase tsnr
viomycin phosphotransferase
phenicol
Cfr 23S ribosomal RNA methyltransferase
chloramphenicol/florfenicol resistance protein (cmlA)
rifampin
rifampin ADP-ribosyltransferase (Arr)
rifampin glycosyltransferase
rifampin monooxygenase
rifampin phosphotransferase
rifampin resistance RNA polymerase-binding proteins
DnaA, RbpA
rifampin resistant beta-subunit of RNA polymerase (RpoB)
streptogramins Cfr 23S ribosomal RNA methyltransferase
Erm 23S ribosomal RNA methyltransferases
ErmA, ErmB, Erm(31), etc.
streptogramin Vgb lyase
Vat acetyltransferase
streptothricin streptothricin acetyltransferase (sat)
sulfonamides sulfonamide resistant dihydropteroate synthases
Sul1, Sul2, Sul3, sulfonamide resistant FolP
tetracyclines mutant porin PIB (por) with reduced permeability
tetracycline inactivation enzyme TetX
tetracycline resistance ribosomal protection proteins
TetM, TetO, TetQ, Tet32, Tet36, etc.
trimethoprim
trimethoprim resistant dihydrofolate reductase dfr
tunicamycin tunicamycin binding protein tmrB
efflux pumps conferring antibiotic resistance * ATP-binding cassette (ABC) antibiotic efflux pumps
major facilitator superfamily (MFS) antibiotic efflux pumps
multidrug & toxic compound extrusion (MATE) transporters
resistance-nodulation-cell division (RND) efflux pumps
small multidrug resistance (SMR) antibiotic efflux pumps
genes modulating antibiotic efflux adeR, acrR, baeSR, mexR, phoPQ, mtrR, etc.
perfect match
screening
novel emergent
AMR genes
novel functional
variants
dedicated resistance genes;
often plasmid-borne
resistance by mutation;
predominantly genomic
whole genome sequencing
(WGS); whole AMR genes
whole community sequencing
(metagenomics); partial gene sequencing
intrinsic resistance;
regulatory
Analyzing the Resistome
Resistance Gene Identifier
perfect
strict
loose / discovery
matches reference
sequence
similarity within
model
similarity outside
of model
“known known”
“known unknown”
“unknown unknown”
Resistance Gene Identifier
McMaster University Hospital – clinical MDR Klebsiella pneumoniae isolate
Resistance Gene Identifier
McMaster University Hospital – clinical MDR Klebsiella pneumoniae isolate
Resistance Gene Identifier
McMaster University Hospital – clinical MDR Klebsiella pneumoniae isolate
Resistance Gene Identifier
McMaster University Hospital – clinical MDR Klebsiella pneumoniae isolate
Resistance Gene Identifier
• RGI designed for whole genomes or assembly contigs
• BLAST-based homlogy analysis + SNP mapping
• Thus effective based on CARD’s canonical reference sequences
• Is CARD data effective for metagenomics analysis?
• Active area of RGI development
• CARD’s canonical reference sequences may under-represent nucleotide
sequence diversity for AMR genes ‘in the wild’
• Burrows-Wheeler Transform read mapping to CARD:
• has a false negative rate due to missing sequence diversity in CARD
• needs to incorporate SNP screening for total resistome prediction
Resistance Gene Identifier
Freschi et al. 2015. Frontiers in Microbiology 6:01036.
Resistance Gene Identifier
Freschi et al. 2015. Frontiers in Microbiology 6:01036.
Resistance Gene Identifier
• RGI has false positives for a Brazilian epidemic strain of Pseudomonas aeruginosa
• Not a failure of the RGI algorithms, but instead a gap in CARD curation
• blaSPM-1 included in the Antibiotic Resistance Ontology but models not yet
added
• Byciclomycin resistance not yet curated into CARD, including the bcr-1 gene
CARD
human curation
AMR detection model with reference sequence, cut-offs, and parameters
• Reference sequence must be published and have GenBank accession
• Publication must have clear evidence of resistance
CARD
human curation
AMR detection model with reference sequence, cut-offs, and parameters
• Reference sequence must be published and have GenBank accession
• Publication must have clear evidence of resistance
CARD*Shark text mining algorithms to prioritize and
triage AMR literature
CARD
human curation
Wild*CARD algorithms
AMR detection model with reference sequence, cut-offs, and parameters
• Reference sequence must be published and have GenBank accession
• Publication must have clear evidence of resistance
New distribution of CARD
reference sequences among
pathogens, genomes, plasmids
Detection of sequence variants
that have been published and
have GenBank accessions
Detection of novel sequence
variants among pathogens
Downloads
• Ontology OBO, JSON, tab-delimited files
• Antibiotic Resistance Ontology
• Detection Model Ontology
• NCBI Taxon Ontology
• CARD Detection Models
• JSON format
• Mutation catalogs
• Nucleotide & Protein Sequences
• FASTA by model type (i.e. homolog screening versus mutant screening)
• ARO classification and index files
• Stand-alone, command line Resistance Gene Identifier software
• Commercial use?
• With a license or collaborative agreement
Frequently Asked Questions
• What data can be included? Can I add unpublished data?
• Only peered reviewed, published data that is also associated with a
GenBank accession can be included in the CARD. SNP and other
mutations not known from clinical settings are currently excluded.
• How do I find a list of all resistance genes in a particular organism?
• CARD is based on reference gene sequences, so does not fully annotate
genomes.
• For intrinsic resistance genes for which resistance is conferred by specific
mutations, does CARD include all known mutant sequences?
• CARD only maintains a complete list of all resistance SNPs relative to a
reference sequence, which may either be a reported mutant sequence or a
wild-type sequence. As such, it is important that SNP mapping be included
in analysis of any genes that require mutation to confer resistance.
Frequently Asked Questions
• How are Minimum Inhibitory Concentration (MIC) data curated?
• The CARD does not yet curate MIC data directly, but instead records the
resistance profile of resistance genes, e.g. beta-lactamases
confers_resistance_to beta-lactams.
• Can CARD and the RGI accurately predict antibiogram?
• Curation of confers_resistance_to_drug relationships for accurate
prediction of antibiogram is currently inconsistent throughout the CARD
and our RGI software is focussed primarily upon accurate prediction of
resistome, not antibiogram.
• Does the Resistance Gene Identifier (RGI) work for metagenomics data?
• Currently the RGI does not analyze metagenomics data, outside of a simple
BLASTX algorithm (with SNP screening) available as a beta-test feature.
The default RGI behaviour attempts to predict complete open reading
frames (ORFs) from submitted nucleotide data, which will fail for short
metagenomic reads.
Challenges
• Biocuration of reference data is difficult to fund, but without gold standard
reference data the power of NGS erodes
• AMR genomic data is never static – unlike traditional biocuration, we are not
annotating a static genome – the target is always moving
• Sequencing technology is getting smaller and cheaper – we will be collecting
increasing amounts, even at the level of clinic or farm – we need a far-reaching
data storage and sharing paradigm
• metagenomics data is difficult and computationally costly – we are currently
only analyzing the easiest subsets of AMR mechanisms; mutation often ignored
• Easy to use translational tools increasingly needed!
What we are not currently collecting
• Minimum inhibitory dose (MIC) – GenBank is planning MIC collection
• Pathogen and AMR gene prevalence
• Poor curation of data on plasmids and other aspects of the mobilome
• Environmental and clinical metadata
Thanks!
Amos Raphenya, Brian Alcock, Biren Dave, Kara Tsang, Briony Lago
Justin Jia
Arjun Sharma
Pearl Guo
(UWaterloo)
Sachin Doshi Tariq Elsayegh
(Royal College of
Surgeons in Ireland)
Daim Sardar
Thanks!
amr.mcmaster.ca
Arjun Sharma
• Dr. Gerry Wright (McMaster) and the CARD Consortium
• GenEpiO Consortium
• CARD & RGI Beta-testers!
• McMaster University & Cisco Systems Canada, Inc.
• Sightline Innovation, Inc.
• National Microbiology Laboratory, Canada – IRIDA Surveillance Project
• United States Department of Agriculture – Agricultural Research Service
• National Center for Biotechnology Information (NCBI, USA)
• bioMérieux France, Inc.
• Pseudomonas Consortia – Dr. Roger C. Lévesque, Laval University
• IslandViewer Database – Dr. Fiona Brinkman, UBC
• Dr. Laura Piddock (UK) – efflux in the ARO & CARD
• β- lactamase nomenclature and informatics consortium
• Many more…
Thanks!