61
How Can We Make Genomic Epidemiology a Widespread Reality? William Hsiao, Ph.D. [email protected] @wlhsiao BC Public Health Microbiology and Reference Laboratory BCCDC Grand Round May 26 2015

How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao

Embed Size (px)

Citation preview

Page 1: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

How Can We Make Genomic Epidemiology a Widespread Reality?

William Hsiao, [email protected]

@wlhsiao

BC Public Health Microbiology and Reference Laboratory

BCCDC Grand Round May 26 2015

Page 2: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Outline

• Part 1: What is genomic epidemiology and Why is it important for public health microbiology

• Part 2: What are the requirements to bring genomic epidemiology to routine public health practice– Introducing our project IRIDA as part of the

solution

Page 3: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

3Source: Peter Gleick, Scienceblogs.com

Page 4: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

PeoplePlaceTime

Source: Melanie Courtot

Page 5: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

PeoplePlaceTime

Source: Melanie Courtot

Page 6: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

PeoplePlaceTime

Source: Melanie Courtot

Page 7: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Molecular Epidemiology

• Laboratory generated biomarker results can be correlated to epidemiological investigations (People, Place, Time)

• Provides linkage based on common exposure to the same pathogen at the molecular level

• Most tests detect one or a few of specific biomarkers, representing a fraction of the pathogens’ genetic information

Page 8: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Current Methods of Characterizing Foodborne Pathogens in a Public Health Laboratory

• Growth characteristics • Phenotypic panels • Agglutination reactions • Enzyme immuno assays (EIAs) • PCR • DNA arrays (hybridization) • Sanger sequencing of marker genes• DNA restriction • Electrophoresis (PFGE, capillary)

Each pathogen is characterized by methods that are specific to that pathogen in multiple workflows (separate workflows for each pathogen) TAT: 5 min – weeks (months)

Source: Rebecca Lindsey

Page 9: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Genomic Epidemiology

Def: Using whole genome sequencing data from pathogens and epidemiological investigations to track spread of an infectious disease

Page 10: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Why Genomic Epidemiology

• One technology (DNA sequencing) compatible with many types of pathogens

• Capable of generating 10-1000s of high quality pathogen genomes within 1-7 days

Page 11: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Sequencing = lots of HQ Data

• Capture the pathogen’s entire genetic makeup• Unbiased (~97-99+% of the genome captured using

common sequencing approaches) • Significantly more data than traditional methods• Allow higher resolution and higher sensitivity methods to

be applied• Allow value-added

evolutionary & Functionalstudy of the pathogens– Virulence factors– AMR genes

Page 12: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

$10K per human genome or $10 per bacterial genome

$100M per human genome

Sequencing cost continues to drop

Page 13: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Variations in genomes = Basis of Comparison

• Mutations– Point mutations– Small insertions and deletion (indels)– Can change functions of a gene

• Recombination, deletion, and duplication– Rearrange genes, can change expression– Increase gene copy number– Delete genes

• Horizontal gene transfer– Acquiring genetic material from non-parental organism

• E.g. Antibiotic resistance / new toxins

Page 14: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

SNP Analysis

• What is a SNP?– A SNP (single nucleotide polymorphism) is DNA sequence

variation occurring when a single nucleotide differs between two or more genomes

ATCGCGATATCATACGGATCGCAATATCATACGGATCGCGATATCATACGGATCGCGATATCATACGGATCGCAATATCATACGG

• SNP can be created from point mutation but can also be created from insertion and deletion of one nucleotide

Page 15: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Why are SNPs useful

• Silent mutations that do not change protein sequences happen quite frequently due to DNA replication errors => High Resolution

• SNPs occurs across the whole genome and can be detected from whole genome sequencing => Unbiased markers

• SNPs can also be used to infer phylogeny of organisms– More shared SNPs = more closely related

Page 16: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

SNP Minimal Spanning Tree – colored by Phage Type

PT8

PT4

PT13a

PT52

The most similar isolates are connected first => clustering them together

Page 17: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

SNP Minimal Spanning Tree – colored by outbreaks

Page 18: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Many phylogenetic trees based on SNPs published to show clustering of outbreak cases

den Bakker et al Emerg Infect Dis. 2014 Aug;20(8)

Non-related cases

Outbreak cases

Allard, M et alPLoS ONE 8 (1) 2013

Page 19: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Forces Driving Pathogen Genome Evolution

Specialization“lean and mean”

New function can be derived through:

Gene expression and be turned on and off

Page 20: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Intra-cluster distances overlap with inter-cluster distances

Leekitcharoenphon, et al. 2014. PLoS ONE 9 (2). doi:10.1371/journal.pone.0087991.

Page 21: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Different species have different clustering distances

Leekitcharoenphon, et al. 2014. PLoS ONE 9 (2). doi:10.1371/journal.pone.0087991.

Page 22: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Genomics + Epidemiology

• Having genetic distance information alone may not be enough to fully characterize outbreaks

• Need to combine with epidemiological investigations

• Using known clusters to establish (sub-)species-specific genetic distance criteria

• Genomics can help connecting previous unlinked cases to uncover new cases

Page 23: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Each year, one in eight Canadians (or four million people)

get sick with a domestically acquired food-borne illness.

http://www.phac-aspc.gc.ca/efwd-emoha/efbi-emoa-eng.php

Page 24: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Whole Genome Sequencing of Foodborne Pathogens Around the World

• UK Public Health England committed to sequence all the Salmonella isolates submitted to PH Lab

• US FDA and CDC (supported by National Center for Biotechnology Information) created a distributed network of labs to utilize WGS for pathogen identification

https://publichealthmatters.blog.gov.uk/2014/01/20/innovations-in-genomic-sequencing/http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/ucm363134.htm

Page 25: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Genome Canada Bioinformatics Competition: Large-Scale Project

“A Federated Bioinformatics Platform for Public Health Microbial Genomics”

Our Goal

The IRIDA platform(Integrated Rapid Infectious Disease Analysis)

An open source, standards compliant, high quality genomic epidemiology analysis platform based on web-technology to support real-time (food-

borne) disease outbreak investigations

25 www.IRIDA.ca

Page 26: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Partnership among public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and application to real-life and real-

time use cases in public health agencies

- Project Team has direct access to state of the art research in academia- Project Team is directly embedded in user organization

Page 27: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

IRIDA Project Phases

• Phase 1: genomics process and analysis pipeline to produce categorical data (MLST and SNPs) suitable for current epidemiological analysis – almost completed

• Phase 2: combine the categorical data with epidemiological data (line list approach to replace current Excel based approach) – in progress

• Phase 3: Develop IRIDA as an exploratory platform for new ways of interpreting genomics data in light of epidemiological and clinical data – in progress; continuous process beyond current project

Page 28: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

28

Interviews with key personnel to identify barriers to implement genomic epidemiology in

public health agencies

Page 29: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

GAP 1: PUBLIC HEALTH PERSONNEL LACK TRAINING IN GENOMICS

Page 30: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Microbial genomics has been a valuable research tool

• Help us understand:– microbial evolution– pathogenesis– create novel industrial processes– create new laboratory tests

• Use historical isolates – not real time• Use of laboratory strains – no associated rich

clinical and epidemiological metadata

Page 31: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Cultural and Practical Differences

Genomics Research Laboratory Genomics Diagnostic Laboratory

Curiosity driven Production / Case driven

Exploratory analysis tolerated Exploratory analysis discouraged

Reproducibility = other labs’ problem Reproducibility critical

Tweaking protocols desirable Stability in protocols desirable

Protocols don’t need to be validated Protocols need to be validated

Novelty justifies the high cost of experiment

Conscious of cost per unit test; tests need to be scalable

How do we bridge the cultural and the practical differences?

Page 32: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data

• Carefully designed and engineered software platform is just the starting point… User

Interface

Secu

rity

File system

Metadata Storage Application

logic

REST APIWorkflow Execution Manager

Continuous Integration Documentation

Page 33: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

• Easy to use interface hiding the technical details

Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data

Page 34: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data

Page 35: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Solution 1b: Build Portable and Transparent Pipelines

• Use Galaxy as workflow engine – large community support

• Retools to address usability, security, and other limitations

• Version Controlled Pipeline Templates• Input files, parameters, and workflow are

sent to IRIDA-specific Galaxy for execution• Results and provenance information are

copied from Galaxy

1. Input files sent to

Galaxy

3. Results downloaded from Galaxy

IRIDA UI/DB

GalaxyAssembly Tools

Variant Calling Tools

REST API

Shared File System

Worker Worker

2. Tools executed on Galaxy workers

Source: Franklin Bristow

Page 36: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Solution 1c: Start the training NOW!

• Canada’s National Microbiology Laboratory has hosted genomic workshops for partners and collaborators

• At, PHMRL, we have been conducting workshops to train technologists and researchers on some common genomic analysis tools

• IRIDA Project has dedicated funding for hosting workshops in 4Q of 2015 and 2016

• We would like to engage the epidemiologists in the future for training purpose as well

Page 37: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

GAP 2: INFORMATION SHARING IS INEFFICIENT AND AD-HOC

Page 38: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Many Players in surveillance and outbreak – ineffective information sharing

Source: M. Taylor, BCCDC

Provincial public health dept.

National laboratory

Local public health dept.

Provincial laboratory

Cases

Physicians Frontline lab

Information

Bioinformatics and Analytical Capacities

Page 39: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Many Systems used in Reporting Diseases –require data re-entry and re-coding

National Ministry of Health

Provincial public health dept.

National laboratory

Local public health dept.

Provincial laboratory

Cases

Physicians Local laboratory

Fax/Electronic

Fax

Phone/Fax

Electronic/Paper

Electronic/Fax/Phone Mailing of Samples/Fax/Eelctroni

c

Source: M. Taylor, BCCDC

Page 40: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Semantic Web

Credit: http://www.cs.rpi.edu/~hendler/

Semantic web is a suitable technology framework to organize and share arbitrary datasets

Page 41: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

What’s the web?

• World-Wide-Web (WWW) is a platform where– Information is distributed (CBC for news, Netflix

for Movies, etc.)– Information is heterogeneous (text, video,

pictures)– (relevant) Information is linked by hyperlinks– Often, information is only human readable– Often, information is incorrect– Often, information is not attributed

Page 42: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

What’s Semantic web?

• Semantic web inherits many of the (good) attributes of WWW (distributed, open, heterogeneous, and linked)

• It’s designed to be:– machine readable based on a common language of logic– Linking information can be automated making data sharing

easier– Easier to describe granular data – Errors can be detected based on logical reasoning– Information can be attributed and can be made to persist– “Smart Web”

Page 43: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

IRIDA uses semantic web technologies to address information management issues

• Solutions:– 2a: Localized Instance of federated databases

– 2b: Permission Control – authentication /authorization for information sharing

– 2c: User role-based display of information

Page 44: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

44

Solution 2a: Local/Cloud Instances and Data Federation

• Data processing capacity pushed to data generating labs• Allow data sharing securely for enhanced analysis• Eventually cultivating a culture of openness of data

sharing and collaborative development of tools

Page 45: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Authorization

Solution 2b: Security

• Local authorization per instance.• Method-level authorization.• Object-level authorization.• Allow secure, fine grained and

flexible information sharingcontrolled by data producer

Page 46: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Solution 2c: Role-based Dynamic Display driven by Ontology

• Ontologies often lack a content management system (CMS)• An Interface Model Ontology (IFM) can define a CMS for an

ontology

Source: Damion Dooley

Page 47: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
Page 48: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

IFM Interface View Permissions

Detailed View Restricted View

E.g. User role permissions control visibility and editing of content

Source: Damion Dooley

Page 49: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

GAP 3: INFORMATION REPRESENTATION IS INCONSISTENT

Page 50: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

There are at least 74 different ways to say “female” in ENA database

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383942/

Page 51: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Solution 3a: Use Ontology

• Ontology: a way to describe types of entities and relations between them

• Why use ontology– Ontology is flexible and expandable– Lower levels of expressivity (e.g. controlled vocabulary,

data dictionary) are heavy handed and show low level of compliance and adoption

– Free text used as an alternative that are not computing friendly

– Ontology and semantic web technologies may be a solution

Page 52: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

The Utility of Ontologies in Food-borne Investigations

Example:Correlate PFGE type SSOXAI.0042 cases between 01 Mar 2015- 16 Mar 2015 with Spinach Leafy Greens Produce High-Risk Food Sources and Symptoms of Nausea and Fever

Ontologist organizes how terms are related in a tree so one can search for terms at different levelsProvides great information-resolving power!!

High-Risk Food

Produce Poultry Seafood

Leafy Greens Sprouts Deli Meat Nuggets Fish Shellfish

Source: Emma Griffiths

Page 53: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Many Domains of Knowledge are needed to describe an outbreak investigation Build On, Work With:

OBITypON NGSOnto NIAID-GSC-BRC core metadataMIxS Ontology NCBI Biosample etcTRANS – Pathogen Transmission EPOExposure OntologyInfectious Disease OntologyCARD, ARO for AMRUSDA Nutrient DBEFSA Comp. Food Consump. DB

Example gaps to be filled: Expand food ontology; expand CARD AMR data with others.

Page 54: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Lab Checklist/Ontology

• Currently finishing a lab/genomics checklist• Metadata Domains:

– Sample Collection– Sample Source– Environmental– Lab Analytics– Sequencing Process /QC– Sequencing Run /QC– Assembly Process / QC– Others overlapping with Epi: Demographic / Geographic / etc.

• Starting an epidemiology checklist to be completed this year

Page 55: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

GAP 4: GENOMIC DATA INTERPRETATION IS COMPLEX AND TECHNOLOGY IS EVOLVING

Page 56: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Solution 4a: Use of QA/QC in IRIDA

• Software Engineering– High quality software that meets regulatory guidelines– Open Source product to ensure “white box” testing– Ontology driven software development– Follow proper software development cycle

• Data Quality– Built-in modules to check for input data quality – Warnings and Feedbacks during pipeline execution to laboratory technologists – Use of Ontology to check metadata (non-genomic) data quality

• Analytic Tool Quality– Utilize validation datasets– Use of abstract pipeline description – with version control– Periodic analysis of exceptions and boundary cases to assess tool accuracy

Page 57: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Solution 4b: Generation of validation datasets

To Participate, Contact Rene [email protected]

Or Errol Strain [email protected]

http://www.globalmicrobialidentifier.org/Workgroups#work-group-4

NML and BCPHMRL will be participating in the GMI proficiency test to compare our genomic sequencing and analysis protocols with other labs around the world

Page 58: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

58

Solution 4c: Exploratory tools can access certain data via REST API securely

http://pathogenomics.sfu.ca/islandviewer

IslandViewer

Dhillon and Laird et al. 2015, Nucleic Acids Research

http://kiwi.cs.dal.ca/GenGIS

Parks et al. 2013, PLoS One

Page 59: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

Availability

• Jun 1 2015: IRIDA 1.0 beta Internal Release– Release to collaborators for installation and full test

• Jul 1 2015: IRIDA 1.0 beta1– Announce Beta release, download, documentation available on

website – www.irida.ca

• Aug 1 2015: IRIDA 1.0 beta2– Cloud installer, with documentation– Additional pipelines as available – Visualization as available

Page 60: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

60

AcknowledgementsProject LeadersFiona Brinkman – SFUWill Hsiao – PHMRLGary Van Domselaar – NML

University of LisbonJoᾶo Carriҫo

National Microbiology Laboratory (NML)Franklin BristowAaron PetkauThomas MatthewsJosh AdamAdam OlsonTarah LynchShaun TylerPhilip MabonPhilip AuCeline NadonMatthew Stuart-EdwardsMorag GrahamChrystal BerryLorelee TschetterAleisha Reimer

Laboratory for Foodborne Zoonoses (LFZ)Eduardo TaboadaPeter KruczkiewiczChad LaingVic GannonMatthew WhitesideRoss DuncanSteven Mutschall

Simon Fraser University (SFU)Melanie CourtotEmma GriffithsGeoff WinsorJulie ShayMatthew LairdBhav DhillonRaymond Lo

BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC)Judy Isaac-RentonPatrick TangNatalie PrystajeckyJennifer GardyDamion DooleyLinda HoangKim MacDonaldYin ChangEleni GalanisMarsha TaylorCletus D’SouzaAna Paccagnella

University of MarylandLynn Schriml

Canadian Food Inspection Agency (CFIA)Burton BlaisCatherine CarrilloDominic Lambert

Dalhousie UniversityRob BeikoAlex Keddy

McMaster UniversityAndrew McArthurDaim Sardar

European Nucleotide ArchiveGuy CochranePetra ten HoopenClara Amid

European Food Safety AgencyLeibana Criado ErnestoVernazza FrancescoRizzi Valentina

Page 61: How Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao

6161

IRIDA Annual General MeetingWinnipeg, April 8-9, 2015