View
221
Download
0
Category
Preview:
Citation preview
How Many Databases Do We Need?
• Ideally one but that is not likely:
• Clinical lab vs. shared
• Commercial vs. non-profit
• General vs. locus-specific
• National vs. international
• Somatic vs. germline
• Sequence vs. copy number
• High-throughput use vs. manual use
• Clinical grade vs. research
Likely need to
support both
separately with
interfaces
Likely can
integrate
We have chosen to support NCBI’s ClinVar
database as the central site for data collection
dbGaP
ClinVar dbSNP
<50 bp
dbVAR
>50 bp
Curated variant calls, phenotypes and
interpretation
Variant calls,
genotype, phenotype
and sequence data
Public Access
Controlled Access
Opt-Out or Consented
CMA, WES, WGS
Non-identifiable datasets
Large variant
datasets
Intra-laboratory
Evidence-based review
Practice guidelines
Expert Curation
Single-Source Curation
Uncurated
Multi-Source Curation
Guideline
Inter-laboratory
By marking variants by level of curation, can mix research/clinical, curated/uncurated
Commercial
Networks
Lab H
Lab I
Lab G
ClinVar at NCBI
HVP Country Nodes
Lab B
Lab D Lab C
Lab A
Domain-Specific Databases LOVD-based (e.g. InSiGHT) Non-LOVD (e.g. PharmGKB)
Domain-Specific Curation (e.g. ISCA-JIRA)
A Database Ecosystem
Lab E
Lab F
Commercial
Curation
Commercial
Databases
(e.g. HGMD)
Commercial Software Tools
Commercial Software Tools
LOVD
PharmGKB InSiGHT OMIM
Patient Registries
Sequencing Laboratories Which Have Agreed to Share Data
Ackerman Lab, Mayo Alfred I Dupont Hospital for Children All Children's Hospital St. Petersburg Ambry Laboratories ARUP Athena Diagnostics Baylor Medical Genetic Laboratories Boston Children's Hospital Boston University Children's Hospital of Philadelphia Children's Mercy Hospital, Kansas City Cincinnati Children's Hospital City of Hope Molecular Diagnostic Lab CureCMD Denver Genetic Laboratories Detroit Medical Center Emory University Fullerton Genetics Laboratory GeneDx Cleveland Clinic Greenwood Genetics Harvard-Partners Lab for Molec. Medicine Henry Ford Hospital Huntington Medical Research Institutes
Illumina Clinical Services Lab Indiana University/Perdue University InSiGHT LabCorp / Integrated Genetics / Correlagen Masonic Medical Research Laboratory Mayo Clinic Mt. Sinai School of Medicine Nationwide Children's Hospital Nemours Biomolecular Core, Jefferson Medical Oregon Health Sciences University Providence Sacred Heart Medical Center Quest Diagnostics SickKids Molecular Genetic Laboratory Transgenomics University of Chicago University of Michigan University of Nebraska Medical Center University of Oklahoma University of Penn University of Sydney University of Washington Women and Children's Hospital Wayne State University School of Medicine Yale University
Lab Result
Variant
Annotation
Variant
classification
Published + in house data
Segregation studies
Population frequency
Amino acid conservation
Predictions: PolyPhen, SIFT, etc
Splicing predictions
Likely
Benign VUS Likely
Pathogenic Pathogenic Benign
• Family Testing
• Additional Info
Clinical Data
Custom
knowledge
Clinical Report
Variant Assessment and Classification
Courtesy of Birgit Funke
What data do we capture?
Automated
Annotations
Evidence-Based Assessment
• Initial analysis derives structured annotations for high-throughput filtering
approaches (e.g. variant prioritization using population frequencies and in silico
tools) and automated classifications (e.g. benign high-frequency variants)
• Manually curated evidence should be added as structured data to enhance
automated analysis (e.g. segregations, results of functional studies, biallelic
observations, phenotype associations, inheritance)
• Need to also curate at the gene level (phenotypes, modes of inheritance, types
of pathogenic mutations, domains of importance, functional pathways)
• Conclusions from manual data review are tracked as text-based descriptions to
enable logic to be followed, reviewed and enhanced over time
However, one of the probands had another pathogenic HCM variant on the same copy of the
gene which segregated with all 8 affected family members (Wang 2009). Although segregation in
3 family members was observed in one other family, an additional 5 individuals had the variant
without disease including three over age 70 (Liu 2005). Our laboratory has observed this variant
in one HCM proband and one DCM proband, neither with a family history of disease, out of over
3500 cases tested (1/215 Asian probands). Across all published and internal studies, this leads to
a cumulative allele frequency of 1% (7/652) in Asian HCM probands or 0.1% (8/7848) across all
probands. This variant has been observed at a frequency of 0.3% (7/2177) in the 1000 Genomes
project with a sub-population frequency of 1.5% (6/388) in the Chinese population. Computational
analyses (biochemical amino acid properties, conservation, AlignGVGD, PolyPhen2, and SIFT)
suggest that the Ala26Val variant is less likely to impact the protein, particularly given the lack of
conservation of the alanine residue in mammals (horse has an aspartic acid) and minimal
biochemical change of the alanine to valine substitution. In summary, although additional data is
necessary to conclusively determine the clinical significance of this variant, based upon the
higher frequency in a race-matched control population (1.5% vs. 1%), the absence of statistically
significant segregation data, the lack of a predictive effect from computational algorithms,
observations in both HCM and DCM which have different mutational mechanisms, and presence
on the background of another pathogenic mutation, this variant is more likely benign.
Documenting Logic
The Ala26Val variant has been reported in 10 HCM probands of Asian descent and was absent
from 832 race-matched control chromosomes (Konno 2005, Liu 2005, Song 2005, Wang 2009).
Documenting Arguments
Phenotypic Data Collection
• Challenges: Limited clinical data is collected
during routine clinical testing
• Opportunities:
• Support physician data entry
• Extract data from EHR
• Interface with patients
Patient Registry
Patient Portal
Add phenotype data
Manage contact preferences
Research Portal
Recruit patients for research Physician Portal
Submit phenotypes
Laboratory Portal
Submit consented cases
Examples of patient registries integrating genotype and phenotype data:
DuchenneConnect, CureCMD, Simons VIP
Patient Registry
The Role of the Curator
• Maintain gene and disease-level data
• Solicit variant and case data from laboratories
• Map lab terminologies to a standard
• Resolve unmappable variants
• Identify inconsistencies in the data (pathogenic
variants at high frequencies)
• Identify conflicting variant interpretations and
resolve where possible
• Queue controversial variants for discussion with
expert consensus group
• Liaison with researchers for variant studies
Diseases Noonan Syndrome, Cardio-Facio-Cutaneous Syndrome, LEOPARD Syndrome, Costello Syndrome
Clinical synopsis Facial dysmorphology, short stature, cardiac defects, motor delay, bleeding diathesis. Autosomal dominant or de novo inheritance
Clinical utility of testing Disease management; family planning
Minimum gene set PTPN11, SOS1, RAF1, KRAS, SHOC2, BRAF, MAP2K1, MAP2K2, HRAS
Existing LSDB SOS1 LOVD database (39 variants)
Project director Sherri Bale (GeneDx)
Expert curators Sherri Bale (GeneDx), Bruce Gelb (Mt. Sinai School of Med), Marco Tartaglia (Italian network for RASopathies), Amy Roberts (Harvard)
Curation staff members Brad Williams (GeneDx), Lisa Vincent (GeneDx)
Patient advocacy groups Noonan Syndrome Support Group (Wanda Robinson - see letter of support)
Contributing labs Baylor, Athena, Greenwood, Boston University, Emory, Children’s Hospital of Boston, U of Oklahoma, Harvard-Partners, ARUP, GeneDx, Nationwide Children’s, Nemours/Alfred I Dupont, Mt Sinai School of Medicine
Cases to contribute >7400 (includes 10 of the above labs)
Variants to contribute >7700 variant observations (includes 10 of the above labs)
Phenotyping approaches Clinical lab data submission (retrospective as well as improved collection with standardized form use)
Acknowledgements
U41 Grant
David Ledbetter
Christa Martin
Joyce Mitchell
Robert Nussbaum
Erin Riggs
Erin Kaminsky
Andy Faucett
Sherri Bale
Madhuri Hegde
Patrick Willems
Elaine Lyon
Soma Das
Matt Ferber
Sandy Aronson
David Miller
Mike Murray
Donna Maglott
Deanna Church
Organizations
NCBI
HVP
HGVS/LOVD
OMIM
GeneReviews
NHGRI
ACMG
CAP
AMP
ASHG
Genetic Alliance
UNIQUE
Patient CrossRoads
Databases
PharmGKB/PGRN
InSiGHT
MSeqDB
Labs that have agreed
to support this project
And many others…….
ICCG Annual Meeting May 9-10, 2013
Bethesda Marriott, Pooks Hill
Bethesda, MD
www.iscaconsortium.org
Recommended