Client Logo COMET International Meeting Bristol UK July 11, 2011 Harmonizing Terminology Steven...
41
Client Logo COMET International Meeting Bristol UK July 11, 2011 Harmonizing Terminology Steven Hirschfeld, MD PhD Captain, U.S. Public Health Service Associate Director for Clinical Research Acting Director, National Children’s Study Eunice Kennedy Shriver National Institute of Child Health and Human Development
Client Logo COMET International Meeting Bristol UK July 11, 2011 Harmonizing Terminology Steven Hirschfeld, MD PhD Captain, U.S. Public Health Service
Client Logo COMET International Meeting Bristol UK July 11,
2011 Harmonizing Terminology Steven Hirschfeld, MD PhD Captain,
U.S. Public Health Service Associate Director for Clinical Research
Acting Director, National Childrens Study Eunice Kennedy Shriver
National Institute of Child Health and Human Development
Slide 3
No financial or governance interests to disclose All opinions
expressed are those of the author and may not represent the views
and policies of the U.S. Federal government or any of its agencies
2 Disclosures
Slide 4
Why terminology? Outcomes in research are based on concepts For
example, man with pneumonia Concepts require terms that are
specific to describe them and differentiate them from other
concepts For example, man with respiratory inflammation, influenza,
tuberculosis or silicosis Terminology is the tool for precision to
allow consistency and multiple analyses 3
Slide 5
What are current options? Systematized Nomenclature of Medicine
(SNOMED)- in use for medical records and research International
Classification of Disease (ICD)- in use for epidemiology and
reimbursement. Several versions in use concurrently Medical
Dictionary for Regulatory Activities (MedDRA)- in use for
therapeutic and diagnostic product development and registration
Multiple subspeciality and niche terminologies 4
Slide 6
What is the dilemma? The major terminologies do not readily map
to one another None are robust for child health and development,
particularly at the youngest ages All are episodic in that they
describe a single circumstance and do not relate concepts across a
developmental time line 5
Slide 7
6 AAP: American Academy of Pediatrics CDC: Centers for Disease
Control and Prevention CDISC: Clinical Data Interchange Standards
Consortium EPA: Environmental Protection Agency ICH-E11:
International Conference on Harmonisation SNOMED: Systematized
Nomenclature of Medicine
Slide 8
What is new and different? The NICHD terminology system differs
from other terminology systems by incorporating into all concepts a
dimension of time and position along a developmental scale to
relate concepts to one another 7
Slide 9
Rationale for terminology initiative The NICHD has an ongoing
effort to establish, through stakeholder consensus, a core library
of consistent and harmonized pediatric terms. Reaching stakeholder
consensus on terminology will benefit pediatric clinical
researchers in the following ways: Provide the infrastructure
necessary to compare and aggregate data and information. Prevent
misinterpretation. Improve precision of data sharing. Permit more
robust meta analysis. Establish consistency with the health care
delivery system across the NICHDs clinical research portfolio,
across the portfolios of other NIH Institutes/Centers, as well as
with the broader research community. 8
Slide 10
Harmonization Process The terminology harmonization process
involves identifying relevant concepts, identifying terms and
definitions to describe the concepts, and graphically depicting the
structure of and relationships between the concepts. 9
Slide 11
What is the framework? A model developed in the Unified
Modeling Language (UML) can be used to map the concepts of interest
and can be leveraged by modeling tools to efficiently and
extensibly harmonize terminology 10
Slide 12
Example of UML Framework Identification & Demographics
Physical Examination Behavioral & Neurological Examination
Biochemical/Physiologic/ Genetic Examination Examination Tool
Parts: Imaging & Other Findings
http://nichd.nih.gov/clinres/terminology
Slide 13
Terminology Development Process 1. Identify concepts and
reference terminology: Determine the terms that will require
harmonization. Identify which of the terms are unique concepts.
Reference terminology resources to find matching concepts. 2.
Develop model: Develop a model for concepts as a terminology that
depicts concepts and their attributes and the relationships between
concepts. Prepare terminology to be incorporated into a reference
terminology, such as the NCI Thesaurus. 3. Annotate model: Use
terminology curation tools, such as the NCIs Semantic Integration
Workbench (SIW) to annotate the model with the reference
terminology. 4. Review concepts with the community: Inform experts
in pediatric community of terminology development effort and
facilitate collaboration. Solicit input and feedback on proposed
concepts from the pediatric community and harmonize with model. 5.
Load metadata and generate tools: Load metadata from the annotated
model to a metadata repository, such as the NCIs cancer Data
Standards Repository (caDSR). Leverage open source clinical
research tools to extract metadata to generate content specific
clinical research tools. 12
Slide 14
1. Trace list of sources 2. Draft tool 3. Structure concepts 4.
Develop model 5. Curate Common Data Elements 6. Generate Research
Tool American Health Information Community (1) Demographics;
----------------------------------
----------------------------------
---------------------------------- --------- (2) Physical
Examination; ----------------------------------
----------------------------------
----------------------------------
---------------------------------- ------------ Draft Tool (1)
Demographics; ------------------------------------
------------------------------------
------------------------------------ ------------ (2) Physical
Examination; ------------------------------------
------------------------------------
------------------------------------
------------------------------------ ---------------- Final Newborn
Examination Tool Draft Examination Tool NCI Thesaurus UML Model
Common Data Elements browser Final Examination Tool Sources Visual
Depiction of Terminology Development Process
Slide 15
Core Terminology Library NICHD terminology files are available
for download from an NCI EVS ftp site
(http://evs.nci.nih.gov/ftp1/NICHD/) in three formats. A textual
representation of the hierarchy of NICHD terms is provided as well;
the terms in this hierarchy, which are restricted to NICHD terms
only, do not necessarily have a direct parent-child relationship
within the NCI Thesaurus (NCIt). The Changes file is published
monthly and contains all changes that have been made to NICHD
content in the current production version of the NCIt when compared
to the most recently posted previous
file.http://evs.nci.nih.gov/ftp1/NICHD/ Instructions at:
http://evs.nci.nih.gov/ftp1/NICHD/About.html
http://evs.nci.nih.gov/ftp1/NICHD/About.html 14
Slide 16
hasLifeStage resultsIn The NICHD Pediatric Terminology
Metastructure provides a common information model associated with
various child life stages occursIn associatedWith evidenceOf
associatedWith affects
Slide 17
Current Activities Focus on neonatal terminology because:
Largest gaps in major terminology schema Existence of robust
research networks Multi Step Process Identify general domains Align
concepts Map concepts to a common resource Agree on mapping Publish
map 16
Slide 18
Advantages to Current Process Retention of legacy tools Ability
to pool data and perform meta- analyses Systematic identification
of knowledge gaps and opportunities Path forward for further
harmonization and consensus terminology incorporating model and
framework 17
Slide 19
The National Childrens Study as a case study 18
Slide 20
Overview of the National Childrens Study (NCS) The NCS is
mandated by the U.S. Congress and implemented by the Eunice Kennedy
Shriver National Institute of Child Health and Human Development of
the National Institutes of Health with advice and input from other
NIH components, the Centers for Disease Control and Prevention and
the Environmental Protection Agency It is a multi-year research
study that will examine the effects of environmental influences on
the health and development of more than 100,000 children across the
United States, following them from before birth until age 21 years
The goal of the Study is to improve the health and well-being of
children and contribute to understanding the influence of various
factors on health and disease 19
Slide 21
The NCS is an integrated system of activities The Study is an
integrated system of activities that include a pilot Study which
began in January 2009 with the goal of determining the feasibility,
acceptability and cost of Study activities, a Main Study scheduled
to begin in calendar year 2012 to determine exposure-response
relationships and various substudies and formative research
projects to examine specific methodological questions The pilot
Study, also known as the Vanguard Study, will run for 21 years,
enroll about 4000 families, and precede Main Study activities by
about 3 years so that every aspect of the Main Study is field
tested prior to scale up and implementation 20
Slide 22
NCS Vanguard Study Goals Vanguard Study designed to evaluate:
Feasibility (technical performance) Acceptability (impact on
participants, study personnel, and infrastructure) Cost (personnel,
time, effort, money) of Study recruitment Logistics and operations
Study visits and study visit assessments 21
Slide 23
The National Childrens Study takes an informatics approach that
is flexible to support innovation and accommodate evolving
technology The approach to informatics for the National Childrens
Study is informed by several trends in informatics, including:
modular architecture use of standardized terminology with curation
semantic awareness scalability defined transmission standards open
architecture and open source platforms with development communities
vertical and horizontal integration of process
interoperability
Slide 24
The National Childrens Study informatics approach is
standards-based During the Vanguard phase of the NCS, multiple
informatics platforms and tools are in the field to determine the
performance characteristics of each. This approach entails the use
of NCS specifications to which each potential informatics solution
must comply plus a systematic evaluation scheme to compare
performance Use of such standards complements an interoperable
approach that allows support for common interfaces and data
exchange specifications Such standards include: Data Documentation
Initiative (DDI) Clinical Data Acquisition Standards Harmonization
(CDASH) CDISC Operational Data Model (ODM) ISO 11179 / 21090
CRoss-Industry Standard Process for Data Mining (CRISP-DM) 23
Slide 25
Standards + Modular = Flexibility The NCS emphasis on
interoperable modular architecture means that any component of a
data system can accurately and efficiently communicate with other
data systems, while adhering to international data standards such
as ones developed by the Clinical Data Interchange Standards
Consortium (www.cdisc.org), such that its components can be reused
or adapted for other studieswww.cdisc.org 24
Slide 26
NCS Data Life Cycle From concept to archive, the NCS has a
consistent approach to the data life cycle Description can be found
in the NCS Data Life Cycle Concepts of Operation
http://www.nationalchildrensstudy.gov/about/overview/Pages/
NCS_concept_of_operations_04_28_11.pdf 25
Slide 27
The NCS incorporates Operational Data Elements Operational Data
Elements are defined as data elements that capture the research
process. In some contexts the term paradata is used. The
Operational Data Elements will allow systematic and objective
evaluation of how the study is conducted and provide a basis for
continuous improvement of efficiency The NCS developed a catalog or
code list of about 500 Operational Data Elements for various study
operations The NCS would like to contribute to the establishment of
standards for Operational Data Elements 26
Slide 28
Metadata derived from harmonized terminology for the Study
provides a layer of semantic interoperability across the data life
cycle The NCS data life cycle follows data approach through data
acquisition to data analysis, maximizing transparency and the
understanding of NCS data Study data elements are guided by the
NICHD Pediatric Terminology framework developed across many sources
in the research, healthcare delivery, and standards development
spectrum Consistent metadata will assure: Semantic interoperability
and compliance with international data standards Syntactic
interoperability between NCS information management systems as they
exchange data in line with the data plan
Slide 29
Various semantic schemas are harmonized so that data may be
accurately exchanged and analyzed among pre-existing systems A
bridging schema, or metastructure, provides a mapping among
concepts and codes from individual terminology schema used by
networks or research endeavors The metastructure is publicly
available, and the source terminology schema is the property of and
maintained by the original owners As the National Children's Study
proceeds, all developmental stages through age 21 years will be
covered in a Pediatric Terminology Metastructure and many fields of
research will be included
Slide 30
A metadata model has emerged that meets the semantic and
syntactic requirements of the Study The Data Documentation
Initiative (DDI) is a metadata specification and international
standard for describing data from the social, behavioral and
economic sciences The DDI model is aligned for CDISC SDTM
vocabularies and CDISC BRIDG protocol definition The CDISC family
of standards (including BRIDG, SDTM and CDASH) include objects
useful in describing health research not found in DDI DDI Combined
Life Cycle Model
Slide 31
The DDI-based metadata repository supports cyclical processes
for both pre-analytical datasets and analytical datasets
Pre-analytical data datasets can be produced and repurposed for new
uses such as support for additional performance metrics or linkage
with extant datasets Data analysis may uncover recruitment,
retention and/or compliance problems which lead to protocol
change
Slide 32
The DDI-based metadata repository for the NCS is an end-to- end
solution that allows scoping and incremental development The
CRoss-Industry Standard Process for Data Mining (CRISP-DM) can be
used to standardize project management and to maximize data
transparency and eventual analysis of NCS data Analysis Preparation
Data Preparation Data Understandin g Business Understanding Results
Evaluation Identify key business objectives Identify key
constraints & assumptions Translate business objectives into
metrics or questions Identify potential data sources Assess
suitability of each data source for analysis Extract data Describe
data Explore data Assess data quality Match and merge data Clean
data Reformat data for analysis Select analytic algorithms Code
algorithms Validate algorithms Validate assumptions Execute
algorithms Capture and interpret results Iteratively improve any
discrepancies / shortcomings Translate results into business
metrics or answers to questions Present results including detailed
documentation of entire process Analysis Execution
Slide 33
DDI covers the entire NCS data lifecycle from protocol
definition and sampling strategy through data collection, analysis,
and distribution
Slide 34
In the NCS data lifecycle, forms and questionnaires are first
specified Domain groups define measures to capture study operations
and the child development life cycle Forms and questionnaires are
specified around the measures These specification occur early in
the NCS data life cycle and are captured by the NCS end-to-end
metadata repository An NCS Incident Report is captured in the DDI
model
Slide 35
Data elements corresponding to the questions are typed Form and
questionnaire code lists are typically composed without regard to
common data elements and standard code lists Instead they are
responsive to context and the exigencies of form and questionnaire
design This has led us to the approach that corresponds to DDI
classifications in which there are categories and codes or, in
other words, master code lists and specific code lists
Slide 36
Master and specific code lists are maintained for internal
(Study-specific) and external harmonization Form and questionnaire
specific code lists almost always include missing values In NCS the
Incident Report is not restricted to adverse events, but
encompasses other classifications All of these classifications are
captured in the Incident Type code list which goes with the
question What category best describes the incident (mark one)?
Slide 37
Specific code lists are the product of master code lists A code
list for a question is compositional The composition of question
specific code lists typically includes a subset from a
category/master collection of missing values Mixing and matching
questions across many categories leads to better comparison of
answers across forms and across time in a longitudinal study
Slide 38
Metadata tagging provides a path for mapping to external
references Data elements are associated with concepts, with
external references through unique identifiers External references
can be made to an ISO 21090 Concept Descriptor and/or an OpenEHR
archetype In these external references each value of a code list
might be linked to a concept This is our path to ISO 11179
compliance and, in the case of incident type, an NCS code list that
combines code lists from many vocabularies
Slide 39
The NCS data life cycle reaches the production of analysis
datasets that conform to the CDISC ODM interchange standard
Variables are packaged into logical records Physical dataset
definitions are constructed that document the various datasets
researchers will request and receive
Slide 40
Ongoing and Future Collaboration As the National Childrens
Study evolves over time, it will continue to seek continuous input
and partnership from willing collaborators and adhere to and inform
international data standards to the highest extent possible The
National Childrens Study aims to connect people, data and diverse
systems to exchange and use information and to work together as a
platform for innovative research and analysis, to ultimately
improve the health and well-being of children
Slide 41
Summary and Plans The NCS utilizes standards from multiple
sources to ensure an open source sustainable and interoperable
informatics environment The NCS is field testing in the Vanguard or
Pilot phase several tools and platforms concurrently in a
systematic fashion to determine performance characteristics The
integration of multiple standards and models allows exploration of
meta data analyses, operational data elements and project
management across all study activities The NCS will publicly
disseminate findings as rapidly as possible and actively seeks
collaborators 40
Slide 42
For more information on the National Childrens Study Please
visit the main website: http://www.nationalchildrensstudy.gov/
http://www.nationalchildrensstudy.gov/ Organizations, groups or
individuals that are interested in contributing to the effort or
learning more are encouraged to contact: Steven Hirschfeld, MD PhD
[email protected] 41