Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Multidisciplinary Data Operations Approach
Page 1
May 17, 2013
Multidisciplinary Data Operations Approach for
Clinical/Translational Research
Brad H. Pollock, MPH, PhD
Professor and Chairman, Department of Epidemiology and Biostatistics, School of Medicine, University of Texas Health Science Center at San Antonio
Disclosures (Brad Pollock)
• No financial conflicts of interest
• Director, Biomedical Informatics Core, Clinical Translational Science Award (UL1 RR025767)
• Director, Biostatistics, Epidemiology, Research Design (BERD) Core (UL1 RR025767)
• Director, Biostatistics and Informatics Shared Resource, Cancer Therapy & Research Center, University of Texas Health Science Center at San Antonio (P30 CA054174)
• Principal Investigator, Children’s Oncology Group (COG) Community Clinical Oncology Program (CCOP) Research Base (U10 CA095861)
• Clinical Science Course Faculty, Society of Clinical Research Associates (SoCRA)
• Data quality
• Venues for data operations
• Data management
• Tools for data operations
• Data plans
– General considerations
– Data plans
• Local approach in San Antonio
• Other considerations
Multidisciplinary Data Operations Approach
Page 2
May 17, 2013
“May 6, 2010 Flash Crash”
• On May 6, 2010, a mishandled trading order temporary sent US stocks plummeting: – Dow Jones Industrial Average (DJIA) plunged
about 1,000 points (about 9%) only to recover those losses within minutes.
– It was the 2nd largest point swing and the biggest one-day point decline, 998.5 points, on an intraday basis in DJIA history.
• A small error strikingly upset an important data-heavy system.
DATA QUALITYThe first critical element in the reproducible research chain
Criteria for Reproducible Research*
Research Component
Requirement
Data Analytical data set is available.
Methods Computer code underlying figures, tables, and other principal results is made available in a human-readable form. In addition, the software environment necessary to execute that code is available.
Documentation Adequate documentation of the computer code, software environment, and analytical data set is available to enable others to repeat the analyses and to conduct other similar ones.
Distribution Standard methods of distribution are used for others to access the software, data, and documentation.
*from Peng, Dominici, Zeger. Am J Epidemiol 2006;163:783–789
Little emphasis on how we get to this point!
Little emphasis on how we get to this point!
Multidisciplinary Data Operations Approach
Page 3
May 17, 2013
HOW DO YOU GET GOOD DATA FOR CLINICAL TRANSLATIONAL RESEARCH?You need outstanding research information technology (IT) resources and a multidisciplinary team all of whom are deeply invested in the research.
Data Operations “Truths”
• Research information technology (IT) is a vital and a non-trivial core resource
• Many examples of investigations being derailed by IT deficits
• Quality and cost-efficiencies have been gained by excellent research IT
Data Operations “Truths” (continued)
• Good IT operations are very resource intensive
• Requires knowledgeable and dedicated staff
• Sustained financial support is a prerequisite for long-term ROI
Multidisciplinary Data Operations Approach
Page 4
May 17, 2013
Where do data operations take place to support biomedical research?
Data Operation Venues
• Biostatistics units/cores
• Central university IT
• Hospital IT departments
• Other academic departments – e.g., computer science, biomedical
informatics
Who should define, manage, and oversee clinical translational research data operations?
Multidisciplinary Data Operations Approach
Page 5
May 17, 2013
Who Should Coordinate Data Operations?
• Biostatistics has played a central role in managing NIH Data Coordinating Centers
• With some exceptions, computation in biostatistics has been heavily focused on analysis.
Biostatistics Core Functions
• Design studies– Clarify hypotheses and objectives
– Select study design
– Define data elements/endpoints
– Sample size/power calculations
– Develop analytic plans
• Monitor studies– Efficacy/futility
– Safety/quality
• Analyze studies– Statistical analysis
– Writing reports/manuscripts
Co
mp
uta
tio
n
Who Should Coordinate Data Operations?
• Biostatistics has played a central role in managing NIH Data Coordinating Centers
• With some exceptions, computation in biostatistics has been heavily focused on analysis.
• With the advent of the CTSAs, coordination has broaden to include other disciplines including biomedical informatics.
Multidisciplinary Data Operations Approach
Page 6
May 17, 2013
Biomedical Informatics
Focus areas:– Ontologies– Vocabulary/terminology– Machine learning, human-machine
interfaces– Natural language processing– Electronic health records– Data repositories– Tool development (e.g. CTSA IKFC)
Who should define, manage, and oversee clinical translational research data operations?
It depends…
Answer:
• No brainer if you are a:– NCI cooperative Group Statistician
– Run a NIH-funded Data Coordinating Center (DCC)
– Direct a structured biostatistics core: e.g., Centers for AIDS Research (CFAR), Alzheimer’s Disease Core Centers, etc.
• Often a requirement of the RFA
Multidisciplinary Data Operations Approach
Page 7
May 17, 2013
Answer:
• Less clear if you are a: Direct a CTSA BERD unit
Direct a CCSG P30 Biostatistics Core
Direct an institutional biostatistics support unit with a separate group informatics group
National Children’s Study center
Data Operation Venues• Biostatistics units/cores
• Central university IT
• Hospital IT departments
• Other academic departments – e.g., computer science, biomedical
informatics
Some of these choices place great faith in others to appropriately manage study data
Biostatistics and Information Management
• A symbiotic relationship:– Biostatistics helps define the “What,” i.e.,
endpoints
– Information Management defines “How,” how to management data operations
BiostatisticsInformation
Management
What do we collect?
How do we manage?
Multidisciplinary Data Operations Approach
Page 8
May 17, 2013
Biostatistics vs. Informatics Perspectives
Medical Informatics
1. Tool development
2. Messy data from integrated data repositories– EHRs
3. Hypothesis-generating orientation
Biostatistics
1. Develop data models to address specific study hypotheses
2. Cleaner data sources:– Registries
– Protocol-generated
3. Hypothesis-testing orientation
Optimal Configuration• Who should be at the table?
– Biostatistics/epidemiology
– Research IT
– Data management
– Regulatory personnel
– Project managers (PMP)
– Biomedical informatics
• Ideally, use a multidisciplinary approach
Data Management
• The development, execution and supervision of plans, policies, programs and practices that control, protect, deliver, and enhance the value of data and information assets*
*Data Management Association, Data Management Body of Knowledge (DAMA-DMBOK), 2008
Multidisciplinary Data Operations Approach
Page 9
May 17, 2013
Who’s Involved in Data Management
SubjectsParticipantsPatients Investigators
CliniciansResearch StaffClinical Staff
StatisticiansEpidemiologistsAnalytic Staff
Central ITCIOISOSNO
Research ITAnalystsProgrammersDBAs
End-to-End Process
Data Management within the Research Process
Final StatisticalAnalysis
ProtocolDevelopment
Data ManagementProcess
ITInvolvement
Data Management Changing Within the Research Process
Final StatisticalAnalysis
ProtocolDevelopment
Data ManagementProcess
Data managementconsiderations arebeginning to influencethe science
}
{
Storage and long term utilization affect the data long after the protocol’s final analysis
Multidisciplinary Data Operations Approach
Page 10
May 17, 2013
Data Management Responsibilities
• Maintain a functional, flexible, scalable, cost-efficient resource to handle a variety of data (demographic, clinical/laboratory/imaging, bioinformatics, environmental)
• Data quality and compliance with regulatory requirements– HIPAA– 21 CFR Part 11– FISMA
• Planning for:– Long time horizons (e.g., National Children’s Study)– Interoperability and federation (e.g., caTissue Suite,
caGRID, OpenMDR)
TOOLS FOR DATA OPERATIONS
Initial Planning Process
• What is an investigator to do about the data operations?
– The investigator only wants to be able to easily do their research.
– They don’t want a lot of barriers put in the way.
• Solution: Let’s use Excel…
Multidisciplinary Data Operations Approach
Page 11
May 17, 2013
“Database Management” Software
Microsoft Excel
Excel Characteristics
• Good Points– Easy to work with
• Quick start up, low costs
– Potentially can force data types
• Bad Points– Too easy to work with
• Doesn’t require you to clearly define your needs
– “Interprets” data
• Will not allow you to override
Multidisciplinary Data Operations Approach
Page 12
May 17, 2013
“Database Management” Software
Multidisciplinary Data Operations Approach
Page 13
May 17, 2013
REDCap Characteristics
• Good Points– Easy to set up, not resource intensive– Requires a data dictionary– Central server model (security & data integrity) – Web front-end
• Less than Good Points– Display interface not very customizable
• Layout, skip patterns, etc.– Each application is a separate instance– Adverse events monitoring difficult– Not truly relational – No data curation, electronic data collection only
“Database Management” Software
How Data Are Handled?
• Paper forms (CRFs) and keypunch
• Client-server DBMS and networked DBMS
• Web-front end DBMS– Pediatric Oncology Group replaced paper
in 1998• Web front-end
• Oracle back-end
• Clinical Trials Management System (CTMS)
Advancing Technology
Multidisciplinary Data Operations Approach
Page 14
May 17, 2013
Clinical Trials Management Systems (CTMS)
IMPACT® CTMS
• Uses: Planning, preparation, monitoring and
reporting of clinical trials Administrative/financial capabilities Electronic case report forms (eCRFs) ± Interoperate with other systems
IDEAS
DATA PLANS FOR CLINICAL TRANSLATIONAL RESEARCH
Researchers
I.T. Staff
Problems Can Start Early Without Statistical Input
Multidisciplinary Data Operations Approach
Page 15
May 17, 2013
Required Data Elements for a Study Plan
• Design studies– Clarify hypotheses and objectives
– Select study design
– Define data elements/endpoints
– Sample size/power calculations
– Develop analytic plans
• Monitor studies– Efficacy/futility
– Safety/quality
• Analyze studies– Statistical analysis
– Writing reports/manuscriptsC
om
pu
tati
on
Data Plan: Study Design
• Select study design– Prospective assessment
– Retrospective assessment
– Cross-sectional assessment
• Define data elements/endpoints– Demographics– Baseline characteristics (clinical, laboratory, imaging)
– Interventional characteristics– Outcomes (clinical, laboratory, imaging, PRO)
Data Plan: Monitoring
• Efficacy/futility– Interim stopping rules– Group sequential methods– Bayesian approaches (e.g. adaptive
randomization)
• Safety/quality– Safety stopping rules– Ongoing remediation of quality problems
• Visualization
Multidisciplinary Data Operations Approach
Page 16
May 17, 2013
Data Plan: Analysis
• Statistical analysis– Data access
• Direct: SQL (e.g., PROC SQL)
• Export to other format (e.g., StatTransfer 11)
– Writing reporting/Manuscripts• Print
• Web/Internet– Data Sharing
Human Studies Database Project
Human Studies Database (HSDB) Project
• A CTSA multi-institutional project to federate study design descriptors and results of the human research portfolio over a grid-based architecture.
• Uses: Inform the design of new studies
Facilitate systematic reviews/meta-analyses
Identify potential collaborators
Aid in research management
Multidisciplinary Data Operations Approach
Page 17
May 17, 2013
Ontology of Clinical Research (OCRe)
• HSDB developed using the Ontology of Clinical Research (OCRe)
• Focus on: Study design (Study Design Classifier), interventions,
exposures, and analytic methods of individual-human studies
Any design type, for any intent, in any clinical domain
Federation across CTSAs
HOW DID WE ADDRESS THESE CHALLENGES IN SAN ANTONIO?
University of Texas Health Science Center at San Antonio
Multidisciplinary Data Operations Approach
Page 18
May 17, 2013
University of Texas Health Science Center at San Antonio BERD Unit
• Initial sit down with a faculty biostatistician or epidemiologist
• Follow-up meeting will add:
–Information Services Director or Co-Director (IDEAS)
–Masters-level public health staff to help co-develop REDCap application
• Been doing this for 10+ years.
INFORMATICS DATA EXCHANGE AND ACQUISITION SYSTEM (IDEAS)
University of Texas Health Science Center at San Antonio
Complexity Encapsulation• Object-based templates• Common business objects• Custom object libraries• Standard Interfaces
User Interface
Data
Business Rules
WebProgrammers
Domain experts and Informatics analysts
DBA
Informatics Data Exchange and Acquisition System
The IDEAS
FrameworkAn interwoven structure of
interdependent components
Security Application
Data Collection Database
• Web• Interface• Batch
Pathology&
Genetics
Security
Protocols
Patient
IDEASThree Tier MVC Framework
Multidisciplinary Data Operations Approach
Page 19
May 17, 2013
IDEAS Features
• Application Development: Custom Meta-data generator
• Optional asynchronous operation
• Interoperability:• Shibboleth: Federated Single Sign-On
Authentication Service• Patient Study Calendar (PSC) • Qualtrics• caTissue Suite / Freezerworks• Velos*
*future feature
Other Considerations of UT HSC San Antonio Data Operations
• Standard Operations Procedures (SOPs)
• Disaster recovery
• Version control (Surround SCM)
• Audit
• Separation of duties – DBAs, analysts, statisticians
• Electronic Sign-offs (Editor Monitor PI)
• Honest broker role (PHI-related)
Unique Challenge:
Integrating Practice-Based Research Networks (PBRNs) into IDEAS
1. StarNet (family practice) PBRN
2. Psychiatry PBRN
3. Dental PBRN
4. VA PBRN
Multidisciplinary Data Operations Approach
Page 20
May 17, 2013
Integrating Omics Data: Genetics and Biology of Liver Tumorigenesis in Children*
• Bioinformatics and Biostatistics Core (BIBSC)
• Brought together disparate data: Pediatric Oncology Group, Children’s Cancer Group,
Children’s Oncology Group, the Cooperative Human Tissue Network (CHTN), Baylor pathology reference lab
Bioinformatics data from a range of high-throughput platforms: Illumina, Affy, NextGen Sequencing, etc.
Demographic, clinical and outcome information
*Cancer Prevention Research Institute of Texas – MIRA RP101195
Need for Ongoing Quality Improvement
Important enough topic that Chris Lindsell and I launched the Data Management and Quality (DMQ) Working Group for the Biostatistics/Epidemiology/Research Design (BERD) Key Function Committee (KFC)
Multidisciplinary Data Operations Approach
Page 21
May 17, 2013
Other Considerations for Data Operations
• Future-proofing database designs for repurposing
• Open source vs. commercial solutions
• Imaging informatics
• mHealth technologies
Summary
• Computation technologies are at the heart of data-driven research
• High quality data are fundamental to reproducible research and study validity
• Good data management High quality data
• High quality data Analytic quality
Summary (continued)
• Multidisciplinary team should be involved:– Biostatistician/epidemiologist
– Research IT
– Data management
– Regulatory personnel
– Biomedical informatics, computer science
Multidisciplinary Data Operations Approach
Page 22
May 17, 2013
Summary (continued)
• Core competencies in data operations should be a requirement for academic clinical translational research training programs.
• Technologies for managing data are changing faster than technologies for analysis.
• Selection of software tools depends on their capability as well as sustainability over the long haul.
“Ultimate Goal” To Advance Clinical Translational Research
• Seamless integration of data operations using complementary innovative technologies.
Promise for the Future
• Clinical data and research data gap will be bridgedCTMS and EHR interoperability
• Precision medicine will be driven by information infrastructure
• Disparate data sources will be federatede.g., I-SPY 2 Breast Cancer Trial
Multidisciplinary Data Operations Approach
Page 23
May 17, 2013
• Data liquidity is rapid, seamless, secure exchange of useful, standards-based information among authorized individual and institutional senders and recipients.
Promise for the Future (continued)
• Information systems will simultaneously address research, financial, and regulatory needs
Thank you very much