59
EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Embed Size (px)

Citation preview

Page 1: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

EPI 218Database Management for Clinical

Research

Michael A. Kohn, MD, MPP

January 10, 2010

Page 2: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Clinical Research*• Choose the study design, and define the study population,

predictor variables, and outcome variables;• measure these variables and anticipate problems with

measurement;• analyze the results

In this course, we discuss the “nitty gritty” of collecting, storing, updating, and monitoring the study measurements.

*Private companies that make data management systems for clinical research understand “clinical research” to include only RCTs preparatory to FDA drug or device approval, not observational studies.

Page 3: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Assumptions about Students

• Actively involved in a clinical research study

• Some experience with entering and maintaining data in single-table spreadsheet or statistical software

Page 4: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Outline

• Housekeeping

• Data Tables– Rows = Records; Columns = Fields

• Normalization of Data Tables

• Queries

• Front End or Interface/On Screen Forms

Page 5: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Housekeeping

Epi 218

Page 6: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

• Course website: http://www.epibiostat.ucsf.edu/courses/schedule/data_management.html

• Labs will be in China Basin Landing 6704 with overflow into 6702, 8:45 – 10:15

• http://apps.epi-ucsf.org (For log-on: [email protected])

Citrix Metaframe Presentation Server MS Office Desktop

• “Learn MS Access 2000” videohttp://mkanders.com/learn_access_video.htmUsername: ucsfdbclassPassword: access2000(We can also loan you the video on CD.)

Page 7: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Lab Instructors

Maurice [email protected]

Others to be named later.

Page 8: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Course ObjectivesLearn how to develop a multi-table, relational

database for a research study. We will be using Microsoft Access*, but we are familiar with other database software.

Learn how to query a database for monitoring and analyzing data in a research study.

Example: Infant Jaundice Study

*SQL-based, widely available, desktop DBMS

Page 9: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Requirements

• Turn in all 4 assignments on time

• Fill out course evaluation.

Page 10: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Assignment 4/Final ProjectDue 2/16/2010

Send in a copy of your research study database* with a data management plan.

We prefer a database that you are currently using or will use for a research study.

However, a demonstration or pilot database is acceptable.

*If you are unable to package your database in a file to email, you can send us a link or work out another way to review your database.

Page 11: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Assignment 4/Final ProjectDue 2/16/2010

If you are doing secondary analysis of data collected by someone else,

• obtain the data collection forms* used in the original data collection,

• set up a new database that you would use for a follow-up study.

*Often easily obtained by doing a Google search or emailing the author of the original study.

Page 12: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Assignment 4/Final ProjectDue 2/16/2010

Start thinking about this now.

Build up your own study database as you work through the labs.

Use extra time in lab to work on your study database.

Set up appointments with course faculty early.

Page 13: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

TICR Professional Conduct StatementClarifications for this class

• I will maintain the highest standards of academic honesty• I will neither give nor receive aid in examinations or

assignments unless such cooperation is expressly permitted by the instructor

• I will conduct research in an unbiased manner, reports results truthfully, and credit ideas developed and work done by others

• I will not use answer keys from prior years• I will write answers in my own words, and, when

collaboration is permitted, acknowledge collaborators when answers are jointly formulated

For Epi 218 – Just don’t turn in somebody else’s work as your own.

Page 14: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Rows = Records = Entities

Columns = Fields = Attributes

Data Tables

Page 15: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

DCR Chapter 16 Exercise 2

The PHTSE (Pre-Hospital Treatment of Status Epilepticus) Study was a randomized blinded trial of lorazepam, diazepam, or placebo in the treatment of pre-hospital status epilepticus. The primary endpoint was termination of convulsions by hospital arrival. To enroll patients, paramedics contacted base hospital physicians by radio. The following are base-hospital physician data collection forms for 2 enrolled patients:

Lowenstein DH, Alldredge BK, Allen F, Neuhaus J, Corry M, Gottwald M, et al. The prehospital treatment of status epilepticus (PHTSE) study: design and methodology. Control Clin Trials 2001;22(3):290-309.

Alldredge BK, Gelb AM, Isaacs SM, Corry MD, Allen F, Ulrich S, et al. A comparison of lorazepam, diazepam, and placebo for the treatment of out-of-hospital status epilepticus. N Engl J Med 2001;345(9):631-7.

Page 16: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010
Page 17: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010
Page 18: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Display the data from these 2 data collection forms in a 2-row data table.

SubjectID

KitNumber

AdminDate

AdminTime

SzStopPreHosp

SzStopPreHospTime

HospArrTime

HospArrSzAct

HospArrGCSV

189 A322 3/12/1994 17:39 FALSE   17:48 TRUE  

410 B536 12/1/1998 01:35 TRUE 01:39 01:53 FALSE 4

Page 19: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Create a 9-field data dictionary for the data table

Field NameData Type Description Validation Rule

SubjectID Integer Unique Subject Identifier  

KitNumber Text(5) 5-character Investigational Pharmacy Code

 

AdminDate Date Date Study Drug Administered  

AdminTime Time Time Study Drug Administered  

SzStopPreHosp Yes/No Did seizure stop during pre-hospital course?

 

SzStopPreHospTime

Time Time seizures stopped during pre-hosp course (blank if seizure did not stop)

 

HospArrTime Time Hospital Arrival Time  

HospArrSzAct Yes/No Was there continued Seizure Activity on Hospital Arrival?

Check against SzStopPreHosp

HospArrGCSV Integer Verbal GCS on Hospital Arrival (blank if seizure continued)

Between 1 and 5

Page 20: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Methods:

Design-Nested double cohort study.Setting-KaiserSubjects-Infants with neonatal jaundice and randomly selected non-jaundiced infantsPredictor Variable-Presence or absence of jaundiceOutcome Variable- Neuropsychological score (ranging from 55 to 145) at age 5Analysis- ?

JIFeeJaundice and Infant Feeding Study

Newman, T. B., P. Liljestrand, et al. (2006). "Outcomes among newborns with total serum bilirubin levels of 25 mg per deciliter or more." N Engl J Med 354(18): 1889-900.

Page 21: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Infant Jaundice Study Data

1. Approximately 400 children2. 5 examiners (doctors)3. Approximately 700 neuropsychological examinations,

measuring weight, height, and “NPScore” (IQ)4. Some children to be examined more than once5. No examiner to see the same child twice6. If child died before age 5, store age and circumstances of

death

Page 22: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Infant Jaundice Study Table of Subjects

Row = Individual Infant

Columns = ID#, Name, DOB, Sex, Jaundice.

If one set of measurements per subject, put measurements in subject table.

This is a single-table database.

Table of Study Subjects

Page 23: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010
Page 24: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Demonstration: Creating a Data Table

Label columns and enter rows of data in datasheet view

Where is predictor on data collection form?

Page 25: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Demonstration: Data Dictionary

Table design view:•field (=column) names, •data types, •definitions, •validation rules

(More on data types, free-text vs. coded responses, later)

Page 26: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010
Page 27: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Acceptable table showing one set of exam results per participant.(BabyExamForFigure3)

Page 28: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Demonstration

Disallowed values

Duplicate primary keys

This automatic error checking and data validation IS why you need to enter your data into a computer; it is NOT why you need a relational DBMS. Many single-table products (Filemaker Pro, SAS FSP, even Excel) can do error checking and data validation.

Page 29: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Demonstration: Same Table in Excel, Stata

• Excel

• Stata

• Etc

Rows = Records = Entities

Columns = Fields = AttributesAccess and Stata have a special row at the top for column headings (=field names); Excel just uses the first row.

Page 30: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Normalization

Page 31: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Table of Study Subjects

Row = Individual Infant

Columns = ID#, Name, DOB, Sex, Jaundice

If some infants have more than one exam, what do you do?

Table of Study Subjects

Page 32: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Undesirable table showing multiple exam results per study participant.(BabyExamForFigure4)

Page 33: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Demo

• Find highest IQ Score

• Find all exams done in April

Page 34: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Common Error

• If you find yourself creating multiple columns for the same measurement, e.g., Date1, Score1, Date2, Score2, Date3, Score3, …

• Or if your table is more than about 30 columns wide,

– It is time to restructure your table.

Page 35: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Undesirable table with participant-specific data duplicated for each exam. (Note problem with Helen’s DOB.)(ExamBabyForFigure5)

Page 36: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Demo

• Find highest IQ Score

• Find all exams in a particular month

• What is Helen’s birth date?

• What happened to Alejandro, Ryan, Zachary, and Jackson?

Page 37: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

If some infants have multiple exams,

“normalize” the records into two tables, one for subjects and one for examinations.

Normalization

Page 38: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Data normalized into two tables: one (“Baby”) with rows comprising subject-specific information; the other (“Exam”) with rows comprising exam-specific information. Note that Helen can only have one birth date. Subjects with no exams, e.g. Alejandro, still appear in the database. “SubjectID” functions as the primary key in the “Baby” table and as the foreign key in the “Exam” table.

Page 39: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Figure 7. Relationships diagram showing the one-to-many relationship between the table of subjects (“Baby”) and the table of measurements (“Exam”).

Page 40: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Demonstration

Inability to create integrity violations with normalized tables.

This IS why you need a multi-table relational DBMS.

Page 41: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Lab Results

Occasionally, the subjects had blood tests.

Robert had a CBC on 1/30/2010.

Helen had a CBC on 1/30/2010, LFTs on 2/28/2010, and a CD-4 count on 3/31/2010.

Page 42: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

LabResultQry

Page 43: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010
Page 44: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010
Page 45: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Undesirability of Storing Calculated Values

Store raw data, not calculated fields, e.g., store dates and times; calculate intervals.

 

Storing a patient’s birth date allows calculation of his or her exact age on the date of a particular measurement.

Page 46: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Figure 15. Storing calculated fields such as “AgeInMonths” is undesirable. What if the birth date for SubjectID 2322 (Helen) is corrected in the “Baby” table?

Page 47: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Queries

Page 48: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Select Queries

Select queries (aka “Views”) organize, sort, filter, and display data.

Queries use Standard Query Language (SQL), but you don’t have to learn it, because of graphical query design tools.

A query can join data from two or more tables, display only selected fields, and filter for records that meet certain criteria.

Page 49: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Demonstration

Age in months and BMI at exam of subjects who were examined in January and February of 2010.

QueryDemo

Page 50: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Select Queries Produce “Table-Like” Results

Note that the result of a select query that joins two tables, displays only certain fields, selects rows based on special criteria, and calculates age and BMI still looks like a table in datasheet view.

But, remember that it is a dynamic “view” of data from the underlying tables.

Page 51: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

“Action Queries” Change Data

1) Update Query -- changes the values of specific fields in existing records

2) Append Query -- adds new records (rows) to a table

3) Delete Query -- deletes records from a table

Page 52: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Front End or Interface

On-screen forms

Page 53: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Advantages of On-Screen Forms

• Data keyed directly into the computer data tables without a transcription step

• Include validation checks and provide immediate feedback when a response is out of range

• Incorporate skip logic

Page 54: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Standard Data Entry Conventions

•Several conventions for data entry and display have developed over time.

•Most users of screen forms have come to expect them subconsciously.

•mutually exclusive, collectively exhaustive choices are displayed as an “option group” consisting of several different “radio buttons”.

•choices which are not mutually exclusive are displayed as check boxes.

N.B. An “option group” of mutually exclusive choices is a single column or field. A group of N check boxes represents N yes/no fields.

Page 55: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Use check boxes when options are not mutually exclusive. (5 fields)

Use radio buttons when options are mutually exclusive. (1 field)

Computer chart abstraction form showing two common data entry conventions.

Page 56: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Demonstration

Option group for examiner’s medical specialty

MasterRaceAsFieldList, MasterRaceAsOptionGroup, MasterRaceAsAllThatApply

Page 57: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

On-screen vs. paper formsMinimize the extent to which study measurements are recorded on paper

forms. Enter data directly into the computer database or move data from paper

forms into the computer database as close to the data collection time as possible.

When you define a variable in a computer database, you specify both its format and its domain or range of allowed values. Using these format and domain specifications, computer data entry forms give immediate feedback about improper formats and values that are out of range. The best time to receive this feedback is when the study subject is still on site.

Can only monitor data for outliers, systematic differences between data collectors or study sites, and study progress (I.e., query the data) once the data are in the computer.

You can always print out a paper copy of the screen form or a report of the exam/interview results once the data are collected.

Page 58: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Outline

• Housekeeping

• Data Tables– Rows = Records; Columns = Fields

• Normalization of Data Tables

• Queries

• Front End or Interface/On Screen Forms

Page 59: EPI 218 Database Management for Clinical Research Michael A. Kohn, MD, MPP January 10, 2010

Don’t Forget

Lab 1 next Tuesday 1/12/2010

View the Learn MS Access 2000 Video

http://mkanders.com/learn_access_video.htm

Username: ucsfdbclass

Password: access2000

(We can also loan you the video on CD.)

Start thinking about your study database and Assignment 4/Final Project.