61
cs of Biostatistics for Health Rese Session 1 – February 7 th , 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences & Department of Psychiatry [email protected]

Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Embed Size (px)

Citation preview

Page 1: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Basics of Biostatistics for Health ResearchSession 1 – February 7th, 2013

Dr. Scott Patten, Professor of EpidemiologyDepartment of Community Health Sciences

& Department of Psychiatry

[email protected]

Page 2: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Objective 1:

Upon completion, students will be (more) able to ….

• Read,

• Understand,

• Critically interpret,

The statistical portions of articles in the medical literature.

Page 3: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Objective 2:

Given a dataset, students will be able to ….

• Select appropriate statistical procedures for basic analyses

• Implement these analyses using typical statistical software (we will use Stata)

Page 4: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Objective 3:

Upon completion, students will be (more) able to ….

• Define and interpret specialized parameters found in the clinical epidemiology literature, for example…– Sensitivity– Specificity– Predictive values

The statistical portions of articles in the medical literature.

Page 5: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Topics for Session 1:

• Why do we need statistics?

• Calculating a 95% confidence interval for a proportion

Page 6: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Why Do We Need Statistics?

• We don’t always need statistics.

• However, statistics are the most powerful tools for answering questions in medicine, for example….– Determining whether treatments work– Comparing different treatments– Identifying the causes of diseases

Page 7: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health
Page 8: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Why Do We Need Statistics?

• Statistics are the most powerful tools for answering questions in medicine, for example….– Determining whether treatments work– Comparing different treatments– Identifying the causes of diseases

Page 9: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

The Power of Statistics

• Where does it come from?– Fundamentally, from the laws of probability

• A familiar example:– Flipping one coin versus flipping many coins

Page 10: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Coin Flipping

• First, I’ll flip a coin and you can try to guess what I got.

• Then, I’ll ask you to flip a coin and I’ll guess how many you get

Page 12: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

The Power of Statistics

• A set of observations can allow us to make statements of a sort that we generally cannot make based on a single observation– E.g. how well does a treatment work?

• Larger and larger sets of observations allow us to make stronger and stronger statements

Page 13: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Formal Terminology

• Source of the observations are a sample

• The sample is a subset of a population

• The observations are data

• The collection of observations are a dataset

Page 14: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Inference

Generally,“A conclusion reached on the basis of evidence and reasoning.”

Statistical,“Making a statement about a population based on observations from a sample (a dataset)”

Page 15: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Stata’s Graphical Interface

Page 16: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Lets do a Study!

• We’ll select a sample of half the class

• Tabulate the frequency of male/female

• Estimate the proportion of women

Page 17: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Select a Sample• We’ll consider the class, of ‘N’ students as our

population

• The first step in obtaining a sample is to have a sampling frame – a list of the population

• Lets make one in Stata

• For notation, I’ll type Stata commands in red.

• These go into the command window• To execute a command, press Enter: <Enter>

Page 18: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Command Menus(an alternative to the command window)

1

2

I’ll use screen capturesand add red numbersif things need to be done in more than onestep.

Page 19: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Use this drop-down variable to select the new variable

1

2

Click OK

Page 20: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

• Let’s create a sampling frame in Stata.

• In the command window, tell Stata that we want to create a list with N rows:

set obs 30

(instead of 40, we’ll use the # in the class)

generate id = _n

Page 21: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

• Let’s create a sampling frame in Stata?

• We’ll start by typing into the command window..

Page 22: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

• In the command window, tell Stata that we want to create a list with N* rows:

set obs 30

<enter>

generate id = _n

* instead of 30, we’ll use the # in the class

Page 23: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

The data viewer

Page 24: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

• Now, lets sample half of these

sample 50

• Click on the data viewer to see our sample

Page 25: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Data Collection

• From each member of our sample, we’ll record the person’s sexMale = 0

Female = 1

• Let’s create a variable called “sex” in which to enter our datagenerate sex = .

Page 26: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

The data viewer

Look at the Dataset!

Page 27: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

The data editor

Enter the Data

Page 28: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Highlight a cell (click on it) and start entering data!

Page 29: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Closing the Data Editor

Click Exit

Page 30: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Making a Table

• At this point, we could make a table to show the frequency of men and women in our sample,

12

3

4

Page 31: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Use this drop-down variable to select the new variable

1

2

Click OK

Page 32: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

A few things to note….

• Our table doesn’t look so great

• The command that our menus created is executed by Stata (see the “. tab var2” in the output window)

• We can do the same thing by typing:

tab var2 in the command line

Page 33: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Command Line

Page 34: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Our Table is Still Very Ugly

(not exactly, but something like this)

Page 35: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Renaming a Variable

1

2

Page 36: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

The Variables Manager

1

Select “var2” (click)

2

Type “sex” here, under Name

Page 37: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Using the Command Window

• Another way to do it is just to type into the command window

rename var2 sex

Page 38: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Our Table is Still Very Ugly

(not exactly, but something like this)

Page 39: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Creating a Label

1

2 3

4

Page 40: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Creating a Label

Click Here

Page 41: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Creating a Label

In Stata, you need to give your label a name,

Our values are 0 and 1

Our labels are men and women

Click Here

1

2

3

4

Page 42: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Creating a Label

After adding women, make add a second value-label for men.

Our labels are men and women

1

2

3

4

Page 43: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Attaching the Label

1

2 3

4

Page 44: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Assigning the Label

1

2

3

Page 45: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

A Good Looking Table

Page 46: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Saving a Dataset

Click Here

To Save

Page 47: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Let’s do Statistics!

12 3

• We need to enter the Statistics menu

4

Page 48: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Entering the Command

1

23

4

Page 49: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Our Output

• What is the 95% confidence interval?

• What does it mean?

• What kind of statement can be made about the population (our class)?

• Is the statement true?

Page 50: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health
Page 51: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Introducing the “do file” editor

1

2 3

Page 52: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Executing a “do” file

Page 53: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Executing a “do” file

Page 54: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Something more Realistic

• Go to “www.ucalgary.ca/~patten” www.ucalgary.ca/~patten

• Scroll to the bottom.

• Right click to download the two files described as being “for PGME Students”

• Save them on your desktop

Page 55: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Open the Datafile

Page 56: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Explore the Datafile

• Click on the data browser in Stata

• Type describe into the command bar

• Open the data documentation file

• Note that sex is not labeled properly and that it is coded differently than in our example

Page 57: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Recode the Sex Variable as 0/1

• Let’s use the command window:generate female = sex

recode female 1=0 2=1

• Double check you’ve done it right:tab female sex

Page 58: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Your Task…• Create a good label for this new variable

• Make a good table of the new variable

• Create a 95% exact binomial confidence interval for the proportion of females in Framingham

• Interpret what this 95% confidence interval means

• Create a do file that will do all of these steps automatically

Page 59: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Creating a Log File

1

2 3

Page 60: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Additional Tasks

• Create a log file for your calculation of the proportion of women in Framingham, and an associated 95% confidence interval.

Page 61: Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health

Additional Tasks

• Calculate an estimate of the proportion of people in Framingham with greater than high school education (and 95% confidence interval) – generate and save a log file that shows this calculation.