44
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Embed Size (px)

Citation preview

Page 1: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Introduction to Statistical Computing in Clinical Research

Biostatistics 212

Lecture 1

Page 2: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Today...

• Course overview– Course objectives– Course details: grading, homework, etc– Schedule, lecture overview

• Where does Stata fit in?• Basic data analysis with Stata• Stata demos• Lab

Page 3: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course Objectives

• Introduce you to using STATA and Excel for– Data management

– Basic statistical and epidemiologic analysis

– Turning raw data into presentable tables, figures and other research products

• Prepare you for Fall courses• Start analyzing your own data

Page 4: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course details

• Biostats 212

– 1 Unit Course– Satisfactory/Unsatisfactory vs. Grades– 7 Sessions – Lecture + Lab, starting August 2

Page 5: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course details

• New this year:

– In-Person + Online versions of the course

– Recorded lectures

– Forum

Page 6: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course details

• Two “In-Person” Sections:

– Lectures – in person (6702), Tuesday 1:15-2:45

– Labs – in person (6702 + 6704), Tuesday 3:00-4:00

Page 7: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course details

• One “Online” Section:

– Lectures – Recorded, posted late Tuesday afternoon

– Labs – Online Wed 1:30-3:00• New this year, online students only

• Led by Jen Cocohoba – comments?

Page 8: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course details

• Recorded Lectures

– Audio + video of lecturer + video of screen

– Available same day for viewing

– See http://xxxxxxxxxxxxxxx

Page 9: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course details

• Forum

– Demo

– Post all questions here!• TA turnaround time

– Before you post, see if it’s already there and answered

– Consider turning ON your alerts around lab time…

Page 10: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course details

• Course Requirements– Hand in all six Labs (even if late)

– Satisfactory Final Project

• Not required– Reading

– Attendance

Page 11: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course details

• Grading (only relevant for Master’s and ATCR Credit-Bearing students?)

– Letter grades: Standard cutoffs• 90-100% A

• 80-89% B

• 70-79% C

• 60-69% D

• <60% or Course Requirements not met: F

– Satisfactory/Unsatisfactory• >80%

Satisfactory

Page 12: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course details, cont

Course Director

Mark Pletcher

TA’s

Naomi Bardach

Raymond Hsu

Sharon Poisson

Monika Sarkar

Assistant Course DirectorJennifer Cocohoba

Lab InstructorsJing ChengBarbara GrimesNancy HillsAnn Lazar

Page 13: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Overview of lecture topics

• 1- Introduction to STATA

• 2- Do files, log files, and workflow in STATA

• 3- Generating variables and manipulating data with STATA

• 4- Using Excel

• 5- Basic epidemiologic analysis with STATA

• 6- Making tables and figures with STATA

• 7- Advanced Programming Topics

Page 14: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Overview of labs

• Lab 1 – Load a dataset and analyze it• Lab 2 – Learn how to use do and log files• Lab 3* – Import data from excel, generate new variables and

manipulate data, document everything with do and log files.• Lab 4 – Using and creating Excel spreadsheets• Lab 5* – Epidemiologic analysis using Stata• Lab 6 – Making a figure with Stata

Last lab session will be dedicated to working on the Final Project

* - Labs 3 and 5 are significantly longer and harder than the others

Page 15: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Overview of labs, cont

• Official In-Person Lab time is 3:00-4:00 on Tuesday, but we will start right after lecture, and you can leave when you are done.

Page 16: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Overview of labs, cont

• Labs are due the following week prior to lecture. Labs turned in late (less than 1 week) will receive only half credit; after that, no points will be awarded. However, ALL labs must be turned in to pass the class (even if no points are awarded).

• Lab 1 is paper

• Labs 2-6 are electronic files, and should be emailed to your section leader’s course email address: [email protected] (Elizabeth/Raman) or [email protected] (David/Yvette)

Page 17: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Final Project

• Create a Table and a Figure using your own data, document analysis using Stata.

• Due 1 week after last lab session, 20 points docked for each 1 day late.

Page 18: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Course Materials

• Online Syllabus (http://www.epibiostat.ucsf.edu/courses/schedule/biostat212.html)

– Lectures and Labs/Datasets (“just in time”)– Miscellaneous handouts– Final Project

Page 19: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Getting started with STATA

Session 1

Page 20: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Types of software packages used in clinical research

• Statistical analysis packages

• Spreadsheets

• Database programs

• Custom applications– Cost-effectiveness analysis (TreeAge, etc)– Survey analysis (SUDAAN, etc)

Page 21: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Software packages for analyzing data

• STATA• SAS• S-plus, and R• SPS-S• SUDAAN• Epi-Info• JMP• MatLab• StatExact

Page 22: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Why use STATA?

• Quick start, user friendly

• Immediate results, response

• You can look at the data

• Menu-driven option

• Good graphics

• Log and do files

• Good manuals, help menu

Page 23: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Why NOT use STATA?

• SAS is used more often?• SAS does some things STATA does not• Programming easier with S-plus and R?• R is free• Complicated data structure and

manipulation easier with SAS?• Epi-info is free and even easier than

STATA?

Page 24: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA – Basic functionality

• Holds data for you– Stata holds 1 “flat” file dataset only (.dta file)

• Listens to what you want– Type a command, press enter

• Does stuff– Statistics, data manipulation, etc

• Shows you the results– Results window

Page 25: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Demo #1

• Open the program

• Entering vs. loading data

• Look at data

• Run a command

• Orient to windows and buttons

Page 26: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA - Windows

• Two basic windows– Command

– Results

• Optional windows– Variable list

– Properties

– History of commands

• Other functions– Data browser/editor

– Variables Manager

– Do file editor

– Viewer (for log, help files, etc)

Page 27: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA - Buttons

• The usual – open, save, print

• Log-file open/suspend/close

• Do-file editor

• Browse and Edit

• Break

Page 28: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA - Menus

• Almost every command can be accessed via menu

Page 29: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Menu vs. Command line

• Menu advantages– Look for commands you don’t know about

– See the options for each command

– Complex commands easier – learn syntax

• Command line advantages– Faster (if you know the command!)

– “Closer” to the program

– Only way to write “do” files• Document and repeat analyses

Page 30: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Demo #2

• Load a STATA dataset

• Explore the data

• Describe the data

• Answer some simple research questions– Gender, BMI, blood pressure

Page 31: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA commandsDescribing your data

• describe [varlist]– Displays variable names, types, labels

• list [varlist]– Displays the values of all observations

• codebook [varlist]– Displays labels and codes for all variables

Page 32: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA commandsDescriptive statistics – continuous data

• summarize [varlist] [, detail]– # obs, mean, SD, range– “, detail” gets you more detail (median, etc)

• ci [varlist]– Mean, standard error of mean, and confidence intervals– Actually works for dichotomous variables, too.

Page 33: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA commandsGraphical exploration – continuous data

• histogram varname– Simple histogram of your variable

• graph box varlist– Box plot of your variable

• qnorm varname– Quantile plot of your variable to check normality

Page 34: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA commandsDescriptive statistics – categorical data

• tabulate [varname]– Counts and percentages

– (see also, table - this is very different!)

Page 35: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA commandsAnalytic statistics – 2 categorical variables

Page 36: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA commandsAnalytic statistics – 2 categorical variables

• tabulate [var1] [var2]– “Cross-tab”

– Descriptive options, row (row percentages)

, col (column percentages)

– Statistics options, chi2 (chi2 test)

, exact (fisher’s exact test)

Page 37: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Getting help

• Try to find the command on the pull-down menus

• Help menu– If you don’t know the command - Search...

– If you know the command - Stata command...

• Try the manuals– more detail, theoretical underpinnings, etc

Page 38: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA commandsAnalytic statistics – 1 categorical, 1 continuous

Page 39: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA commandsAnalytic statistics – 1 categorical, 1 continuous

• bysort catvar: summarize [contvar]– mean, SD, range of one in subgroup

• ttest [contvar], by(catvar)– t-test

• oneway [contvar] [catvar]– ANOVA

• table [catvar] [, contents(mean [contvar]…)– Table of statistics

Page 40: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA commandsAnalytic statistics – 2 continuous

Page 41: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

STATA commandsAnalytic statistics – 2 continuous

• scatter [var1] [var2]– Scatterplot of the two variables

• pwcorr [varlist] [, sig]– Pairwise correlations between variables

– “sig” option gives p-values

• spearman [varlist] [, stats(rho p)]

Page 42: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

In Lab Today…

• Expect some chaos!– IT will be here to help with wireless, logins, etc

• Familiarize yourself with Stata

• Load a dataset

• Use Stata commands to analyze data and fill in the blanks

Page 43: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Next week

• Do files, log files, and workflow in Stata

• Find a dataset!

Page 44: Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1

Website addresses

• Course website– http://www.epibiostat.ucsf.edu/courses/schedule/biostat212.html

• Computing information– http://www.epibiostat.ucsf.edu/courses/ChinaBasinLocation.html#

computing

• Download RDP for Macs (for Stata Server)– http://www.microsoft.com/mac/remote-desktop-client

• Citrix Web Server– http://apps.epi-ucsf.org/

• Stata 12 Server– 65.175.48.75