46
1 Introduction to Stata SHRS, UQ 7 September 2010 Asad Khan

Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

Embed Size (px)

Citation preview

Page 1: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

1

Introduction to Stata

SHRS, UQ7 September 2010

Asad Khan

Page 2: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

2

Overview

• Sample Data

• Stata Interface

• Steps in a Stata Session

• Working Examples

• Dealing with Do-files

• Data Transformation

• Getting Help

Page 3: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

3

Sample Data

Let‟s start with a dataset that contains responses to a self-

administered survey. This survey was conducted among

postgraduate students enrolled on a Biostatistics course

at the University of Sydney in late 1990s

– The questionnaire includes 15 questions with a

variety of response options (continuous, categorical)

– A total of 80 students completed the questionnaire

Suppose the data have been recorded in a Stata data file

(bioqdata.dta) that we want to use in this workshop.

Page 4: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

4

Page 5: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

5

Stata Interface

• Open the Stata program by selecting Stata from the Program menu

• In a Stata session, a number of windows are available to Stata users:

– Results window - shows the numeric output from the Stata commands entered

– Command window – this is where you enter Stata commands to perform analyses

– Variables window – lists the variables that are available in the current dataset

– Review window – lists the commands that have already been run

• Optional windows: Graph, Data editor, Do-file editor

Page 6: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

6

Review Window Command Window

Variables Window Results Window

Graph

Window

Enter commands here!

Page 7: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

7

Step 1: Open a log file to store results

Step 2: Load/enter your data into Stata

Step 3: Explore, manipulate and analyse

Step 4: Close the log file to save results

Steps in a Stata Session

Page 8: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

8

Step 1: Open a log file to store results

Page 9: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

9

At the beginning of a Stata session, it is recommended to

open a log file to save your commands and results, by

selecting this icon:

You can also open a log file by typing (in command window):

Stata: log using h:\myfile.log

Now you have a log file (myfile.log) that will record all your

commands and the (numeric) output that you see in the

results window

Page 10: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

10

Step 2: Load/enter your data into Stata

Page 11: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

11

• To open/load your Stata data-file (with .dta extension), click on this icon and browse for “bioqdata.dta”

Alternatively,

• Go to: File Open Browse for the Stata data-file “bioqdata.dta”

• You can also load your data-file into Stata by typing the command:

Stata: use h:\bioqdata.dta, clear

Page 12: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

12

You can enter (/edit/view) your data in Stata using Data Editor (an optional window).

• To open a Data Editor window, click on this icon in the Stata window

• Data Editor can also be opened by simply typing:

Stata: edit

• Either enter your data or copy and paste data from a spreadsheet

• Changes in data are not permanent until you save them

Page 13: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

13

Bioqdata in Data Editor

Page 14: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

14

Step 3: Exploration, manipulation and analyses

Page 15: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

15

• To view the contents of your loaded data file:

Stata: describe

• To view the variables in your loaded data file:

Stata: codebook

pulse2 int %8.0g chol byte %8.0g smoke byte %8.0g pt_ft byte %8.0g course byte %8.0g sleep float %9.0g children byte %8.0g marital byte %8.0g bthord byte %8.0g pulse1 byte %8.0g weight int %8.0g height float %9.0g sex byte %8.0g sexlabel age byte %8.0g sid byte %8.0g variable name type format label variable label storage display value size: 2,160 (99.8% of memory free) vars: 15 3 Sep 2010 11:48 obs: 80 Contains data from F:\Workshops\bioqdata.dta

Page 16: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

16

• To view a particular variable, e.g. sex:

Stata: codebook sex

• To view observations from the loaded data file:

Stata: list sid age sex in 1/5

47 2 Female 33 1 Male tabulation: Freq. Numeric Label

unique values: 2 missing .: 0/80 range: [1,2] units: 1

label: sexlabel type: numeric (byte)

sex (unlabeled)

5. 5 43 Female 4. 4 46 Female 3. 3 40 Male 2. 2 29 Male 1. 1 25 Female sid age sex

Page 17: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

17

• To draw a histogram of a variable, e.g.sleep:

Stata: histogram sleep

• To identify cases with sleep > 15 hrs:

Stata: list sid sleep if sleep>15

62. 62 28 46. 46 . sid sleep

0

.05

.1.1

5.2

Den

sity

0 5 10 15 20 25sleep

Page 18: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

18

• To draw histograms by group using Menu bar, go to

Graph → Histogram → (Main → By → Density plots)

Page 19: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

19

0

.02

.04

.06

.08

150 160 170 180 190 150 160 170 180 190

Male Female

Density

kdensity height

normal height

De

nsity

Height in cm

Graphs by Sex

These graphs can be obtained by typing the command:Stata : hist height, by(sex) normal kdensity

Page 20: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

20

To create a new variable using Menu bar, go to

Data → Create or change variables → Create new variable

bmi = weight (in kg)/height (in m)2

Page 21: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

21

• To generate BMI, type the command:

Stata : gen bmi=weight/(height/100)^2

• To recode into same variable, type the command:

Stata : recode bmi (min/18.4=1)(18.5/24.9=2)

(25/29.9=3)(30/max=4)

• To recode into different variable, type the command:

(a) Stata : recode bmi (min/18.4=1)(18.5/24.9=2)

(25/29.9=3)(30/max=4), gen(bmi_r)

(b) Stata : recode bmi (min/18.4=1 underweight) (18.5/24.9=2 "Normal weight") (25/29.9=3

"Over weight")(30/max=4 Obese), gen(bmi_n)

Page 22: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

22

• To assign a variable label to pt_ft, type the command:

Stata : tab pt_ft

label var pt_ft "Type of candidature"

tab pt_ft

• To assign value labels to pt_ft, type the commands:

Stata : tab pt_ft

label define pt_ftlabel 1 “Part-time"

label define pt_ftlabel 2 “Fulltime", add

label values sex pt_ftlabel

tab pt_ft

Page 23: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

23

• To examine whether the average height was different for male and female student population, go to:Statistics → Summaries, tables and tests → Classical tests of hypotheses → Two-group mean-comparison test

Page 24: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

24

This output can also be obtained by typing the command:

Stata: ttest height, by(sex)

Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Ho: diff = 0 degrees of freedom = 78 diff = mean(Male) - mean(Female) t = 7.4173 diff 13.19504 1.77895 9.653417 16.73665 combined 80 168.5812 1.136368 10.16399 166.3194 170.8431 Female 47 163.1383 1.020681 6.997439 161.0838 165.1928 Male 33 176.3333 1.548867 8.897565 173.1784 179.4883 Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] Two-sample t test with equal variances

Page 25: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

25

• To examine the relationship between weight and height

through a scatter plot, type the command:

Stata: scatter weight height || lfit weight height

|| lowess weight height5

01

00

150

150 160 170 180 190height

weight Fitted values

lowess weight height

Page 26: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

26

• To obtain correlation between weight and height, type

Stata: corr weight height

• To construct a simple linear regression model for weight

(as DV) and height (as IV)

Stata: regress weight height

Coefficient estimates

Analysis of variance Fit statistics

height 0.5264 1.0000 weight 1.0000 weight height

_cons -88.47466 28.67081 -3.09 0.003 -145.5539 -31.39545 height .9283337 .1697668 5.47 0.000 .5903541 1.266313 weight Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 25379.95 79 321.26519 Root MSE = 15.337 Adj R-squared = 0.2679 Residual 18346.5804 78 235.212569 R-squared = 0.2771 Model 7033.36961 1 7033.36961 Prob > F = 0.0000 F( 1, 78) = 29.90 Source SS df MS Number of obs = 80

Page 27: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

27

• If you want to use dialogue box to run your analysis, you can do so by using a Stata command db

• To obtain a dialogue box to run regression, type:

Stata: db regress

• To obtain a dialogue box to draw a histogram, type:

Stata: db histogram

Page 28: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

28

• Frequency tables

Stata: tab1 sex chol pt_ft

• Graphs

– Stem-and-leaf plots

Stata: stem sleep

– Box plots by a variable

Stata: graph box pulse1, by(sex)

• Proportion (& CI)

Stata: proportion pt_ft

• Summary (descriptive) statistics

Stata: sum pulse2, detail

Working Examples

Page 29: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

29

• To generate a variable

Stata: gen sleep1=sleep if sleep<=15

• To assign a value to a variable

Stata: replace sleep=7 if sleep==28

• To generate a dichotomous (1, 0) variable

Stata: gen obese=(bmi>=30)

• To generate dummy variables for each value of a variable

Stata: tab marital, gen(marital_dmy)

• To keep variables in data file

Stata: keep age sex bmi

Stata: keep age sex bmi if sex==2

Page 30: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

30

• To test whether the average sleep was different for male and female students population:

Stata: tab sex, sum(sleep)

ttest sleep, by(sex)

• To assess equality of two independent group means using summary data:

Stata: ttesti 12 25.34 2.05 15 24.94 2.44

• To investigate whether pulse rate has increased after one minute exercise, using a paired t-test:

Stata: ttest pulse1=pulse2

• To examine whether average age is different for students in different courses, using one-way ANOVA:

Stata: oneway age course

Also try this with dialogue box using db command:

Stata: db oneway

Page 31: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

31

… keep working on your analyses

Page 32: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

32

In a Stata session, you can ask Stata –

• Not to pause or display the –more– message by typing:

Stata : set more off

• To temporarily stop logging by typing the command:

Stata : log off

• To resume logging by typing the command:

Stata : log on

To save time in editing, consider suspending (log off)

the log when it is not necessary (log off to resume).

Page 33: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

33

Step 4: Close your log file to save results

Page 34: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

34

• After you are done with your analyses, you need to close

the log file by clicking on this icon

and then select „Close log file‟

• You can close your log file by typing the command:

Stata: log close

• To view your log file, click on this icon

and select „View snapshot of log‟

• To exit from Stata, type the command:

Stata: exit

Page 35: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

35

• Once you close your log file, you can view the log using

any text editor or word processor and print the log as you

would print any text file.

• If your log file has .smcl extension, you can translate that

into a text file using the command:

Stata : translate h:\mylog.smcl h:\mylog.txt

• To print a graph, select Print from File menu in a

Graph window

• You can place your graph into a Word document by selecting Copy from Edit menu from Graph window and

then Paste it into your document.

Page 36: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

36

In a Stata session, you can write and save your Stata

commands in a file, called do-file (like a syntax file in

SPSS).

Do-file is simply a text file that contains a list of Stata

commands that you can use during a Stata session or

can save them for later use.

Any command which can be executed from the command

line can be placed (copy and paste) in a do-file.

Do-file allows you to execute several commands at once

and also makes it easier to identify and correct mistakes

Dealing with Do-files

Page 37: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

37

• To open a Do-File Editor, select this icon in Stata window

• A new do-file can be created by typing the command:

Stata: doedit

• To reopen a do-file (say, ana_1.do), type the command:

Stata: doedit h:\ana_1.do

• To run the do-file (ana1.do), type the command:

Stata: do h:\ana_1.do

• A do-file is typically saved with .do extension.

Page 38: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

38

An example of a do-file

To run commands in do-

file, click on this icon

Page 39: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

39

Data Transformation

Stat/Transfer, a file transfer utility software, makes it easy

to move data among the different spreadsheets and

statistical programs by providing a fast and reliable way

to convert data files from one format to another.

We can use Stat/Transfer to convert SPSS or Excel data-

files (e.g. bioqdata.sav or bioqdata.xls) into a Stata data-

file with .dta extension (e.g. bioqdata.dta)

NB: Stat/Transfer is available for Windows, Mac OS X, and Unix.

Page 40: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

40

Open the software by choosing Stat/Transfer from the

Program list

Page 41: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

41

You will need to select the input file type, the file itself,

the output file type and the output file.

Finally, click the "Transfer" button to complete the conversion

Page 42: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

42

Stat/Transfer will show you the progress it's making as it converts the file

Ref: Keown L (2004) Producing efficient data files using Stat/Transfer,

Information and Technical Bulletin 1(1):13-19.

Page 43: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

43

Import Excel Data into Stata

• Launch Excel and read in your Excel file (mydata.exl)

• Save your Excel file as an XML Spreadsheet 2003

• Launch Stata and go to:

File > Import > XML Data > Browse for “mydata.xml”

– Tick “Excel Spreadsheet” box

– Tick “First row in variable names” box

– Click „OK‟

You can use Stata command to import Excel spreadsheet:

Stata: xmluse "H:\mydata.xml", doctype(excel) firstrow

• Save the data as a Stata dataset using the save as

command

Page 44: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

44

Getting Help

Within Stata:

• To get help in a particular command (e.g. regression)

help regression

• To obtain all references to a topic (e.g. logistic)

search logistic

• To find relevant commands on a topic (e.g. anova)

findit anova

Online Stata support @ www.stata.com/support

AU/NZ distributor for Stata & Stat/Transfer

www.survey-design.com.au

– Stata GradPlan arrangements for students

http://www.survey-design.com.au/gradplan.html

Page 45: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

45

Some References

• Acock A (2008) A Gentle Introduction to Stata, 2nd edition,

Stata Press.

• Hamilton L (2009) Statistics with Stata, Version 10, 7th

edition, Stata Press

• Juul S (2008) An Introduction to Stata for Health

Researchers, 2nd edition, Stata Press

• Hills M & De Stavola B (2007) A Short Introduction to

Stata for Biostatistics, 1st edition, Timberlake Consultants

• Mitchell M (2008) A Visual Guide to Stata Graphics, 2nd

edition, Stata Press

• Stata Manuals (Release 10)

Page 46: Introduction to Stata - School of Health and ... · PDF fileStep 1: Open a log file to store results Step 2: Load/enter your data into Stata Step 3: Explore, manipulate and analyse

46

Thank you

Comments

Questions