STATA Tips Brinda

Using STATA: Tips for Beginner’s Opening STATA Once the w-stata icon is clicked four windows appear. These four windows constitute the workspace. • Command (right bottom): The space where the command is typed and is usually empty after the

operation is performed. • Results (right top- black screen): The results of any statistical or econometric operation get

displayed here. • Review (left top): All the commands that have been typed appear here and can be clicked for

using again rather than typing a command again. • Variables (left bottom): Once the data file is opened all the variable names are displayed here

Increase Memory STATA has limited memory space as default option and to work on larger problems the memory size needs to be increased. This is done by typing the following command on the “command editor”:

Set mem 30m This will immediately get displayed on the “Results” window. The memory space can be allotted according to the size of the data and calculations. If enough memory space is not allocated, this will automatically be prompted as “increase memory” when the estimations are carried out. The set mem ----- can be retyped with an increased value to increase the memory.

Importing Data into STATA Any statistical software has its own data storing mechanism. Quite often the data is either stores in

spreadsheets like excel or in a format of any other software or in ASCI format (which is a free format). Option (a) or (b) given below can be used for transferring data. For very large data sets (a) is the

only option and (b) is preferred when the data set is small. (a) also gives the flexibility to convert data stored in any format to be re-changed to STATA format and vice-versa.

(a) Using Stat-transfer

• Click on the Stat-transfer icon and choose the “Input File Type” as EXCEL in the top window. • Browse for the file name, say it is stored in d:\data\myname\asgn2.xls • Chose the data format as “STATA” in the “Input File Type” as STATA. • Give a new name or else the same file name will be retained with extension as “dta”; that is the new file will be stored in the same directory with the name as asgn2.dta. • This new data set is ready for work using STATA

(b) Using STATA Browser

• Open STATA and click on “data editor” on top. • The data from excel worksheet can be copied and pasted on to this area. • Then shut off this window and click on “save as” and give a file name at the appropriate location. • This data set will be displayed on the screen and is ready for immediate use.

Data Browser The data after “importing” can be seen • using the data browser menu on top

OR • typing edit on the command window.

• If an already existing data is to be opened: • Click on file->open-> file name (with the extension .dta)

OR • Type on the “command” window:

• use "D:\data\myname\------.dta"

1

Storing results Before performing any regression or calculations it is necessary that the output is stored in a file so

that it can be opened as and when necessary. The output file is stored with an extension smcl. (a) The following command can be typed:

• log using d:\data\myname\res1.smcl OR

• File-> log-> begin can be used on the menu and then the file name is specified on the screen as the prompt comes giving the name “res1”

After completing the analysis type • Log close

This will stop storing any more results in the file res1.smcl

If the analysis is to be continued and the newer results have to be (a) replaced in the same file res1.smcl then type

• log using d:\data\myname\res1.smcl, replace (b) added (appended) to the older set of results then type:

• log using d:\data\myname\res1.smcl, append If the results have to be viewed then click on the menu bar

• File->log->view->file name. The output file will be displayed.

The results from this file can be blocked and copied and taken to excel or word for further use OR

The following command will convert that into a text file which can be directly opened in any software.

• On the menu click on File->log->translate. • In the top window, indicate the smcl file that needs to be translated and below the name in which it needs to be stored. This new file needs to be stored with a .log extension. • So one can use the same file name as the original output file and then change the extension from smcl to log.

Program file

(a) It is preferable to use a ‘do’ file to carry out the regressions. A do file is a program file where all the commands are written one after another and the entire set of operations are performed together by calling for this program. For this a separate do file has to be created (say) try.do in the same directory where you are working. To do this click on the DO-file editor on the top menu bar and write down all the commands. Then save it by giving a name (say) try.do. Then click on File->Do followed by the required file name (try.do here) and the program starts running. The results of this program will be stored in an output file with the extension smcl as specified in the do file. (b) An easier option is type the required command on the command window. After one command is executed another command is to be specified. • The commands that have been typed on the command window appear on the Review window. These can be re-clicked for repeating a command. • All the commands that have been typed can be saved by clicking on the review icon and “Save Review contents”. Once the file name is specified after the window prompt comes it will be saved as a do file. This do file can be reopened for working again.

2

Creating new variables and transforming variables

egen creates a constant

mean income is generated and stored in the ‘variable’ minc • egen minc= mean(income)

The ‘variable’ interc generates a constant with value across all observations • egen interc= 1 gen creates new variables that are usually transformation or dummy variables

A new varaible dev is generated that is deviation of income for an observation with the mean income generated above

• gen dev= income-minc Similarly logarithms and any other mathematical function • gen ln(w)= log(w) • gen agesq= age*age • gen incsq= age^2

To create a dummy variable two options are there (a) More useful for just two categories as in gender • gen dgen= (gen==2) • gen drur= (urb_rur==2) (b) for multiple categories gen dpoorst=1 if wlthindx==1 mvencode dpoorst, mv(0) gen dpoor=1 if wlthindx==2 mvencode dpoor, mv(0) gen dmid=1 if wlthindx==3 mvencode dmid, mv(0) gen drich=1 if wlthindx==4 mvencode drich, mv(0) gen drichst=1 if wlthindx==5 mvencode drichst, mv(0)

3

Basic Regression commands Click on help and search for “regress”. Look at the example given at the bottom to see how the variables need to be specified. Then type this command on the screen and see the output. If you have already specified an output file name then the results will automatically be stored there, else it will be erased once STATA is shut off. The regression command is as given below • reg hgt age

Here hgt is the dependent variable representing the height of individuals and age is the independent variable representing the age of the individuals under study.

• reg hgt age drur

Here hgt is the dependent variable representing the height of individuals and age is the independent variable representing the age of the individuals under study. There is one more independent variable drur that is a dummy variable taking on 1 when the recorded height is for a female and 0 for a male.

Use the predict command to store the predicted values

• predict yhat

The residuals by typing on screen as given below • predict uhat, resid

Save the data file now so that the variables yhat and uhat are stored in your data set. These new variables can now be used to perform several operations further.

For instance if one has to plot yhat against w, then use

• plot yhat w

Save the graph as it comes on to the screen with a file name so that it can be retrieved later. Regressions with dummy independent variables

xi command • xi: reg hgt age i.urb_rur

Note here that the dummy variable is in a different format to capture the rural-urban contrast. In this case the computer chooses by default the first category as the base group to compare. So you can choose to either have a group of your choice using the command:

• char urb_rur [omit] 1 This takes the base group as that which has urb_rur values as 1.

or the most prevalent group using the command • char urb_rur [omit] "prevalent" This takes the base group with urb_rur values as the most prevalent group. • char urb_rur [omit] This restores the default group

4

Documents

STATA Tips Brinda