Download pptx - Stata Learning From Treiman

Doing analysis using stata 10.0stata 作图（续）TreimanUcla易黠于善国家中2009.2.18

目录1 •基础准备

2 •绘图编程

3 •Lowess

4 •NL

Stata, created by Statacorp, is a statistical program used by many businesses and academic institutions around the world. Most of its users work in research, especially in the field of economics and epidemiology.

Stata's full range of capabilities includes:

Why I need it？

• Data management• Statistical analysis• Graphics• Simulations• Custom Programming

dict://key.317097B6E814E043839A3467BA8F4CCE/Statacorp

dict://key.317097B6E814E043839A3467BA8F4CCE/research

dict://key.317097B6E814E043839A3467BA8F4CCE/economics

dict://key.317097B6E814E043839A3467BA8F4CCE/epidemiology

Do Everything with -do- Files• capture log close• log using class.log,replace• #delimit;• version 10.0;• set more 1;• clear;• program drop _all;• set mem 100m;• *CLASS.DO (DJT initiated 5/19/99, last revised 2/4/08);• *This do-file creates computations for a paper on literacy in• China.;• use d:\china\survey\data\china07.dta;• log close;

Document Your -do- File Exhaustively•the editorial review process often takes a

very long time. If you have not documented your work, you may have a great deal of trouble remembering why you have done what you have done.

• include comments summarizing the outcome of each set of commands


2 •绘图编程

3 •Lowess

4 •NL

Chapter7 gssy2004case 1 curvilinear relationshipreg inc age agesq if good==1

My method：gen inc1=3088.804*age-27.77605*agesq-15554.28

3000

040

000

5000

060

000

7000

0in

c1

20 40 60 80 100age of respondent

我的作图的确有问题，看看 Treiman 的图0

5000

1000

01500

02000

02500

03000

03500

04000

04500

05000

0E

xpec

ted

Inco

me

in 2

003

20 24 28 32 36 40 44 48 52 56 60 64Age in 2004

• *Mark the good data.;• mark good if inc~=. &

age>19 & age<65;

• *Do the regression and make a predicted value.;

• reg inc age agesq if good==1;

• *Get the transformed coefficients.;

• gen m=_b[_cons]-(_b[age]^2)/(4*_b[agesq]);

• gen F=(-_b[age])/(2*_b[agesq]);

• l m F in 1;• 仅仅 list 一个对应的m和 F

严格的作图 syntaxNot simple

• lab var age "Age in 2004" lab var xinc "Expected

Income in 2003"• . graph twoway (scatter xinc

age,sort connect(l) clwidth(medthick) clpattern(solid) mcolor(black) msymbol(i)), plotregion

• > (style(none)) xlab(20(4)64) ylab(0(5000)50000) saving(ch07fig1.gph,replace)

• 定义坐标• Lab=label ，例如将 age 标

记为 age in 2004• Sort connect （ l ）直线连

接 ?• Clwidth?• Clpattern?• Mcolor?• ms(I)• Plotregion?

记住这个格式gr tw (sc xinc age, sort connect(l) ms(i))sort 前空格， tw 后空格。我用简化的 graph tw（ sc xinc age， sort connect（ l））作图

怎么去掉这些点？只需要加上ms（ i）即可。

2000

030

000

4000

050

000

6000

0E

xpec

ted

Inco

me

in 2

003

20 30 40 50 60Age in 2004

2000

030

000

4000

050

000

6000

0E

xpec

ted

Inco

me

in 2

003

20 30 40 50 60Age in 2004

Case 2 二分类变量、平均值、相互控制reg lninc educ hrs male if good==1;• *Make graphs of the relationship between education and income, by sex,

for those who work an average number of hours.;• 显然大家都在用平均值来处理这种变量的交互叠加的效果。• *First, get the expected values evaluated at mean hours.;• *Get the mean hours worked, which I need below.;

• sum hrs if good==1;• gen mhrs=r(mean) if good==1;• 跟我们的做法如出一辙：都是获取相应的 mean 。• Omg~~~~ 这个程序写起来也不是那么顺畅。一次只能 return 一个 mean 。

• gen xincm=_b[_cons]+_b[educ]*educ+_b[hrs]*mhrs+_b[male] if male==1 & good==1;

• Gen xincf=_b[_cons]+_b[educ]*educ+_b[hrs]*mhrs if male==0 & good==1;

• 让你写出来这么多东西，也是一种痛苦吧。 So many –b[],torture~~

Sum 、 gen 、 return 命令组合获取均值的方法

• graph twoway (scatter xincm educ, sort connect(l) clwidth(medthick) clpattern(solid) mcolor(black) msymbol(i))

• (scatter xincf educ, sort connect(l) clwidth(medthick) clpattern(solid) mcolor(black) msymbol(O)), plotregion(style(none))

• legend ( label(1 "Males") label(2 "Females") cols(1) ring(0) position(11)) xlab(0(4)20) ylab(8(1)11) xtick(1(1)20) ytick(8(.25)11.5) l1("Expected ln(Income) in 2003") saving(ch07fig2.gph,replace)

89

1011

0 4 8 12 16 20Years of Schooling

MalessFemales

Exp

ect

ed

ln(I

nco

me

) in

20

03

I hate to clean the syntax

010

000

2000

030

000

4000

050

000

6000

070

000

0 4 8 12 16 20highest year of school completed

MalesFemales

Exp

ect

ed

In

com

e in

20

03

Predict 命令case3 没想到吗，随处可见的平方• sysuse auto, clear• generate weight2 =

weight^2• regress mpg weight

weight2 foreign• webuse newautos, clear• generate weight2 =

weight^2• *Obtain out-of-sample

prediction using another dataset

• predict mpg

• but ， how stata user know that ？

Sysuse auto， cleargenerate weight2 = weight^2regress mpg weight weight2 foreign. gen mpgf=_b[_cons]+_b[weight]*weight+_b[weight2]*weight2+_b[foreign]. gen mpgd=_b[_cons]+_b[weight]*weight+_b[weight2]*weight2. gr tw (sc mpgf weight, sort connect(l) ms(i)) (sc mpgd weight, sort connect(l) ms(i))绘图 Lowess mpg weight

• gr tw (sc mpgf weight, sort connect(l) ms(i))

• (sc mpgd weight, sort connect(l) ms(i)),

• legend ( label(1 "foreign") label(2 "domestic"))

10

20

30

40

Mile

age

(m

pg)

2,000 3,000 4,000 5,000Weight (lbs.)

bandwidth = .8Lowess smoother

Stata 探测之方法

Lowess 命令

Edit the graph10

1520

2530

35

2,000 3,000 4,000 5,000Weight (lbs.)

foreign domestic

Set scheme economistgr tw (sc mpgf weight, sort connect(l) ms(i)) (sc mpgd weight, sort connect(l) ms(i)) ,legend(label(1 "foreign") label (2 "domestic")) ytitle("y=mileage") xtitle("x=weight")

10

15

20

25

30

35

2,000 3,000 4,000 5,000Weight (lbs.)

mpgf mpgd

10

15

20

25

30

35

y=m

ileag

e

2,000 3,000 4,000 5,000x=weight

foreign domestic


2 •绘图编程

3 •Lowess

4 •NL

Lowess 命令——加权回归 • Lowess is a statistical technique for plotting a smooth

curve through a set of data points in a scattergram. lowess carries out a locally weighted regression of yvar on xvar, displays the graph, and optionally saves the smoothed variable.

• A scattergram is a plot of various data points in a graph with a predictor variable as its x-axis and a criterion variable as its y-axis.

• Lowess is a version of a locally weighted scatterplot smoothing technique. Each smoothed value is determined by a linear polynomial taking into account the values of data within a particular span of values of the criterion variable, but giving most weight to the central value of the span, less and less weight to more distant values, and zero weight to values outside the span. The span is then moved along the x-axis and a new smoothed value computed. The size of the span is set by a tension factor determining the proportion of the data points to be included in the span.

• Warning: lowess is computationally intensive and may therefore take a long time to run on a slow computer. Lowess calculations on 1,000 observations, for instance, require performing 1,000 regressions.

dict://key.317097B6E814E043839A3467BA8F4CCE/statistical

dict://key.317097B6E814E043839A3467BA8F4CCE/scattergram

dict://key.317097B6E814E043839A3467BA8F4CCE/tension

案例 1 : 手机上网频率与个人月收入之间的关系 gr tw (sc b2am inc,sort connect(l) ms(i)),ytitle(“y= b2am=_b[_cons]+_b[inc]*inc+_b[inc2]*inc2”) xtitle(“x=inc”)

1.8

2

2.2

2.4

2.6

y= b

2am

=_

b[_

cons

]+_

b[in

c]*i

nc+

_b[in

c2]*

inc2

0 5 10 15x=inc

0

2

4

6

8

b2a.您手机上网的频率大约是：（单选）

0 5 10 15p9. ? , ,请问您的个人税前月收入大约是多少请包括所有奖金工资津贴等在内。（单选）

bandwidth = .8Lowess smoother

案例 2 China 07.dtagr tw lowess newsnow income

Lowess newsnow income

1.5

22.

53

3.5

low

ess

new

snow

inco

me

0 2000 4000 6000 8000total monthly income from job

11.

52

2.5

33.

5ho

w o

ften

rea

d ne

wsp

aper

0 2000 4000 6000 8000total monthly income from job

bandwidth = .8

Lowess smoother

• 其实办法都差不多

• Ms 大家都这样拿平均值开刀，没有什么神秘的地方，不同的是Treiman 的do.file的帮助很大。让你完整的复现他的工作。这是很了不起的见解。

lawrence.C.HamiltonSTATISTICS WITH

STATA

Use graph combine graph to compare different x variable effect•Graph combine fig08_11.gph

fig08_12.gph, ycommon cols(2) scale(1.25)


2 •绘图编程

3 •Lowess

4 •NL

Nl 命令——我不喜欢，太死板了。

案例 1

• nl exp2 y1 x• predict yhat1• gr tw sc y1 x• line yhat1 x, sort• legend(off)

ytitle(“y1=10*1.03^x+e”) xtitle(“x”)

案例 2