View
944
Download
2
Category
Tags:
Preview:
Citation preview
1
Introduction to Data Analisys with StataSara Godoy.Grupo Avanzado. Noviembre 2011
+
Nonparametric Analysis
+
Non-Parametric tests: SummaryNATURE OF DEPENDENT VBL. ONE-SAMPLE TWO-SAMPLE K-SAMPLE
RELATED/MATCHED
INDEPENDENT
INDEPENDENT
CATEGORICAL/NOMINAL
Binomial test
McNemar test
Fisher s exact test WilconxonMann Whitney test
Chi-square test
ORDINAL/INTERVAL
KolmogorovSmirnov onesample test
Wilcoxon signed ranks test
Kruskal Wallis test
+
Non-parametric correlationA Spearman correlation is used when one or both of the variables are not assumed to be normally distributed and interval (but are assumed to be ordinal). The values of the variables are converted in ranks and then correlated. ! Syntax: spearman [varlist] [if] ,[options]!
spearman read write Number of obs = 200 Spearman's rho = 0.6167 Test of Ho: read and write are independent Prob > |t| = 0.0000 The results suggest that the relationship between read and write (rho = 0.6167, p = 0.000) is statistically significant.
+
P-values meaningA p-value is a measure of how much evidence we have against the null hypothesis (H0) ! The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.! !
One often "rejects the null hypothesis" when the p-value is less than the significance level:! ! !
p F = R-squared = Root MSE =
xi: regress csat expense percent income high college i.region, robust
50 69.82 0.0000 0.9111 21.492
csat expense percent income high college _Iregion_2 _Iregion_3 _Iregion_4 _cons
Coef. -.002021 -3.007647 -.1674421 1.814731 4.670564 69.45333 25.39701 34.57704 808.0206
Robust Std. Err. .0035883 .2358047 1.196409 1.02694 1.599798 17.99933 12.52558 9.44989 67.86418
t -0.56 -12.75 -0.14 1.77 2.92 3.86 2.03 3.66 11.91
P>|t| 0.576 0.000 0.889 0.085 0.006 0.000 0.049 0.001 0.000
[95% Conf. Interval] -.0092676 -3.483864 -2.583638 -.2592168 1.439705 33.10295 .101086 15.4926 670.9661 .0052256 -2.53143 2.248754 3.888679 7.901422 105.8037 50.69293 53.66149 945.0751
NOTE: By default xi excludes the first value, to select a different value, before running the regression type: . char region[omit] 4 xi: regress csat expense percent income high college i.region, robust This will select Midwest (4) as the reference category for the dummy variables.
+
Regression: correlation matrix!
Below is a correlation matrix for all variables in the model. Numbers are Pearson correlation coefficients, go from -1 to 1. Closer to 1 means strong correlation. A negative value indicates an inverse relationship (roughly, when one goes up the other goes down).pwcorr csat expense percent income high college, star(0.05) sigcsat csat 1.0000 expense percent income high college
expense
-0.4663* 0.0006 -0.8758* 0.0000 -0.4713* 0.0005 0.0858 0.5495 -0.3729* 0.0070
1.0000
percent
0.6509* 0.0000 0.6784* 0.0000 0.3133* 0.0252 0.6400* 0.0000
1.0000
income
0.6733* 0.0000 0.1413 0.3226 0.6091* 0.0000
1.0000
high
0.5099* 0.0001 0.7234* 0.0000
1.0000
college
0.5319* 0.0001
1.0000
+
Regression: graph matrix!
Command graph matrix produces a graphical representation of the correlation matrix by presenting a series of scatterplots for all variables
graph matrix csat expense percent income high college, half maxis (ylabel(none) xlabel(none))
+
Regression: Managing all this outputs! Usually!
when we re running regression, we ll be testing multiple models at a timeCan be difficult to compare results
! Stata
offers several user- friendly options for storing and viewing regression output from multiple models:! !
Store Output: eststo / esttab Outputting into Excel: outreg2
+
Regression: eststo/esttab!We
can store this info in Stata, just type:
regress csat expense, robust eststo model1 regress csat expense college, robust eststo model2 percent income high
xi: regress csat expense college i.region, robust eststo model3
percent
income
high
+
Regression: eststo/esttab!esttab model1 model2 model3 Now Stata will hold your output in . memory until you ask to recall it: (1) (2) csat csat expense -0.0223*** (-6.07) 0.00335 (0.70) -2.618*** (-11.44) 0.106 (0.09) 1.631 (1.73) 2.031 (0.96) (3) csat -0.00202 (-0.56) -3.008*** (-12.75) -0.167 (-0.14) 1.815 (1.77) 4.671** (2.92) 69.45*** (3.86) 25.40* (2.03) 34.58*** (3.66) 1060.7*** (43.55) 51 851.6*** (14.86) 51 808.0*** (11.91) 50
esttab model1 !model2 model3
percent
income
high
college
_Iregion_2
_Iregion_3
_Iregion_4
_cons
N
t statistics in parentheses * p
Recommended