View
219
Download
0
Category
Preview:
DESCRIPTION
Lets start with the data… File: Wednesday.dat Contains 6 of the variables from Dorret’s example ntrid zygMZDZ age1 sekse1 AQ1 age2 sekse2 AQ2
Citation preview
Means, Thresholds and Moderation
Sarah Medland – Boulder 2008Corrected VersionThanks to Hongyan Du for pointing out the error on the regression examples
This morning
Fitting a mean and regression with continuous data
Modelling Ordinal data Fitting the regression model with
ordinal data
Lets start with the data…
File: Wednesday.dat Contains 6 of the variables from
Dorret’s example ntrid zygMZDZ age1 sekse1 AQ1 age2
sekse2 AQ2
Lets start with the data…
If this was a pedigree data file…
Famid Ind Father Mother Zyg Sex Age Trait2 1 0 0 0 1 x x2 2 0 0 0 2 x x2 3 1 2 MZ 1 18.12 912 4 1 2 MZ 1 18.12 95
How can we make this data file?
Assume we have data with 3 variables:
How do we make this data?
SPSSSORT CASES BY Family Individual .CASESTOVARS /ID = Family /INDEX = Individual /GROUPBY = VARIABLE .
SAS? R?
Means…
In spss sas etc we calculate the mean
In Mx and other ML programs we estimate the mean
Spss…
Spss…
Mx…
Means.mx
Spss assumes this is a sample
Mx assumes this is a population
Slightly different algebra
How about regression?
Y=X*B +C
Regression speak AutismQuotient = Sex*Beta1 +
Age*Beta2 + Intercept BG speak
AutismQuotient = Sex Effect + Age Effect + Grand Mean
Spss…
regression.mx
regression.mx
regression.mx
Spss…
Coefficientsa
93.564 65.862 1.421 .157.549 3.623 .011 .151 .880
-2.608 1.558 -.125 -1.673 .096
(Constant)age1sekse1
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: AQ1a.
Run regression.mx
Coefficientsa
93.564 65.862 1.421 .157.549 3.623 .011 .151 .880
-2.608 1.558 -.125 -1.673 .096
(Constant)age1sekse1
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: AQ1a.
What does this mean?
Age Beta = .549 For every 1 unit increase in Age the mean
shifts .549 Grand mean =93.564 Mean Age =18.2
So the mean for 20 year olds is predicted to be: 104.544 = 93.564 + 20*.549
Sex effects?
Sex Beta = -2.608 Sex coded Male = 1 Female = 0
Female Mean: 93.564 = 93.564 + 0*-2.608
Male Mean: 90.656 = 93.564 + 1*-2.608
How do we get the p-values?
Coefficientsa
93.564 65.862 1.421 .157.549 3.623 .011 .151 .880
-2.608 1.558 -.125 -1.673 .096
(Constant)age1sekse1
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: AQ1a.
Set the elements to equal 0
Do this one at a time!
So…Coefficientsa
93.564 65.862 1.421 .157.549 3.623 .011 .151 .880
-2.608 1.558 -.125 -1.673 .096
(Constant)age1sekse1
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: AQ1a.
Why bother with Mx?
Because most stat packages can’t handle non-independent data…Non-independence reduces the
variance Biases t and F tests
Why bother with Mx?
Because we want complete flexibility in the model specification…As you see later today
Why bother with Mx?
Because very few packages can handle ordinal data adequately…
Binary data
File: two_cat.dat NI=5 Labels Zyg twin1 twin2 Age Sex Trait – smoking initiation
Never Smoked/Ever Smoked (Recoded from yesterday) Data is sorted to speed up the analysis
Twin 1 smoking initiation
twin1
822 47.5 53.0 53.0730 42.2 47.0 100.0
1552 89.7 100.0179 10.3
1731 100.0
01Total
Valid
SystemMissingTotal
Frequency Percent Valid PercentCumulative
Percent
Twin 1 smoking initiation
Twin 1 smoking initiation
Mean = .47SD =.499Non Smokers =53%
Raw data distributionMean = .47SD =.499Non Smokers =53%Threshold =.53
Standard normal distributionMean = 0SD =1Non Smokers =53%Threshold =.074
Threshold = .074 – Huh what?
How can I work this out Excell
=NORMSINV()
Why do we rescale the data this way?
ConvenienceVariance always 1 Mean is always 0We can interpret the area under a
curve between two z-values as a probability or percentage
Why do we rescale the data this way?
You could use other distributions but you would have to specify the fit function
Threshold.mx
Threshold = .075 – Huh what?
How about age/sex correction?
How about age/sex correction?
What does this mean?
Age Beta = .007 For every 1 unit increase in Age the
threshold shifts .007
What does this mean?
Beta = .007 Threshold is -.1118
38 is +1.38 SD from the mean age The threshold for 38 year olds
is: .1544= -.1118 + .007*38 22 is -1.38 SD from the mean age
The threshold for 38 year olds is: .0422= -.1118 + .007*22
22 year oldsThreshold = .0422
38 year oldsThreshold =.1544
Is the age effect significant?
How to interpret this
The threshold moved slightly to the right as age increases
This means younger people were more likely to have tried smoking than older people But this was not significant
22 year oldsThreshold = .548
38 year oldsThreshold = 1.028
Is the age effect significant?
If Beta = .03
How about the sex effect
Beta = -.05 Threshold = -.1118 Sex coded Male = 1, Female = 0
So the Male threshold is: -.1618= -.1118 + 1*-.05
The Female threshold is: -.1118 =-.1118 + 0*-.05
FemaleThreshold =-.1118
MaleThreshold =-.1618
Are males or females more likely to smoke?
Both effects together
38 year old Males: .1042=-.1118 + 1*-.05 + .007*38
38 year old Females: .1542=-.1118 + 0*-.05 + .007*38
22 year old Males: -.0078=-.1118 + 1*-.05 + .007*22
22 year old Females: .0422=-.1118 + 0*-.05 + .007*22
Mx Threshold Specification: 3+ Cat.
-3 31.20-1
2.2
Threshold matrix: T Full 2 2 Free
1st threshold
Twin 1 Twin 2
increment
Mx Threshold Model: Thresholds L*T /
Threshold matrix: T Full 2 2 Free
1st threshold
Twin 1 Twin 2
increment
Mx Threshold Specification: 3+ Cat.
-3 31.20-1
2.2
Mx Threshold Model: Thresholds L*T /
Threshold matrix: T Full 2 2 Free
1st threshold
Twin 1 Twin 2
increment
2nd threshold
Mx Threshold Specification: 3+ Cat.
-3 31.20-1
2.2
Adding a regression
L*T + G@(D*B);
maxth =2, ndef=2, nsib=2, nthr=4
1 1 2G D B
1 1 2sex sex sexage age age
Adding a regression 1 1 2
G D B1 1 2
sex sex sexage age age
B*D =
* 1 * 1 * 2 * 2sex sex age age sex sex age age
G@(B*D) =* 1 * 1 * 2 * 2* 1 * 1 * 2 * 2
sex sex age age sex sex age agesex sex age age sex sex age age
Adding a regression
L*T + G@(B*D) =11 * 1 * 1 12 * 2 * 2
( 11 21) * 1 * 1 ( 12 22) * 2 * 2t sex sex age age t sex sex age age
t t sex sex age age t t sex sex age age
Multivariate Threshold Models
Specification in Mx
Thanks Kate Morley for these slides
#define nsib 2 ! Number of variables * number of siblings = 2#define maxth 2 ! Maximum number of thresholds#define nvar 2 ! Number of variables#define ndef 1 ! Number of definition variables#define nthr 4 ! nsib x nvar#NGROUPS 8
G1: MZ FemalesData NInput=8 Ordinal File=data.datLabelsfamID zyg covar_a covar_b var1_a var2_a var1_b var2_bSelect if zyg = 1 / SELECT covar_a covar_b var1_a var2_a var1_b var2_b / DEFINITION_VARIABLE covar_a covar_b /
BEGIN MATRICES;X Lower nvar nvar Free ! Genetic pathsY Lower nvar nvar Free ! Common environmental pathsZ Lower nvar nvar Free ! Unique environmental pathsH Full 1 1T Full maxth nthr Free ! ThresholdsB Full nvar ndef Free ! Regression betasL lower maxth maxth ! For converting incremental to cumulative thresholdsG Full maxth 1 ! For duplicating regression betas across thresholdsK Full ndef nsib ! Contains definition variablesEND MATRICES;
Threshold model for multivariate, multiplecategory data with definition variables:
We will break the algebra into two parts:1 - Definition variables;2 - Uncorrected thresholds;and go through it in detail.
Part 1Part 2
Threshold correctionTwin 1Variable 1
Threshold correctionTwin 1Variable 2
Twin 1 Twin 2Definitionvariables
Threshold correctionTwin 2Variable 2
Threshold correctionTwin 2Variable 1
Transpose:
Thresholds 1 & 2Twin 1Variable 1 Thresholds 1 & 2
Twin 1Variable 2
Thresholds 1 & 2Twin 2Variable 1
Thresholds 1 & 2Twin 2Variable 2
=
http://davidmlane.com/hyperstat/z_table.html
Recommended