Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
12/8/2014
1
Chapter1BasicBiostatisticsJamalludinAbRahmanMDMPHDepartmentofCommunityMedicineKulliyyahofMedicine
Content
12 August, 2014www.jamalrahman.net
2
Basicpremises– variables,levelofmeasurements,probabilitydistribution
Descriptivestatistics Inferentialstatistics,hypothesistesting
12/8/2014
2
12 August, 2014www.jamalrahman.net
3
Weobserve,webelieve.Whatweobservemightnotbethetruth
…andwecan’tobserveall.Wesample.
12 August, 2014www.jamalrahman.net
4
Parameter Statistic
Population Sample
12/8/2014
3
Variable
12 August, 2014www.jamalrahman.net
5
Characteristicofapopulation Cantakedifferentvalues Data=measurementscollected/observed
12 August, 2014www.jamalrahman.net
6Typeofdata(Levelofmeasurement)
Categorical
Nominal Ordinal
Numerical
Discrete Continuous
e.g.Gender,Race e.g.Cancerstaging,SeverityofCXRfor
PTB
e.g.Parity,Gravida e.g.Hb,RBS,cholesterol.
12/8/2014
4
Normaldistribution
12 August, 2014www.jamalrahman.net
7
; ,1
2
Otherdistributions
12/8/2014www.jamalrahman.net
8
Discretevs.continuousprobabilitydistribution
2,F,Weibull,Binomial&Poisson
12/8/2014
5
CharacteristicsofNormaldistribution
12 August, 2014www.jamalrahman.net
9
Smooth,symmetrical(aroundthemean),uni‐modal,bellshapedcurve
Mean=Median=Mode Skewness =0 Kurtosis=0
TestofNormality
12 August, 2014www.jamalrahman.net
10
Anderson–DarlingTest
CorrectedKolmogorov–SmirnovTest(Lilliefors Test)
Cramér–von‐mises Criterion
D'agostino's K‐squaredTest
Jarque–Bera Test
Pearson'sChi‐squareTest
Shapiro–Francia
Shapiro–Wilk Test
12/8/2014
6
UseNormalitytestwithcaution
12 August, 2014www.jamalrahman.net
11
Smallsamplesalmostalwayspassanormalitytest.NormalitytestshavelittlepowertotellwhetherornotasmallsampleofdatacomesfromaGaussiandistribution.
Withlargesamples,minordeviationsfromnormalitymaybeflaggedasstatisticallysignificant,eventhoughsmalldeviationsfromanormaldistributionwon’taffecttheresultsofattestorANOVA.
Statisticalobjectives
12/8/2014www.jamalrahman.net
12
1. Determinepresenceofdifference(orsimilarity)2. Determinedegreeofdifference3. Determinethedirectionofchanges(outcome)4. Predictchanges(outcomes)
12/8/2014
7
12/8/2014www.jamalrahman.net
13
A B C
IsthereanydifferencebetweenA&B?
HowbigisthedifferencebetweenA&B?
Whichoneistaller?AorB?
IsCdifferentfromA&B?
Isthereanypatternnow?
IftherewillbeD,canyoupredicthowtallisD?
Association
12 August, 2014www.jamalrahman.net
14
avalueandwhoseassociatedvaluemaybechanged
DependentIndependent
e.g.Smoking e.g.LungCancer
12/8/2014
8
Diseasemodel
12 August, 2014www.jamalrahman.net
15
Time
Exposure Exposure
Exposure
Outcome
Diseasemodel(example)
12 August, 2014www.jamalrahman.net
16
Time
Smoking MineralDust
Age
LungCancer
12/8/2014
9
Causalrelationship
12 August, 2014www.jamalrahman.net
17
Outcome
Factor1
Factor2
Factor3
Factor4
Factor5
Factor6Factor7
Data
CategoricalFrequency(count)&Percentage
Numerical
Normal Mean(SD)
NotNormal Median(Range/IQR)
Descriptivestatistics
12 August, 2014www.jamalrahman.net
18
12/8/2014
10
HypothesisTesting
12 August, 2014www.jamalrahman.net
19
Involvemorethanonevariables‐ exposure&outcome,‐ predictor&criterion,‐ risk&disease
TrytoprovethatExposurecausestheDiseasee.g.SmokingcausingLungCancer
Example~Ho:NodifferenceofrisktogetLungCancerbetweensmokerandnon‐smoker
LungCancerNoLungCancer
Smoking 20(18.2%) 90(81.8%)
NotSmoking 5(4.5%) 105(95.5%)
2 (df=1)= 10.150, p =0.001, OR = 4.7 (CI95% 1.7 – 13.0)
Becausep<0.05,werejectH0.Thereforethereisadifferentbetweensmoker&nonsmoker
12 August, 2014www.jamalrahman.net
20
12/8/2014
11
StatisticalTest
12 August, 2014www.jamalrahman.net
21
Univariate~Onedependent&oneindependentMultivariate~Multipledependent&multipleindependentvariable
Whattesttouse?
12 August, 2014www.jamalrahman.net
22
Variable1 Variable2 Test
Categorical Categorical Chi‐square
Categorical(2pop) Numerical(Normal) Independent samplet‐test
Categorical(2pop) Numerical(NotNormal) Mann‐WhitneyUtest
Categorical(>2pop) Numerical(Normal) One‐wayANOVA
Categorical(>2pop) Numerical(NotNormal) Kruskal‐Wallis test
Numerical(Normal) Numerical(Normal) Pearson CorrelationCoefficientTest
Numerical(Normal/NotNormal)
Numerical(NotNormal) Spearman CorrelationCoefficientTest
Numerical(Normal) Numerical(Normal) – Paired Pairedt‐test
Numerical(NotNormal) Numerical(NotNormal)–Paired
Friedmantest
12/8/2014
12
Butlifeisnotsimple!
12 August, 2014www.jamalrahman.net
23
LungCancer
Smoking Exposuretomineraldust
Radiation
Outcome
Noofcigarettesmokedperday
RadiationAbsorbeddose(mGy)perday
PM2.5,PM10(g/m3)
Factor Factor
Factor
Themultivariatemodel
12/8/2014www.jamalrahman.net
24
LungCA=Smoking+Radiation+Mineraldust+Others
Regression Residual
AgoodmodeliswhenRegression>Residual
12/8/2014
13
MultivariateAnalysis
12 August, 2014www.jamalrahman.net
25
Hypothesistesting&controlforconfounders– e.g.GeneralLinearModel,LogisticRegression
Modeling– e.g.LinearRegression
Datareduction– e.g.FactorAnalysis,ClusterAnalysis
Writingplanforstatisticalanalysis#1
12 August, 2014www.jamalrahman.net
26
DatawereanalyzedusingthecomplexsamplefunctionofSPSS(version13.0).Samplingerrorswereestimatedusingtheprimarysamplingunitsandstrataprovidedinthedataset.Samplingweightswereusedtoadjustfornonresponsebiasandtheoversamplingofblacks,MexicanAmericans,andtheelderlyinNHANES.Theprevalenceofhypertension,aswellastheawareness,treatment,andcontrolrates,wereageadjustedbydirectstandardizationtotheUS2000standardpopulation.Toanalyzedifferencesovertime,the2003–2004datawerecomparedwiththe1999–2000data.Estimateswithacoefficientofvariation>0.3wereconsideredunreliable.A2‐tailedPvalue<0.05wasconsideredstatisticallysignificant.(Ongetal.2009)
12/8/2014
14
Writingplanforstatisticalanalysis#2
12 August, 2014www.jamalrahman.net
27
Toassesstheeffectoftheselectionprocessonthecharacteristicsofthecases,wecomparedcasesincludedinthefinalanalysistotherestofthecases.Sincecontrolsincludedinthepresentanalysisweredifferentfromtherestofthediabetesfreeparticipantsbydesign,nosimilarcomparisonswereperformedforthatgroup.Tocomparebaselinecharacteristicsofcasesandcontrolsappropriateunivariatestatisticswereused.SimilarbinarylogisticandmultiplelinearregressionmodelswerebuiltwithincidentdiabetesorHbA1casrespectiveoutcomesandadditiveblockentryofadiponectin andpotentialconfounders.ForlinearregressionCRPandtriglycerideswerelogtransformed.SinceHbA1ccouldbemodifiedbydrugtreatment,weranasensitivityanalysisexcludingallparticipantsonantidiabetic medication.Ap‐valueof<0.05wasconsideredsignificant.AnalyseswereperformedwithSPSS14.0forWindows.
Reportinganalysis(example)
12 August, 2014www.jamalrahman.net
28
12/8/2014
15
29
ww
w.j
amal
rah
man
.ne
t12
Aug
ust,
2014
Reportinganalysis(example)
Reportinganalysis(example)
12 August, 2014www.jamalrahman.net
30
12/8/2014
16
Summary1. Identify&definevariables2. Type– independentvs.dependent3. Levelofmeasurements– nominal,ordinalor
continuous4. Checkdistribution– Normalvs.NotNormal5. Decidewhattodo‐ descriptivevs.analytical
Chapter2IntroductiontoSPSSIBMSPSSStatisticsv21forWindows
JamalludinAbRahmanMDMPHDepartmentofCommunityMedicineKulliyyahofMedicine
12/8/2014
17
IBMSPSSStatistics
IBMCorporationSoftwareGroupRoute100Somers,NY10589ProducedintheUnitedStatesofAmericaMay2012
12/8/2014www.jamalrahman.net
33
12/8/2014www.jamalrahman.net
34
SPSSisNOT
StatisticalPackageforSocialSciences
anymore!!!
12/8/2014
18
SPSSLAYOUT
12/8/2014www.jamalrahman.net
35
Layout
12/8/2014www.jamalrahman.net
36Mainmenu
Toolbar
Variables
12/8/2014
19
Dataeditor
12/8/2014www.jamalrahman.net
37
Rows=variablesRows=eachdata
Define&describeyourvariableshere
Enteryourdatahere
Viewer
12/8/2014www.jamalrahman.net
38
Theoutputofanalyseswillbedisplayedhere.Outputisseparatedfrom
data
12/8/2014
20
Syntax
12/8/2014www.jamalrahman.net
39
Wecancompileallthestepsoftheanalyseshere.ExtendtheprogrammingfunctioninSPSS.Abilitytoperformcomplexstepse.g.
“looping”
CREATINGDATASET
12/8/2014www.jamalrahman.net
40
12/8/2014
21
BeforeevenyoustartSPSS!
12/8/2014www.jamalrahman.net
41
Youmustidentify&definerelevantvariables Definemeans
1. Name– preferablyshortsinglename,beginswithalphabet,nospecialcharacter,nospace
2. Typeofdata– e.g.Numeric,Date,String3. Width&DecimalPlaces(ifnumeric)4. Label– descriptionfortheName(willbedisplayedinViewer)5. Values– labelsforvaluee.g.1=Male,2=Female6. Missing– definemissingvaluee.g.999forN/A
Defineyourvariables
12/8/2014www.jamalrahman.net
42
12/8/2014
22
VariableTypes
12/8/2014www.jamalrahman.net
43
VariableType
12/8/2014www.jamalrahman.net
44
Decidethesuitable
variabletype
Fornumeric,determineWidth&Decimal.
Decimal<Width
ForString,nooptionfor
DecimalPlace
NewoptionforNumerics withleadingzeros
12/8/2014
23
ValueLabels
12/8/2014www.jamalrahman.net
45
MissingValue
12/8/2014www.jamalrahman.net
46
ThisquestionisNotApplicable
tomale
e.g.Assign999torepresentN/A
value&thiswon’tbeincludedinany
analysis
12/8/2014
24
Chapter3DescriptiveStatisticsIBMSPSSStatisticsv22forWindows
JamalludinAbRahmanMDMPHDepartmentofCommunityMedicineKulliyyahofMedicine
Datasetfortheexercise
12/8/2014www.jamalrahman.net
48
Filename:bp.sav Hypothetical Studytodescribefactorsrelatedtobloodpressure N=150 13variables
12/8/2014
25
Retrievefileinformation
12/8/2014www.jamalrahman.net
49
12/8/2014www.jamalrahman.net
12/8/2014
26
Exercise#1
12/8/2014www.jamalrahman.net
51
1. Describesocio‐demographiccharacteristicsoftherespondent
2. Describetheexplanatoryvariables1. Smoke2. BMI3. Glucose4. Cholesterol
3. Measureprevalenceofhighbloodpressure(usingSBP140,DBP90)
DESCRIBENUMERICALDATA
12/8/2014www.jamalrahman.net
52
12/8/2014
27
Explore
12/8/2014www.jamalrahman.net
53
Results
12/8/2014www.jamalrahman.net
54
Check for Normality.Is Age data distributed Normally?
12/8/2014
28
Describeage
12/8/2014www.jamalrahman.net
55
Normal Thesubjectsdistributedbetween18‐56yearsoldwiththeaverageof38(SD=8)years.
IfnotNormal Thesubjectsdistributedbetween18‐56yearsoldwiththemedianof38(IQR=10)years
DESCRIBECATEGORICALDATA
12/8/2014www.jamalrahman.net
56
12/8/2014
29
Frequency
12/8/2014www.jamalrahman.net
57
Result
12/8/2014www.jamalrahman.net
58
12/8/2014
30
TRANSFORM
12/8/2014www.jamalrahman.net
59
Compute
12/8/2014www.jamalrahman.net
60
12/8/2014
31
12/8/2014www.jamalrahman.net
61
weight / ((height / 100) ** 2)
Visualbinning
12/8/2014www.jamalrahman.net
62
12/8/2014
32
12/8/2014www.jamalrahman.net
63
Normal < 23Overweight 23 to < 27.5Obese >= 27.5
12/8/2014www.jamalrahman.net
64
12/8/2014
33
Chapter4Bivariable analysesIBMSPSSStatisticsv21forWindows
JamalludinAbRahmanMDMPHDepartmentofCommunityMedicineKulliyyahofMedicine
Todetermineassociationoftwovariables?
12/8/2014www.jamalrahman.net
66
BPAge
12/8/2014
34
Thesteps
12/8/2014www.jamalrahman.net
67
1. Determinewhichisdependant&whichisindependent
2. Determinelevelofmeasurements3. CheckNormalityofthenumericalmeasurement4. Determinethesuitablestatisticaltest
Whatarethetests?
12 August, 2014www.jamalrahman.net
68
Variable1 Variable2 Test
Categorical Categorical Chi‐square
Categorical(2pop) Numerical(Normal) Independent samplet‐test
Categorical(2pop) Numerical(NotNormal) Mann‐WhitneyUtest
Categorical(>2pop) Numerical(Normal) One‐wayANOVA
Categorical(>2pop) Numerical(NotNormal) Kruskal‐Wallis test
Numerical(Normal) Numerical(Normal) Pearson CorrelationCoefficientTest
Numerical(Normal/NotNormal)
Numerical(NotNormal) Spearman CorrelationCoefficientTest
Numerical(Normal) Numerical(Normal) – Paired Pairedt‐test
Numerical(NotNormal) Numerical(NotNormal)–Paired
Friedmantest
12/8/2014
35
Exercise#2
12/8/2014www.jamalrahman.net
69
1. Determineassociationbetweensocio‐demographiccharacteristics&alltheriskfactorswithBP
2. Determineassociationbetweensocio‐demographiccharacteristics&alltheriskfactorswithHCY
Note:Itwouldbegoodifyoucouldconstructdummytable fortheanswersevenbeforetheanalysesstarted
HCYnormalrange
12/8/2014www.jamalrahman.net
70
12/8/2014
36
INDEPENDENTSAMPLEt‐TEST
COMPARING TWOMEANS
12/8/2014www.jamalrahman.net
71
Agevs.BP
12/8/2014www.jamalrahman.net
72
12/8/2014
37
12/8/2014www.jamalrahman.net
73
Results
12/8/2014www.jamalrahman.net
74
Levene’s testcheckequalitybetweenvariances.Ho:Thereisnodifferenceofvariances.SoifPissignificant,werejectHo,andthereforeequalvariancesassumed.
The original t-test (Student’s t-test) assumes equal variances for equal sample sizes. However if the variances are equal, it is robust for different sizes.
Welch's correction
12/8/2014
38
Table– Distributionofagebybloodpressurestatus
12/8/2014www.jamalrahman.net
75
N Mean SD Statistics df P
NormalBP 156 33.9 7.9 t=0.431 299 0.667
HighBP 145 34.4 8.9
CHI‐SQUAREDTEST
DIFFERENCE OF TWOPROPORTIONS
12/8/2014www.jamalrahman.net
76
12/8/2014
39
Gendervs.BP
12/8/2014www.jamalrahman.net
77
12/8/2014www.jamalrahman.net
78
12/8/2014
40
Results
12/8/2014www.jamalrahman.net
79
Describe this table first. What is your impression? 49% women vs. 47% men with high BP
Some books may suggest the use of Continuity Correction at ALL time, but recent simulations showed that CC (or Yate’s correction) is OVERCONSERVATIVE. Hence, use Pearson 2 when < 20% of cells have expected count < 5
When 20% of cells have EC < 5, use Fisher’s Exact Test
This is given because we code the variables using numbers. Can be used to measure P-trend
ONE‐WAYANOVA
COMPARING MORE THANTWOMEANS
12/8/2014www.jamalrahman.net
80
12/8/2014
41
Racevs.HbA1c
12/8/2014www.jamalrahman.net
81
12/8/2014www.jamalrahman.net
82
12/8/2014
42
Results
12/8/2014www.jamalrahman.net
83
Describe these results first. What is your impression? HbA1c between races? 6.4 (SD 2.1) vs. 6.7 (SD 2.2) vs. 6.5 (SD 2.2)
The F test shows that there is no single significant difference between any two groups
Results– BMIstatusvs.HbA1c
12/8/2014www.jamalrahman.net
84
The F test shows that at least there is ONE pair with significant different. Either N vs. OW, N vs. OB or OW vs. OB
We need to run Post-hoc test to determine which of the PAIR is significant.
To decide which Post-hoc test to choose, we have to test for equality of variances i.e. Homogeneity of variances (Levene’s test)
12/8/2014
43
12/8/2014www.jamalrahman.net
85
Results– Posthoc
12/8/2014www.jamalrahman.net
86
The significant difference is only for Normal vs. Obese (P=0.002)
12/8/2014
44
Report
12/8/2014www.jamalrahman.net
87
ThereisasignificantassociationbetweenBMIStatusandHbA1c(F(2,298)=13.129,P<0.001).Post‐hoctestshowedthatObesesubjectshavesignificantlyhigherHbA1ccomparedtoNormalandOverweightsubjects(P=0.001andP<0.001respectively).
MANN‐WHITNEY U
NONPARAMETRIC TESTS
12/8/2014www.jamalrahman.net
88
12/8/2014
45
Gendervs.HCY
12/8/2014www.jamalrahman.net
89
Results
12/8/2014www.jamalrahman.net
90
This ranks table is not to be cited in the research paper. Instead, describe their MEDIAN
12/8/2014
46
KRUSKALLWALLIS
NON‐PARAMETRIC TESTS
12/8/2014www.jamalrahman.net
91
Racevs.HCY
12/8/2014www.jamalrahman.net
92
12/8/2014
47
Results
12/8/2014www.jamalrahman.net
93
CORRELATIONTEST
RELATIONSHIP OF TWONUMERICAL DATA
12/8/2014www.jamalrahman.net
94
12/8/2014
48
Agevs.HbA1c
12/8/2014www.jamalrahman.net
95
Results
12/8/2014www.jamalrahman.net
96
12/8/2014
49
Agevs.HCY
12/8/2014www.jamalrahman.net
97
12/8/2014www.jamalrahman.net
98