122
Basic Biostatistics Prof. Dr. Jamalludin Ab Rahman MD MPH Department of Community Medicine

Basic biostatistics for medical students

Embed Size (px)

Citation preview

Basic BiostatisticsProf. Dr. Jamalludin Ab Rahman MD MPH

Department of Community Medicine

Learning outcomesAt the end of this workshop, you should be able toDescribe about population & sample, causation,

level of measurement, distribution of dataSummarise categorical & numerical dataUse appropriate statistical test for bi-variable

analyses using SPSS

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

2

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

3

We observe, we believe.What we observe might not be the

truth

Population vs. Sample 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

4

Parameter vs. StatisticsParameter

characteristic of the whole population Statistics

characteristic of a sample, presumably measurable.

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

5

Statistics estimate parametersRepresentativeSampling errorDifferent samples yield different estimatesStatistics = Parameter if sampling done properlyHow to prove?

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

6

N = 35Red = 18/35 = 51.4%

N = 6Red = 3/6 = 50%

N = 6Red = 2/6 = 33.3%

N = 6Red = 4/6 = 66.7%

Average % Red = 50% + 33.3% + 66.7%3

= 50%

50% ≈ 51.4%Parameter

Statistics

Statistics Parameter

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

7

Variable & its roleA value and whose associated value may be

changed

DependentIndependent

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

8

CausationRelation of events (cause and effect)But correlation (between two events) does not

(always) imply causationRooster's crow does not cause the sun to riseSwitch does not cause the bulb to light

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

9

Time & causation

Time

Exposure Exposure

Exposure

Outcome

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

10

Time & causation (example)

Time

AgeExposure to

silica

Smoking

Lung Cancer

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

11

Causal Model

Slide 12

OutcomeExposure Exposure Exposure

Exposure

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

13

Type of data(Level of measurement)

Categorical

Nominal Ordinal

Numerical

Discrete Continuous

e.g. Gender, Race e.g. Cancer staging, Severity of CXR for PTB

e.g. Parity, Gravida

e.g. Hb, RBS, cholesterol.

Distribution (shape) of dataApplicable to numerical valueDiscrete or ContinuousDiscrete ~ Binomial, Poisson, Negative Binomial,

Hypergeometry, Multinomial etc.Continuous ~ Normal, t, chi-square, F etc.

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

14

Central limit theorem “Given a distribution with a mean μ and variance

σ², the sampling distribution of the mean approaches a normal distribution with a mean (μ) and a variance σ²/N as N, the sample size, increases” (David M. Lane)

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

15

Normal Distribution 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

16

𝑓𝑓 𝑥𝑥; 𝜇𝜇,𝜎𝜎2 =1

𝜎𝜎 2𝜋𝜋𝑒𝑒−

12(𝑥𝑥−𝜇𝜇𝜎𝜎 )2)

Why Normal?- Because many biological & psychological variables are distributed normally

CharacteristicsBell shaped curveSymmetricalUnimodal

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

17

Test of Normality Anderson–Darling Test Corrected Kolmogorov–Smirnov Test (Lilliefors Test) Cramér–von-Mises Criterion D'agostino's K-squared Test Jarque–Bera Test Pearson's Chi-square Test Shapiro–Francia Shapiro–Wilk Test

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

18

Use Normality test with cautionSmall samples almost always pass a normality

test. Normality tests have little power to tell whether or not a small sample of data comes from a Gaussian distribution.

With large samples, minor deviations from normality may be flagged as statistically significant, even though small deviations from a normal distribution won’t affect the results of a t test or ANOVA.

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

19

Why run statistical test? 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

20

1. Determine presence of difference (or similarity)2. Determine degree of difference3. Determine the direction of changes (trend)

Predict changes (outcomes)

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

21

A B C

Is there any difference between A & B?

How big is the difference between A & B?

Which one is taller? A or B?

Is C different from A & B?

Is there any pattern now?

If there will be D, can you predict how tall is D?

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

22

Statistical analysis

AnalyticalDescriptive

e.g. Describe socio-demographic characteristics -Age, Sex, Race etc.

e.g. Prevalence of hypertension.

e.g. Compare demographic characteristics between two population – Compare age between male & female

e.g. Distribution of gender by hypertension status

e.g. How demographic characteristics (more than one factors) explain hypertension

Univariable Bivariable Multivariable

IV DV IV DV IV DV IV

Descriptive statistics

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 23

Descriptive StatisticsExplain one variable at one timeMethod based on type of measureCategorical

Frequency (Percentage)Numerical

Central measures (e.g. mean, median) & Dispersion (e.g. variance, standard deviation, range, min-max, interquartile range)

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

24

Data

CategoricalFrequency (count) &

Percentage

Numerical

Normal Mean (SD)

Not Normal Median (Range/IQR)

How to describe a data 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

25

Analytical statistics

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 26

Comparing differenceA

B C

Which of the following shows true difference between two populations?

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

27

3 methods to compare values1. P-value2. Confidence interval3. Effect size

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s

28

Hypothesis testingTruth

Ho True Ho FalseTe

st

Do notreject Ho Correct Type II error

(β)

Reject Ho Type I error (α) Correct

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s

29Please revise the concept on your own

P valueP-value is ‘likely’ or ‘unlikely’ that Ho is true Taking 0.05 as the cut-off point (α), if P ≤ 0.05, it is

then ‘unlikely’ Ho is true, therefore reject Ho

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s

30

One-tailed vs. two-tailed Is there a difference between Hb 14 g% vs. Hb 12

g% in male & female respectively?Ho: HbM – HbF = 0H1: HbM = HbFH2: HbM > HbFH3: HbM < HbM

Note: Should be determined a priori

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

31

One-tailed vs. two-tailed

Two-sided Right-sidedLeft-sided

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

32

Hypothesis testingTruth

Ho True Ho False

Test

Do notreject Ho Correct Type II error

(β)

Reject Ho Type I error (α) Correct

P-value is the probability to make Type I error (based on frequentist inference)

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

33

P & Sample Size 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

34

The truth about P value Measures effectiveness (even by US FDA) < P means statistically significant, NOT clinical

significant But, be careful when interpreting P value P is affected BOTH by effectiveness AND sample size P can be < 0.05 even though the effectiveness is

marginal when sample size is huge Compare Ps between studies only appropriate if the

sampling & sample size is the same

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

35

P < 0.05Why 5%?Cut-off point proposed by Sir Ronald A. Fisher

1925 to reject or not to reject a hypothesis If P < 0.05 = Probability to make Type I error is

less than 5% If P > 0.05, > 5% of the difference occurred by

chance & not due to the TRUE difference

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

36

Hypothesis Testing using bivariable analysis Try to prove that Exposure causes the Disease

e.g. Smoking causing Lung CancerHo: No difference of risk to get Lung Cancer

between smoker and non-smoker

37

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

Lung Cancer No Lung Cancer

Smoking 20 (18.2%) 90 (81.8%)

Not Smoking 5 (4.5%) 105 (95.5%)

The occurrence of lung cancer is significantly higher (18.2%) among smokers compared to non-smokers (4.5%) (χ2 (df=1)= 10.15, P =0.001, OR = 4.7 (CI95% 1.7 – 13.0))

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

38

26 O

ctob

er 2

016

Confidence IntervalRange of plausible valuesNarrow interval high precision

Wide interval poor precisionHow narrow is narrow? And how wide is wide?

Base on your clinical judgment

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

40

26 O

ctob

er 2

016

Interpret single CI Compare with the null value

i.e. can be 0 for % or 1 for risk Compare with practical significance or the clinical

significance/indifference

41

Null

Null

A

B

Source: http://www.childrens-mercy.org/stats/journal/confidence.asp

Null

Null

C

D

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

Comparing multiple CIs 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

42

A B C

1

2

1 12

2

Effect size The measure of effect irrespective of sample sizeCohen (1988) classify ES into Low (<0.3)Medium (0.3-0.7)Large (> 0.7)

Manual calculation or web based calculation

43

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

44

Statistical TestBivariable (univariate) ~ One dependent & one

independentMultivariate ~ Multiple dependent & multiple

independent variable

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

45

What test to use?Variable 1 Variable 2 Test

Categorical Categorical Chi-square

Categorical (2 pop) Numerical (Normal) Independent sample t-test

Categorical (2 pop) Numerical (Not Normal) Mann-Whitney U test

Categorical (> 2 pop) Numerical (Normal) One-way ANOVA

Categorical (> 2 pop) Numerical (Not Normal) Kruskal-Wallis test

Numerical (Normal) Numerical (Normal) Pearson Correlation Coefficient Test

Numerical (Normal/ Not Normal)

Numerical (Not Normal) Spearman Correlation Coefficient Test

Numerical (Normal) Numerical (Normal) –Paired

Paired t-test

Numerical (Not Normal) Numerical (Not Normal) –Paired

Wilcoxon Signed Rank Test

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

46

Bivariable Analyses Compare meansIndependent sample t-test (Unpaired t-test) ~ Two unrelated

meansPaired t-test ~ Two related meansOne-way ANOVA ~ More than 2 means

χ2 Test ~ Between categorical variables Non-parametric tests (Kruskall-Wallis, Man-Whitney U

tests) ~ If data is not normally distributed

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

47

Writing plan for statistical analysis #1

26 October 2016Basic Biostatistics (C) Jamalludin Ab Rahman 2015 48

Data were analyzed using the complex sample function of SPSS (version 13.0). Sampling errors were estimated using the primary sampling units and strata provided in the data set. Sampling weights were used to adjust for nonresponse bias and the oversampling of blacks, Mexican Americans, and the elderly in NHANES. The prevalence of hypertension, as well as the awareness, treatment, and control rates, were age adjusted by direct standardization to the US 2000 standard population.10 To analyze differences over time, the 2003–2004 data were compared with the 1999–2000 data. Estimates with a coefficient of variation >0.3 were considered unreliable. A 2-tailed P value <0.05 was considered statistically significant. (Ong et al. 2009)

Writing plan for statistical analysis #2

26 October 2016Basic Biostatistics (C) Jamalludin Ab Rahman 2015 49

To assess the effect of the selection process on the characteristics of the cases, we compared cases included in the final analysis to the rest of the cases. Since controls included in the present analysis were different from the rest of the diabetes free participants by design, no similar comparisons were performed for that group. To compare baseline characteristics of cases and controls appropriate univariate statistics were used. Similar binary logistic and multiple linear regression models were built with incident diabetes or HbA1c as respective outcomes and additive block entry of adiponectin and potential confounders. For linear regression CRP and triglycerides were log transformed. Since HbA1c could be modified by drug treatment, we ran a sensitivity analysis excluding all participants on antidiabetic medication. A p-value of <0.05 was considered significant. Analyses were performed with SPSS 14.0 for Windows.

Reporting analysis (example) 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

50

51

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

26 O

ctob

er 2

016

Reporting analysis (example)

Reporting analysis (example)

26 October 2016Basic Biostatistics (C) Jamalludin Ab Rahman 2015 52

Summary1. Identify & define variables2. Type – independent vs. dependent3. Level of measurements – nominal, ordinal or

continuous4. Check distribution – Normal vs. Not Normal5. Decide what to do - descriptive vs. analytical

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

53

PracticalDownload data from

http://bit.ly/1RC6Zte

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

54

Chapter 2Introduction to SPSSIBM SPSS Statistics v21 for WindowsJamalludin Ab Rahman MD MPHDepartment of Community MedicineKulliyyahof Medicine

IBM SPSS Statistics

IBM Corporation Software Group Route 100 Somers, NY 10589Produced in the United States of AmericaMay 2012

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

56

SPSS Layout

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 57

Layout 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

58

Main menu

Toolbar

Variables

Data editor 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

59

Rows = variablesRows = each data

Define & describe your variables here

Enter your data here

Viewer 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

60

The output of analyses will be displayed here.

Output is separated from

data

Syntax 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

61

We can compile all the steps of the analyses here. Extend the programming

function in SPSS. Ability to perform complex steps e.g.

“looping”

Creating Dataset

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 62

Before even you start SPSS! 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

63

You must identify & define relevant variables Define means

1. Name – preferably short single name, begins with alphabet, no special character, no space

2. Type of data – e.g. Numeric, Date, String3. Width & Decimal Places (if numeric)4. Label – description for the Name (will be displayed in Viewer)

5. Values – labels for value e.g. 1=Male, 2=Female6. Missing – define missing value e.g. 999 for N/A

Define your variables 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

64

Variable Types 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

65

Variable Type 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

66

Decide the suitable

variable type

For numeric, determine Width

& Decimal. Decimal < Width

For String, no option for

Decimal Place

New option for Numerics with leading zeros

Value Labels 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

67

Missing Value 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

68

This question is Not Applicable

to male

e.g. Assign 999 to represent N/A value

& this won’t be included in any

analysis

Chapter 3Descriptive StatisticsIBM SPSS Statistics v21 for WindowsJamalludin Ab Rahman MD MPHDepartment of Community MedicineKulliyyahof Medicine

Exercise data 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

70

HypotheticalStudy to describe factors related to HbA1c &

Homocystein (HCY)N=301 13 variables

Retrieve file information 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

71

healthstatus001 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

72

Task 1 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

73

1. Describe socio-demographic characteristics of the respondent (age, gender & race)

2. Describe the explanatory variables1. Exercise2. smoking status3. BMI status &4. BP status

3. Describe HbA1c (taking cut-off for Poor HbA1c ≥ 6.5%) & HCY

Describe numerical data

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 74

Explore 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

75

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

76

Results 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

77

Check for Normality.Is Age data distributed Normally?

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

78

Is this Normaldistribution?

Describe age 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

79

Normal The subjects distributed between 23-67 years old

with the average of 34 (SD=8) years.If not Normal The subjects distributed between 23-67 years old

with the median of 33 (IQR=11) years

Describe categorical data

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 80

Frequency 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

81

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

82

Results 26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 83

TRANSFORM

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 84

Compute 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

85

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

86

weight / ((height / 100) ** 2)

Visual binning 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

87

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

88

Normal < 23Overweight 23 to < 27.5Obese >= 27.5

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

89

Chapter 4Bivariable analysesIBM SPSS Statistics v21 for WindowsJamalludin Ab Rahman MD MPHDepartment of Community MedicineKulliyyahof Medicine

To check association of two variables? 26

Oct

ober

201

6Ba

sic

Bios

tatis

tics

(C) J

amal

ludi

n Ab

Rah

man

201

5

91

HbA1cAge

The steps 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

92

1. Determine which is dependant & which is independent

2. Determine level of measurements3. Determine Normality of the numerical

measurement4. Determine the suitable statistical test

What are the tests? 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

93

Variable 1 Variable 2 Test

Categorical Categorical Chi-square

Categorical (2 pop) Numerical (Normal) Independent sample t-test

Categorical (2 pop) Numerical (Not Normal) Mann-Whitney U test

Categorical (> 2 pop) Numerical (Normal) One-way ANOVA

Categorical (> 2 pop) Numerical (Not Normal) Kruskal-Wallis test

Numerical (Normal) Numerical (Normal) Pearson Correlation Coefficient Test

Numerical (Normal/ Not Normal)

Numerical (Not Normal) Spearman Correlation Coefficient Test

Numerical (Normal) Numerical (Normal) – Paired Paired t-test

Numerical (Not Normal) Numerical (Not Normal) –Paired

Friedman test

Tasks 2 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

94

1. Determine association between socio-demographic characteristics & all the risk factors with HbA1c

2. Determine association between socio-demographic characteristics & all the risk factors with HCY

Note: It would be good if you could construct dummy table for the answers even before the analyses started

HCY normal range 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

95

Independent sample t-TestComparing Two Means

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 96

Age vs. BP 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

97

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

98

Results 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

99

Levene’s test check equality between variances. Ho: There is no difference of variances. So if P is significant, we reject Ho, and therefore equal variances assumed.

The original t-test (Student’s t-test) assumes equal variances for equal sample sizes. However if the variances are equal, it is robust for different sizes.

Welch's correction

Table – Distribution of age by blood pressure status 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

100

N Mean SD Statistics df P

Normal BP 156 33.9 7.9 t=0.431 299 0.667

High BP 145 34.4 8.9

Chi-squared TestDifference of Two Proportions

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 101

Gender vs. BP 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

102

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

103

Results 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

104

Describe this table first. What is your impression? 49% women vs. 47% men with high BP

Some books may suggest the use of Continuity Correction at ALL time, but recent simulations showed that CC (or Yate’s correction) is OVERCONSERVATIVE. Hence, use Pearson χ2 when < 20% of cells have expected count < 5

When ≥ 20% of cells have EC < 5, use Fisher’s Exact Test

This is given because we code the variables using numbers. Can be used to measure P-trend

One-way ANOVAComparing More than Two Means

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 105

Race vs. HbA1c 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

106

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

107

Results 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

108

Describe these results first. What is your impression? HbA1c between races? 6.4 (SD 2.1) vs. 6.7 (SD 2.2) vs. 6.5 (SD 2.2)

The F test shows that there is no single significant difference between any two groups

Results – BMI status vs. HbA1c 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

109

The F test shows that at least there is ONE pair with significant different. Either N vs. OW, N vs. OB or OW vs. OB

We need to run Post-hoc test to determine which of the PAIR is significant.

To decide which Post-hoc test to choose, we have to test for equality of variances i.e. Homogeneity of variances (Levene’s test)

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

110

Results – Post hoc 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

111

The significant difference is only for Normal vs. Obese (P=0.002)

Report 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

112

There is a significant association between BMI Status and HbA1c (F(2,298)=13.129, P<0.001). Post-hoc test showed that Obese subjects have significantly higher HbA1c compared to Normal and Overweight subjects (P=0.001 and P < 0.001 respectively).

Mann-whitney UNon Parametric Tests

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 113

Gender vs. HCY 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

114

Results 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

115

This ranks table is not to be cited in the research paper. Instead, describe their MEDIAN

KRUSKALL WALLISNon-Parametric TESTS

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 116

Race vs. HCY 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

117

Results 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

118

Correlation TestRELATIONSHIP OF TWO NUMERICAL DATA

26 O

ctob

er 2

016

Basic Biostatistics (C) Jamalludin Ab Rahman 2015 119

Age vs. HbA1c 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

120

Results 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

121

Age vs. HCY 26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

122

26 O

ctob

er 2

016

Basi

c Bi

osta

tistic

s (C

) Jam

allu

din

Ab R

ahm

an 2

015

123