Biostatistics ii4june

Inferential Statistics

Session-II2009

Dr. Arshad Sabir A.P

Issues in epidemiological Researcha. Studies undertaken to assess population characteristics like

age, vaccination status, prevalence of malnutrition, KAP of Contraception etc (sample based, variations occur normally)

Issues: To what extent study findings are a true estimate of reference population ?

b. Compare groups to study associations (Cases & Controls, Exposed & Unexposed, Efficacy of a drug etc)

Issues: Are the differences observed hold true for the differences in total population? ( differences observed may be due sampling, variations occur normally) .

VRIATIONS: Normal / Biological,………..Real………………Experimental

ISSUE: we want to be as much precise as possible.

Central limit theorem (CLT)• Suppose, we want to know weight of adult

population of Rawalpindi city. • Take multiple, Random, large ( >30) samples ( say

1000) are taken. Calculate mean wt. in each case.• We will have 1000 mean wt.(X1-n) • If all the sample means are presented by frequency

distribution curve. It will follow a normal distribution pattern. Known as “Sampling distribution of means”.

• 68% samples means will fall X ±1SD , 95% means will be X ±2SD. And 99.7% mean will be X ±3SD.

• Summary values of such a dist. i.e. mean, SD are very close to population values.

• Its mean is almost equal to Pop. mean ( X = µ )• Its SD is known as “Standard Error” (SE)

CLT • Formula for SE is ;

SE = SD / √ n• SE is a unit of measure of variability that can

happen due to sampling (sampling variation).• SE error is based upon “Normal distribution” so

follows rules of “normal distribution curve”.• In actuality we take only one sample and use its

SD as Standard Error. • So, we can be 95% confident that pop. mean will

be within range of dist. mean ±2SE and its chances of falling beyond this range are only ≤5%.

• SE is measure of “Chance variation” or normal variation from sample to population or b/w two samples or groups.

Confidence Limits and Confidence Interval• When assessing Pop. mean on the basis of one sample. It

has its mean X and SD (SE). Its mean is not equal to µ.• According to CLT, SE is a tool to measure variations that

can happen due to sampling. • Sample mean (X) is not exactly equal to pop. mean but with

help of sample SD .i.e. SE we can construct a range of values around sample mean within which pop. mean would fall with certain degree of confidence.

• These limits worked out on both sides of sample mean on the basis of CLT are called “confidence limits” {CLs}. And the range between these limits is known as “confidence interval” {CI}.

Formula for Pop. mean µci = X ± 2SE = X ± 2 ( SD)(95% CI) √ n

Estimation of population parameter from a sample statistic

• As per CLT we are sure that 95% of sample means will be within confidence limits of µ ± 2SE .

• 95% confidence interval means that there is 95% probability that Pop. mean (µ) lies 2SE below or above the sample mean and 5% probability that it lies outside this interval (P = 0.05). We can say that we are 95% confident in making this statement .

• CI is related to size of sample (n). Larger the sample, smaller the CI for a given level of significance.

Estimation of pop. Parameters (say mean) form sample statistics

Apparently if there is large SE will have wide range of estimate (CI ) or vice versa. We desire a precise estimate.

SE basically depends (depends upon 02 factors)

Variability: How is dispersion of attribute in the actual Pop. ( reflected by σ ). If SD is large , estimate will far away or wide ( it inherent property , can not be changed) and if it is small, estimate will be close to true value.

Sample size: A small sample (n) with no or small variability is good to estimate µ but larger samples are needed to accommodate higher variability in data.

This relationship of SD to Sample size (n) is expressed as

Standard Error ; SE = SD √ n

Exercise• 16Kg is mean Ht. of 3y old children obtained from

a sample of 11 from a village.( SD=2kg)• How is this estimate? (sampling variation)• To what extent this mean is representative of

actual pop. mean ? • SE = 2/ √ 11 = 0.6 Kg

• 95% CI = 16 ± 2 x 0.6 14.8kg---------17.2kg

Role of sample size: If n= 20, SE= 2/ √ 20 = 0.45kg95%CI = 16 ± 2 x 0.45 = 15.1-----------16.9kg

Standard error of proportion (SEP)• Similarly , Normal distribution of samples

proportions around the proportions of pop. may be expressed arithmetically in term of SE of proportion with confidence limits. [ Central limit theorem ]

• SEP is also measure of variation due to sampling• 95% of sample proportions will lie within limits of

population proportion as P ± 2 SEP {95% CLs}. • Samples with larger or smaller than this range will

be rare or only 5%. And such values will taken as statistically significant at 5% level of significance.

Formula: SEP = √ p x q / n04/09/23 Dr. Arshad Sabir 9

95% CI for a proportion (percentage)

( categorical variable) Exercise.2 • In sample of 120 T.B pts. drawn from country,

23.3%(28) had compliance with treatment.• Is this finding holds true for whole population ?Standard Error for Proportion/Percentage (SEP)if p = one of the percentage (23.3%)100-p = other percentage = 100-23.3 = 76.7% (q)

SEP = √ p x q / n = √ 23.3-76.7/ 120 =3.8 95% CI for SEP = p ± 2 x SEP = 23.3 ± (2 x 3.8)

95% CI for SEP = 15.5%----31.1%

Standard error of difference b/w two proportions [SE(p1 –p2)] (02 samples)

Essentials: 1. Samples are large2. Samples are selected at random

observed difference = p1- p2

Z =

Standard error of diff. SE (p1- p2) if observed difference is more than 2 SE, it is statistically

significant or real difference, at 5% level of significance other wise is “normal” difference

04/09/23 Dr. Arshad Sabir 11

Calculation of SE of difference b/w two proportions [SE(p1 –p2)]

SE(p1 –p2) = sum of the square root of the sum of the squares of SEs of the two proportions.

SE(p1 –p2) = ( p1 x q1) + ( p2 x q2)

n1 n2 Observed Difference (p1 –p2)

Z = ------------------------------------- = (LOS ≥ 2) SE of the difference (SE(p1 –p2))


SE of difference b/w two proportions: Exercise

Morality in Pyomeningitis with B. Penicillin 30% and was 20% with Ceftrioxone in sample of 100 in both cases.

SE(p1 –p2) = (30 x 70) + (20 x 80)

100 + 100 SE(p1 –p2) = 37 = 6.08

Z = Obs. diff = 30 – 20 = 10/ 6.08 = 1.64 ( critical LOS is 2) SE of diff. 6.08

Z = less than 2 (95% confidence limits) Hence difference is insignificant at 95% confidence limits or at 5% level of significance.


Uses of SEP1. To find confidence limits for population

proportions (P) when only sample proportion (p) is known.

2. To determine if a sample was drawn from a known population or not when the population proportion is known……… Z = p-P/SEP ( should by within 2SEP at 5%LOS).

3. To find out standard error of the difference b/w the two proportions ( significant or not sig.)

4. To find the size of the sample. n = 4pq / L2

(margin of error, say 5% of proportion p [0.05])


Decision making in Health

1. Standard error for Mean2. Standard error of difference b/w two

means 3. Students t-test 4. Standard Error for Proportion. 5. Standard Error of the difference b/w

two proportions 6. Chi – square test

Testing a statistical Hypothesis.

“Hypothesis” is a statement which is to be tested under the assumption of to be true .

In statistical testing 02 Hypothesis are formulated:1. Null Hypothesis ( Ho )-there is no difference

between characteristics of a two samples or both are from same population. {No difference Hypothesis}

2. Alternate Hypothesis (HA). Sample value is “significantly” different from pop. OR from other sample value. {Hypo. of significant difference}

Hypothesis testing……

Ho is against the claim of the researcher.Researcher desires to reject Ho and in doing so he may commit error-

Type-I error or alpha -error……………… Rejecting Ho when it was actually true ( No significant differences exist ) OR

Type-II error or Beta-error .Accepting Ho when it was not true …… (Significant difference do exist)

Hypothesis testing Decision based of study results

True situation

Difference Difference not

Exist Exist

Difference exist:

H0 Rejected

Correct decision Type-I error

{α error}

Difference don't exist: H0 Accepted

Type –II error

{β-error}

Correct decision

Tests of significance • Whether a study result can be considered as result

which indeed exist in study population from where sample was drawn?

• Whether the differences observed are due to chance variation(normal) or are true due to play some external factor (significantly different).

• These tests are mathematical procedures by which likelihood (probability) of an observed study results (differences)occurring by chance is found.

• POWRE OF THE TEST: is its ability to detect differences between groups if such differences actually exist.

Tests of significance

When 02 or more groups are compared, possibility could be;

– There is no difference [reject null Hypothesis]– There is some difference:• Slight difference (normal or by chance difference)• Large (sig.)difference not explainable by chance or

that may be due to play of some external factor. Extent of an observed diff. of being “normal”

and not normal beyond that (significant) is decided on the basis of certain cut off values obtained by applying some statistical test or procedure.

Selection of tests depends upon type of data.

Level of significance Study results are sample basedWe can never 100% sure about study result ( many

sources of variation )By convention we accept results if we have 95%

confidence upon results (diff. exist) or if chances of having results by chance (actually no diff.) are less than 5%.

We allow 5% level of accepting results that might have occurred by chance . This is called level of significance (LOS) or level of alpha.

Level of significance (α) and P-value

Probability of committing α-Error or getting the results by chance or wrongly rejecting Ho is fixed before the start of the experiment. (LOS). A max. level is fixed. It is usually fixed at .01 (1%) or .05 (5%) LOSBut the p-value is obtained after completing the

experiment. It is derived (from a table)after applying some suitable statistical test to the study results. It is not fixed. It may assume any value more, or less or equal to the LOS (5%).

Obtained p-value is compared with LOS. If is (.03,.02 0r .01)equal or less than 0.05, we will reject Ho and accept HA and if comes more than fixed LOS like 0.06. 0.1, 0.5 etc, we will accept Ho.

Important tests of significance.

Data information n Tests

(Qualitative) categorical

-Nominal

Frequencies as percentages or

Proportions etc

Small ( less than 40)

Large (more than 40)

Fisher exact test

Chi-square test

(Quantitative)

Numeric

Interval,

ratio scale data

-Means,

If linear relationship is suspected

02 groups

Multiple gps

Students t-test

F-test

Person’s Correlation- Co- efficient.ANVO

Chi-square test (x2)ESSENTIALS: Used to find out whether the observed differences b/w

proportions of events in 2 or more groups may considered statistically sig.

• It was developed by Karl Pearson• Non-parametric test. Not based on any / normal

distribution of the variable under study. • Used qualitative, discrete data in frequencies or

proportions ( not in percentages)• Involves calculation of a quantity called Chi-square (x2)• This test is based on measuring diff b/w observed

frequencies and expected frequencies.


Steps of applying Chi-square test (x2)An assumption of f no difference is made which is then

proved or disproved with the x2 test. (Null hypothesis) • Steps: – Fix a level of sig. (.05) for tab. P-value.– Enter study data in the table, observed Frequency (O)– Calculate expected frequency for each cell (E) – Formula for x 2 value of each cell = (O-E) 2/E

E f = (RT x CT / GT)– Add up results of all cells X 2

cal = ∑ (O-E) 2/E– Df = (C-1) x ( R-1) ( it is 1 in 2X2 table)– Compare X 2

cal value with value X 2tab as pre decided LOS in

the table for given DF , if it is equal or larger than it ,. that means p-value for this data is smaller than LOS p-value, we reject H0 and accept HA otherwise we accept H0


Is the use of ANS is associated with shorter distance ?Distance from ANS Used ANS Not Used ANS Total Less than 10 Km (O) 51(E= 44.4) (O)29 (E= 35.6) 8010Km or more (O)35 (E = 41.6) (O)40 (E= 33.4) 75

86 69 155E or Expected values are calculated on the basis of supposition (H0

)of no difference in utilization of ANS in the two groups of women

X 2cal = ( 51-44.4)2 +(29- 53.6)2+(35-41.6)2+(40-33.4)2

= 4.55 44.4 35.6 41.6 33.4X 2

cal at 2DF = 4.55 while X 2tab at 0.05 LOS at 2DF is 3.84

was as the cal value is larger than tab value that means P-value in this case is less than 0.05 hence the diff observed is sig. and H0 is rejected.

Chi-square (x2) as a test of “Goodness of fit

• Ratio of male to female birth is universally expected 1:1 (50% to 50%).

• Observed ratio in a village was M=52 & F=48• Is the difference is normal or significant?• Male Female• Obs-freq. 52 48• Expect-freq. 50 50 ( 50% 50%)

(52-50)2 + ( 48-50)2

X2 ______________ ____________ = 8/50 = 0.16 50 50


Chi-square (x2) as a test of “Goodness of fit”.

• Degree of freedom = (No. of classes--1) K—1 = 2-1 = 1 OR DF = (R-1) x (C-1)

• At 5% LOS expected value of X2 = 3.841 (table value) while calculated value of chi-square is ( X2

cal = 0.16) much lower than it.

Hence the observed difference in births is normal or by chance and not significant.


Student’s t-test• Numerical data (mean values), • Normal Variable, Compare 02 groups• Random sampling Steps: 1. Calculate t-value (from data)2. Chose a level of Significance (LOS) usually .05 which actually

means probability of having difference by chance (P-value)3. Determine DF (= sum of two sample sizes minus 2)4. Locate t-value corresponding to LOS at the given degree or

freedom. If cal-t value is equal to larger than table value of t means P-value in this case is significant or less than chosen LOS (indicated at the top of column), Hence H0 is rejected

How to calculate t-value1. Calculate means of the two groups (x1

and x2 )

2. Calculate difference b/w means of the two groups. (x1 – x2 ).

3. Calculate standard Deviation of each study group (SD1 & SD2 )

4. Calculate the Standard Error for the both groups (SE1 & SE2)

SE = SD / √ n.5. Formula for t- value is

x1 – x2

t = ----------------- √ SD1

2 /n1 + SD22 /n2

Exercise Delivery outcome n Mean Ht. SD

Normal B wt 60 156cm 3.1 LBW 52 152cm 2.8

H0 : there is no difference in mean hts. of the two gp?

Diff. may be by chance but acceptable LOS is 0.05 (p < .05) x1 – x2 2 2

t = ----------------- = -------------------------- = ------ = 3.6 √ SD1

2 /n1 + SD22 /n2 √ 3.1 2 /60 + 2.8 2 /52 0.56

Calculated value of t = 3.6, Tab value of t at DF 110 at .05 LOS is 1.98 . Hence Cal- t value is larger than tab value of t hence the difference is

sig. and H is rejected.

OSPE Questions Example: 1, In a sample(n=1000) obesity in man was

found 20% and30% in women. Is the difference is has reflected actual diff in the total pop. or has occurred by chance.

Calculate SE of the diff. b/w two proportions at 5%LOS ?

Example.2, Average B.P of bank cashier (170) as compare to that of PRO staff (150). Is the difference is normal or is real due to play some external factor (stress).

Calculate SE of the difference b/w two means at 5%LOS

Documents

Biostatistics ii4june