Upload
rawalpindi-medical-college
View
426
Download
5
Embed Size (px)
DESCRIPTION
Citation preview
Inferential Statistics
Session-II2009
Dr. Arshad Sabir A.P
Issues in epidemiological Researcha. Studies undertaken to assess population characteristics like
age, vaccination status, prevalence of malnutrition, KAP of Contraception etc (sample based, variations occur normally)
Issues: To what extent study findings are a true estimate of reference population ?
b. Compare groups to study associations (Cases & Controls, Exposed & Unexposed, Efficacy of a drug etc)
Issues: Are the differences observed hold true for the differences in total population? ( differences observed may be due sampling, variations occur normally) .
VRIATIONS: Normal / Biological,………..Real………………Experimental
ISSUE: we want to be as much precise as possible.
Central limit theorem (CLT)• Suppose, we want to know weight of adult
population of Rawalpindi city. • Take multiple, Random, large ( >30) samples ( say
1000) are taken. Calculate mean wt. in each case.• We will have 1000 mean wt.(X1-n) • If all the sample means are presented by frequency
distribution curve. It will follow a normal distribution pattern. Known as “Sampling distribution of means”.
• 68% samples means will fall X ±1SD , 95% means will be X ±2SD. And 99.7% mean will be X ±3SD.
• Summary values of such a dist. i.e. mean, SD are very close to population values.
• Its mean is almost equal to Pop. mean ( X = µ )• Its SD is known as “Standard Error” (SE)
CLT • Formula for SE is ;
SE = SD / √ n• SE is a unit of measure of variability that can
happen due to sampling (sampling variation).• SE error is based upon “Normal distribution” so
follows rules of “normal distribution curve”.• In actuality we take only one sample and use its
SD as Standard Error. • So, we can be 95% confident that pop. mean will
be within range of dist. mean ±2SE and its chances of falling beyond this range are only ≤5%.
• SE is measure of “Chance variation” or normal variation from sample to population or b/w two samples or groups.
Confidence Limits and Confidence Interval• When assessing Pop. mean on the basis of one sample. It
has its mean X and SD (SE). Its mean is not equal to µ.• According to CLT, SE is a tool to measure variations that
can happen due to sampling. • Sample mean (X) is not exactly equal to pop. mean but with
help of sample SD .i.e. SE we can construct a range of values around sample mean within which pop. mean would fall with certain degree of confidence.
• These limits worked out on both sides of sample mean on the basis of CLT are called “confidence limits” {CLs}. And the range between these limits is known as “confidence interval” {CI}.
Formula for Pop. mean µci = X ± 2SE = X ± 2 ( SD)(95% CI) √ n
Estimation of population parameter from a sample statistic
• As per CLT we are sure that 95% of sample means will be within confidence limits of µ ± 2SE .
• 95% confidence interval means that there is 95% probability that Pop. mean (µ) lies 2SE below or above the sample mean and 5% probability that it lies outside this interval (P = 0.05). We can say that we are 95% confident in making this statement .
• CI is related to size of sample (n). Larger the sample, smaller the CI for a given level of significance.
Estimation of pop. Parameters (say mean) form sample statistics
Apparently if there is large SE will have wide range of estimate (CI ) or vice versa. We desire a precise estimate.
SE basically depends (depends upon 02 factors)
Variability: How is dispersion of attribute in the actual Pop. ( reflected by σ ). If SD is large , estimate will far away or wide ( it inherent property , can not be changed) and if it is small, estimate will be close to true value.
Sample size: A small sample (n) with no or small variability is good to estimate µ but larger samples are needed to accommodate higher variability in data.
This relationship of SD to Sample size (n) is expressed as
Standard Error ; SE = SD √ n
Exercise• 16Kg is mean Ht. of 3y old children obtained from
a sample of 11 from a village.( SD=2kg)• How is this estimate? (sampling variation)• To what extent this mean is representative of
actual pop. mean ? • SE = 2/ √ 11 = 0.6 Kg
• 95% CI = 16 ± 2 x 0.6 14.8kg---------17.2kg
Role of sample size: If n= 20, SE= 2/ √ 20 = 0.45kg95%CI = 16 ± 2 x 0.45 = 15.1-----------16.9kg
Standard error of proportion (SEP)• Similarly , Normal distribution of samples
proportions around the proportions of pop. may be expressed arithmetically in term of SE of proportion with confidence limits. [ Central limit theorem ]
• SEP is also measure of variation due to sampling• 95% of sample proportions will lie within limits of
population proportion as P ± 2 SEP {95% CLs}. • Samples with larger or smaller than this range will
be rare or only 5%. And such values will taken as statistically significant at 5% level of significance.
Formula: SEP = √ p x q / n04/09/23 Dr. Arshad Sabir 9
95% CI for a proportion (percentage)
( categorical variable) Exercise.2 • In sample of 120 T.B pts. drawn from country,
23.3%(28) had compliance with treatment.• Is this finding holds true for whole population ?Standard Error for Proportion/Percentage (SEP)if p = one of the percentage (23.3%)100-p = other percentage = 100-23.3 = 76.7% (q)
SEP = √ p x q / n = √ 23.3-76.7/ 120 =3.8 95% CI for SEP = p ± 2 x SEP = 23.3 ± (2 x 3.8)
95% CI for SEP = 15.5%----31.1%
Standard error of difference b/w two proportions [SE(p1 –p2)] (02 samples)
Essentials: 1. Samples are large2. Samples are selected at random
observed difference = p1- p2
Z =
Standard error of diff. SE (p1- p2) if observed difference is more than 2 SE, it is statistically
significant or real difference, at 5% level of significance other wise is “normal” difference
04/09/23 Dr. Arshad Sabir 11
Calculation of SE of difference b/w two proportions [SE(p1 –p2)]
SE(p1 –p2) = sum of the square root of the sum of the squares of SEs of the two proportions.
SE(p1 –p2) = ( p1 x q1) + ( p2 x q2)
n1 n2 Observed Difference (p1 –p2)
Z = ------------------------------------- = (LOS ≥ 2) SE of the difference (SE(p1 –p2))
04/09/23 Dr. Arshad Sabir 12
SE of difference b/w two proportions: Exercise
Morality in Pyomeningitis with B. Penicillin 30% and was 20% with Ceftrioxone in sample of 100 in both cases.
SE(p1 –p2) = (30 x 70) + (20 x 80)
100 + 100 SE(p1 –p2) = 37 = 6.08
Z = Obs. diff = 30 – 20 = 10/ 6.08 = 1.64 ( critical LOS is 2) SE of diff. 6.08
Z = less than 2 (95% confidence limits) Hence difference is insignificant at 95% confidence limits or at 5% level of significance.
04/09/23 Dr. Arshad Sabir 13
Uses of SEP1. To find confidence limits for population
proportions (P) when only sample proportion (p) is known.
2. To determine if a sample was drawn from a known population or not when the population proportion is known……… Z = p-P/SEP ( should by within 2SEP at 5%LOS).
3. To find out standard error of the difference b/w the two proportions ( significant or not sig.)
4. To find the size of the sample. n = 4pq / L2
(margin of error, say 5% of proportion p [0.05])
04/09/23 Dr. Arshad Sabir 14
Decision making in Health
1. Standard error for Mean2. Standard error of difference b/w two
means 3. Students t-test 4. Standard Error for Proportion. 5. Standard Error of the difference b/w
two proportions 6. Chi – square test
Testing a statistical Hypothesis.
“Hypothesis” is a statement which is to be tested under the assumption of to be true .
In statistical testing 02 Hypothesis are formulated:1. Null Hypothesis ( Ho )-there is no difference
between characteristics of a two samples or both are from same population. {No difference Hypothesis}
2. Alternate Hypothesis (HA). Sample value is “significantly” different from pop. OR from other sample value. {Hypo. of significant difference}
Hypothesis testing……
Ho is against the claim of the researcher.Researcher desires to reject Ho and in doing so he may commit error-
Type-I error or alpha -error……………… Rejecting Ho when it was actually true ( No significant differences exist ) OR
Type-II error or Beta-error .Accepting Ho when it was not true …… (Significant difference do exist)
Hypothesis testing Decision based of study results
True situation
Difference Difference not
Exist Exist
Difference exist:
H0 Rejected
Correct decision Type-I error
{α error}
Difference don't exist: H0 Accepted
Type –II error
{β-error}
Correct decision
Tests of significance • Whether a study result can be considered as result
which indeed exist in study population from where sample was drawn?
• Whether the differences observed are due to chance variation(normal) or are true due to play some external factor (significantly different).
• These tests are mathematical procedures by which likelihood (probability) of an observed study results (differences)occurring by chance is found.
• POWRE OF THE TEST: is its ability to detect differences between groups if such differences actually exist.
Tests of significance
When 02 or more groups are compared, possibility could be;
– There is no difference [reject null Hypothesis]– There is some difference:• Slight difference (normal or by chance difference)• Large (sig.)difference not explainable by chance or
that may be due to play of some external factor. Extent of an observed diff. of being “normal”
and not normal beyond that (significant) is decided on the basis of certain cut off values obtained by applying some statistical test or procedure.
Selection of tests depends upon type of data.
Level of significance Study results are sample basedWe can never 100% sure about study result ( many
sources of variation )By convention we accept results if we have 95%
confidence upon results (diff. exist) or if chances of having results by chance (actually no diff.) are less than 5%.
We allow 5% level of accepting results that might have occurred by chance . This is called level of significance (LOS) or level of alpha.
Level of significance (α) and P-value
Probability of committing α-Error or getting the results by chance or wrongly rejecting Ho is fixed before the start of the experiment. (LOS). A max. level is fixed. It is usually fixed at .01 (1%) or .05 (5%) LOSBut the p-value is obtained after completing the
experiment. It is derived (from a table)after applying some suitable statistical test to the study results. It is not fixed. It may assume any value more, or less or equal to the LOS (5%).
Obtained p-value is compared with LOS. If is (.03,.02 0r .01)equal or less than 0.05, we will reject Ho and accept HA and if comes more than fixed LOS like 0.06. 0.1, 0.5 etc, we will accept Ho.
Important tests of significance.
Data information n Tests
(Qualitative) categorical
-Nominal
Frequencies as percentages or
Proportions etc
Small ( less than 40)
Large (more than 40)
Fisher exact test
Chi-square test
(Quantitative)
Numeric
Interval,
ratio scale data
-Means,
If linear relationship is suspected
02 groups
Multiple gps
Students t-test
F-test
Person’s Correlation- Co- efficient.ANVO
Chi-square test (x2)ESSENTIALS: Used to find out whether the observed differences b/w
proportions of events in 2 or more groups may considered statistically sig.
• It was developed by Karl Pearson• Non-parametric test. Not based on any / normal
distribution of the variable under study. • Used qualitative, discrete data in frequencies or
proportions ( not in percentages)• Involves calculation of a quantity called Chi-square (x2)• This test is based on measuring diff b/w observed
frequencies and expected frequencies.
04/09/23 Dr. Arshad Sabir 25
Steps of applying Chi-square test (x2)An assumption of f no difference is made which is then
proved or disproved with the x2 test. (Null hypothesis) • Steps: – Fix a level of sig. (.05) for tab. P-value.– Enter study data in the table, observed Frequency (O)– Calculate expected frequency for each cell (E) – Formula for x 2 value of each cell = (O-E) 2/E
E f = (RT x CT / GT)– Add up results of all cells X 2
cal = ∑ (O-E) 2/E– Df = (C-1) x ( R-1) ( it is 1 in 2X2 table)– Compare X 2
cal value with value X 2tab as pre decided LOS in
the table for given DF , if it is equal or larger than it ,. that means p-value for this data is smaller than LOS p-value, we reject H0 and accept HA otherwise we accept H0
04/09/23 Dr. Arshad Sabir 26
Is the use of ANS is associated with shorter distance ?Distance from ANS Used ANS Not Used ANS Total Less than 10 Km (O) 51(E= 44.4) (O)29 (E= 35.6) 8010Km or more (O)35 (E = 41.6) (O)40 (E= 33.4) 75
86 69 155E or Expected values are calculated on the basis of supposition (H0
)of no difference in utilization of ANS in the two groups of women
X 2cal = ( 51-44.4)2 +(29- 53.6)2+(35-41.6)2+(40-33.4)2
= 4.55 44.4 35.6 41.6 33.4X 2
cal at 2DF = 4.55 while X 2tab at 0.05 LOS at 2DF is 3.84
was as the cal value is larger than tab value that means P-value in this case is less than 0.05 hence the diff observed is sig. and H0 is rejected.
Chi-square (x2) as a test of “Goodness of fit
• Ratio of male to female birth is universally expected 1:1 (50% to 50%).
• Observed ratio in a village was M=52 & F=48• Is the difference is normal or significant?• Male Female• Obs-freq. 52 48• Expect-freq. 50 50 ( 50% 50%)
(52-50)2 + ( 48-50)2
X2 ______________ ____________ = 8/50 = 0.16 50 50
04/09/23 Dr. Arshad Sabir 28
Chi-square (x2) as a test of “Goodness of fit”.
• Degree of freedom = (No. of classes--1) K—1 = 2-1 = 1 OR DF = (R-1) x (C-1)
• At 5% LOS expected value of X2 = 3.841 (table value) while calculated value of chi-square is ( X2
cal = 0.16) much lower than it.
Hence the observed difference in births is normal or by chance and not significant.
04/09/23 Dr. Arshad Sabir 29
Student’s t-test• Numerical data (mean values), • Normal Variable, Compare 02 groups• Random sampling Steps: 1. Calculate t-value (from data)2. Chose a level of Significance (LOS) usually .05 which actually
means probability of having difference by chance (P-value)3. Determine DF (= sum of two sample sizes minus 2)4. Locate t-value corresponding to LOS at the given degree or
freedom. If cal-t value is equal to larger than table value of t means P-value in this case is significant or less than chosen LOS (indicated at the top of column), Hence H0 is rejected
How to calculate t-value1. Calculate means of the two groups (x1
and x2 )
2. Calculate difference b/w means of the two groups. (x1 – x2 ).
3. Calculate standard Deviation of each study group (SD1 & SD2 )
4. Calculate the Standard Error for the both groups (SE1 & SE2)
SE = SD / √ n.5. Formula for t- value is
x1 – x2
t = ----------------- √ SD1
2 /n1 + SD22 /n2
Exercise Delivery outcome n Mean Ht. SD
Normal B wt 60 156cm 3.1 LBW 52 152cm 2.8
H0 : there is no difference in mean hts. of the two gp?
Diff. may be by chance but acceptable LOS is 0.05 (p < .05) x1 – x2 2 2
t = ----------------- = -------------------------- = ------ = 3.6 √ SD1
2 /n1 + SD22 /n2 √ 3.1 2 /60 + 2.8 2 /52 0.56
Calculated value of t = 3.6, Tab value of t at DF 110 at .05 LOS is 1.98 . Hence Cal- t value is larger than tab value of t hence the difference is
sig. and H is rejected.
OSPE Questions Example: 1, In a sample(n=1000) obesity in man was
found 20% and30% in women. Is the difference is has reflected actual diff in the total pop. or has occurred by chance.
Calculate SE of the diff. b/w two proportions at 5%LOS ?
Example.2, Average B.P of bank cashier (170) as compare to that of PRO staff (150). Is the difference is normal or is real due to play some external factor (stress).
Calculate SE of the difference b/w two means at 5%LOS