Upload
indian-dental-academy
View
217
Download
0
Embed Size (px)
Citation preview
INTRODUCTION :
The final outcome of an experiment or research depends to a large
extent on the analysis of comparative values obtained. To the dictum of
Helmholtz that “All science is measurement”, one can also add Sir Henry
Dale’s clause that, “All true measurement is essentially comparative”.
Statistics plays an integral role in collection, presentation, analysis and
interpretation of comparative data. In a broad sense, the term “statistics” has
always been associated with studies related to facts and figures e.g. health
statistics, business statistics etc. in the book “Statistical methods in medical
research” statistics has been defined as a discipline concerned with the
treatment of numerical data derived from groups of individuals or materials.
DESCRIPTIVE STATISTICS :
Statistics :
Is the science of compiling, classifying and tabulating numerical data
and expressing the results in a mathematical or graphical form.
Or
Statistics is the study of methods and procedures for collecting,
classifying, summarizing and analyzing data and for making scientific
inferences from such data.
- Prof. P.F. Sukhatme
Biostatistics :
Is the branch of statistics applied to biological or medical sciences
(biometry).
Is that branch of statistics concerned with mathematical facts and data
relating to biological events.
Variable :
A general term for any feature of the unit, which is observed or
measured.
Frequency distribution :
Distribution showing the number of observations or frequencies at
different values or within certain range of values of the variable.
Mean :
Dividing the total of all observations by the number of observations
Eg. Calculate the mean of DMFT scores 2.3, 2.0, 2.7, 3.0, 2.0.
Geometric mean (GM) – nth root of the product
GM = (x1) (x2) (x3) …….(xn) =
When the variation between the lowest and the highest value is very
high, geometric mean is advised and preferred.
Harmonic mean (HM) – is the reciprocal of the arithmetic mean of the
reciprocal of the observations.
Median :
Is the middle value, which divides the observed values into two equal
parts, when the values are arranged in ascending or descending order.
Eg. Calculate the median of DMFT scores 2.3, 2.0, 2.7, 3.0, 2.0 arrange in
asc order,
2.0, 2.0, 2.3, 2.7, 3.0 = = 3rd value = i.e. 2.3
x =x1 + x2 + x3 + …………. + xn
n =xn
x =2.3 + 2.0 + 2.7 + 3.0 + 2.0
5 =125 = 2.4
log xn
n
HM = 11
n1x1
= 1 1
x1
n + 1
2
5 + 1 + 62
Mode :
Is the value of the variable which occurs most frequently
Eg. Calculate the mode of DMFT scores 2.3, 2.0, 2.7, 3.0, 2.0
Mode = 2.0
Mode = 3 x 2.3 – 2 x 2.4 = 2.1
Mode = (3 median – 2 mean)
Variance :
Is the appropriate measure of dispersion for interval or ratio level data
Computes how far each score is from the man
o This is done by
(x – x )
o Each score will have a deviation from the mean, so to find the
average deviation we have to add all the deviations and divide it
by number of scores (just like calculating mean)
i.e.
but ….. (x – x ) = 0
i.e. = S2
So to eliminate this zero, square the deviations which eliminates the (-) sign.
- In other words it is the average of the squared deviations.
Standard deviation (Root mean square deviation)
Is defined as the square root of the arithmetic mean of the squared
deviations of the individual values from their arithmetic mean
For small samples SD = () =
For large samples SD = (s) =
(x – x )N
(x – x )2
N
(x – x )2
N - 1
(x – x )2
N
When there is frequency distribution
For small samples SD = () =
For large samples SD = (s) =
Used of SD
Summarizes the deviations of a large distribution from mean in one
figure used as unit of freedom.
Indicates whether the variation from the mean is by chance or real.
Helps finding standard error – which determines whether the difference
b/n means of two samples is by chance or real.
Helps finding the suitable size of the sample for value conclusions.
Standard error :
Standard deviation of mean values
SE = =
Used to compare means with one another
Coefficient of variation is a measure used to measure relative variability
i.e.
Variation of same character in two or more different series. (eg – pulse
rate in young ad old person)
Variation of two different character in one and same series. (eg – height
and weight in same individual).
CV = x 100
Normal curve and distribution :
(x – x )2
N - 1
(x – x )2
N1
Standard deviation
Sample size
SD
n
Standard deviation Mean
the histogram of the same frequency distribution of heights, with large
number of observations and small class intervals – gives a frequency
curve which is symmetrical in nature Normal curve or Gaussian
curve.
Characteristics of normal curve :
Bell shaped
Symmetrical
Mean, Mode and Median – coincide
Has two inflections – the central part is convex, while at the point of
inflection the curve changes from convexity to concavity.
On preparing frequency distribution with small class intervals of the
data collected, we can observe
1) Some observations are above the mean and others are below the mean
2) If arranged in order, maximum number of frequencies is seen in the
middle around the mean and fewer at the extremes decreasing
smoothly
3) Normally half the observations lie above and half below the mean and
all are symmetrically distributed on each side of mean
An distribution of this nature or shape is called Normal or Gaussian
distribution
Arithmetically
Mean 1SD limits, include 68.27% observations
Mean 2SD limits, include 95.45% observations
Mean 1.96SD limits, include 95% observations
Mean 3SD limits, includes 99.73% observations
Mean 2.58SD limits, includes 99% observations
Normal curve and distribution :
Relative or standard normal deviate or variate – (Z)
Is the deviation from the mean in a normal distribution or curve
Z = =
- In standard normal curve the mean is taken as zero and SD as unity of one
Skewness – is the static to measure the asymmetry
Coefficient of skewness is 0
Positivity (right) skewed Negativity (left) skewed Bimodal
Kurtosis – is a measure of height of the distribution curve
Coefficient of kurtosis is 3
Leptokurtic (high) Platykurtic (flat) Mesokurtic (normal)
TESTS OF SIGNIFICANCE :
Population is any finite collection of elements
i.e. – individuals, items, observations etc.,
Sample – is a part or subset of the population
Parameter – is a constant describing a population
Statistic – is a quantity describing a sample, namely a function of
observations
Statistic (Greek) Parameter (Latin)
Mean x
Standard Deviation D
Variance s2 2
Correlation coefficient r
Number of subjects N
Observation – mean SD
x – x SD
HYPOTHESIS TESTING :
Hypothesis
Is an assumption about the status of a phenomenon or is a statement
about the parameters or form of population.
Null hypothesis or hypothesis of no difference – States no difference
between statistic of a sample and parameter of population or b/n statistics of
two samples.
This nullifies the claim that the experiment result is different from or
better than one observed already.
Denoted by H0
Alternate hypothesis – any hypothesis alternate to null hypothesis, which is
to be tested
Denoted by H1
Note : the alternate hypothesis is accepted when null hypothesis is rejected
Type I and type II errors
H0 accept H1 accept
H0 is true No error Type 1 error
H1 is true Type II error No error
Type I error =
Type II error =
When primary concern of the test is to see whether the null
hypothesis can be rejected such test is called Test of significance.
The probability of committing type I error is called “P” value.
The p-value is the chance that the presence of difference is concluded
when actually there is none.
Type I error is important and it is fixed in advance at a low level, such
upper limit of tolerance of the chance of type I error is called Level of
Significance ()
Thus is the maximum tolerable probability of type I error
DIFFERENCE B/N LEVEL OF SIGNIFICANCE AND P-VALUE
Level of significance
1) Maximum tolerable chance of type I error
2) is fixed in advance
P-value :
1) Actual probability of type I error
2) Calculated on basis of data following procedures
The P-value can be more than or less than depending on data,
When P-value is less than result is statistically significant.
The level of significance is usually fixed at 5% (0.05) or 1% (0.01) or
0.1% (0.001) or 0.5% (0.005)
Maximum desirable is 5% level
When P-value is b/n
0.05-0.01 = statistically significant
< than 0.01 = highly statistically significant
lower than 0.001 or 0.005 = very highly significant
TESTS OF SIGNIFICANCE :
Are mathematical methods by which the probability (P) or relative
frequency of an observed difference, occurring the chance is found
Steps & procedure of test of significance –
1) State null hypothesis
2) State alternate hypothesis
3) Selection of the appropriate test to be utilized and
calculation of test criterion based on type of test.
4) Fixation of level of significance
5) Select the table and compare the calculated value with
the critical value of the table
6) If calculated value is > table value, H0 is rejected
7) If calculated value is < table value, H0 is rejected
8) Draw conclusions
STEPS IN STATISTICAL STUDY :
The chronology of steps in involved in a statistical study are as
follows :
Selection of sample size :
Often, the primary problem encountered by a student of research is
the number of samples or sample size to be selected.
Criteria for selection of sample size are as below :
A sample size of 25-30 in each group is adequate if there is one variable
or one parameter in the study.
In invivo studies where there is less availability of samples, a slight
decrease in sample size may be acceptable.
Larger sample size will be needed if
o Larger variation is expected
o Rare characteristic is present
o More variable are present
TESTS IN TEST OF SIGNIFICANCE
Parametric(normal distribution & Normal culture)
Non-parametric(not follow normal distribution)
Quantitative data Student ‘t’ test
Paired Unpaired
Z test (for large samples) One way ANOVATwo way ANOVA
Qualitative data
Z – prop test 2 test
Qualitative (quantitative converted to qualitative)
Mann Whitney U test Wilcoxon rank test Kruskal wallis test
Friedmann test
o More precision required
o More reliability required
Selection of test :
The tests employed to complete a study can be classified as :
A. For comparison of mean (average of observations) of
different samples
B. For comparison of proportion (percentage) of different
samples.
C. Correlation tests
D. Regression tests
A. For comparison of mean (average of observations) of different
samples :
Two types of tests are available, namely parametric and non-parametric
Parametric tests Non parametric tests
Employed if the distribution of
the population from which the
samples are drawn is known. (i.e.
normally distributed with less
variation).
Employed if distribution is
unknown (large variation present)
In the computation of parametric
tests the arithmetic processes of
addition, division and
multiplication are used.
Data are changed from
measurements or scores to ranks
or even to signs.
Used if adequate sample size is
present.
Used if adequate sample size is
not present
i) Independent t-test
Employed to compare mean of
two groups using one variable.
Eg. Comparison of bond strength
of amalgam and composite.
i) Mann Whitney U test
It is equivalent to independent t-
test.
ii) Paired t-test
Used for within group comparison
at different time intervals. Eg.
Number of microbes in root canal
before and after antibiotic therapy.
ii) Wiloxan sign rank test
It is equivalent to paired t-test.
iii) ANOVA (Analysis of variance)
Mean of any number of groups using
one variable is determined by this
test. Eg. Sealing capacity of different
endodontic sealers.
iii) Kruskal Wallis H-test
It is equivalent to ANOVA.
B. For comparison of proportion (percentage) of different samples :
The test employed are as follows :
i) Chi-square test :
It checks the proportion between any number of groups using one
variable. It is used if adequate sample size is available.
Eg. The effect of ampicillin, sulphonamides and tetracycline in a
certain percentage of people.
ii) Fisher’s test :
This test is similar to Chi-square test if less sample size is available.
iii) Mc-Nemar test :
It compares proportion of one variable within a group at different
time intervals.
Eg. Proportion of people having sensitivity before and after using
desensitizing paste.
C. Correlation tests :
It is used to find if two variables co-vary with each other or are independent.
Eg. The susceptibility rate of organisms in root canal following
increase in antibiotic dosage.
D. Regression tests :
It describes the dependence of one variable on another independent variable.
Eg. Effect of bonding agent on strength of composite.
ANALYSIS OF RESULT :
Non-parametric tests like Mann Whitney U-test, Wiloxan sign rank test,
Kruskal Wallis H-test are less sensitive than parametric tests like
Independent t-test, ANOVA as they use random ranking instead of
original values.
Probability (p) value indicates level of significance (sensitivity) of a test.
p < 0.001 Highly significant
The probability that the difference between two groups
occurring by chance is less than 1 in 1000.
p < 0.01 Moderately significant
The probability of the difference occurring by chance is
less tan 1 in 100.
p < 0.05 Less significant
The probability of the difference occurring by chance is
less than 5 in 100.
p < 0.05 Not significant
The probability of difference occurring by chance is very
high.
CONCLUSION :
This seminar attempts at explaining importance of statistics as an
essential protocol for any research program. Statistics is the greatest leveler.
It covers up for all the variations that can creep into the results thereby
providing a foolproof system for proper interpretation of data. Hence it
would be appropriate to term it as “The Vital Statistics”.
VITAL STATISTICS
Outline
INTRODUCTION
DESCRIPTIVE STATISTICS
TESTS OF SIGNIFICANCE
STEPS IN STATISTICAL STUDY
o SELECTION OF SAMPLE SIZE
o SELECTION OF TEST
o ANALYSIS OF RESULT
CONCLUSION
COLLEGE OF DENTAL SCIENCESDEPARTMENT OF CONSERVATIVE DENTISTRY AND
ENDODONTICS
SEMINAR
ON
VITAL STATISTICS
H0 accept H1 accept
H0 is true No error Type 1 error
H1 is true Type II error No error
PRESENTED BY : Dr. Siddheswaran V.