Statistics 1 (Final) / orthodontic courses by Indian dental academy

INTRODUCTION :

The final outcome of an experiment or research depends to a large

extent on the analysis of comparative values obtained. To the dictum of

Helmholtz that “All science is measurement”, one can also add Sir Henry

Dale’s clause that, “All true measurement is essentially comparative”.

Statistics plays an integral role in collection, presentation, analysis and

interpretation of comparative data. In a broad sense, the term “statistics” has

always been associated with studies related to facts and figures e.g. health

statistics, business statistics etc. in the book “Statistical methods in medical

research” statistics has been defined as a discipline concerned with the

treatment of numerical data derived from groups of individuals or materials.

DESCRIPTIVE STATISTICS :

Statistics :

Is the science of compiling, classifying and tabulating numerical data

and expressing the results in a mathematical or graphical form.

Or

Statistics is the study of methods and procedures for collecting,

classifying, summarizing and analyzing data and for making scientific

inferences from such data.

- Prof. P.F. Sukhatme

Biostatistics :

Is the branch of statistics applied to biological or medical sciences

(biometry).

Is that branch of statistics concerned with mathematical facts and data

relating to biological events.

Variable :

A general term for any feature of the unit, which is observed or

measured.

Frequency distribution :

Distribution showing the number of observations or frequencies at

different values or within certain range of values of the variable.

Mean :

Dividing the total of all observations by the number of observations

Eg. Calculate the mean of DMFT scores 2.3, 2.0, 2.7, 3.0, 2.0.

Geometric mean (GM) – nth root of the product

GM = (x1) (x2) (x3) …….(xn) =

When the variation between the lowest and the highest value is very

high, geometric mean is advised and preferred.

Harmonic mean (HM) – is the reciprocal of the arithmetic mean of the

reciprocal of the observations.

Median :

Is the middle value, which divides the observed values into two equal

parts, when the values are arranged in ascending or descending order.

Eg. Calculate the median of DMFT scores 2.3, 2.0, 2.7, 3.0, 2.0 arrange in

asc order,

2.0, 2.0, 2.3, 2.7, 3.0 = = 3rd value = i.e. 2.3

x =x1 + x2 + x3 + …………. + xn

n =xn

x =2.3 + 2.0 + 2.7 + 3.0 + 2.0

5 =125 = 2.4

log xn

n

HM = 11

n1x1

= 1 1

x1

n + 1

2

5 + 1 + 62

Mode :

Is the value of the variable which occurs most frequently

Eg. Calculate the mode of DMFT scores 2.3, 2.0, 2.7, 3.0, 2.0

Mode = 2.0

Mode = 3 x 2.3 – 2 x 2.4 = 2.1

Mode = (3 median – 2 mean)

Variance :

Is the appropriate measure of dispersion for interval or ratio level data

Computes how far each score is from the man

o This is done by

(x – x )

o Each score will have a deviation from the mean, so to find the

average deviation we have to add all the deviations and divide it

by number of scores (just like calculating mean)

i.e.

but ….. (x – x ) = 0

i.e. = S2

So to eliminate this zero, square the deviations which eliminates the (-) sign.

- In other words it is the average of the squared deviations.

Standard deviation (Root mean square deviation)

Is defined as the square root of the arithmetic mean of the squared

deviations of the individual values from their arithmetic mean

For small samples SD = () =

For large samples SD = (s) =

(x – x )N

(x – x )2

N

(x – x )2

N - 1

(x – x )2

N

When there is frequency distribution

For small samples SD = () =

For large samples SD = (s) =

Used of SD

Summarizes the deviations of a large distribution from mean in one

figure used as unit of freedom.

Indicates whether the variation from the mean is by chance or real.

Helps finding standard error – which determines whether the difference

b/n means of two samples is by chance or real.

Helps finding the suitable size of the sample for value conclusions.

Standard error :

Standard deviation of mean values

SE = =

Used to compare means with one another

Coefficient of variation is a measure used to measure relative variability

i.e.

Variation of same character in two or more different series. (eg – pulse

rate in young ad old person)

Variation of two different character in one and same series. (eg – height

and weight in same individual).

CV = x 100

Normal curve and distribution :

(x – x )2

N - 1

(x – x )2

N1

Standard deviation

Sample size

SD

n

Standard deviation Mean

the histogram of the same frequency distribution of heights, with large

number of observations and small class intervals – gives a frequency

curve which is symmetrical in nature Normal curve or Gaussian

curve.

Characteristics of normal curve :

Bell shaped

Symmetrical

Mean, Mode and Median – coincide

Has two inflections – the central part is convex, while at the point of

inflection the curve changes from convexity to concavity.

On preparing frequency distribution with small class intervals of the

data collected, we can observe

1) Some observations are above the mean and others are below the mean

2) If arranged in order, maximum number of frequencies is seen in the

middle around the mean and fewer at the extremes decreasing

smoothly

3) Normally half the observations lie above and half below the mean and

all are symmetrically distributed on each side of mean

An distribution of this nature or shape is called Normal or Gaussian

distribution

Arithmetically

Mean 1SD limits, include 68.27% observations

Mean 2SD limits, include 95.45% observations

Mean 1.96SD limits, include 95% observations

Mean 3SD limits, includes 99.73% observations

Mean 2.58SD limits, includes 99% observations

Normal curve and distribution :

Relative or standard normal deviate or variate – (Z)

Is the deviation from the mean in a normal distribution or curve

Z = =

- In standard normal curve the mean is taken as zero and SD as unity of one

Skewness – is the static to measure the asymmetry

Coefficient of skewness is 0

Positivity (right) skewed Negativity (left) skewed Bimodal

Kurtosis – is a measure of height of the distribution curve

Coefficient of kurtosis is 3

Leptokurtic (high) Platykurtic (flat) Mesokurtic (normal)

TESTS OF SIGNIFICANCE :

Population is any finite collection of elements

i.e. – individuals, items, observations etc.,

Sample – is a part or subset of the population

Parameter – is a constant describing a population

Statistic – is a quantity describing a sample, namely a function of

observations

Statistic (Greek) Parameter (Latin)

Mean x

Standard Deviation D

Variance s2 2

Correlation coefficient r

Number of subjects N

Observation – mean SD

x – x SD

HYPOTHESIS TESTING :

Hypothesis

Is an assumption about the status of a phenomenon or is a statement

about the parameters or form of population.

Null hypothesis or hypothesis of no difference – States no difference

between statistic of a sample and parameter of population or b/n statistics of

two samples.

This nullifies the claim that the experiment result is different from or

better than one observed already.

Denoted by H0

Alternate hypothesis – any hypothesis alternate to null hypothesis, which is

to be tested

Denoted by H1

Note : the alternate hypothesis is accepted when null hypothesis is rejected

Type I and type II errors

H0 accept H1 accept

H0 is true No error Type 1 error

H1 is true Type II error No error

Type I error =

Type II error =

When primary concern of the test is to see whether the null

hypothesis can be rejected such test is called Test of significance.

The probability of committing type I error is called “P” value.

The p-value is the chance that the presence of difference is concluded

when actually there is none.

Type I error is important and it is fixed in advance at a low level, such

upper limit of tolerance of the chance of type I error is called Level of

Significance ()

Thus is the maximum tolerable probability of type I error

DIFFERENCE B/N LEVEL OF SIGNIFICANCE AND P-VALUE

Level of significance

1) Maximum tolerable chance of type I error

2) is fixed in advance

P-value :

1) Actual probability of type I error

2) Calculated on basis of data following procedures

The P-value can be more than or less than depending on data,

When P-value is less than result is statistically significant.

The level of significance is usually fixed at 5% (0.05) or 1% (0.01) or

0.1% (0.001) or 0.5% (0.005)

Maximum desirable is 5% level

When P-value is b/n

0.05-0.01 = statistically significant

< than 0.01 = highly statistically significant

lower than 0.001 or 0.005 = very highly significant

TESTS OF SIGNIFICANCE :

Are mathematical methods by which the probability (P) or relative

frequency of an observed difference, occurring the chance is found

Steps & procedure of test of significance –

1) State null hypothesis

2) State alternate hypothesis

3) Selection of the appropriate test to be utilized and

calculation of test criterion based on type of test.

4) Fixation of level of significance

5) Select the table and compare the calculated value with

the critical value of the table

6) If calculated value is > table value, H0 is rejected

7) If calculated value is < table value, H0 is rejected

8) Draw conclusions

STEPS IN STATISTICAL STUDY :

The chronology of steps in involved in a statistical study are as

follows :

Selection of sample size :

Often, the primary problem encountered by a student of research is

the number of samples or sample size to be selected.

Criteria for selection of sample size are as below :

A sample size of 25-30 in each group is adequate if there is one variable

or one parameter in the study.

In invivo studies where there is less availability of samples, a slight

decrease in sample size may be acceptable.

Larger sample size will be needed if

o Larger variation is expected

o Rare characteristic is present

o More variable are present

TESTS IN TEST OF SIGNIFICANCE

Parametric(normal distribution & Normal culture)

Non-parametric(not follow normal distribution)

Quantitative data Student ‘t’ test

Paired Unpaired

Z test (for large samples) One way ANOVATwo way ANOVA

Qualitative data

Z – prop test 2 test

Qualitative (quantitative converted to qualitative)

Mann Whitney U test Wilcoxon rank test Kruskal wallis test

Friedmann test

o More precision required

o More reliability required

Selection of test :

The tests employed to complete a study can be classified as :

A. For comparison of mean (average of observations) of

different samples

B. For comparison of proportion (percentage) of different

samples.

C. Correlation tests

D. Regression tests

A. For comparison of mean (average of observations) of different

samples :

Two types of tests are available, namely parametric and non-parametric

Parametric tests Non parametric tests

Employed if the distribution of

the population from which the

samples are drawn is known. (i.e.

normally distributed with less

variation).

Employed if distribution is

unknown (large variation present)

In the computation of parametric

tests the arithmetic processes of

addition, division and

multiplication are used.

Data are changed from

measurements or scores to ranks

or even to signs.

Used if adequate sample size is

present.

Used if adequate sample size is

not present

i) Independent t-test

Employed to compare mean of

two groups using one variable.

Eg. Comparison of bond strength

of amalgam and composite.

i) Mann Whitney U test

It is equivalent to independent t-

test.

ii) Paired t-test

Used for within group comparison

at different time intervals. Eg.

Number of microbes in root canal

before and after antibiotic therapy.

ii) Wiloxan sign rank test

It is equivalent to paired t-test.

iii) ANOVA (Analysis of variance)

Mean of any number of groups using

one variable is determined by this

test. Eg. Sealing capacity of different

endodontic sealers.

iii) Kruskal Wallis H-test

It is equivalent to ANOVA.

B. For comparison of proportion (percentage) of different samples :

The test employed are as follows :

i) Chi-square test :

It checks the proportion between any number of groups using one

variable. It is used if adequate sample size is available.

Eg. The effect of ampicillin, sulphonamides and tetracycline in a

certain percentage of people.

ii) Fisher’s test :

This test is similar to Chi-square test if less sample size is available.

iii) Mc-Nemar test :

It compares proportion of one variable within a group at different

time intervals.

Eg. Proportion of people having sensitivity before and after using

desensitizing paste.

C. Correlation tests :

It is used to find if two variables co-vary with each other or are independent.

Eg. The susceptibility rate of organisms in root canal following

increase in antibiotic dosage.

D. Regression tests :

It describes the dependence of one variable on another independent variable.

Eg. Effect of bonding agent on strength of composite.

ANALYSIS OF RESULT :

Non-parametric tests like Mann Whitney U-test, Wiloxan sign rank test,

Kruskal Wallis H-test are less sensitive than parametric tests like

Independent t-test, ANOVA as they use random ranking instead of

original values.

Probability (p) value indicates level of significance (sensitivity) of a test.

p < 0.001 Highly significant

The probability that the difference between two groups

occurring by chance is less than 1 in 1000.

p < 0.01 Moderately significant

The probability of the difference occurring by chance is

less tan 1 in 100.

p < 0.05 Less significant

The probability of the difference occurring by chance is

less than 5 in 100.

p < 0.05 Not significant

The probability of difference occurring by chance is very

high.

CONCLUSION :

This seminar attempts at explaining importance of statistics as an

essential protocol for any research program. Statistics is the greatest leveler.

It covers up for all the variations that can creep into the results thereby

providing a foolproof system for proper interpretation of data. Hence it

would be appropriate to term it as “The Vital Statistics”.

VITAL STATISTICS

Outline

INTRODUCTION

DESCRIPTIVE STATISTICS

TESTS OF SIGNIFICANCE

STEPS IN STATISTICAL STUDY

o SELECTION OF SAMPLE SIZE

o SELECTION OF TEST

o ANALYSIS OF RESULT

CONCLUSION

COLLEGE OF DENTAL SCIENCESDEPARTMENT OF CONSERVATIVE DENTISTRY AND

ENDODONTICS

SEMINAR

ON

VITAL STATISTICS

H0 accept H1 accept

H0 is true No error Type 1 error

H1 is true Type II error No error

PRESENTED BY : Dr. Siddheswaran V.

Documents

Statistics 1 (Final) / orthodontic courses by Indian dental academy