Upload
tomlawson88
View
262
Download
2
Embed Size (px)
Citation preview
Types of data
Nominal = data labelled according to category – with no order
Ordinal = data labelled according to category with an intrinsic order, but without equal differences between consecutive levels, e.g. ASA or pain (mild, mod, severe)
Parametric = data labelled according to category, with an intrinsic order with equal distance between consecutive intervals. A.k.a. Continuous
Can be either o Interval Data = equal differences between numbers without a
natural zero e.g. Celcius scale (zero does not mean zero energy)
o Ratio Data = equal differences between numbers with a natural order (i.e. complete absence of thing being measured e.g. Kelvin scale)
Estimation = made to try and determine parameters
Central tendency = single value representation of set of data
Mode = most commonly occurring value
Median = middle value in an ordered list of data (where the list contains an even number of observations, the median is the average of the two central observations)
Mean = average value
Parametric data can be represented by mean, median and mode
Non-parametric data can be represented by median and mode
Measure of Dispersion
Spread of data = distribution
Range = difference between largest and smallest values – limited use
Quartiles = expresses distribution in quarters
Inter-Quartile Range = difference between 1st and 3rd quartile (ignores 1st and last quarters)
Variance = measures spread using all data = calculate difference between each value, square them, then add them up.
Standard Deviation = the square root of variance – converts variance into appropriate units
Normal Distribution = bell-shaped curve, symmetrical about central axis (which corresponds to mean, mode and median). Standard normal curve has a mean of 0 and a SD of 1. Area under the curve = 1.
68% of values lie within +/- 1 SD of the mean95% of values lie within +/- 2 SD of the mean99% of values lie within +/- 3SD of the mean
Standard Error of Mean (SEm) = quantifies uncertainty in estimate of mean
SEm = SD / n
Standard error of mean = SD / square root of sample size
Skew = values are clustered on one side and sparse on the other.
Null hypothesis = no change is seen – i.e. observations are the same
Alternate hypothesis = change is seen
p-value = probability of a result occurring by chance if null hypothesis is true
lower the p-value, the lower the change the observation occurred by chance (i.e. the null hypothesis is unlikely)
p-value of 0.05 = 5% chance
p-value >0.05 = null hypothesis is not accepted as true, but merely not rejected
p-value <0.05 = significant = i.e. null hypothesis is rejected
p-value <0.01 = extremely significant
Error
Type 1 error = false positive = seeing a difference where there isn’t one
Type 2 error = false negative = not seeing a difference where there is one
Type 1 = Now you see it Type 2 = Now you don’t
Experimental design aims to minimise error
Error occurs due to
Random error – due to intrinsic variation in samples (reduced by increasing sample size)
Systematic error – a.k.a. bias (not reduced by increasing sample size)
Bias = systematic error resulting in incorrect estimation of statistical parameters
Selection bias = groups aren’t comparable (reduced by randomisation) Measurement bias = error occurring in measuring variables (e.g.
equipment error or observer bias – reduced by blinding and standardising equipment)
Confounding = association between study factors is distorted due to other variables
Reduce error by Randomization (reduces selection bias) = equal chance of being in
either group Blinding (reduces measurement bias)
o Single blinded = subject doesn’t know what group they are ino Double blinded – subject and observer don’t know what group
the subject is in Adequate sample size reduces error (ideal sample size can be
calculated by power analysis)
Power of a study = probability of appropriately rejecting the null hypothesis if it is false (i.e. ability to detect a significant difference if one exists). Sample size depends on: -
Effect size = difference in effect between treatment and control group (larger the effect size, the smaller the sample size needed)
Beta-value = probability of a type 2 error = 20% (i.e. power of 80% is needed)
Alpha value = p-value = 0.05 Distribution of value = parametric or non-parametric
Assessing distribution of data
= parametric tests – either Kolmogorov-Smirnov test or Q-Q plots (quantile-quantile)
Assessing significance or data
= calculating p-values
Requires an appropriate test for the type of data being examined
Parametric = applicable to data that is normally distributed Student’s t-test
o Assesses null hypothesis that mean obtained is same as known population mean
o T = (sample mean – known mean)/SE of sample mean
o When means are the same t=0o As sample mean deviates from population mean, t increases
and p-value decreases = i.e. probability data came from different population increases
Student’s paired t-testo Examines paired data (i.e. data from same subject, before and
after)o Interested in differences between individuals, NOT populationso T = (mean difference before and after) / SE of difference
ANOVA testso = analysis of variance
One tailed tests look to see if there is a difference above or below the null value
Two tailed tests look to see if there is a difference above and below the null value
Non-normal distribution
Nominal = Chi-squared compares observed values (seen in sample) and expected values
(calculated by extrapolating known data from a population to the population study)
Calculated by doing the followingo For each observed number subtract the corresponding expected
number (O - E)o Then Square that (O – E)2
o Then Divide that by the corresponding expected number [(O-E)2/E]
o Repeat this for every cello Add all the individual values for [(O-E)2/E] together = this is the
chi-square statistic for the table In order to analyse the result you will need
o A pre-determined level of significance – usually 0.05o The degrees of freedom (df) for the data (= number in the
sample minus the number of restrictions) E.g. if you have 4 numbers with the restriction they must
add up to 50. Then the first 3 numbers can be anything, e.g. 5, 10 and 15. Therefore the fourth number must be 20 (in order to make 50)
Therefore the degrees of freedom = (4-1) = 3 Having calculated these, the Chi-squared value is applied to a Chi-
squared distribution table. If your calculated Chi-squared corresponds to a p-value of 0.05 or less
then the null hypothesis can be rejected.
Ordinal = Wilcoxian signed-rank sum test, Mann-Whitney test
How to perform THE MANN-WHITNEY U TEST
1. Call one sample A and the other B.
Sample A = 7; 3; 6; 2; 4; 3; 5; 5Sample B = 3; 5; 6; 4; 6; 5; 7; 5
2. Combine the samples into one group, and rank in ascending order
A A A B A B A A B B B A B B A B2 3 3 3 4 4 5 5 5 5 5 6 6 6 7 7
3. Look at each B in turn, count the number of A’s preceding each one. Add up the total to get a U value
U= 3+4+6+6+6+7+7+8 = 47
4. Look at each A in turn, count the number of B’s preceding each one. Add up the total to get a U value
U= 0+0+0+1+2+2+5+7 = 17
5. Use the smaller of the two U values. Compare this to the probability table, against the total sample number. The table value gives the probability value – the percentage probability that the difference between the two sets of data could have occurred by chance
Type of Data and Which test to use
2 groups, different subjects
Same subjects, before and after intervention
> 2 groups, different subjects
Serial measurements
Continuous Unpaired t test
Paired t-test ANOVA Repeated measures ANOVA
Ordinal Mann-Whitney
Wilcoxon rank
Kruskal-Wallis
Friedman
Nominal Chi-squared McNemar test
Chi-squared Cochran’s
General Definitions
Sensitivity
Probability of diagnosing a true positive
Specificity Probability of diagnosing a true negative
Positive predictive value Probability a person has a disease when given a +ve test result
Negative predictive value Probability a person does not have a disease when given a –ve test
result
Risk Ratio of events occurring in a study group to the total number of events
across all groupsRelative Risk Ratio of risk in treatment group to risk in control group = risk in treatment / risk in control
Absolute risk reduction Difference in event rates between treatment and control groups = risk in control group – risk in treatment group
Relative risk reduction % reduction in events in treatment group compared with control group = 1 – relative risk
Odds ratio of probability of an event occurring to probability of it not occurring
Odds ratio ratio of the odds of an event occurring in one group to the odds of it
occurring in another group
Number needed to treat (NNT) number of patients needed to be treated to prevent one adverse
outcome ideally needs to be as low as possible = 1 / absolute risk reduction
Number needed to harm (NNH) number of patients needed to be treated to cause one adverse event Low NNH = low therapeutic index