8
Types of data Nominal = data labelled according to category – with no order Ordinal = data labelled according to category with an intrinsic order, but without equal differences between consecutive levels, e.g. ASA or pain (mild, mod, severe) Parametric = data labelled according to category, with an intrinsic order with equal distance between consecutive intervals. A.k.a. Continuous Can be either o Interval Data = equal differences between numbers without a natural zero e.g. Celcius scale (zero does not mean zero energy) o Ratio Data = equal differences between numbers with a natural order (i.e. complete absence of thing being measured e.g. Kelvin scale) Estimation = made to try and determine parameters Central tendency = single value representation of set of data Mode = most commonly occurring value Median = middle value in an ordered list of data (where the list contains an even number of observations, the median is the average of the two central observations) Mean = average value Parametric data can be represented by mean, median and mode Non-parametric data can be represented by median and mode Measure of Dispersion Spread of data = distribution

Stats for Primary FRCA

Embed Size (px)

Citation preview

Page 1: Stats for Primary FRCA

Types of data

Nominal = data labelled according to category – with no order

Ordinal = data labelled according to category with an intrinsic order, but without equal differences between consecutive levels, e.g. ASA or pain (mild, mod, severe)

Parametric = data labelled according to category, with an intrinsic order with equal distance between consecutive intervals. A.k.a. Continuous

Can be either o Interval Data = equal differences between numbers without a

natural zero e.g. Celcius scale (zero does not mean zero energy)

o Ratio Data = equal differences between numbers with a natural order (i.e. complete absence of thing being measured e.g. Kelvin scale)

Estimation = made to try and determine parameters

Central tendency = single value representation of set of data

Mode = most commonly occurring value

Median = middle value in an ordered list of data (where the list contains an even number of observations, the median is the average of the two central observations)

Mean = average value

Parametric data can be represented by mean, median and mode

Non-parametric data can be represented by median and mode

Measure of Dispersion

Spread of data = distribution

Range = difference between largest and smallest values – limited use

Quartiles = expresses distribution in quarters

Inter-Quartile Range = difference between 1st and 3rd quartile (ignores 1st and last quarters)

Variance = measures spread using all data = calculate difference between each value, square them, then add them up.

Page 2: Stats for Primary FRCA

Standard Deviation = the square root of variance – converts variance into appropriate units

Normal Distribution = bell-shaped curve, symmetrical about central axis (which corresponds to mean, mode and median). Standard normal curve has a mean of 0 and a SD of 1. Area under the curve = 1.

68% of values lie within +/- 1 SD of the mean95% of values lie within +/- 2 SD of the mean99% of values lie within +/- 3SD of the mean

Standard Error of Mean (SEm) = quantifies uncertainty in estimate of mean

SEm = SD / n

Standard error of mean = SD / square root of sample size

Skew = values are clustered on one side and sparse on the other.

Null hypothesis = no change is seen – i.e. observations are the same

Alternate hypothesis = change is seen

p-value = probability of a result occurring by chance if null hypothesis is true

lower the p-value, the lower the change the observation occurred by chance (i.e. the null hypothesis is unlikely)

p-value of 0.05 = 5% chance

p-value >0.05 = null hypothesis is not accepted as true, but merely not rejected

p-value <0.05 = significant = i.e. null hypothesis is rejected

p-value <0.01 = extremely significant

Error

Type 1 error = false positive = seeing a difference where there isn’t one

Type 2 error = false negative = not seeing a difference where there is one

Type 1 = Now you see it Type 2 = Now you don’t

Experimental design aims to minimise error

Error occurs due to

Page 3: Stats for Primary FRCA

Random error – due to intrinsic variation in samples (reduced by increasing sample size)

Systematic error – a.k.a. bias (not reduced by increasing sample size)

Bias = systematic error resulting in incorrect estimation of statistical parameters

Selection bias = groups aren’t comparable (reduced by randomisation) Measurement bias = error occurring in measuring variables (e.g.

equipment error or observer bias – reduced by blinding and standardising equipment)

Confounding = association between study factors is distorted due to other variables

Reduce error by Randomization (reduces selection bias) = equal chance of being in

either group Blinding (reduces measurement bias)

o Single blinded = subject doesn’t know what group they are ino Double blinded – subject and observer don’t know what group

the subject is in Adequate sample size reduces error (ideal sample size can be

calculated by power analysis)

Power of a study = probability of appropriately rejecting the null hypothesis if it is false (i.e. ability to detect a significant difference if one exists). Sample size depends on: -

Effect size = difference in effect between treatment and control group (larger the effect size, the smaller the sample size needed)

Beta-value = probability of a type 2 error = 20% (i.e. power of 80% is needed)

Alpha value = p-value = 0.05 Distribution of value = parametric or non-parametric

Assessing distribution of data

= parametric tests – either Kolmogorov-Smirnov test or Q-Q plots (quantile-quantile)

Assessing significance or data

= calculating p-values

Requires an appropriate test for the type of data being examined

Parametric = applicable to data that is normally distributed Student’s t-test

o Assesses null hypothesis that mean obtained is same as known population mean

o T = (sample mean – known mean)/SE of sample mean

Page 4: Stats for Primary FRCA

o When means are the same t=0o As sample mean deviates from population mean, t increases

and p-value decreases = i.e. probability data came from different population increases

Student’s paired t-testo Examines paired data (i.e. data from same subject, before and

after)o Interested in differences between individuals, NOT populationso T = (mean difference before and after) / SE of difference

ANOVA testso = analysis of variance

One tailed tests look to see if there is a difference above or below the null value

Two tailed tests look to see if there is a difference above and below the null value

Non-normal distribution

Nominal = Chi-squared compares observed values (seen in sample) and expected values

(calculated by extrapolating known data from a population to the population study)

Calculated by doing the followingo For each observed number subtract the corresponding expected

number (O - E)o Then Square that (O – E)2

o Then Divide that by the corresponding expected number [(O-E)2/E]

o Repeat this for every cello Add all the individual values for [(O-E)2/E] together = this is the

chi-square statistic for the table In order to analyse the result you will need

o A pre-determined level of significance – usually 0.05o The degrees of freedom (df) for the data (= number in the

sample minus the number of restrictions) E.g. if you have 4 numbers with the restriction they must

add up to 50. Then the first 3 numbers can be anything, e.g. 5, 10 and 15. Therefore the fourth number must be 20 (in order to make 50)

Therefore the degrees of freedom = (4-1) = 3 Having calculated these, the Chi-squared value is applied to a Chi-

squared distribution table. If your calculated Chi-squared corresponds to a p-value of 0.05 or less

then the null hypothesis can be rejected.

Ordinal = Wilcoxian signed-rank sum test, Mann-Whitney test

Page 5: Stats for Primary FRCA

How to perform THE MANN-WHITNEY U TEST

1. Call one sample A and the other B.

Sample A = 7; 3; 6; 2; 4; 3; 5; 5Sample B = 3; 5; 6; 4; 6; 5; 7; 5

2. Combine the samples into one group, and rank in ascending order

A A A B A B A A B B B A B B A B2 3 3 3 4 4 5 5 5 5 5 6 6 6 7 7

3. Look at each B in turn, count the number of A’s preceding each one. Add up the total to get a U value

U= 3+4+6+6+6+7+7+8 = 47

4. Look at each A in turn, count the number of B’s preceding each one. Add up the total to get a U value

U= 0+0+0+1+2+2+5+7 = 17

5. Use the smaller of the two U values. Compare this to the probability table, against the total sample number. The table value gives the probability value – the percentage probability that the difference between the two sets of data could have occurred by chance

Type of Data and Which test to use

2 groups, different subjects

Same subjects, before and after intervention

> 2 groups, different subjects

Serial measurements

Continuous Unpaired t test

Paired t-test ANOVA Repeated measures ANOVA

Ordinal Mann-Whitney

Wilcoxon rank

Kruskal-Wallis

Friedman

Nominal Chi-squared McNemar test

Chi-squared Cochran’s

General Definitions

Sensitivity

Page 6: Stats for Primary FRCA

Probability of diagnosing a true positive

Specificity Probability of diagnosing a true negative

Positive predictive value Probability a person has a disease when given a +ve test result

Negative predictive value Probability a person does not have a disease when given a –ve test

result

Risk Ratio of events occurring in a study group to the total number of events

across all groupsRelative Risk Ratio of risk in treatment group to risk in control group = risk in treatment / risk in control

Absolute risk reduction Difference in event rates between treatment and control groups = risk in control group – risk in treatment group

Relative risk reduction % reduction in events in treatment group compared with control group = 1 – relative risk

Odds ratio of probability of an event occurring to probability of it not occurring

Odds ratio ratio of the odds of an event occurring in one group to the odds of it

occurring in another group

Number needed to treat (NNT) number of patients needed to be treated to prevent one adverse

outcome ideally needs to be as low as possible = 1 / absolute risk reduction

Number needed to harm (NNH) number of patients needed to be treated to cause one adverse event Low NNH = low therapeutic index