Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Biostatistics
Phd Kamil Barański
Department of epidemiology
Medical University of Silesia in Katowice
Random variable - a variable whose value changes in an unpredictable, random manner(in the mathematical sense). It can take any value from a certain set of values, and eachof these values can be assigned a certain probability.
Randomness – no visible scheme among e.g. measured variable value.
Population - a set of objects that have at least one thing in common, e.g. a set of "things"that we would like to measure; set of units (objects / measurements) about which we want todraw some conclusions.
Parameter - characteristics of the studied population. Describes what we would liketo estimate. We use Greek letters (µ, σ).
Definitions
Sample - a subset of the population. A set of measurements or observations we have madethat should be representative of our general population.
Random sample - a sample selected e.g. in such a way thateach sample of the same size has the same chance of beingselected.
Statistics - numerical characteristics of the sample we selected.
Definitions
Two meanins of term statistics
✓ section of mathematics
✓ numerical characteristics of the sample
Type of variables
✓ nominal → Is „A” different from „B”?
✓ ordinal → Is „A” greather than „B”?
categorial / qualitative
✓ interval → How many units of „A” are different from „B”?
✓ continuous → How many times „A” is longer than „B”?
Numerical / quantitative
Numeric variables can always be transformed into categorical variables, but not vice versa,
e.g. height [cm] → low, medium, high.
Variables can also be divided by function:
✓ dependent variables
✓ independent variables
Type of variables and their importance due to the possibility of performing mathematical
operations
Different statistical methods depend on the type of variables.
counting ordering + / – * / :
nominal +
categorical + +
interval + + +
zmienne iloczynowe + + + +
Source: lecture prof. J. E. Zejda
Słowniczek, np. LEK2 – antibiotic in the first three days:
0 = no, 1 = vancomycin, 2 = ampicillin, ‘ ‘ = lack of data
Dataset
✓ giving each subject his own unique identification number - no identifying information, e.g. surname,
is entered
✓ simple variable names, e.g. systolic pressure = RR systole
✓ short names, e.g. up to 8 characters in the variable name
✓ avoid unique letters. Exemple: polish letters ć,ą,ż
✓ avoiding units of measurement, e.g. 15%,
Statistica – importing database
Statistica – browsing the database contents
Variable’s name→ „click” mouse button (right) → specification of variable
or
→ double click on variable’s name
Statistica – specification of variable
TEXT LABEL EDITOR → you can enter labels → names for the value of a categorical
variable → attention to the automatic assignment of numerical values.
VALUES, STATISTICS ... → you can quickly check the variable
values before starting the analysis.
ALL SPECIFICATIONS → variable specification editor. You can modify the NAME,
VARIABLE TYPE, NO DATA CODE if you need it.
Statistica – correction of incorrect values
✓ after finding invalid values, delete them
✓ we select the column / variable that we want to correct
✓ card: DATA → RECODE
✓ we replace evidently incorrect values with the missing data
code
✓ if we know the correct value, enter it
Statistica – creating new variable
Variable’s name→ „right click”
Here we can enter the formula we want to calculate.
Statistica – quantitative variable categorization
✓ we create a new variable (new empty column)
✓ select and "click" CODE
✓ we condition new codes, e.g. height categories, with values in another column, e.g.
height values in cm
Measures of central tendency:
✓ average (parametric)
✓ median (non-parametric)
✓ fashion (non-parametric)
✓ quartiles and any centiles (non-parametric)
Measures of variation / dispersion / dispersion:
✓ variance / standard deviation / coefficient of variation (parametric)
✓ range (non-parametric)
✓ range / interquartile range (non-parametric)
Descriptive statistics
Specification:
✓ easy to calculate
✓ uses all information contained in the data
✓ characterizes important distributions → especially Normal distribution for continuous variables N
(µ, σ2)
Limitations:
✓ sensitive to extreme / extreme / outliers
✓ inappropriate as a measure of central tendencies for clearly asymmetrical distributions
✓ limited significance for variables expressed on a categorical scale
Mean value in sample and population
Median
After ordering data from smallest to largest:
✓ for the odd number of observations, the median is the median observation
✓ for an even number of observations, the median is the arithmetic mean of the 2 middle
observations
Specification:
✓ not sensitive to outliers
✓ it is not affected by the shape of the distribution
✓ suitable (also) for the ordinal scale
✓ most frequent value
✓multimodality indicates the heterogeneity of the population
Mode
Range R = Xmax – XminInterquartile range IQR = Q3 – Q1
Q1 → lower quartile
Q2 → middle quartile= median
Q3 → upper quartile
Variance in sample Variance in population
Standard deviation in sample Standard deviation in population
Dispersion
( )22
1
1 N
i
i
xN
=
= −( )22
1
1
1
n
i
i
s x xn =
= −−
Statistica – description of data
STATISTICS → BASIC STATISTICS → DESCRIPTIVE STATISTICS → OK
Variable Selection:
we highlight or enter variable numbers in the edit field
Selecting a set of variables:
continuous list - highlight with Shift pressed or select with the mouse
discontinuous list - we highlight with Ctrl pressed
all variables - the ALL button
EXPAND / COLLECT → long variable names
CLOSE UP → VALUES, STATISTICS window ...
Statistica – Variables
Statistica – karta WIĘCEJ
STATYSTYKA → STATYSTYKI PODSTAWOWE → STATYSTYKI OPISOWE → OK →
karta WIĘCEJ
W tym miejscu możemy obliczyć interesujące nas statystyki opisowe dla zmiennych ilościowych.
Statistica – normal distribution
STATISTICS → BASIC STATISTICS → DESCRIPTIVE STATISTICS → OK →
NORMALITY tab → HISTOGRAMS
Here, we can check the normality of the distribution for the quantitative variable of interest.
p>0,05 = normal distribution
p
✓ for example, when we want to return from a workbook to a window with a data table
✓ when we have more windows, data tables, and workbooks open
Statistica – switching between windows
Statistica – contingency tables
Statistica – estimation of prevelance
References
✓ Zejda J.E., Kowalska M., Brożek G .: "BIOSTATISTICS. Practical methods of data analysis
in observational epidemiological studies". CATFISH
✓ Presentation of prof. dr hab. n. med. Jan E. Zejda. Chair and Department of Epidemiology
WLK ŚUM
✓ Physics presentation Beaty Malec. Chair and Department of Epidemiology WLK ŚUM