28
Biostatistics Phd Kamil Barański Department of epidemiology Medical University of Silesia in Katowice

Biostatisticsepidemiologia.sum.edu.pl/wp-content/uploads/2020/05/...Biostatystyka Author epidemiologia Created Date 5/10/2020 7:10:57 PM

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Biostatistics

    Phd Kamil Barański

    Department of epidemiology

    Medical University of Silesia in Katowice

  • Random variable - a variable whose value changes in an unpredictable, random manner(in the mathematical sense). It can take any value from a certain set of values, and eachof these values can be assigned a certain probability.

    Randomness – no visible scheme among e.g. measured variable value.

    Population - a set of objects that have at least one thing in common, e.g. a set of "things"that we would like to measure; set of units (objects / measurements) about which we want todraw some conclusions.

    Parameter - characteristics of the studied population. Describes what we would liketo estimate. We use Greek letters (µ, σ).

    Definitions

  • Sample - a subset of the population. A set of measurements or observations we have madethat should be representative of our general population.

    Random sample - a sample selected e.g. in such a way thateach sample of the same size has the same chance of beingselected.

    Statistics - numerical characteristics of the sample we selected.

    Definitions

    Two meanins of term statistics

    ✓ section of mathematics

    ✓ numerical characteristics of the sample

  • Type of variables

    ✓ nominal → Is „A” different from „B”?

    ✓ ordinal → Is „A” greather than „B”?

    categorial / qualitative

    ✓ interval → How many units of „A” are different from „B”?

    ✓ continuous → How many times „A” is longer than „B”?

    Numerical / quantitative

  • Numeric variables can always be transformed into categorical variables, but not vice versa,

    e.g. height [cm] → low, medium, high.

    Variables can also be divided by function:

    ✓ dependent variables

    ✓ independent variables

  • Type of variables and their importance due to the possibility of performing mathematical

    operations

    Different statistical methods depend on the type of variables.

    counting ordering + / – * / :

    nominal +

    categorical + +

    interval + + +

    zmienne iloczynowe + + + +

  • Source: lecture prof. J. E. Zejda

    Słowniczek, np. LEK2 – antibiotic in the first three days:

    0 = no, 1 = vancomycin, 2 = ampicillin, ‘ ‘ = lack of data

    Dataset

  • ✓ giving each subject his own unique identification number - no identifying information, e.g. surname,

    is entered

    ✓ simple variable names, e.g. systolic pressure = RR systole

    ✓ short names, e.g. up to 8 characters in the variable name

    ✓ avoid unique letters. Exemple: polish letters ć,ą,ż

    ✓ avoiding units of measurement, e.g. 15%,

  • Statistica – importing database

  • Statistica – browsing the database contents

    Variable’s name→ „click” mouse button (right) → specification of variable

    or

    → double click on variable’s name

  • Statistica – specification of variable

    TEXT LABEL EDITOR → you can enter labels → names for the value of a categorical

    variable → attention to the automatic assignment of numerical values.

    VALUES, STATISTICS ... → you can quickly check the variable

    values before starting the analysis.

  • ALL SPECIFICATIONS → variable specification editor. You can modify the NAME,

    VARIABLE TYPE, NO DATA CODE if you need it.

  • Statistica – correction of incorrect values

    ✓ after finding invalid values, delete them

    ✓ we select the column / variable that we want to correct

    ✓ card: DATA → RECODE

    ✓ we replace evidently incorrect values with the missing data

    code

    ✓ if we know the correct value, enter it

  • Statistica – creating new variable

    Variable’s name→ „right click”

    Here we can enter the formula we want to calculate.

  • Statistica – quantitative variable categorization

    ✓ we create a new variable (new empty column)

    ✓ select and "click" CODE

    ✓ we condition new codes, e.g. height categories, with values in another column, e.g.

    height values in cm

  • Measures of central tendency:

    ✓ average (parametric)

    ✓ median (non-parametric)

    ✓ fashion (non-parametric)

    ✓ quartiles and any centiles (non-parametric)

    Measures of variation / dispersion / dispersion:

    ✓ variance / standard deviation / coefficient of variation (parametric)

    ✓ range (non-parametric)

    ✓ range / interquartile range (non-parametric)

    Descriptive statistics

  • Specification:

    ✓ easy to calculate

    ✓ uses all information contained in the data

    ✓ characterizes important distributions → especially Normal distribution for continuous variables N

    (µ, σ2)

    Limitations:

    ✓ sensitive to extreme / extreme / outliers

    ✓ inappropriate as a measure of central tendencies for clearly asymmetrical distributions

    ✓ limited significance for variables expressed on a categorical scale

    Mean value in sample and population

  • Median

    After ordering data from smallest to largest:

    ✓ for the odd number of observations, the median is the median observation

    ✓ for an even number of observations, the median is the arithmetic mean of the 2 middle

    observations

    Specification:

    ✓ not sensitive to outliers

    ✓ it is not affected by the shape of the distribution

    ✓ suitable (also) for the ordinal scale

    ✓ most frequent value

    ✓multimodality indicates the heterogeneity of the population

    Mode

  • Range R = Xmax – XminInterquartile range IQR = Q3 – Q1

    Q1 → lower quartile

    Q2 → middle quartile= median

    Q3 → upper quartile

    Variance in sample Variance in population

    Standard deviation in sample Standard deviation in population

    Dispersion

    ( )22

    1

    1 N

    i

    i

    xN

    =

    = −( )22

    1

    1

    1

    n

    i

    i

    s x xn =

    = −−

  • Statistica – description of data

    STATISTICS → BASIC STATISTICS → DESCRIPTIVE STATISTICS → OK

  • Variable Selection:

    we highlight or enter variable numbers in the edit field

    Selecting a set of variables:

    continuous list - highlight with Shift pressed or select with the mouse

    discontinuous list - we highlight with Ctrl pressed

    all variables - the ALL button

    EXPAND / COLLECT → long variable names

    CLOSE UP → VALUES, STATISTICS window ...

    Statistica – Variables

  • Statistica – karta WIĘCEJ

    STATYSTYKA → STATYSTYKI PODSTAWOWE → STATYSTYKI OPISOWE → OK →

    karta WIĘCEJ

    W tym miejscu możemy obliczyć interesujące nas statystyki opisowe dla zmiennych ilościowych.

  • Statistica – normal distribution

    STATISTICS → BASIC STATISTICS → DESCRIPTIVE STATISTICS → OK →

    NORMALITY tab → HISTOGRAMS

    Here, we can check the normality of the distribution for the quantitative variable of interest.

    p>0,05 = normal distribution

    p

  • ✓ for example, when we want to return from a workbook to a window with a data table

    ✓ when we have more windows, data tables, and workbooks open

    Statistica – switching between windows

  • Statistica – contingency tables

  • Statistica – estimation of prevelance

  • References

    ✓ Zejda J.E., Kowalska M., Brożek G .: "BIOSTATISTICS. Practical methods of data analysis

    in observational epidemiological studies". CATFISH

    ✓ Presentation of prof. dr hab. n. med. Jan E. Zejda. Chair and Department of Epidemiology

    WLK ŚUM

    ✓ Physics presentation Beaty Malec. Chair and Department of Epidemiology WLK ŚUM