Biostat lec01 basicconcepts

Embed Size (px)

Citation preview

  • 8/13/2019 Biostat lec01 basicconcepts

    1/15

    111

    Biostatistics School of Biotechnology International University

    Slide 1

    Basic Concepts ofStatistics

    Dang Quoc Tuan, Ph.D

    School of Biotechnology

    International University

    Lecture 1

    Biostatistics School of Biotechnology International University

    Slide 2

    Statistics

    What is Statistics?

    Why Statistics?

    - Statistic vs. Statistics?

    How to learn Statistics?

    - Common sense vs. mathematical expertise

    - Almost all fields of study benefit from theapplication of statistical methods

    Biostatistics School of Biotechnology International University

    Slide 3Goals of the lecture:To introduce fundamental concepts and definitions in Statistics:

    - Statistic vs. Statistics

    - Descriptive and inferential statistics

    - Population vs. sample

    - Parameter vs. statistic

    - Variable, random variable, random number

    - Data, types of data

    - Observation, event, measurement

    - Experiment, treatment, replication

    - Sampling, types of sampling, sampling errorBiostatistics School of Biotechnology International University

    Slide 4

    Biostatistics

    Statistics is the collection, processing, interpretation

    and presentation of numerical information.

    Biostatistics is the application of statistics to questions

    about living systems.

    Biostatistics is an umbrella term that encompasses statistical

    research in several subject matter areas. These areas include

    pharmacology, medicine, biology, genetics, biotechnology, food

    technology and public health.

  • 8/13/2019 Biostat lec01 basicconcepts

    2/15

    222

    Biostatistics School of Biotechnology International University

    Slide 5Biostatistics

    Statistics is critical in analyzing patterns of genomic variationwithin populations, and in relating this variation to diseasestates or other phenotypes

    - Genomes differ from the reference copy(single nucleotide polymorphisms, structural variants)

    - Gene mapping by linkage and association methods

    Statistics supports analyses to determine the functionof genes/transcripts/proteins

    Biostatistics School of Biotechnology International University

    Slide 6

    Introduction to Statistics

    1. Population and Data

    2. Types of Data

    3. Critical Thinking

    4. Design of Experiments

    Biostatistics School of Biotechnology International University

    Slide 7Overview

    A common goal of surveys and other data

    collecting tools is to collect data from a smaller

    part of a larger group so we can learn

    something about the larger group.

    In this section we will look at some of ways to

    describe data.

    Biostatistics School of Biotechnology International University

    Slide 8Overview

    Statistics

    Two Meanings Specific numbers

    Method of analysis

    Field of study

    Method of analysis(a way of thinking)

  • 8/13/2019 Biostat lec01 basicconcepts

    3/15

    333

    Biostatistics School of Biotechnology International University

    Slide 9Overview

    Specific numbernumerical measurement determined by aset of data

    Example:

    - 23% of people polled believed that there are too manypolls.

    - Average age of Vietnamese men in 2000 is 68.

    - Price index in December increased by 1% compared tothat in November

    Biostatistics School of Biotechnology International University

    Slide 10

    Statistics=method of analysis

    a collection of methods for planning experiments,

    obtaining data, and then organizing, summarizing,

    presenting, analyzing, interpreting, and drawing

    conclusions based on the data.

    = drawing of inferences (generalization) about the

    large groups (population) on the basis of

    observations made on smaller ones (sample)

    Definitions

    Biostatistics School of Biotechnology International University

    Slide 11

    Definitions

    Populationthe complete collection of all elements(scores, people, measurements, and so on)to be studied.

    The collection is complete in the sense thatit includes all individual items or unitswhich are the subject of investigation.

    Unit = an individual of the population

    Biostatistics School of Biotechnology International University

    Slide 12

    Census

    the collection of data from every member of thepopulation

    Samplea sub-collection of elements drawn from a

    population

    Sample sizenumber of units in the sample (or % of units fromthe population)

    Definitions

  • 8/13/2019 Biostat lec01 basicconcepts

    4/15

    444

    Biostatistics School of Biotechnology International University

    Slide 13

    Variable

    Characteristics of a population which differ from unit to

    unit

    Data

    Observations on the variable (such as measurements,

    degrees, orders, properties, outcome, results) that havebeen measured and collected

    Definitions

    Biostatistics School of Biotechnology International University

    Slide 14

    Population: All 1st year students in the School

    of Biotechnology

    Unit: a student

    Variable: score in a math final exam

    Observation (data): points of the score (60, 71,

    95 , etc..)

    Sample: a group of 30 student

    Sample size: 30

    Example:

    Biostatistics School of Biotechnology International University

    Slide 15Random sampling and randomnumbers

    Sample data must be collected in anappropriate way, such as through aprocess of random selection

    (each unit in a population must have anequal chance of being drawn)

    If sample data are not collected in anappropriate way, the data may beso completely useless, information

    may not be properly extrapolated to thepopulation

    Biostatistics School of Biotechnology International University

    Slide 16Random sampling and randomnumbers

    Random number

    Select units to be measured by referenceto random number

    The way to avoid bias

    Random number table (in any statisticalbook)

    Computer: MINITAB, Excel, SAS,StatGraphic, SPSS, etc.

    Calculator (some versions)

  • 8/13/2019 Biostat lec01 basicconcepts

    5/15

    555

    Biostatistics School of Biotechnology International University

    Slide 17Random Sampling

    selection so that each has anequal chance of being selected

    In Excel: RANDBETWEEN (a, b)

    Biostatistics School of Biotechnology International University

    Slide 18

    Table of random numbers

    Biostatistics School of Biotechnology International University

    Slide 19Descriptive and inferentialstatistics

    Descriptive Statistics

    summarize or describe the important

    characteristics of a known set of

    population data

    Inferential Statistics

    use sample data to make inferences (orgeneralizations) about a population

    Biostatistics School of Biotechnology International University

    Slide 20

    Types of Data

    Processing Data

  • 8/13/2019 Biostat lec01 basicconcepts

    6/15

    666

    Biostatistics School of Biotechnology International University

    Slide 21

    Parametera numerical measurement describingsome characteristic of a population

    population

    parameter

    Definitions

    Biostatistics School of Biotechnology International University

    Slide 22Definitions

    Statistica numerical measurement describingsome characteristic of a sample

    sample

    statistic

    XS

    Biostatistics School of Biotechnology International University

    Slide 23Definitions

    Quantitative data

    Numbers representing counts or measurements

    Example: weights, lengths, ages, pressure,temperature

    Biostatistics School of Biotechnology International University

    Slide 24Definitions

    Qualitative (or categorical orattribute) data

    can be separated into different categories

    that are distinguished by some non-numericcharacteristics.

    Example: genders (male/female),

    colors (blue, red, )

    marital status

    levels of satisfaction

  • 8/13/2019 Biostat lec01 basicconcepts

    7/15

    777

    Biostatistics School of Biotechnology International University

    Slide 25Working with

    Quantitative DataQuantitative data:- Measure of quantity- Can compare one to others (more orless)- Can calculate an average

    Quantitative data can furtherbe distinguished betweendiscrete and continuous types

    Biostatistics School of Biotechnology International University

    Slide 26

    Discrete

    data result when the number of possiblevalues is either a finite number or acountable number of possible values

    0, 1, 2, 3, . . .

    Example: The number of eggs that hens lay

    Definitions

    Biostatistics School of Biotechnology International University

    Slide 27

    Continuous(numerical) data result from infinitely many possiblevalues that correspond to some continuous scalethat covers a range of values without gaps,interruptions, or jumps

    Definitions

    2 3

    Example: The amount of milk that a cow produces;e.g. 2.343115 gallons per day

    Biostatistics School of Biotechnology International University

    Slide 28

    Levels of Measurement

    Another way to classify data is touse levels of measurement. Fourof these levels are discussed inthe following slides

  • 8/13/2019 Biostat lec01 basicconcepts

    8/15

    888

    Biostatistics School of Biotechnology International University

    Slide 29

    Example: - Survey responses: yes, no, undecided

    - Marital status: single, married, divorced,

    widows

    Definitions nominal level of measurement

    characterized by data that consist of names, labels, or

    categories only. The data cannot be arranged in an

    ordering scheme (such as low to high)

    Biostatistics School of Biotechnology International University

    Slide 30

    ordinal level of measurement

    involves data that may be arranged in some order, but differences

    between data values either cannot be determined or are meaningless. It

    is used to indicate rank order, but nothing more

    Definitions

    Examples: Course grades A, B, C, D, or F

    Score given to answer such as how often you use a bus service?:

    - very often: 5

    - Often: 4- Occasionally: 3

    - Rarely: 2

    - Never: 1

    It gives a bit more information than nominal, but still cant calculate the average

    Biostatistics School of Biotechnology International University

    Slide 31

    interval level of measurement

    like the ordinal level, with the additional property that the

    difference between any two data values is meaningful.

    However, there is no natural zero starting point (where

    none of the quantity is present).

    - The interval can be added or subtracted but not divided (the

    ratio makes no sense)

    Date is a very widely used interval scale.

    Example: - Years 1000, 2000, 1776, and 1492

    - Temperature: 5oC, 10oC, 20oC

    - 1st , 5th, 10th day in a month

    Definitions

    Biostatistics School of Biotechnology International University

    Slide 32

    ratio level of measurement

    the interval level modified to include the natural zero

    starting point (where zero indicates that none of the

    quantity is present). For values at this level, differences

    and ratios are meaningful. It incorporate the properties ofthe interval, ordinal and nominal levels

    Example:

    - Prices of college textbooks ($0 represents no cost)

    - Measurement of mass and length

    Definitions

  • 8/13/2019 Biostat lec01 basicconcepts

    9/15

    999

    Biostatistics School of Biotechnology International University

    Slide 33Summary -Levels of Measurement

    Nominal - categories only

    Ordinal - categories with some order

    Interval - differences but no naturalstarting point

    Ratio - differences and a natural startingpoint

    Biostatistics School of Biotechnology International University

    Slide 34Summary -Levels of Measurement

    Nominal

    Data

    Qualitative Quantitative

    Ordinal Interval Ratio

    Biostatistics School of Biotechnology International University

    Slide 35

    Recap

    Basic definitions and terms describing data

    Parameters versus statistics

    Types of data (quantitative and qualitative)

    Levels of measurement

    In the previous sections we have looked at:

    Biostatistics School of Biotechnology International University

    Slide 36

    Critical Thinking

  • 8/13/2019 Biostat lec01 basicconcepts

    10/15

    101010

    Biostatistics School of Biotechnology International University

    Slide 37

    Success in StatisticsSuccess in the introductory statistics

    course typically requires more commonsense than mathematical expertise

    This section is designed to illustrate

    how common sense is used when wethink critically about data and statistics

    Biostatistics School of Biotechnology International University

    Slide 38

    Limitations of Statistics:

    -Not to proof anything, just to show

    the chance of occurring of some

    event

    -May lead to some misuse

    Biostatistics School of Biotechnology International University

    Slide 39

    self-selected survey

    (or voluntary response sample)

    one in which the respondents themselves decide whether to be

    included

    In this case, valid conclusions can be made only about the

    specific group of people who agree to participate.

    Abuses (or misuses) of Statistics

    Bad Samples

    Biostatistics School of Biotechnology International University

    Slide 40

    Abuses of Statistics

    Loaded Questions

    Misleading Graphs

    Bad Samples

    Small Samples

  • 8/13/2019 Biostat lec01 basicconcepts

    11/15

    111111

    Biostatistics School of Biotechnology International University

    Slide 41

    Bachelor High School

    Degree Diploma

    Figure. Salaries of People with Bachelors Degrees and with High

    School Diplomas

    $40,000

    30,000

    25,000

    20,000

    $40,500

    $24,400

    35,000

    $40,000

    20,000

    10,000

    0

    $40,500

    $24,40030,000

    Bachelor High School

    Degree Diploma

    (a) (b)

    Biostatistics School of Biotechnology International University

    Slide 42

    We should analyze thenumerical information givenin the graph instead of beingmislead by its general shape

    Misleading Graphs

    Biostatistics School of Biotechnology International University

    Slide 43

    Bad Samples

    Small Samples

    Misleading Graphs

    Pictographs

    Distorted PercentagesLoaded Questions

    Order of Questions

    Refusals

    Correlation & Causality

    Self Interest Study

    Precise Numbers

    Partial Pictures

    Deliberate Distortions

    Misuses of Statistics

    Biostatistics School of Biotechnology International University

    Slide 44

    Design of Experiments

  • 8/13/2019 Biostat lec01 basicconcepts

    12/15

  • 8/13/2019 Biostat lec01 basicconcepts

    13/15

    131313

    Biostatistics School of Biotechnology International University

    Slide 49

    Sample Sizeuse a sample size that is large enough to seethe true nature of any effects and obtain thatsample using an appropriate method, such as

    one based on randomness

    Sample Size

    Biostatistics School of Biotechnology International University

    Slide 50

    Random Samplemembers of the population are selected insuch a way that each individual member hasan equal chance of being selected

    Definitions

    Simple Random Sample (of size

    n

    )subjects selected in such a way that every

    possible sample of the same size n has the

    same chance of being chosen

    Biostatistics School of Biotechnology International University

    Slide 51

    Randomnesswhy randomness is important in statistics?

    Random sample = representativesample-The best way to get a representative sample =choose a proportion of a population at random

    -Every possible experimental unit having equalchance of being selected, without bias

    Random sample

    Biostatistics School of Biotechnology International University

    Slide 52

    Random Sampling - selection so thateach has an equal chance of being selected

  • 8/13/2019 Biostat lec01 basicconcepts

    14/15

    141414

    Biostatistics School of Biotechnology International University

    Slide 53

    Systematic Sampling - Select somestarting point and then select every k-th element inthe population

    Biostatistics School of Biotechnology International University

    Slide 54Stratified Samplingsubdivide the population into at

    least two different subgroups that share the samecharacteristics, then draw a sample from each

    subgroup (or stratum)

    Biostatistics School of Biotechnology International University

    Slide 55

    Cluster Sampling - divide the populationinto sections (or clusters); randomly select some ofthose clusters; choose allmembers from selectedclusters

    Biostatistics School of Biotechnology International University

    Slide 56Major Points

    If sample data are not collected in anappropriate way, the data may be socompletely useless.

    Randomness typically plays a criticalrole in determining which data tocollect.

  • 8/13/2019 Biostat lec01 basicconcepts

    15/15

    151515

    Biostatistics School of Biotechnology International University

    Slide 57

    Random

    Systematic

    Stratified

    Cluster

    Methods of Sampling

    Biostatistics School of Biotechnology International University

    Slide 58

    Sampling Error

    the difference between a sample result and the truepopulation result; such an error results from chancesample fluctuations

    Nonsampling Errorsample data that are incorrectly collected, recorded,or analyzed (such as by selecting a biased sample,using a defective instrument, or copying the data

    incorrectly)

    Definitions

    Precision vs. Accuracy?

    Biostatistics School of Biotechnology International University

    Slide 59

    Recap

    In this section we have looked at:

    Types of studies and experiments

    Controlling the effects of variables

    (replication and sample size)

    Randomization

    Types of sampling

    Sampling errors

    Biostatistics School of Biotechnology International University

    Slide 60

    HOMEWORK

    Chernick: Introductory Biostatistics for theHealth Sciences

    2.1; 2.2; 2.8; 2.14

    3.1