STAT 3022 Data Analysis class slides 1

  • Upload
    yang-yi

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    1/16

    Chapter 1

    Drawing Statistical Conclusions

    STAT 3022

    School of Statistic, University of Minnesota

    January 27, 2013

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    2/16

    Outline

    Some BasicsSummary statistics

    sample mean X=n

    i=1Xi/n

    sample standard deviation

    ni=1

    (Xi X

    )2/(n 1)

    median, Q1, Q3, interquartile range (IQR)

    IQR = Q3 Q1

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 2 / 16

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    3/16

    Outline

    Some Basics

    Roll a 6-face dice 3 times, outcome: 1, 2, 6

    Sample Mean: X=

    1+2+6

    3 = 3Population Mean: =?

    Law of large numbers: Xn for n

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 3 / 16

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    4/16

    Outline

    Some Basics

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 4 / 16

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    5/16

    Outline

    Some BasicsGraphical summary

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 5 / 16

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    6/16

    Outline

    Case Study: Observational Experiment

    Question: Did a bank discriminatorily pay higher startingsalaries to men than to women? Data: Beginning salaries for 32

    men, 61 women. All skilled, entry-level employees hiredbetween 1969 and 1977 Perform exploratory data analysis using

    graphical and numerical summaries of the data.

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 6 / 16

    l

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    7/16

    Outline

    Graphical Summary

    Male

    Starting Salary

    Frequency

    3000 4000 5000 6000 7000 8000 9000

    0

    4

    8

    12

    Female

    Starting Salary

    Frequency

    3000 4000 5000 6000 7000 8000 9000

    0

    5

    10

    2

    0

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 7 / 16

    O tli

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    8/16

    Outline

    Interpreting Histograms

    Relative frequency histograms allow us to visually displaygeneral characteristics of the data distribution of a particularvariable:

    Central tendency - Do men tend to be paid higher than

    women?Spread - What is the range of most salaries?

    Symmetry - Is there a skew in either distribution? Are thereany outliers?

    Histograms are used to show broad features, not exquisitedetail

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 8 / 16

    Outline

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    9/16

    Outline

    Numerical Summary

    lec1_1.R lec1_2.R in-class

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 9 / 16

    Outline

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    10/16

    Outline

    Normal Distribution

    bell shaped, defined by the formula 12

    e (x)

    2

    22

    two parameters: mean , variance 2 (standard deviation =

    2)

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 10 / 16

    Outline

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    11/16

    Outline

    Normal Distribution

    Normal distribution N(, ) is defined by

    f(x) =1

    2e (x)

    2

    22

    Standard normal distribution N(0, 1)

    (x) =12

    e12x2

    Why is standard normal distribution important

    f(x) =1

    x

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 11 / 16

    Outline

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    12/16

    Outline

    Normal Distribution

    Why normal distribution is so important?

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 12 / 16

    Outline

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    13/16

    Outline

    Normal Distribution

    What is this distribution?

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 13 / 16

    Outline

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    14/16

    Central Limit Theorem

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 14 / 16

    Outline

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    15/16

    > dnorm(0, mean = 0, sd = 1) # density

    [1] 0.3989423

    > dnorm(0, mean = 0, sd = 2)

    [1] 0.1994711

    >

    > pnorm(1, mean = 0, sd = 1) # distribution function

    [1] 0.8413447

    > pnorm(1, mean = 0, sd = 1, lower.tail = FALSE)

    [1] 0.1586553

    >

    > qnorm(0.5, mean = 2, sd = 1) # quantile function

    [1] 2

    > qnorm(0, mean = 2, sd = 1)

    [1] -Inf

    >> rnorm(5, mean = 0, sd = 1) # random generation

    [1] 2.2867947 1.3311000 1.9408290 -0.5366956 1.1687528

    > rnorm(5)

    [1] -0.48693373 0.02950848 -1.03232990 -0.24314950 -0.42515522

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 15 / 16

    Outline

  • 7/29/2019 STAT 3022 Data Analysis class slides 1

    16/16

    ???

    STAT 3022 | Chapter 1 Drawing Statistical Conclusions 16 / 16