Upload
herb
View
34
Download
0
Tags:
Embed Size (px)
DESCRIPTION
STA 291 Spring 2010. Lecture 1 Dustin Lueker. Topics. Statistical terminology Descriptive statistics Probability and distribution functions Inferential statistics Estimation (confidence intervals) Hypothesis testing Simple linear regression and correlation. Why study Statistics?. - PowerPoint PPT Presentation
Citation preview
STA 291Spring 2010
Lecture 1Dustin Lueker
Statistical terminology Descriptive statistics Probability and distribution functions Inferential statistics
◦ Estimation (confidence intervals)◦ Hypothesis testing
Simple linear regression and correlation
Topics
STA 291 Spring 2010 Lecture 1
Research in all fields is becoming more quantitative◦ Research journals◦ Most graduates will need to be familiar with basic
statistical methodology and terminology Newspapers, advertising, surveys, etc.
◦ Many statements contain statistical arguments Computers make complex statistical
methods easier to use
Why study Statistics?
STA 291 Spring 2010 Lecture 1
Many times statistics are used in an incorrect and misleading manner
Purposely misused◦ Companies/people wanting to further their
agenda Cooking the data
Completely making up data Massaging the numbers
Altering values to get desired result Incidentally misused
◦ Using inappropriate methods Vital to understand a method before using it
Lies, Damn Lies, and Statistics
STA 291 Spring 2010 Lecture 1
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data
Applicable to a wide variety of academic disciplines◦ Physical sciences◦ Social sciences◦ Humanities
Statistics are used for making informed decisions◦ Business◦ Government
What is Statistics?
STA 291 Spring 2010 Lecture 1
Design •Planning research studies•How to best obtain the required data•Assuring that our data is representational of the entire population
Description •Summarizing data•Exploring patterns in the data•Extract/condense information
Inference •Make predictions based on the data•‘Infer’ from sample to population•Summarize results
General Statistical Methodology
STA 291 Spring 2010 Lecture 1
Population◦ Total set of all subjects of interest
Entire group of people, animals, products, etc. about which we want information
Elementary Unit◦ Any individual member of the population
Sample◦ Subset of the population from which the study
actually collects information◦ Used to draw conclusions about the whole
population
Basic Terminology
STA 291 Spring 2010 Lecture 1
Variable◦ A characteristic of a unit that can vary among
subjects in the population/sample Ex: gender, nationality, age, income, hair color, height,
disease status, state of residence, grade in STA 291 Parameter
◦ Numerical characteristic of the population Calculated using the whole population
Statistic◦ Numerical characteristic of the sample
Calculated using the sample
Basic Terminology
STA 291 Spring 2010 Lecture 1
Why take a sample? Why not take a census? Why not measure all of the units in the population?◦ Accuracy
May not be able to find every unit in the population◦ Time
Speed of response from units◦ Money◦ Infinite Population◦ Destructive Sampling or Testing
Data Collection and Sampling Theory
STA 291 Spring 2010 Lecture 1
University Health Services at UK conducts a survey about alcohol abuse among students◦ 200 of the students are sampled and asked to
complete a questionnaire◦ One question is “have you regretted something
you did while drinking?” What is the population? What is the sample?
Example
STA 291 Spring 2010 Lecture 1
‘Flavors’ of Statistics Descriptive Statistics
◦ Summarizing the information in a collection of data
Inferential Statistics◦ Using information from a sample to make
conclusions/predictions about the population Ex: using a sample statistic to estimate a population
parameter
STA 291 Spring 2010 Lecture 1
Example The Current Population Survey of about 60,000
households in the United States in 2002 distinguishes three types of families: Married-couple (MC), Female householder and no husband (FH), Male householder and no wife (MH)
It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level◦ Are these numbers statistics or parameters?
The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5%◦ Is this an example of descriptive or inferential statistics?
STA 291 Spring 2010 Lecture 1
Univariate vs. Multivariate Univariate data
◦ Consists of observations on a single attribute Multivariate data
◦ Consists of observations on several attributes Special case
Bivariate Data Consists of observations on two attributes
STA 291 Spring 2010 Lecture 1
Quantitative or Numerical◦ Variable with numerical values associated with
them Qualitative or Categorical
◦ Variables without numerical values associated with them
Scales of Measurement
STA 291 Spring 2010 Lecture 1
Ordinal◦ Disease status, company rating, grade in STA 291
Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than
another unit Nominal
◦ Gender, nationality, hair color, state of residence Nominal variables have a scale of unordered
categories It does not make sense to say, for example, that green
hair is greater/higher/better than orange hair
Qualitative Variables
STA 291 Spring 2010 Lecture 1
Quantitative◦ Age, income, height
Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval
scale
Quantitative Variables
STA 291 Spring 2010 Lecture 1
A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following◦ Nominal (Qualitative): Requires assistance from staff?
Yes No
◦ Ordinal (Qualitative): Plaque score No visible plaque Small amounts of plaque Moderate amounts of plaque Abundant plaque
◦ Interval (Quantitative): Number of teeth
Example
STA 291 Spring 2010 Lecture 1
A birth registry database collects the following information on newborns◦ Birth weight: in grams◦ Infant’s Condition:
Excellent Good Fair Poor
◦ Number of prenatal visits◦ Ethnic background:
African-American Caucasian Hispanic Native American Other
What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal)
Example
STA 291 Spring 2010 Lecture 1
Statistical methods vary for quantitative and qualitative variables
Methods for quantitative data cannot be used to analyze qualitative data
Quantitative variables can be treated in a less quantitative manner◦ Height: measured in cm/in
Interval (Quantitative) Can be treated at Qualitative
Ordinal: Short Average Tall
Nominal: <60in or >72in 60in-72in
Importance of Different Types of Data
STA 291 Spring 2010 Lecture 1
Try to measure variables as detailed as possible◦ Quantitative
More detailed data can be analyzed in further depth
◦ Caution: Sometimes ordinal variables are treated as quantitative (ex: GPA)
Other Notes on Variable Types
STA 291 Spring 2010 Lecture 1
A variable is discrete if it can take on a finite number of values◦ Gender◦ Nationality◦ Hair color◦ Disease status◦ Grade in STA 291◦ Favorite MLB team
Qualitative variables are discrete
Discrete Variables
STA 291 Spring 2010 Lecture 1
Continuous variables can take an infinite continuum of possible real number values◦ Time spent studying for STA 291 per day
43 minutes 2 minutes 27.487 minutes 27.48682 minutes
Can be subdivided into more accurate values Therefore continuous
Continuous Variables
STA 291 Spring 2010 Lecture 1
Number of children in a family Distance a car travels on a tank of gas % grade on an exam
Examples
STA 291 Spring 2010 Lecture 1
Quantitative variables can be discrete or continuous
Age, income, height?◦ Depends on the scale
Age is potentially continuous, but usually measured in years (discrete)
Discrete or Continuous
STA 291 Spring 2010 Lecture 1