26
LECTURE I Statistics Two different meanings of statistics: 1. ‘summary numbers resulting from data analysis’ 2. ‘procedures used to organize and analyze facts numerically’

LECTURE I Statistics

Embed Size (px)

DESCRIPTION

LECTURE I Statistics. Two different meanings of statistics: ‘summary numbers resulting from data analysis’ ‘procedures used to organize and analyze facts numerically’. Descriptive statistics. When one wants to describe the data that has been collected The mean age in the class is 24. - PowerPoint PPT Presentation

Citation preview

Page 1: LECTURE I Statistics

LECTURE IStatistics

Two different meanings of statistics:1. ‘summary numbers resulting from data

analysis’2. ‘procedures used to organize and analyze

facts numerically’

Page 2: LECTURE I Statistics

Descriptive statistics

• When one wants to describe the data that has been collected – The mean age in the class is 24.– The median home price in Orange County is

$305,000.

Page 3: LECTURE I Statistics

Inferential statistics

• when one wants to generalize or make inferences based on it.– Because the mean age of this class is 24, the

mean age of all classes are 24.– Based on the data from insurance companies,

males under 18 are more likely to get into accidents then females under 18.

Page 4: LECTURE I Statistics

The research process

• Specify research goals

• Review the literature

• Formulate hypotheses

• Measure and record

• Analyze the data

• Invite scrutiny

Page 5: LECTURE I Statistics

The W’s of data

• WHO: each observation belong to somebody or something. If observations are coming from individuals, those individuals are called ‘subjects’, ‘respondents’, ‘participants’ or ‘cases’. Unit of observation does not have be individuals though. If observations are coming from inanimate subjects they are usually called experimental units.

Page 6: LECTURE I Statistics

Example:

• You collected data on your patients’ age. Then each observation comes from a single patient. In this case each patient is a ‘case’ or a ‘participant’.

• You collected data from different web-sites. You recorded how may links each web-site has. Here observation comes from a different web-site. Therefore each web-site is called an ‘experimental unit’.

Page 7: LECTURE I Statistics

The W’s of data

• 2. WHAT: The characteristics recorded about each individual are called variables.

• 3. WHERE

• 4. WHEN

• 5. HOW

Page 8: LECTURE I Statistics

Definitions

• Population – the collection of all elements in the study.• Census - the collection of data from every element in a

population.• Sample - a sub-collection of elements drawn from the

population .• Random sample – selected in such a way that each element

in the population has an equal chance of being represented.• Sampling frame – a list of elements in the population

Page 9: LECTURE I Statistics

SAMPLING:

• Is the process by which you select the sample from the population.

• Sample has to be representative, if not then it is a biased sample and results will not apply (be generalized) to the population.

Page 10: LECTURE I Statistics

How to get a representative sample?:

• Random sampling

• Systematic sampling

• Stratified sampling

• Cluster sampling

• Convenience sampling

Page 11: LECTURE I Statistics

• Simple random sample – n subjects are selected in a way that every possible sample of size n has the same chance of being chosen

• Stratified sample – subdivide the population into at least 2 different subpopulations that share the same characteristic, then draw a sample from each group.

• Systematic sample – select every k element in the population.

• Cluster sample – divide the population into sections/clusters, then randomly select a few of those sections, and then choose all of the numbers from those selected sections

• Convenience sample – use what is readily available

Page 12: LECTURE I Statistics

What is a variable?

• A variable is anything that can take on different values or amounts across time or across subjects.

* IQ

* size of a classroom* midterm scores* depression* motivation* drug dosage* SES* Teaching experience* Race, ethnicity

Page 13: LECTURE I Statistics

Hypothesis

• A statement that describes a relationship between at least two variables; these statements are based on either research or personal knowledge.– The majority of Americans run red lights.– Research claims that the mean body

temperature of healthy adults is not 98.6.

Page 14: LECTURE I Statistics

Depending on the context of research a variable can be the one of the two :

• Dependant variable is the variable of main interest. It is observed but not manipulated. It is the variable on which the effect of other variables are investigated

• Independent variable is the variable of which effect on the dependant variable is investigated.

• Control variable is any variable other than the above that can have affect on the independent – dependant relationship.

Page 15: LECTURE I Statistics

Example:

• Catholics are more likely to vote for Bush.

• DEPENDANT VARIABLE: voting preference.• INDEPENDANT VARIABLE: religion.

(the way that you vote depends on your religion)

• Control variables:

Page 16: LECTURE I Statistics

Context-independent classification of variables:

• Qualitative variables: also known as ‘categorical variables’. Differ in kind rather than amount. There is no unit of measurement.

• Quantitative variables: numbers assigned to quantitative variables represent differing quantities of characteristics.

Page 17: LECTURE I Statistics

Which are quantitative, which are qualitative?

• Gender

• IQ scores

• Age

• Ethnicity

• Number of years of experience

• Smoking status

Page 18: LECTURE I Statistics

A Quantitative variable can be

• Discrete [e.g. number of students in a class, number of kids one has]

• Continuous [time, weight, ability, achievement, IQ]

Page 19: LECTURE I Statistics

Summary of types of variables

Types of variables

Qualitative Quantitative

Discrete Continuous

Page 20: LECTURE I Statistics

SCALES OF MEASUREMENT

• 1. Nominal Scale: Simplest scale. Provides names or labels only. Numbers assigned to each label is completely arbitrary. Therefore labels cannot be put in a meaningful order. There is no magnitude of measurement.

Ex: Gender is measured on nominal scale.

1 = Male, 2 = Female

does not mean Female is bigger, better or stronger.

Page 21: LECTURE I Statistics

SCALES OF MEASUREMENT

• 2. Ordinal Scale: Numbers assigned on ordinal scale tell us about the ranking of each observation. Therefore they can be put in a meaningful order. Ex: How difficult is this course?1 = not at all difficult2 = a bit difficult3 = extremely difficult

BE CAREFUL! THE DIFFERENCE BETWEEN THE NUMBERS IS MEANINGLESS.

Page 22: LECTURE I Statistics

SCALES OF MEASUREMENT

• 3. Interval Scale: Numbers assigned on interval scale are meaningful, the differences among the numbers are also meaningful. There is no absolute zero, therefore the ratio of the numbers is meaningless.

Ex: Temperature is measured on interval scale.

difference between 30 F and 40 F is equal to the difference between 50 F ad 60 F. But 60 F is not twice as hot as 30 F.

Page 23: LECTURE I Statistics

SCALES OF MEASUREMENT

• 4. Ratio Scale: the numbers assigned on ratio scale are meaningful, the differences are meaningful and the ratios are meaningful. There is an absolute zero point.

Example: Weight is measured on ratio scale.

40 pounds is twice as heavy as 20 pounds.

Page 24: LECTURE I Statistics

More examples• Ice cream flavors - NOMINAL

• The speed of five runners in a 1-mile race, as measured by the runner’s order of finish. 1 for winner, 2 for second, etc. - ORDINAL

• The height above ground level of the floors in a particular 10 storied apartment building, as measure by the number of each floor, assuming that the first floor is at ground level. - INTERVAL

• The number of people going to a particular movie theater each night as a measure of the theater’s gross income from ticket sales, assuming each ticket costs $7.00. - RATIO

• Population of all eighth grade students in the US, with X representing the region of the country in which the student lives. 1 = northeast, 2 = north central, 3 = south, and 4 = west. - NOMINAL

• Toss a coin 100 times and X represents the number of heads obtained for each set of 100 tosses. - RATIO

Page 25: LECTURE I Statistics

Uses and abuses of statistics

• small samples – even a large sample can be biased• precise numbers – a statistic that is very precise is not necessarily

accurate• guesstimates – estimating how many people at the million man march• distorted percentages • partial pictures• deliberate distortions• loaded questions – since we already have enough nuclear warheads to

blow up the world, should more federal money be spent on the defense budget?

• misleading graphs – see text!• pictographs – often drawn distorted• pollster pressure – answering to favor self-image• bad samples

Page 26: LECTURE I Statistics

WHY DID WE LEARN THIS ANYWAY?!?

• Your variables are measured on one scale or the other. And the type of scale determines what kind of operations (or calculations) you can carry out with the data that represents your variables. If you measured something on nominal scale you cannot take its average for instance !!!