Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic...

Preview:

DESCRIPTION

Different Types of Variables Some variables are quantitative variable, like the time for a person to finish a task or the person’s age. Other variables are qualitative variables as the person’s nationality or the person’s preferred sport. In this note we will work with quantitative variables. All the measurement collected from individuals about a particular data is referred a “data”. Our data will contain the measurement for only one variable.

Citation preview

Descriptive Statisticsfor one Variable

Variables and measurements

• A variable is a characteristic of an individual or object in which the researcher is interested. For example the SAT score for a college student.

• For a particular individual or object the variable will take a value called measurement. For example , John’s SAT is 720.

Different Types of Variables• Some variables are quantitative variable, like the time for

a person to finish a task or the person’s age.

• Other variables are qualitative variables as the person’s nationality or the person’s preferred sport.

• In this note we will work with quantitative variables.

• All the measurement collected from individuals about a particular data is referred a “data”.

• Our data will contain the measurement for only one variable.

Statistics has two major chapters:

• Descriptive Statistics

• Inferential statistics

StatisticsDescriptive Statistics• Provides numerical and

graphic procedures to summarize the information of the data in a clear and understandable way

Inferential Statistics

• Provides procedures to draw inferences about a population from a sample

Population and SamplesThe Population under study is the set off all individuals of interest for the research.

We will see that, in practice, the variable is measured only for a part of the population.

That part of the population for which we collect measurements is called sample.

The number of individuals in a sample is denoted by n.

In this notes and examples we will assume that our data correspond to a sample of the population under study.

Descriptive Measures• Central Tendency measures. They are

computed in order to give a “center” around which the measurements in the data are distributed.

• Variation or Variability measures. They describe “data spread” or how far away the measurements are from the center.

• Relative Standing measures. They describe the relative position of a specific measurement in the data.

Measures of Central Tendency

• Mean: Sum of all measurements in the data divided by the number of measurements.

• Median: A number such that at most half of the measurements are below it and at most half of the measurements are above it.

• Mode: The most frequent measurement in the data.

Example of Mean

Measurements Deviationx x - mean3 -15 15 11 -37 32 -26 27 30 -44 0

40 0

• MEAN = 40/10 = 4

• Notice that the sum of the “deviations” is 0.

• Notice that every single observation intervenes in the computation of the mean.

Example of Median• Median: (4+5)/2 =

4.5

• Notice that only the two central values are used in the computation.

• The median is not sensible to extreme values

Measurements Measurements Ranked

x x3 05 15 21 37 42 56 57 60 74 7

40 40

Example of ModeMeasurements

x3551726704

• In this case the data have two modes:

• 5 and 7• Both measurements are

repeated twice

Example of ModeMeasurements

x351147383

• Mode: 3

• Notice that it is possible for a data not to have any mode.

Measures of Variability

• Range• Variance• Standard Deviation

The Range• Definition: The range of a data is the difference

between the largest and the smallest measurements in the data.

• To find the range, first order the data from least to greatest. Then subtract the smallest value from the largest value in the set.

• Example: A marathon race was completed by 7 participants. What is the range of times given in hours below? 2.3 hr, 8.7 hr, 3.5 hr, 5.1 hr, 4.9 hr, 7.1 hr, 4.2 hs

Ordering the data from least to greatest, we get: 2.3, 3.5, 4.2, 4.9, 5.1, 7.1, 8.7. So highest - lowest = 8.7 hr - 2.3 hr = 6.4 hr Answer: The range of swim times is 6.4 hr.

The Range is not Enough

Consider the following examples of data1,1,1,1,81,2,4,6,81,8,1,8,1In the three cases the Range is the same:

Range = 7However, the three series exhibit

completely different distributions of values along the range of values

The sample variance

The variance takes into account the deviation around the mean of the Data.The formula for the sample variance is as follows

1

22

nxx

s

The Standard Deviation consists of the square root of the Variance

Notice that the mean and the standard deviation have the same unit as the one of the measurements

2sVariances

Variance (for a sample)

• Steps:– Compute each deviation– Square each deviation– Sum all the squares– Divide by the data size (sample size)

minus one: n-1

Example of Variance

Measurements Deviations Square of deviations

x x - mean3 -1 15 1 15 1 11 -3 97 3 92 -2 46 2 47 3 90 -4 164 0 0

40 0 54

• Variance = 54/9 = 6

• It is a measure of “spread”.

• Notice that the larger the deviations (positive or negative) the larger the variance

The standard deviation

• It is defined as the square root of the variance

• In the previous example• Variance = 6• Standard deviation = Square root of

the variance = Square root of 6 = 2.45• The standard deviation summarizes the

deviations in one number

Percentiles

• The p-th percentile is a number such that at most p% of the measurements are below it and at most 100 – p percent of the data are above it.

• Example, if in a certain data the 85th percentile is 340 means that 15% of the measurements in the data are above 340. It also means that 85% of the measurements are below 340

• Notice that the median is the 50th percentile

Tchebichev’s Rule

The standard deviation can be used to construct an interval enclosing an important percent of the data. In fact, this rule says that for any data set:

• At least 75% of the measurements differ from the mean less than twice the standard deviation.

• At least 89% of the measurements differ from the mean less than three times the standard deviation.

Note: This is a general property and it is called Tchebichev’s Rule: At least 1-1/k2 of the observation falls within k standard deviations from the mean. It is true for every dataset.

Example of Tchebichev’s Rule

Suppose that for a certain data is :

• Mean = 20

• Standard deviation =3

Then:

• A least 75% of the measurements are between 14 and 26

• At least 89% of the measurements are between 11 and 29

Further Notes

• When the Mean is greater than the Median the data distribution is skewed to the Right.

• When the Median is greater than the Mean the data distribution is skewed to the Left.

• When Mean and Median are very close to each other the data distribution is approximately symmetric.

Empirical Rule (68-95-99.7 Rule)

For “Normal Distributions” (Data sets whose histograms are bell or mount shaped):

• Approx. 68% of values are within 1 standard deviation of the mean

• Approx. 95% of values are within 2 standard deviations of the mean

• Approx. 99.7% of values are within 3 standard deviations of the mean

Example of Empirical Rule

Suppose that the hourly wages of certain type of workers have a “normal distribution” ( bell shaped histogram). Assume also that the mean is $16 with a standard deviation of $1.5

The we have:

1 standard deviation = $1.52 standard deviations = $3.03 standard deviations = $4.5

What does the empirical rule allow us to say?

SolutionThe empirical rule allows us to say that:

• Approx. 68% of workers in this occupation earn wages that are within 1 standard deviation of the mean :– Between 14 – 1.5 and 14 + 1.5– Between $12.5 and $15.5

• Approx. 95% of workers in this occupation earn wages that are within 2 standard deviation of the mean :– Between 14 – 3 and 14 + 3– Between $11.0 and $17.0

• Approx. 99.7% of workers in this occupation earn wages that are within 3 standard deviation of the mean :– Between 14 – 4.5 and 14 + 4.5– Between $9.5 and $18.5

Recommended