DISPLAYING DISTRIBUTIONS WITH GRAPHS Section...

Preview:

Citation preview

DISPLAYING DISTRIBUTIONS WITH GRAPHS

Section 1.1

INTRODUCTION TO STEMPLOTS

Video:

https://www.learner.org/courses/againstallodds/unitpages/unit02.html

STEMPLOT(STEM AND LEAF PLOT)

GRAPHING QUANTITATIVE DATA -STEMPLOTSAlso referred to as a stem-and-leaf plot.

Gives a quick picture of the shape of a distribution

Includes the actual numerical values in the graph

Works best for:

- Small number of observations (Guideline: 15 – 150)

HOW TO MAKE A STEMPLOTSeparate each observation into a stem (consisting of all but the final, right-most, digit) and a leaf (the final digit).

-Stems may have as many digits as needed (can trim if needed)

-Each leaf contains only 1 digit

Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column

Write each leaf in the row to the right of its stem, increasing in order out from the stem

- Leaves are normally written to the right of the stems, unless making a back-to-back stemplot.

LITERACY IN ISLAMIC COUNTRIES

EX: CREATE A STEMPLOTOF FEMALE LITERACY RATES

You can use your calculator to sort data…as well as many other useful things

2

3

4

5

6

7

8

9

9

1 8

6

0 3 3

0 1 1 8

2 5 6

9 9 9

Overall pattern of stemplot is irregular.

-often the case when there are very few observations.

Appear to be 2 clusters of data.

Suggests we may want to investigate the variation in literacy.

GRAPHING QUANTITATIVE DATA -STEMPLOTS

BACK-TO-BACK STEMPLOTSLeaves on each side are ordered out from the common stem.

9

8 1

6

3 3 0

8 1 1 0

6 5 2

9 9 9

0

8 8

0 8

3 4 5 9

2 2 4 5

6

0 0 0

2

3

4

5

6

7

8

9

1

0

Female Male

Be sure to label each leaf.

STEMPLOT GUIDELINESStemplots do not work well for large data sets where each stem must hold a large number of leaves.

-becomes cluttered, loses shape

Two helpful modifications:

- Splitting stems into 2 (0-4 and 5-9) or 5 (0-1, 2-3, 4-5, 6-7, and 8-9)

- Trimming: When observed values have many digits – removing the last digit or digits before making stemplot (trimmed, not rounded)

Have to use your best judgment in deciding when to split or trim.

- Keep in mind that the purpose of the stemplot is to display the shape of a distribution.

- Statistical software will often make this decision for you.

VIRGINIA TUITION FOR 06-07 SCHOOL YEAR

Goal: Trim data and make a stemplot

Minimum number of digits for a stemplot:

Here we will use ten-thousands for the stem and thousands for the leaf – we base this off of the smallest value.

0

1

2

3

9

0122223444444556667788999

1111222225

4

2

Ok, let’s split the stems into two (0-4 and 5-9)

0

0

1

1

2

2

3

3

9

0122223444444

556667788999

111122222

5

4

Now split stems into five (0-1, 2-3, 4-5, 6-7, and 8-9)

0

1

1

1

1

1

2

2

2

2

2

3

3

3

9

01

22223

44444455

66677

88999

1111

22222

5

4

Stemplot Examples

INTRODUCTION TO HISTOGRAMS

Video:

https://www.learner.org/courses/againstallodds/unitpages/unit03.html

HISTOGRAMSStemplots:

• display the actual values of the observations, making them awkward for large data sets.

• divide the observations into groups (stems) determined by the number system, not by judgment.

Histograms do not have these limitations.

A histogram breaks the range of values into classes (quantitative) and displays only the count or percent of the observations that fall into each class.

You can choose any convenient number of classes (usually 5-8) but you must always choose classes of equal width.

Too few classes can lead to a graph that looks like a skyscraper.

Too many classes can lead to a graph that looks like a pancake.

HISTOGRAMS (CONT)

Good to know: Histograms are slow to construct by hand

They also do not display the actual values observed

Because of this we use stemplots for small data sets

CREATING A HISTOGRAM

Determine a starting point for the classes and a common width.

First find the range of the values

Then determine a convenient starting point and width.

Count the number of individual observations in each class (this will determine the height of the class).

Graph the histogram (Be sure to leave no space between classes, unless a class is empty).

Create a histogram based on the data below

Data range is 81 - 145

Start at 75, go up by 10

CLASS AND COUNTS

HISTOGRAMS VS. BAR GRAPHSHistograms resemble bar graphs, but are very different.

Bar Graphs are for a categorical variable.

Histograms are for a quantitative variable.

Bar Graphs have a blank space to separate the items being compared.

Histograms have no space (unless there is an empty class).

This indicates that all values of the variable are covered.

DESCRIBING DISTRIBUTIONSWhen you are asked to describe a distribution you are actually being tasked with describing the overall pattern.

Look to see if there is an overall pattern or if there are any striking differences from the pattern.

The overall pattern consists of: Shape, Center, and Spread.

An important deviation from the pattern is an Outlier. This is an individual value that falls outside the overall pattern.

For now, use your best judgement to determine if an individual is an outlier. Later we will learn a specific set of guidelines for determining outliers.

DESCRIBING DISTRIBUTIONSUse SOCS or CUSS to help you remember the steps in describing distributions.

Shape, Outliers, Center, Spread

or...

Center, Unusual, Shape, Spread

We will be using this ALL year☺

CENTERFor now we will describe the center of a distribution by its midpoint

The value with about half of the observations with smaller values and half of the observations with larger values.

There is usually a peak associated with the center of the distribution (when unimodal).

Later we will learn additional measures of center.Center Center

UNUSUAL / OUTLIERSFor now, identifying outliers is a matter of judgement.

Points that are clearly apart from the body of the data, not just the largest or smallest value.

Later we will learn an objective process for determining if an outlier is present.

Probably not an

outlier Probably an outlier

SHAPEStemplots and Histograms display the shape of data in the same way.

Imagine the stemplot on its side so that the larger values lie to the right.

Peaks: Does the distribution have one or several major peaks (modes)

Unimodal BimodalMultimodal

SHAPE (CONT.)Skew: Is the distribution approximately symmetric or is it skewed in one direction?

SymmetricLeft Skewed Right Skewed

SHAPE (CONT. AGAIN)• Some variables commonly have distributions with

predictable shapes.

• Often times biological measurements on specimens from

the same species and sex have symmetric distributions.

• Salaries, savings, and home prices often have right-

skewed distributions.

SPREADThe spread of a distribution can be described by giving the smallest (minimum) and largest (maximum) values.

Later we will learn additional ways to describe the spread.

Spread is from

40 to 100

Recommended