21
Descriptive statistics enable us to understand data through summary values and graphical presentations. Summary values not only include the average, but also the spread, median, mode, range, and standard deviation. It is important to look at summary statistics along with the data set to understand the entire picture, as the same summary statistics may describe very different data sets. Descriptive statistics can be illustrated in an understandable fashion by presenting them graphically using statistical and data presentation tools. STATISTICAL / DATA PRESENTATION TOOLS

Statistical Data Presentation Tools

Embed Size (px)

Citation preview

Page 1: Statistical Data Presentation Tools

Descriptive statistics enable us to understand data through summary values and graphical presentations. Summary values not only include the average, but also the spread, median, mode, range, and standard deviation.

It is important to look at summary statistics along with the data set to understand the entire picture, as the same summary statistics may describe very different data sets. Descriptive statistics can be illustrated in an understandable fashion by presenting them graphically using statistical and data presentation tools.

STATISTICAL / DATA PRESENTATION TOOLS

Page 2: Statistical Data Presentation Tools

When creating graphic displays, keep in mind the following questions:

• What am I trying to communicate? • Who is my audience? • What might prevent them from understanding this display? • Does the display tell the entire story?

STATISTICAL / DATA PRESENTATION TOOLS

Page 3: Statistical Data Presentation Tools

Several types of statistical/data presentation tools exist, including:

(a) charts displaying frequencies (bar, pie, and Pareto charts, (b) charts displaying trends (run and control charts), (c) charts displaying distributions (histograms), and (d) charts displaying associations (scatter diagrams).

Different types of data require different kinds of statistical tools. There are two types of data. Attribute data are countable data or data that can be put into categories: e.g., the number of people willing to pay, the number of complaints, percentage who want blue/percentage who want red/percentage who want yellow. Variable data are measurement data, based on some continuous scale: e.g., length, time, cost.

STATISTICAL / DATA PRESENTATION TOOLS

Page 4: Statistical Data Presentation Tools

To Show Use Data Needed

Frequency of occurrence:Simple percentages or comparisons of magnitude

Bar chartPie chartPareto chart

Tallies by category (data can be attribute data or variable data divided into categories)

Trends over time Line graphRun chartControl chart

Measurements taken in chronological order (attribute or variable data can be used)

Distribution: Variation not related to time (distributions)

Histograms Forty or more measurements (not necessarily in chronological order, variable data)

Association: Looking for a correlation between two things

Scatter diagram Forty or more paired measurements (measures of both things of interest, variable data)

CHOOSING DATA DISPLAY TOOLS

Page 5: Statistical Data Presentation Tools

In descriptive statistics, a box plot is a convenient way of graphically depicting groups of numerical data through their five-number summaries:

• the smallest observation (sample minimum), • lower quartile (Q1),• median (Q2), • upper quartile (Q3), • and largest observation (sample maximum).

A boxplot may also indicate which observations, if any, might be considered outliers. Boxplots display differences between populations without making any assumptions of the underlying statistical distribution: they are non-parametric.

The spacings between the different parts of the box help indicate the degree of dispersion (spread) and skewness in the data, and identify outliers. Boxplots can be drawn either horizontally or vertically.

BOXPLOT

Page 6: Statistical Data Presentation Tools

BOXPLOT

Page 7: Statistical Data Presentation Tools

A stemplot (or stem-and-leaf plot), in statistics, is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution.

Unlike histograms, stemplots retain the original data to at least two significant digits, and put the data in order, thereby easing the move to order-based inference and non-parametric statistics.

A basic stemplot contains two columns separated by a vertical line. The left column contains the stems and the right column contains the leaves.

STEMPLOT

Stem and Leaf Graph used for Japanese Train Time Table

Page 8: Statistical Data Presentation Tools

CONSTRUCTING A STEMPLOT

To construct a stem plot, the observations must first be sorted in ascending order. Here is the sorted set of data values that will be used in the following example:

44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106

Next, we must determine what the stems will represent and what the leaves will represent.

Typically, the leaf contains the last digit of the number and the stem contains all of the other digits. In the case of very large numbers, the data values may be rounded to a particular place value (such as the hundreds place) that will be used for the leaves. The remaining digits to the left of the rounded place value are used as the stem.

In this example, the leaf represents the ones place and the stem will represent the rest of the number (tens place and higher).

Page 9: Statistical Data Presentation Tools

The stemplot is drawn with two columns separated by a vertical line. The stems are listed to the left of the vertical line. It is important that each stem is listed only once and that no numbers are skipped, even if it means that some stems have no leaves. The leaves are listed in increasing order in a row to the right of each stem.It is important to note that when there is a repeated number in the data (such as two 44's) then the plot must reflect such (so the plot would look like 4 | 4 4 6 7 9 if it had the numbers 44 44 46 47 49)

4 | 4 6 7 9 5 |

6 | 3 4 6 8 8 7 | 2 2 5 6

8 | 1 4 8 9 |

10 | 6 key: 6|3=63 leaf unit: 1.0

stem unit: 10.0

CONSTRUCTING A STEMPLOT

Page 10: Statistical Data Presentation Tools

Rounding may be needed to create a stemplot. Based on the following set of data, the stem plot below would be created:-23.678758, -12.45, -3.4, 4.43, 5.5, 5.678, 16.87, 24.7, 56.8

For negative numbers, a negative is placed in front of the stem unit, which is still the value X / 10. Non-integers are rounded. This allowed the stem and leaf plot to retain its shape, even for more complicated data sets. As in this example below:

-2 | 4 -1 | 2 -0 | 3

0 | 4 6 6 1 | 7 2 | 5

3 | 4 |

5 | 7

CONSTRUCTING A STEMPLOT

Page 11: Statistical Data Presentation Tools

A scatter plot or scattergraph is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data.

The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. This kind of plot is also called a scatter chart, scatter diagram and scatter graph.

SCATTER PLOT

Page 12: Statistical Data Presentation Tools

A scatter plot is used when a variable exists that is under the control of the experimenter. If a parameter exists that is systematically incremented and/or decremented by the other, it is called the control parameter or independent variable and is customarily plotted along the horizontal axis. The measured or dependent variable is customarily plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of correlation (not causation) between two variables.

SCATTER PLOT

Page 13: Statistical Data Presentation Tools

A 3D scatter plot allows for the visualization of multivariate data of up to four dimensions. The Scatter plot takes multiple scalar variables and uses them for different axes in phase space. The different variables are combined to form coordinates in the phase space and they are displayed using glyphs and colored using another scalar variable

SCATTER PLOT

Page 14: Statistical Data Presentation Tools

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of experimental data. It is an estimate of the probability distribution of a continuous variable. A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equalling 1.

The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size.

HISTOGRAM

Page 15: Statistical Data Presentation Tools

HISTOGRAM

Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram used for probability density is always normalized to 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

Page 16: Statistical Data Presentation Tools

BAR CHART

A bar chart or bar graph is a chart with rectangular bars with lengths proportional to the values that they represent. The bars can also be plotted horizontally.

Page 17: Statistical Data Presentation Tools

Bar charts are used for plotting discrete (or 'discontinuous') data i.e. data which has discrete values and is not continuous.

Some examples of discontinuous data include 'shoe size' or 'eye color', for which you would use a bar chart.

In contrast, some examples of continuous data would be 'height' or 'weight'.

A bar chart is very useful if you are trying to record certain information whether it is continuous or not continuous data.

BAR CHART

Page 18: Statistical Data Presentation Tools

A pie chart (or a circle graph) is a circular chart divided into sectors, illustrating proportion. In a pie chart, the arc length of each sector (and consequently its central angle and area), is proportional to the quantity it represents. When angles are measured with 1 turn as unit then a number of percent is identified with the same number of centiturns. Together, the sectors create a full disk. It is named for its resemblance to a pie which has been sliced.

PIE CHART

Pie chart of populations of English

native speakers

Page 19: Statistical Data Presentation Tools

DOT PLOTS

A dot chart or dot plot is a statistical chart consisting of group of data points plotted on a simple scale. Dot plots are used for continuous, quantitative, univariate data. Data points may be labelled if there are few of them.

Dot plots are one of the simplest statistical plots, and are suitable for small to moderate sized data sets. They are useful for highlighting clusters and gaps, as well as outliers.

Their other advantage is the conservation of numerical information. When dealing with larger data sets (around 20–30 or more data points) the related stemplot, box plot or histogram may be more efficient, as dot plots may become too cluttered after this point.

Page 20: Statistical Data Presentation Tools

DOT PLOTS

A dot plot of 50 random values from 0 to 9.

Page 21: Statistical Data Presentation Tools

LINE GRAPHS

A line chart or line graph is a type of graph, which displays information as a series of data points connected by straight line segments. It is a basic type of chart common in many fields. It is an extension of a scatter graph, and is created by connecting a series of points that represent individual measurements with line segments. A line chart is often used to visualize a trend in data over intervals of time – a time series – thus the line is often drawn chronologically