13
Chapter 2 Organizing and Visualizing Variables Dr. Joerg Wild Henan University of Technology Statistics Chapter 2: Organizing and Visualizing Variables Table of contents 1 Organizing Categorical Variables 2 Organizing Numerical Variables 3 Visualizing Categorical Variables 4 Visualizing Numerical Variables 5 Visualizing Two Numerical Variables 6 The Challenge in Organizing and Visualizing Variables 2 / 51 Organizing Categorical Variables 3 / 51 Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Categorical Data Are Organized By Utilizing Tables 4 / 51

Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Embed Size (px)

Citation preview

Page 1: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Chapter 2Organizing and Visualizing Variables

Dr. Joerg Wild

Henan University of Technology

Statistics Chapter 2: Organizing and Visualizing Variables

Table of contents

1 Organizing Categorical Variables

2 Organizing Numerical Variables

3 Visualizing Categorical Variables

4 Visualizing Numerical Variables

5 Visualizing Two Numerical Variables

6 The Challenge in Organizing and VisualizingVariables

2 / 51

Organizing CategoricalVariables

3 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables

Categorical Data Are Organized By UtilizingTables

4 / 51

Page 2: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables

Organizing Categorical Data:Summary Table

A summary table tallies the frequencies orpercentages of items in a set of categories so thatyou can see differences between categories.

Figure 1: Main Reason Young Adults Shop Online

5 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables

Summary Table

The sample of 316 retirement funds for the “ChoiceIs Yours” scenario includes the variable risk thathas the defined categories Low, Average, and High.

Figure 2: Summary Table of Levels of Risk of Retirement Funds

6 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables

Contingency TableAdding a new dimension, the table below presents thecompleted contingency table after all 316 funds have beentallied. This table shows that there are 143 retirement fundsthat have the fund type Growth and risk level Low. Insummarizing all six joint responses, the table reveals thatGrowth and Low is the most frequent joint response in thesample of 316 retirement funds.

Figure 3: Contingency Table Displaying Fund Type and Risk Level

7 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables

Contingency TableFigure 4: Contingency Table Displaying Fund Type and Risk Level, Based

on Percentage of Overall Total

Figure 5: Contingency Table Displaying Fund Type and Risk Level, Based

on Percentage of Row Total

8 / 51

Page 3: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables

Contingency Table

Figure 6: Contingency Table Displaying Fund Type and Risk Level, Based

on Percentage of Column Total

9 / 51

Organizing NumericalVariables

10 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

Tables Used For Organizing Numerical Data

11 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

Array - Unordered vs. Ordered

12 / 51

Page 4: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

Organizing Numerical Data:Frequency Distribution 1/2

The frequency distribution is a summary tablein which the data are arranged into numericallyordered classes.You must give attention to selecting theappropriate number of class groupings for thetable, determining a suitable width of a classgrouping, and establishing the boundaries ofeach class grouping to avoid overlapping.

13 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

Organizing Numerical Data:Frequency Distribution 2/2

The number of classes depends on the numberof values in the data. With a larger number ofvalues, typically there are more classes. Ingeneral, a frequency distribution should haveat least 5 but no more than 15 classes.To determine the width of a class interval, youdivide the range (Highest value–Lowest value)of the data by the number of class groupingsdesired.

14 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

Frequency Distribution,Example Meal Cost

Figure 7: Frequency Distributions of the Meal Costs for 50 City Restaurants

and 50 Suburban Restaurants

15 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

Frequency Distribution,Example Returns

Figure 8: Frequency Distributions of the One-Year Return Percentage for

growth and Value Funds

16 / 51

Page 5: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

The Relative Frequency Distribution and thePercentage Distribution, Example Meal Cost

Figure 9: Relative Frequency Distributions and Percentage Distributions of

the Meal Costs at City and Suburban Restaurants

17 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

The Relative Frequency Distribution and thePercentage Distribution, Example Returns

Figure 10: Relative Frequency Distributions and Percentage Distributions

of the One-Year Return Percentage for growth and Value Funds

18 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

The Cumulative Distribution, Example Meal

Figure 11: Developing the Cumulative Percentage Distribution for City

Restaurant Meal Costs

19 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

The Cumulative Distribution, Example Meal

Figure 12: Cumulative Percentage Distributions of the Meal Costs for City

and Suburban Restaurants

20 / 51

Page 6: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables

The Cumulative Distribution, ExampleReturns

Figure 13: Cumulative Percentage Distributions of the One-Year Return

Percentages for growth and Value Funds

21 / 51

Visualizing CategoricalVariables

22 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Categorical Variables

Visualizing Categorical Data ThroughGraphical Displays

23 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Categorical Variables

Bar Chart And Pie Chart

Figure 14: excel bar chart (left) and pie chart (right) for reasons for shopping

online

24 / 51

Page 7: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Categorical Variables

Bar Chart

Figure 15: Reviewing below bar chart you see that low risk is the largest

category, followed by average risk. Very few of the funds have high risk.

25 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Categorical Variables

Pie Chart

Figure 16: Reviewing below pie chart you see that more than two-thirds of

the funds are low risk, about 30% are average risk, and only about 4% are

high risk.

26 / 51

Visualizing NumericalVariables

27 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Numerical Variables

The Stem-and-Leaf Display

DefinitionA stem-and-leaf display visualizes data bypresenting the data as one or more row-wise stemsthat represent a range of values. In turn, each stemhas one or more leaves that branch out to the rightof their stem and represent the values found in thatstem. For stems with more than one leaf, the leavesare arranged in ascending order.

28 / 51

Page 8: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Numerical Variables

The Stem-and-Leaf Display - ExampleSuppose you collect the following meal costs (in $)for 15 classmates who had lunch at a fast-foodrestaurant:7.42 6.29 5.83 6.50 8.34 9.51 7.10 6.80 5.90 4.896.50 5.52 7.90 8.30 9.60

29 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Numerical Variables

The Histogram

DefinitionA histogram visualizes data as a vertical bar chartin which each bar represents a class interval from afrequency or percentage distribution. In ahistogram, you display the numerical variable alongthe horizontal (X) axis and use the vertical (Y) axisto represent either the frequency or the percentageof values per class interval.

30 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Numerical Variables

The Histogram - Example

Figure 17: Frequency histograms for meal costs at city and suburban

restaurants

31 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Numerical Variables

The Histogram - Example

Figure 18: Excel frequency histograms for the one-year return percentages

for the growth and value funds

32 / 51

Page 9: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Visualizing Two NumericalVariables

33 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables

The Scatter Plot

DefinitionA scatter plot explores the possible relationshipbetween two numerical variables by plotting thevalues of one numerical variable on the horizontal,or X, axis and the values of a second numericalvariable on the vertical, or Y, axis.

34 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables

The Scatter Plot - Example

Figure 19: Revenues and Values for NBA Teams

35 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables

The Scatter Plot - Example

Figure 20: Scatter plot of revenue and value for NBA teams

36 / 51

Page 10: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables

The Time-Series Plot

DefinitionA time-series plot plots the values of a numericalvariable on the Y axis and plots the time periodassociated with each numerical value on the X axis.A time-series plot can help you visualize trends indata that occur over time.

37 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables

The Time-Series Plot - Example

Figure 21: Movie Revenues (in $billions) from 1995 to 2013

38 / 51

Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables

The Time-Series Plot - Example

Figure 22: Time-series plot of movie revenue per year from 1995 to 2013s

39 / 51

The Challenge in Organizingand Visualizing Variables

40 / 51

Page 11: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Obscuring Data

Figure 23: Information overload, presenting too many details, can obscure

data and hamper decision making

41 / 51

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Creating False Impressions, 1/2

Figure 24: Left: One-Year Percentage Change in Year-to-Year Sales for the

Month of April; Right: Percentage Change for Three Consecutive Years

42 / 51

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Creating False impressions, 2/2

Figure 25: Market shares of companies in “two” industries

43 / 51

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Chartjunk, 1/3

Figure 26: Two visualizations of market share of soft drinks

44 / 51

Page 12: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Chartjunk, 2/3

Figure 27: Two visualizations of Australian wine exports to the United

States, in millions of gallons

45 / 51

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Chartjunk, 3/3

Figure 28: Visualization of the amount of land planted with grapes for the

wine industry

46 / 51

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Graphical Errors, 1/3

Figure 29: No Relative Basis

47 / 51

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Graphical Errors, 2/3

Figure 30: Compressing the Vertical Axis

48 / 51

Page 13: Organizing Categorical Variables - · PDF fileStatistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables Organizing Categorical Data: Summary Table

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Graphical Errors, 3/3

Figure 31: No Zero Point on the Vertical Axis

49 / 51

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Best Practices for ConstructingVisualizations

Use the simplest possible visualizationInclude a titleLabel all axesInclude a scale for each axis if the chartcontains axesBegin the scale for a vertical axis at zeroUse a constant scaleAvoid 3D effectsAvoid chartjunk is flawed.

50 / 51

Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables

Summary

Methods to organize variables.Methods to visualize variables.Methods to organize or visualize more thanone variable at the same time.Principles of proper visualizations.

51 / 51