Upload
katelyn-howard
View
238
Download
0
Tags:
Embed Size (px)
Citation preview
ENV 2006 2.1
Envisioning Information
Lecture 2
Simple Graphs and Charts
Ken BrodlieSchool of ComputingUniversity of Leeds
ENV 2006 2.2
Lecture Outline
• Preliminaries– Definitions– Datatypes
• Simple Data Presentation– Graphs and charts
ENV 2006 2.3
Fundamentals
• Basic Datatypes correspond to different levels of measurement
• Data can be:– Categorical - labels– Numerical – numbers
• Categorical– Nominal
• No sense of order• Apples, oranges,…
– Ordinal• Ordered in sequence• January, February, ..
• Numerical– Continuous
• Real numbers• Height of students in class
– Discrete• Typically whole numbers• Marks in an exam
ENV 2006 2.4
Question
• Give an example for each class in which numbers are involved…
• Categorical - nominal
• Categorical - ordinal
• Numerical – continuous
• Numerical - discrete
ENV 2006 2.5
Exploratory Data Analysis
• Pioneering figure is John Tukey
• New approach to data analysis, heavily based on visualization, as an alternative to classical data analysis
• See wikipedia
• Two stage process:– Exploratory: Search for
evidence using all tools available
– Confirmatory: evaluate strength of evidence using classical data analysis
ENV 2006 2.6
Simple Data Presentation
ENV 2006 2.7
Simple Data Presentation
• Simple data tables are often presented as line graphs, bar graphs, pie charts, dot graphs, histograms…
• Which should we use and when?
ENV 2006 2.8
Line Graph
• Fundamental technique of data presentation
• Used to compare two variables
– X-axis is often the control variable
– Y-axis is the response variable
• Good at:– Showing specific values– Trends– Trends in groups (using
multiple line graphs)
Students participating in sporting activities
MobilePhone use
Note: graph labelling is fundamentalAny criticalcomments here?
ENV 2006 2.9
Simple Representations – Bar Graph
• Bar graph– Presents categorical variables– Height of bar indicates value– Double bar graph allows
comparison– Note spacing between bars– Can be horizontal (when would
you use this?)
Internet use at a school
Number of police officers
Note more space for labels
ENV 2006 2.10
Dot Graph
• Very simple but effective…
• Horizontal to give more space for labelling
ENV 2006 2.11
Pie Chart
• Pie chart summarises a set of categorical/nominal data
• But use with care…
• … too many segments are harder to compare than in a bar chart
Should we have a long lecture?
Favourite movie genres
ENV 2006 2.12
Histograms
• Histograms summarise discrete or continuous data that are measured on an interval scale
• No gaps if variable is continuous
Distribution of salariesin a company
ENV 2006 2.13
Scatter Plot
• Used to present measurements of two variables
• Effective if a relationship exists between the two variables
Car ownership by household income
Example taken fromNIST Handbook –Evidence of strongpositive correlation
ENV 2006 2.14
Scatter Plots in Excel
• The scatter plot is a fundamental tool in Excel
• Chart type XY (Scatter) and subtype Unconnected Points
http://www2.ncsu.edu:8010/ncsu/chemistry/resource/excel/excel.html
ENV 2006 2.15
Regression Line
• Excel allows you to add a linear regression line (trend line)
Remember: correlation does not imply causality… ie a relationshipexists but one is not necessarily causing the other – there may be athird factor?
ENV 2006 2.16
Tukey Sum-Difference Plot
Better understanding of residuals …
ENV 2006 2.17
Box Plots
• In some situations we have, not a single data value at a point, but a number of data values, or even a probability distribution
• When might this occur?
• Tukey proposed the idea of a boxplot to visualize the distribution of values
• For explanation and some history, see:
http://mathworld.wolfram.com/Box-and-WhiskerPlot.html
http://en.wikipedia.org/wiki/Box_plot
M – medianQ1, Q3 – quarrtilesWhiskers –1.5 * interquartile rangeDots - outliers
http://www.upscale.utoronto.ca/GeneralInterest/Harrison/Visualisation/Visualisation.html
Darwin’s plant study
ENV 2006 2.18
Acknowledgement
• Thanks to Statistics Canada – an excellent web site for simple data presentation– http://www.statcan.ca/english/edu/power/toc/contents.htm
ENV 2006 2.19
Exercise for next week
• Understand a bit more about the merits of pie charts and bar graphs
• Create a dataset with roughly equal numbers in each class• Which is best if the task is to discriminate?
ENV 2006 2.20
Exercise for next week
• Over the next week look for examples of basic graphs– In newspapers, magazines or other print media
– On news web sites or other electronic media
• Analyse two examples– One should be a example where you think the use of graphics is
good
– One should be bad
• Be ready next week to present these results to the class…
ENV 2006 2.21
Envisioning Information : Practical Work
Gnuplot
R
Excel