Eurostat
THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION
Data visualization in Python
Martijn Tennekes
Eurostat
Outline
• Overview data visualization in Python
• ggplot2
• tmap
• tabplot
2
Eurostat
Which packages/functions
• Standard charts (e.g. line chart, bar chart, scatter plot):
• Matplotlib
• Pandas
• Seaborn
• ggplot
• Altair
• Thematic maps
• Folium
• Other visualisations
• Bokeh (interactive plots)
3
Eurostat
ggplot
• Based on one of the most popular R package (ggplot2) for academic publications
• Based on the Grammar of Graphics (Wilkinson, 2005)
• Charts are build up according to this grammar:
• data• mapping / aestetics• geoms• stats • scales• coord• Facets
• Pandas DataFrames are used natively in ggplot.
4
Eurostat
ggplot and qplot
Shortcut function: qplot (quick plot):
5
ggplot(mpg, aes(x = displ, y = cty) ) + geom_point()
qplot(diamonds.carat, diamonds.price)
Data: DataFrame.
Aestatics: x, y, color, fill, shapeGeometry: points
Stacking of layersand transformationswith +
Eurostat
Aesthetics
6
ggplot(aes(x='carat', y='price', color='clarity'), diamonds) + geom_point()
Mapping of data to visual attributes of geometric objects:
– Position: x,y– Color: color– Shape: shape
Eurostat
7
Aesthetics
Mapping of data to visual attributes of geometric objects:
– Position: x,y– Color: color– Shape: shape
ggplot(aes(x='carat', y='price', shape="cut"), diamonds) + geom_point()
Eurostat
Geom
8
ggplot(mpg, aes(x = displ, y = cty)) + geom_point() + geom_line()
• Geometric objects:
• Points, lines, polygons, …
• Functions start with “geom_”
• Also margins:
• geom_errorbar(), geom_pointrange(), geom_linerange().
• Note: they require the aesthetics ymin and ymax.
Eurostat
Stat
• stat_smooth() and stat_density() enable statistical transformation
• Most geoms have default stat (and the other way round)
• geom and stat form a layer
• One or more layers form a plot
9
Eurostat
Scales (and axes)
• A scale indicates how the value of a variable scales with an aesthetic
• Therefore:• A scale belongs to one aesthetic (x, y, color, fill, etc.)
• The axis is an essential part of a scale
• With scale_XXX, the scales and axes can be adjusted (XXX stands for the a combination of aesthetic and type of scale, e.g. scale_fill_gradient)
10
Eurostat
Coord
• A chart is drawn in a coordinate system. This can be transformed.
• A pie chart has a polar coordinate system.
11
df = pd.DataFrame({"x": np.arange(100)}) df['y'] = df.x * 10 # polar coordsp = ggplot(df, aes(x='x', y='y')) + geom_point() + coord_polar() print(p)
Eurostat
Facets
• With facets, small multiples are created.
• Each facet shows a subset of the data.
12
ggplot(diamonds, aes(x='price')) + \geom_histogram() + \facet_grid("cut")
Eurostat
Facets example
13
ggplot(chopsticks, aes(x='chopstick_length', y='food_pinching_effeciency')) + \geom_point() + \geom_line() + \scale_x_continuous(breaks=[150, 250, 350]) + \facet_wrap("individual")
Eurostat
14
Facets example 2
ggplot(diamonds, aes(x="carat", y="price", color="color", shape="cut")) + geom_point() + facet_wrap("clarity")
Eurostat
ggplot tips
• You can annotate plots
• Assign a plot to a variable, for instance g:
• The function save saves the plot to the desired format:
15
ggplot(mtcars, aes(x='mpg')) + geom_histogram() + \xlab("Miles per Gallon") + ylab("# of Cars")
g = ggplot(mpg, aes(x = displ, y = cty)) + geom_point()
g.save(“myimage.png”)
Eurostat
Folium: Thematic maps
• A thematic map is a visualization where statistical information with a spatial component is shown.
• Other libraries are: Basemap, Cartopy, Iris
• Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library.
• Manipulate your data in Python, then visualize it in on a Leaflet map via Folium.
16
Eurostat
Folium features
• Built-in tilesets from OpenStreetMap, MapQuest Open, MapQuest Open Aerial, Mapbox, and Stamen
• Supports custom tilesets with Mapbox or Cloudmade API keys.
• Supports GeoJSON and TopoJSON overlays,
• as well as the binding of data to those overlays to create choropleth maps with color-brewer color schemes.
17
Eurostat
Basic Maps
18folium.Map(location=[50.89, 5.99], zoom_start=14)
Eurostat
Basic maps
19folium.Map(location=[50.89, 5.99], zoom_start=14, tiles='Stamen Toner')
Eurostat
GeoJSON/TopoJSON Overlays
20
ice_map = folium.Map(location=[-59, -11], tiles='Mapbox Bright', zoom_start=2)ice_map.geo_json(geo_path=geo_path)ice_map.geo_json(geo_path=topo_path, topojson='objects.antarctic_ice_shelf')ice_map.create_map(path='ice_map.html')
Eurostat
Choropleth maps
21
map = folium.Map(location=[48, -102], zoom_start=3)map.choropleth(geo_path=state_geo, data=state_data,
columns=['State', 'Unemployment'], key_on='feature.id', fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2, legend_name='Unemployment Rate (%)')
Eurostat
The Grammar of Graphics
22
Defaults• Data• Aesthetics
Coordinates
Scales
Layers• Data• Aesthetics• Geometry• Statistics• Position
Facets
Shape• Coordinates and topology.
Spatial types:◊ Polygons• Points⁄ Lines# Raster
• Data• Map projection• Bounding box
Layers• Aesthetics• Statistics• Scale
ggplot2Layered Grammar of Graphics
Facets
Group
1
1 or more
tmapLayered Grammar of Thematic Maps
Eurostat
Creating a choropleth
23
tm_shape(NLD_muni,projection=“rd”) +
tm_fill()
Eurostat
Creating a choropleth (2)
24
tm_shape(NLD_muni,projection=“rd”) +
tm_fill(“blue”)
Eurostat
Creating a choropleth (3)
25
tm_shape(NLD_muni,projection=“rd”) +
tm_fill(“population”)
Eurostat
Creating a choropleth (4)
26
tm_shape(NLD_muni,projection=“rd”) +
tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +
Eurostat
Creating a choropleth (5)
27
tm_shape(NLD_muni,projection=“rd”) +
tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +
tm_borders(alpha = .5) +
Eurostat
Creating a choropleth (6)
28
tm_borders(lwd=2) +
tm_shape(NLD_muni,projection=“rd”) +
tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +
tm_borders(alpha = .5) +
tm_shape(NLD_prov) +
Eurostat
Creating a choropleth (7)
29tm_text(“name”, size = .8, shadow = TRUE,bg.color = "white", bg.alpha = .25)
tm_borders(lwd = 2) +
tm_shape(NLD_muni,projection=“rd”) +
tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +
tm_borders(alpha = .5) +
tm_shape(NLD_prov) +
Eurostat
30
Creating a choropleth with qplot
qtm(NLD_muni) qtm(NLD_muni,fill="population",convert2density=TRUE)
qtm(NLD_muni,fill="population",convert2density=TRUE,fill.style="kmeans",fill.title="Population per km2") +
qtm(NLD_prov, fill=NULL,text="name", text.size=.7,borders.lwd=2,text.bg.color="white",text.bg.alpha=.25, shadow=TRUE)
Eurostat
31
Example: choropleth
tm_shape(World) +tm_fill("income_grp", palette="-Blues",title="Income classification") +
tm_borders() +tm_text("iso_a3", size="AREA") +
tm_format_World()
Eurostat
Example: bubble map
32
tm_shape(World) +tm_fill("grey70") +
tm_shape(metro) +tm_bubbles("X2010", col = "growth",border.col = "black", border.alpha = .5,
style="fixed", breaks=c(-Inf, 0, 2, 4, 6, Inf),
palette="-RdYlBu",title.size="Metro population",
title.col="Growth rate (%)") + tm_format_World()
Eurostat
Example: choropleth + bubble map
33
tm_shape(World) +tm_fill("income_grp", palette="-Blues",
contrast = .5,title="Income class",) +
tm_borders() + tm_text("iso_a3", size="AREA") +tm_shape(metro) +tm_bubbles("X2010", col = "growth",border.col = "black", border.alpha = .5,
style="fixed", breaks=c(-Inf, 0, 2, 4, 6, Inf),
palette="-RdYlBu",title.size="Metro population",
title.col="Growth rate (%)") + tm_format_World(bg.color = “gray80”)
Eurostat
Example: raster map
34
pal8 <- c("#33A02C", "#B2DF8A", "#FDBF6F", "#1F78B4", "#999999", "#E31A1C", "#E6E6E6", "#A6CEE3")tm_shape(land, ylim = c(-88,88)) +tm_raster("cover_cls", palette = pal8, title="Global
Land Cover") +tm_shape(World) +tm_borders() +
tm_format_World(legend.bg.color = "white", legend.bg.alpha=.2,
legend.frame="gray50", legend.width=.2)
Eurostat
Example: raster map (with dotmap)
35
pal8 <- c("#33A02C", "#B2DF8A", "#FDBF6F", "#1F78B4", "#999999", "#E31A1C", "#E6E6E6", "#A6CEE3")tm_shape(land, ylim = c(-88,88)) +tm_raster("cover_cls", palette = pal8, title="Global
Land Cover") +tm_shape(World) +tm_borders() +
qtm(metro, dot.color=“E31A1C”) +tm_format_World(legend.bg.color = "white", legend.bg.alpha=.2,
legend.frame="gray50", legend.width=.2)
Eurostat
Example: classic map
36
... + style_classic()
Eurostat
Small multiples
37
tm_shape(NLD_muni) +
tm_polygons("population",style="kmeans",convert2density = TRUE) +
tm_facets(by="province",free.coords=TRUE,drop.shapes=TRUE) +
tm_layout(legend.show = FALSE,outer.margins=0)
Eurostat
38
OpenStreetMap layer
osm_NLD <- read_osm(NLD_muni)
qtm(osm_NLD) +tm_shape(NLD_muni) +tm_polygons("population", convert2density=TRUE,style="kmeans", alpha=.7, palette="Reds")
Eurostat
Interactive maps
• All maps can be made interactive.
• tmap contains two modes:
plot: static maps, shown in graphics device window; can be exported to png, jpg, pdf, etc.
view: interactive maps, shown in the viewing window or in the browser; can be exported to standalone HTML files
39
# switch to plot mode:tmap_mode(“plot”)
# switch to view mode:tmap_mode(“view”)
# toggle between modes:ttm()
Eurostat
Some convenient functions
40
Read ESRI shape file:
Append data:
Set map projection:
Crop shapes:
Create animation
Save to image
NLD_muni <- read_shape(“NLD_2014_muni.shp”)
NLD_muni <- set_projection(NLD_muni, “longlat”)
NLD_muni <- append_data(NLD_muni, NLD_data,key.shp=“code”, key.data=“muni_code”)
NLD_twitter <- crop(twitter, NLD_muni)
tm_twitter <- tm_shape(NLD_muni) + tm_polygons() + tm_shape(NLD_twitter) + tm_dots()
save_tmap(tm_twitter, filename = “twitter.png”, width = 600, height = 800)
animation_tmap(...)
Eurostat
Animation
41
Day Time Population per municipality based on mobile phone network data
tm_dtp <-tm_shape(dtp) +
tm_polygons(paste0(“dtp”,0:23), ...) +tm_shape(NLD_prov) +
tm_borders(lwd = 2)tm_credits(“...”) +tm_facets(ncol=1, nrow=1)
animation_tmap(tm_dtp, filename = “dtp.gif”, width = 600, height = 800, delay = 40)
Eurostat
42
Tableplot
library(tabplot)
# load datalibrary(ggplot2)data(diamonds)
tableplot(diamonds)
• Tableplots can be created with the package tabplot
• It works well with very large tabular data (dozen of variables, millions of works). Internally, it makes use of the ff and ffbase packages which store data on disk rather than in memory.
• Speed is ensured by preprocessing the data.
• Standard deviations can be shown for numeric variables.
Eurostat
43
Tableplot Dutch virtual census test data from 2008
Eurostat
Summary
• R is very suitable for data visualization
• The ggplot2 package is the standard for non-spatial charts
• The tmap package is a package in the same style for spatial data visualization.
• The tabplot package can be used to visualize large tabular data.
44
Eurostat
References
• http://yhat.github.io/ggplot/
• https://folium.readthedocs.io/en/latest/
45