45
Eurostat THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Data visualization in Python Martijn Tennekes

Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

  • Upload
    buidan

  • View
    250

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

Data visualization in Python

Martijn Tennekes

Page 2: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Outline

• Overview data visualization in Python

• ggplot2

• tmap

• tabplot

2

Page 3: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Which packages/functions

• Standard charts (e.g. line chart, bar chart, scatter plot):

• Matplotlib

• Pandas

• Seaborn

• ggplot

• Altair

• Thematic maps

• Folium

• Other visualisations

• Bokeh (interactive plots)

3

Page 4: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

ggplot

• Based on one of the most popular R package (ggplot2) for academic publications

• Based on the Grammar of Graphics (Wilkinson, 2005)

• Charts are build up according to this grammar:

• data• mapping / aestetics• geoms• stats • scales• coord• Facets

• Pandas DataFrames are used natively in ggplot.

4

Page 5: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

ggplot and qplot

Shortcut function: qplot (quick plot):

5

ggplot(mpg, aes(x = displ, y = cty) ) + geom_point()

qplot(diamonds.carat, diamonds.price)

Data: DataFrame.

Aestatics: x, y, color, fill, shapeGeometry: points

Stacking of layersand transformationswith +

Page 6: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Aesthetics

6

ggplot(aes(x='carat', y='price', color='clarity'), diamonds) + geom_point()

Mapping of data to visual attributes of geometric objects:

– Position: x,y– Color: color– Shape: shape

Page 7: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

7

Aesthetics

Mapping of data to visual attributes of geometric objects:

– Position: x,y– Color: color– Shape: shape

ggplot(aes(x='carat', y='price', shape="cut"), diamonds) + geom_point()

Page 8: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Geom

8

ggplot(mpg, aes(x = displ, y = cty)) + geom_point() + geom_line()

• Geometric objects:

• Points, lines, polygons, …

• Functions start with “geom_”

• Also margins:

• geom_errorbar(), geom_pointrange(), geom_linerange().

• Note: they require the aesthetics ymin and ymax.

Page 9: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Stat

• stat_smooth() and stat_density() enable statistical transformation

• Most geoms have default stat (and the other way round)

• geom and stat form a layer

• One or more layers form a plot

9

Page 10: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Scales (and axes)

• A scale indicates how the value of a variable scales with an aesthetic

• Therefore:• A scale belongs to one aesthetic (x, y, color, fill, etc.)

• The axis is an essential part of a scale

• With scale_XXX, the scales and axes can be adjusted (XXX stands for the a combination of aesthetic and type of scale, e.g. scale_fill_gradient)

10

Page 11: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Coord

• A chart is drawn in a coordinate system. This can be transformed.

• A pie chart has a polar coordinate system.

11

df = pd.DataFrame({"x": np.arange(100)}) df['y'] = df.x * 10 # polar coordsp = ggplot(df, aes(x='x', y='y')) + geom_point() + coord_polar() print(p)

Page 12: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Facets

• With facets, small multiples are created.

• Each facet shows a subset of the data.

12

ggplot(diamonds, aes(x='price')) + \geom_histogram() + \facet_grid("cut")

Page 13: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Facets example

13

ggplot(chopsticks, aes(x='chopstick_length', y='food_pinching_effeciency')) + \geom_point() + \geom_line() + \scale_x_continuous(breaks=[150, 250, 350]) + \facet_wrap("individual")

Page 14: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

14

Facets example 2

ggplot(diamonds, aes(x="carat", y="price", color="color", shape="cut")) + geom_point() + facet_wrap("clarity")

Page 15: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

ggplot tips

• You can annotate plots

• Assign a plot to a variable, for instance g:

• The function save saves the plot to the desired format:

15

ggplot(mtcars, aes(x='mpg')) + geom_histogram() + \xlab("Miles per Gallon") + ylab("# of Cars")

g = ggplot(mpg, aes(x = displ, y = cty)) + geom_point()

g.save(“myimage.png”)

Page 16: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Folium: Thematic maps

• A thematic map is a visualization where statistical information with a spatial component is shown.

• Other libraries are: Basemap, Cartopy, Iris

• Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library.

• Manipulate your data in Python, then visualize it in on a Leaflet map via Folium.

16

Page 17: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Folium features

• Built-in tilesets from OpenStreetMap, MapQuest Open, MapQuest Open Aerial, Mapbox, and Stamen

• Supports custom tilesets with Mapbox or Cloudmade API keys.

• Supports GeoJSON and TopoJSON overlays,

• as well as the binding of data to those overlays to create choropleth maps with color-brewer color schemes.

17

Page 18: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Basic Maps

18folium.Map(location=[50.89, 5.99], zoom_start=14)

Page 19: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Basic maps

19folium.Map(location=[50.89, 5.99], zoom_start=14, tiles='Stamen Toner')

Page 20: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

GeoJSON/TopoJSON Overlays

20

ice_map = folium.Map(location=[-59, -11], tiles='Mapbox Bright', zoom_start=2)ice_map.geo_json(geo_path=geo_path)ice_map.geo_json(geo_path=topo_path, topojson='objects.antarctic_ice_shelf')ice_map.create_map(path='ice_map.html')

Page 21: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Choropleth maps

21

map = folium.Map(location=[48, -102], zoom_start=3)map.choropleth(geo_path=state_geo, data=state_data,

columns=['State', 'Unemployment'], key_on='feature.id', fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2, legend_name='Unemployment Rate (%)')

Page 22: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

The Grammar of Graphics

22

Defaults• Data• Aesthetics

Coordinates

Scales

Layers• Data• Aesthetics• Geometry• Statistics• Position

Facets

Shape• Coordinates and topology.

Spatial types:◊ Polygons• Points⁄ Lines# Raster

• Data• Map projection• Bounding box

Layers• Aesthetics• Statistics• Scale

ggplot2Layered Grammar of Graphics

Facets

Group

1

1 or more

tmapLayered Grammar of Thematic Maps

Page 23: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Creating a choropleth

23

tm_shape(NLD_muni,projection=“rd”) +

tm_fill()

Page 24: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Creating a choropleth (2)

24

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“blue”)

Page 25: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Creating a choropleth (3)

25

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“population”)

Page 26: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Creating a choropleth (4)

26

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +

Page 27: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Creating a choropleth (5)

27

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +

tm_borders(alpha = .5) +

Page 28: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Creating a choropleth (6)

28

tm_borders(lwd=2) +

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +

tm_borders(alpha = .5) +

tm_shape(NLD_prov) +

Page 29: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Creating a choropleth (7)

29tm_text(“name”, size = .8, shadow = TRUE,bg.color = "white", bg.alpha = .25)

tm_borders(lwd = 2) +

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +

tm_borders(alpha = .5) +

tm_shape(NLD_prov) +

Page 30: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

30

Creating a choropleth with qplot

qtm(NLD_muni) qtm(NLD_muni,fill="population",convert2density=TRUE)

qtm(NLD_muni,fill="population",convert2density=TRUE,fill.style="kmeans",fill.title="Population per km2") +

qtm(NLD_prov, fill=NULL,text="name", text.size=.7,borders.lwd=2,text.bg.color="white",text.bg.alpha=.25, shadow=TRUE)

Page 31: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

31

Example: choropleth

tm_shape(World) +tm_fill("income_grp", palette="-Blues",title="Income classification") +

tm_borders() +tm_text("iso_a3", size="AREA") +

tm_format_World()

Page 32: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Example: bubble map

32

tm_shape(World) +tm_fill("grey70") +

tm_shape(metro) +tm_bubbles("X2010", col = "growth",border.col = "black", border.alpha = .5,

style="fixed", breaks=c(-Inf, 0, 2, 4, 6, Inf),

palette="-RdYlBu",title.size="Metro population",

title.col="Growth rate (%)") + tm_format_World()

Page 33: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Example: choropleth + bubble map

33

tm_shape(World) +tm_fill("income_grp", palette="-Blues",

contrast = .5,title="Income class",) +

tm_borders() + tm_text("iso_a3", size="AREA") +tm_shape(metro) +tm_bubbles("X2010", col = "growth",border.col = "black", border.alpha = .5,

style="fixed", breaks=c(-Inf, 0, 2, 4, 6, Inf),

palette="-RdYlBu",title.size="Metro population",

title.col="Growth rate (%)") + tm_format_World(bg.color = “gray80”)

Page 34: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Example: raster map

34

pal8 <- c("#33A02C", "#B2DF8A", "#FDBF6F", "#1F78B4", "#999999", "#E31A1C", "#E6E6E6", "#A6CEE3")tm_shape(land, ylim = c(-88,88)) +tm_raster("cover_cls", palette = pal8, title="Global

Land Cover") +tm_shape(World) +tm_borders() +

tm_format_World(legend.bg.color = "white", legend.bg.alpha=.2,

legend.frame="gray50", legend.width=.2)

Page 35: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Example: raster map (with dotmap)

35

pal8 <- c("#33A02C", "#B2DF8A", "#FDBF6F", "#1F78B4", "#999999", "#E31A1C", "#E6E6E6", "#A6CEE3")tm_shape(land, ylim = c(-88,88)) +tm_raster("cover_cls", palette = pal8, title="Global

Land Cover") +tm_shape(World) +tm_borders() +

qtm(metro, dot.color=“E31A1C”) +tm_format_World(legend.bg.color = "white", legend.bg.alpha=.2,

legend.frame="gray50", legend.width=.2)

Page 36: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Example: classic map

36

... + style_classic()

Page 37: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Small multiples

37

tm_shape(NLD_muni) +

tm_polygons("population",style="kmeans",convert2density = TRUE) +

tm_facets(by="province",free.coords=TRUE,drop.shapes=TRUE) +

tm_layout(legend.show = FALSE,outer.margins=0)

Page 38: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

38

OpenStreetMap layer

osm_NLD <- read_osm(NLD_muni)

qtm(osm_NLD) +tm_shape(NLD_muni) +tm_polygons("population", convert2density=TRUE,style="kmeans", alpha=.7, palette="Reds")

Page 39: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Interactive maps

• All maps can be made interactive.

• tmap contains two modes:

plot: static maps, shown in graphics device window; can be exported to png, jpg, pdf, etc.

view: interactive maps, shown in the viewing window or in the browser; can be exported to standalone HTML files

39

# switch to plot mode:tmap_mode(“plot”)

# switch to view mode:tmap_mode(“view”)

# toggle between modes:ttm()

Page 40: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Some convenient functions

40

Read ESRI shape file:

Append data:

Set map projection:

Crop shapes:

Create animation

Save to image

NLD_muni <- read_shape(“NLD_2014_muni.shp”)

NLD_muni <- set_projection(NLD_muni, “longlat”)

NLD_muni <- append_data(NLD_muni, NLD_data,key.shp=“code”, key.data=“muni_code”)

NLD_twitter <- crop(twitter, NLD_muni)

tm_twitter <- tm_shape(NLD_muni) + tm_polygons() + tm_shape(NLD_twitter) + tm_dots()

save_tmap(tm_twitter, filename = “twitter.png”, width = 600, height = 800)

animation_tmap(...)

Page 41: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Animation

41

Day Time Population per municipality based on mobile phone network data

tm_dtp <-tm_shape(dtp) +

tm_polygons(paste0(“dtp”,0:23), ...) +tm_shape(NLD_prov) +

tm_borders(lwd = 2)tm_credits(“...”) +tm_facets(ncol=1, nrow=1)

animation_tmap(tm_dtp, filename = “dtp.gif”, width = 600, height = 800, delay = 40)

Page 42: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

42

Tableplot

library(tabplot)

# load datalibrary(ggplot2)data(diamonds)

tableplot(diamonds)

• Tableplots can be created with the package tabplot

• It works well with very large tabular data (dozen of variables, millions of works). Internally, it makes use of the ff and ffbase packages which store data on disk rather than in memory.

• Speed is ensured by preprocessing the data.

• Standard deviations can be shown for numeric variables.

Page 43: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

43

Tableplot Dutch virtual census test data from 2008

Page 44: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

Summary

• R is very suitable for data visualization

• The ggplot2 package is the standard for non-spatial charts

• The tmap package is a package in the same style for spatial data visualization.

• The tabplot package can be used to visualize large tabular data.

44

Page 45: Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. • tmap contains two modes: plot: static maps, shown in graphics device window; can

Eurostat

References

• http://yhat.github.io/ggplot/

• https://folium.readthedocs.io/en/latest/

45