© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Unlocking Open Data in
Public Sector Account Manager, DACH
Loft + Lab Munich
11th November 2016
What this session is about
1) Open Data: Concepts, Examples & Trends
2) AWS as a Platform for Open Data
3) Case Study: Provide Open Data on AWS
4) Case Study: Use Open Data on AWS
Open Data: Concepts,
Examples & Trends
“Open data is data that can be
freely used, shared and built-
on by anyone, anywhere, for
Definition by Open Knowledge Foundation, 2013
The 8 Open Government Data Principles
5. Machine processable
8. License-free OGD Principleshttps://opengovdata.org/
Why Open Data?
2. Releasing social and commercial value
3. Participation and engagement
McKinsey report from October 2013
EC study from November 2015: Creating Value through Open Data: Study on the Impact of Re-use of Public Data Resources
10 Open Data Portal of Deutsche Bahn http://data.deutschebahn.com/
AWS as a Platform
for Open Data
Why does AWS care about Open Data?
Many of our commercial sector customers rely on quality open data as much as they
rely on our cloud infrastructure services.
Many of our public sector customers use AWS to make their data available to a global
community of researchers, entrepreneurs, students, and fellow government agencies.
Sharing data makes it accessible to a large and growing community of
researchers, entrepreneurs, and enterprises.
The cloud allows users from anywhere to take their algorithms to
data rather than downloading data to their computing resources.
Data Acquisition in the Cloud
Open data as a platform
Data Creation Data Enrichment
Data at Rest
Lower cost of knowledge
A Rich Set of Programmable Services
Analytics App Services Developer Tools and Operations Mobile Services
Core Services CDNCompute(VMs, Auto-Scaling
and Load Balancing)
(Relational, NoSQL, and Caching)
(VPC, DX, and DNS)
(Object, Block, and
Infrastructure Availability Zones
Providing Open Data on AWS
Case Study: Transport for London
25 graphics from TfL, October 2016
Why open data at TfL?
Optimal use of transport network
The API supports all the data
requirements of the TfL
Every data-driven aspect of
the website (including maps) is
powered by the unified API.
Some of the multi-modal core
datasets included and available to
Journey Planning (current and
Status (current and future)
Disruptions (current) and Planned
(instant and websockets)
Embarkation points and facilities
Routes and lines (topology and
Almost 500 apps produced.
Playground for innovation.
Apps by public transportation
authorities: MVV, MVG, DB.
No info how to access data, lacks
29 graphic from TfL, October 2016
Outcomes Cloud Benefits
Customers save time,
New jobs and investments
in startup and tech
Usage of data has since
Data consolidation and
Pay for what you use
Lower maintenance costs
Automation and consistency
Blue/green deployment –
30 mwd advisors cased study https://d0.awsstatic.com/analyst-reports/MWD_AWS_TFL_Case_Study_Sept_2015.pdf
Solutions for providing Open Data on AWS
Open data platforms
Using Open Data on AWS
Public Data Sets on AWS
Several high-value datasets are available for anyone to access for free on AWS.
Landsat on AWS3K Rice Genome NEXRAD on AWS
More available Public Datasets on AWS…
GDELT: Over a quarter-billion records monitoring the world's broadcast, print, and web news from
nearly every corner of every country, updated daily..
IRS 990 Filings on AWS: Machine-readable data from certain electronic 990 forms filed with the IRS
from 2011 to present
Common Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pages
TCGA on AWS: Raw and processed genomic, transcriptomic, and epigenomic data from The Cancer
Genome Atlas (TCGA) available to qualified researchers via the Cancer Genomics Cloud
ICGC on AWS: Whole genome sequence data available to qualified researchers via The International
Cancer Genome Consortium (ICGC)
1000 Genomes Project: A detailed map of human genetic variation
Multimedia Commons: A collection of nearly 100M images and videos with audio and visual features
Google Books Ngrams: A dataset containing Google Books n-gram corpuses
A list of other Public Datasets is available here.
Accessing and processing Landsat data
What is Landsat on AWS?
How to access Landsat on AWS?
How to use Landsat on AWS?
Landsat on AWS
We have committed to make up to 1
petabyte of Landsat imagery readily
available as objects on Amazon S3.
All Landsat 8 scenes from 2015 and
2016 are available, along with a
selection of cloud-free scenes from
2013 and 2014.
All new Landsat 8 scenes are made
available each day (~700 per day),
often within hours of production.
Landsat on AWS
Landsat on AWS makes each band of
each scene readily available as objects
on Amazon S3. Data can be accessed
programmatically via HTTP and quickly
deployed to any of our products for
analysis and processing.
Users do not need to worry about local
storage and have access to virtually
unlimited computing power on demand.
Undifferentiated heavy lifting