© 2016, Amazon Web Services, Inc. or its Affiliates. All rights
Unlocking Open Data in the Cloud
Grischa GundelsweilerPublic Sector Account Manager, DACHLoft +
Lab Munich11th November 2016
What this session is about
1) Open Data: Concepts, Examples & Trends2) AWS as a
Platform for Open Data3) Case Study: Provide Open Data on AWS4)
Case Study: Use Open Data on AWS
Open Data: Concepts, Examples & Trends
“Open data is data that can be freely used, shared and built-on
by anyone, anywhere, for any purpose.”
Definition by Open Knowledge Foundation,
The 8 Open Government Data Principles
1. Complete2. Primary3. Timely4. Accessible5. Machine
processable6. Non-discriminatory7. Non-proprietary8. License-free
Why Open Data?
2. Releasing social and commercial value
3. Participation and engagement
McKinsey report from October
9EC study from November 2015: Creating Value through Open Data:
Study on the Impact of Re-use of Public Data Resources
10 Open Data Portal of Deutsche Bahn
AWS as a Platformfor Open Data
Why does AWS care about Open Data?
Many of our commercial sector customers rely on quality open
data as much as they rely on our cloud infrastructure services.
Many of our public sector customers use AWS to make their data
available to a global community of researchers, entrepreneurs,
students, and fellow government agencies.
Sharing data makes it accessible to a large and growing
community of researchers, entrepreneurs, and enterprises.
The cloud allows users from anywhere to take their algorithms to
data rather than downloading data to their computing resources.
Data Acquisition in the Cloud
Open data as a platform
Data Creation Data Enrichment
Data at Rest(Object storage)
Lower cost of knowledge(Efficiency)
A Rich Set of Programmable Services
Key Management and Storage
Resource and Usage Auditing
Analytics App Services Developer Tools and Operations Mobile
Resource Templates Identity
Queuing and Notifications
Core Services CDNCompute(VMs, Auto-Scaling and Load
Databases(Relational, NoSQL, and Caching)
Networking(VPC, DX, and DNS)
Storage(Object, Block, and Archival)
Infrastructure Availability ZonesPoints of Presence
Sharing and Collaboration
Technical and Business Support
Security and Pricing Reports
Training and Certification
Providing Open Data on AWS
Case Study: Transport for London
25 graphics from TfL, October 2016
Why open data at TfL?
TransparencyReachOptimal use of transport networkEconomic
The API supports all the data requirements of the TfLwebsite.
Every data-driven aspect of the website (including maps) is powered
by the unified API.
Some of the multi-modal core datasets included and available to
Journey Planning (current and future)Status (current and
future)Disruptions (current) and Planned works
(future)Arrival/departure predictions (instant and
websockets)TimetablesEmbarkation points and facilitiesRoutes and
lines (topology and geographical)Fares
Almost 500 apps produced.Playground for innovation.Improving
Apps by public transportationauthorities: MVV, MVG, DB. No info
how to access data, lacksdocumentation.
29 graphic from TfL, October 2016
Outcomes Cloud Benefits
Customers save time, economic benefitsNew jobs and investmentsin
startup and techecosystemUsage of data has sincedoubledData
Pay for what you useLower maintenance costsElasticityAutomation
and consistencyBlue/green deployment –zero downtimeHighly
30 mwd advisors cased study
Solutions for providing Open Data on AWS
Using Open Data on AWS
Public Data Sets on AWSSeveral high-value datasets are available
for anyone to access for free on AWS. Examples include:
Landsat on AWS3K Rice Genome NEXRAD on AWS
More available Public Datasets on AWS…
GDELT: Over a quarter-billion records monitoring the world's
broadcast, print, and web news from nearly every corner of every
country, updated daily..IRS 990 Filings on AWS: Machine-readable
data from certain electronic 990 forms filed with the IRS from 2011
to presentCommon Crawl Corpus: A corpus of web crawl data composed
of over 5 billion web pagesTCGA on AWS: Raw and processed genomic,
transcriptomic, and epigenomic data from The Cancer Genome Atlas
(TCGA) available to qualified researchers via the Cancer Genomics
CloudICGC on AWS: Whole genome sequence data available to qualified
researchers via The International Cancer Genome Consortium
(ICGC)1000 Genomes Project: A detailed map of human genetic
variationMultimedia Commons: A collection of nearly 100M images and
videos with audio and visual features and annotationsGoogle Books
Ngrams: A dataset containing Google Books n-gram corpusesA list of
other Public Datasets is available here.
Accessing and processing Landsat data
What is Landsat on AWS?
How to access Landsat on AWS?
How to use Landsat on AWS?
Landsat on AWS
We have committed to make up to 1 petabyte of Landsat imagery
readily available as objects on Amazon S3.
All Landsat 8 scenes from 2015 and 2016 are available, along
with a selection of cloud-free scenes from 2013 and 2014.
All new Landsat 8 scenes are made available each day (~700 per
day), often within hours of production.
Landsat on AWS
Landsat on AWS makes each band of each scene readily available
as objects on Amazon S3. Data can be accessed programmatically via
HTTP and quickly deployed to any of our products for analysis and
Users do not need to worry about local storage and have access
to virtually unlimited computing power on demand.
Undifferentiated heavy lifting
We use GDAL to add “internal tiling” on each Landsat on AWS
tiff, which allows developers to use HTTP range gets to access
specific portions of each scene.
This allows people to only access the data they need when they
need it. Standard tiff
objectInternal tiled tiff
1 2 3 4 5 67 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 3031 32 33 34 35 36
1 2 34 5 6
7 8 9
10 11 1213 14 15
16 17 18
19 20 2122 23 2425 26 27
28 29 3031 32 3334 35 36
Shortwave infraredUrban areas
Think of URLs instead of copiesWellington, New
Using Landsat on S3
Landsat on Amazon
ArcGIS Server on
AWS US West Oregon Region
reliable, performant data access
Usage in the first year:Over 400,000 scenes available
Over 1 billion hits globally
Used for new product development by:
Landsat on AWS
Small invest, big impact:
Public dataset hosted in FRA
Apps for agriculture, disaster relief, vegetation monitoring,
property taxation, ..
Used for new product development by:
Sentinel-2 on AWS
Depending on your role, your goalsUse open data in your projects
/ your organisationProvide open data from your organisationBuild a
new business on open data
AWS offersTechnology platform that constantly evolvesEnablement
through workshops, training, ProServCustomer and partner ecosystem
to connect and build