Transcript

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Unlocking Open Data in the Cloud

Grischa GundelsweilerPublic Sector Account Manager, DACHLoft + Lab Munich11th November 2016

What this session is about

1) Open Data: Concepts, Examples & Trends2) AWS as a Platform for Open Data3) Case Study: Provide Open Data on AWS4) Case Study: Use Open Data on AWS

2

Open Data: Concepts, Examples & Trends

3

“Open data is data that can be freely used, shared and built-on by anyone, anywhere, for any purpose.”

Definition by Open Knowledge Foundation, 2013http://blog.okfn.org/2013/10/03/defining-open-data/

The 8 Open Government Data Principles

1. Complete2. Primary3. Timely4. Accessible5. Machine processable6. Non-discriminatory7. Non-proprietary8. License-free OGD Principles

https://opengovdata.org/

Why Open Data?

1. Transparency

2. Releasing social and commercial value

3. Participation and engagement

8

McKinsey report from October 2013http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/open-data-unlocking-innovation-and-performance-with-liquid-information

9EC study from November 2015: Creating Value through Open Data: Study on the Impact of Re-use of Public Data Resources https://www.europeandataportal.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf

10 Open Data Portal of Deutsche Bahn http://data.deutschebahn.com/

12

14

15

16

17

AWS as a Platformfor Open Data

18

Why does AWS care about Open Data?

� Many of our commercial sector customers rely on quality open data as much as they rely on our cloud infrastructure services.

� Many of our public sector customers use AWS to make their data available to a global community of researchers, entrepreneurs, students, and fellow government agencies.

Sharing data makes it accessible to a large and growing community of researchers, entrepreneurs, and enterprises.

19

The cloud allows users from anywhere to take their algorithms to data rather than downloading data to their computing resources.

Data Acquisition in the Cloud

20

Open data as a platform

Data Creation Data Enrichment

Sen

sem

akin

g

Data at Rest(Object storage)

Basic APIs

Complex APIs

Consumerapplications

Algorithmicpolicy

Data-drivenjournalism

Data Catalogs

Focused datadashboards

Predictivemodeling

Visualizations

Lower cost of knowledge(Efficiency)

21

A Rich Set of Programmable Services

22

Administrationand Security

Access Control

Identity Management

Key Management and Storage

Monitoringand Logs

Resource and Usage Auditing

Platform Services

Analytics App Services Developer Tools and Operations Mobile Services

DataPipelines

DataWarehouse

Hadoop

Real-TimeStreaming Data

Application LifecycleManagement

Containers

Deployment

DevOps

Event-Driven Computing

Resource Templates Identity

Mobile Analytics

Push Notifications

Sync

App Streaming

Email

Queuing and Notifications

Search

Transcoding

Workflow

Core Services CDNCompute(VMs, Auto-Scaling and Load Balancing)

Databases(Relational, NoSQL, and Caching)

Networking(VPC, DX, and DNS)

Storage(Object, Block, and Archival)

Infrastructure Availability Zones

Points of Presence

Regions

EnterpriseApplications

Business Email

Sharing and Collaboration

Virtual Desktop

Technical and Business Support

AccountManagement

PartnerEcosystem

ProfessionalServices

Security and Pricing Reports

SolutionsArchitectsSupport Training and

Certification

Providing Open Data on AWS

24

Case Study: Transport for London

25 graphics from TfL, October 2016

Why open data at TfL?

TransparencyReachOptimal use of transport networkEconomic benefitInnovation…

26

Available Datasets

The API supports all the data requirements of the TfLwebsite. Every data-driven aspect of the website (including maps) is powered by the unified API.

Some of the multi-modal core datasets included and available to developers are:� Journey Planning (current and

future)� Status (current and future)� Disruptions (current) and Planned

works (future)� Arrival/departure predictions

(instant and websockets)� Timetables� Embarkation points and facilities� Routes and lines (topology and

geographical)� Fares

27

London

28

Munich

Almost 500 apps produced.Playground for innovation.Improving transportation, collaboratively.

Apps by public transportationauthorities: MVV, MVG, DB. No info how to access data, lacksdocumentation.

29 graphic from TfL, October 2016

Outcomes Cloud Benefits

� Customers save time, economic benefits

� New jobs and investmentsin startup and techecosystem

� Usage of data has sincedoubled

� Data consolidation andquality

� Pay for what you use� Lower maintenance costs� Elasticity� Automation and consistency� Blue/green deployment –

zero downtime� Highly secure

30 mwd advisors cased study https://d0.awsstatic.com/analyst-reports/MWD_AWS_TFL_Case_Study_Sept_2015.pdf

Solutions for providing Open Data on AWS

Open data platforms� Catalog� Publish� Discover� Visualize� Analyze� Share� …

31

Using Open Data on AWS

32

Public Data Sets on AWSSeveral high-value datasets are available for anyone to access for free on AWS. Examples include:

Landsat on AWS3K Rice Genome NEXRAD on AWS

33

More available Public Datasets on AWS…

GDELT: Over a quarter-billion records monitoring the world's broadcast, print, and web news from nearly every corner of every country, updated daily..IRS 990 Filings on AWS: Machine-readable data from certain electronic 990 forms filed with the IRS from 2011 to presentCommon Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pagesTCGA on AWS: Raw and processed genomic, transcriptomic, and epigenomic data from The Cancer Genome Atlas (TCGA) available to qualified researchers via the Cancer Genomics CloudICGC on AWS: Whole genome sequence data available to qualified researchers via The International Cancer Genome Consortium (ICGC)1000 Genomes Project: A detailed map of human genetic variationMultimedia Commons: A collection of nearly 100M images and videos with audio and visual features and annotationsGoogle Books Ngrams: A dataset containing Google Books n-gram corpusesA list of other Public Datasets is available here.

34

Accessing and processing Landsat data

What is Landsat on AWS?

How to access Landsat on AWS?

How to use Landsat on AWS?

36

Landsat on AWS

We have committed to make up to 1 petabyte of Landsat imagery readily available as objects on Amazon S3.

All Landsat 8 scenes from 2015 and 2016 are available, along with a selection of cloud-free scenes from 2013 and 2014.

All new Landsat 8 scenes are made available each day (~700 per day), often within hours of production.

37

Landsat on AWS

Landsat on AWS makes each band of each scene readily available as objects on Amazon S3. Data can be accessed programmatically via HTTP and quickly deployed to any of our products for analysis and processing.

Users do not need to worry about local storage and have access to virtually unlimited computing power on demand.

AmazonEC2

s3://landsat-pds

.tarUSGS

.tiff

38

Undifferentiated heavy lifting

We use GDAL to add “internal tiling” on each Landsat on AWS tiff, which allows developers to use HTTP range gets to access specific portions of each scene.

This allows people to only access the data they need when they need it. Standard tiff

objectInternal tiled tiff

object

1 2 3 4 5 67 8 9 10 11 12

13 14 15 16 17 18

19 20 21 22 23 24

25 26 27 28 29 3031 32 33 34 35 36

1 2 34 5 6

7 8 9

10 11 1213 14 15

16 17 18

19 20 2122 23 2425 26 27

28 29 3031 32 3334 35 36

39

RGBVisible light

InfraredVegetation

Shortwave infraredUrban areas

Think of URLs instead of copiesWellington, New Zealandhttps://landsat-pds.s3.amazonaws.com/L8/072/089/

Using Landsat on S3

Landsat on Amazon

S3

ArcGIS Server on

Amazon EC2

AWS US West Oregon Region

reliable, performant data access

user

Usage in the first year:� Over 400,000 scenes available

� Over 1 billion hits globally

Used for new product development by:

Landsat on AWS

Small invest, big impact:

� Public dataset hosted in FRA

� Apps for agriculture, disaster relief, vegetation monitoring, property taxation, ..

Used for new product development by:

42

Sentinel-2 on AWS

Next steps

Depending on your role, your goals� Use open data in your projects / your organisation� Provide open data from your organisation� Build a new business on open dataAWS offers� Technology platform that constantly evolves� Enablement through workshops, training, ProServ� Customer and partner ecosystem to connect and build

44

Thank you!

[email protected]

Grischa Gundelsweiler


Recommended