Unlocking Open Data in the Cloud - Amazon Web Servicesaws-de-media.s3. ... DevOps Event-Driven Computing

  • View

  • Download

Embed Size (px)

Text of Unlocking Open Data in the Cloud - Amazon Web Servicesaws-de-media.s3. ... DevOps Event-Driven...

  • © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

    Unlocking Open Data in the Cloud

    Grischa Gundelsweiler Public Sector Account Manager, DACH Loft + Lab Munich 11th November 2016

  • What this session is about

    1) Open Data: Concepts, Examples & Trends 2) AWS as a Platform for Open Data 3) Case Study: Provide Open Data on AWS 4) Case Study: Use Open Data on AWS


  • Open Data: Concepts, Examples & Trends


  • “Open data is data that can be freely used, shared and built- on by anyone, anywhere, for any purpose.”

    Definition by Open Knowledge Foundation, 2013 http://blog.okfn.org/2013/10/03/defining-open-data/


  • The 8 Open Government Data Principles

    1. Complete 2. Primary 3. Timely 4. Accessible 5. Machine processable 6. Non-discriminatory 7. Non-proprietary 8. License-free OGD Principleshttps://opengovdata.org/


  • Why Open Data?

    1. Transparency

    2. Releasing social and commercial value

    3. Participation and engagement

  • 8

    McKinsey report from October 2013 http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/open-data- unlocking-innovation-and-performance-with-liquid-information


  • 9 EC study from November 2015: Creating Value through Open Data: Study on the Impact of Re-use of Public Data Resources https://www.europeandataportal.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf


  • 10 Open Data Portal of Deutsche Bahn http://data.deutschebahn.com/


  • 12

  • 14

  • 15

  • 16

  • 17

  • AWS as a Platform for Open Data


  • Why does AWS care about Open Data?

    Many of our commercial sector customers rely on quality open data as much as they rely on our cloud infrastructure services.

    Many of our public sector customers use AWS to make their data available to a global community of researchers, entrepreneurs, students, and fellow government agencies.

    Sharing data makes it accessible to a large and growing community of researchers, entrepreneurs, and enterprises.


  • The cloud allows users from anywhere to take their algorithms to data rather than downloading data to their computing resources.

    Data Acquisition in the Cloud


  • Open data as a platform

    Data Creation Data Enrichment

    S en

    se m

    ak in


    Data at Rest (Object storage)

    Basic APIs

    Complex APIs

    Consumer applications

    Algorithmic policy

    Data-driven journalism

    Data Catalogs

    Focused data dashboards

    Predictive modeling


    Lower cost of knowledge (Efficiency)


  • A Rich Set of Programmable Services


    Administration and Security

    Access Control

    Identity Management

    Key Management and Storage

    Monitoring and Logs

    Resource and Usage Auditing

    Platform Services

    Analytics App Services Developer Tools and Operations Mobile Services

    Data Pipelines

    Data Warehouse


    Real-Time Streaming Data

    Application Lifecycle Management




    Event-Driven Computing

    Resource Templates Identity

    Mobile Analytics

    Push Notifications


    App Streaming


    Queuing and Notifications




    Core Services CDNCompute(VMs, Auto-Scaling and Load Balancing)

    Databases (Relational, NoSQL, and Caching)

    Networking (VPC, DX, and DNS)

    Storage (Object, Block, and Archival)

    Infrastructure Availability Zones Points of Presence


    Enterprise Applications

    Business Email

    Sharing and Collaboration

    Virtual Desktop

    Technical and Business Support

    Account Management

    Partner Ecosystem

    Professional Services

    Security and Pricing Reports

    Solutions ArchitectsSupport

    Training and Certification

  • Providing Open Data on AWS


  • Case Study: Transport for London

    25 graphics from TfL, October 2016

  • Why open data at TfL?

    Transparency Reach Optimal use of transport network Economic benefit Innovation …


  • Available Datasets

    The API supports all the data requirements of the TfL website. Every data-driven aspect of the website (including maps) is powered by the unified API.

    Some of the multi-modal core datasets included and available to developers are:

    Journey Planning (current and future) Status (current and future) Disruptions (current) and Planned works (future) Arrival/departure predictions (instant and websockets) Timetables Embarkation points and facilities Routes and lines (topology and geographical) Fares


  • London



    Almost 500 apps produced. Playground for innovation. Improving transportation, collaboratively.

    Apps by public transportation authorities: MVV, MVG, DB. No info how to access data, lacks documentation.

  • 29 graphic from TfL, October 2016

  • Outcomes Cloud Benefits

    Customers save time, economic benefits New jobs and investments in startup and tech ecosystem Usage of data has since doubled Data consolidation and quality

    Pay for what you use Lower maintenance costs Elasticity Automation and consistency Blue/green deployment – zero downtime Highly secure

    30 mwd advisors cased study https://d0.awsstatic.com/analyst-reports/MWD_AWS_TFL_Case_Study_Sept_2015.pdf


  • Solutions for providing Open Data on AWS

    Open data platforms Catalog Publish Discover Visualize Analyze Share …


    https://aws.amazon.com/de/government-education/open-data/ https://aws.amazon.com/de/government-education/open-data/

  • Using Open Data on AWS


  • Public Data Sets on AWS Several high-value datasets are available for anyone to access for free on AWS. Examples include:

    Landsat on AWS3K Rice Genome NEXRAD on AWS


  • More available Public Datasets on AWS…

    GDELT: Over a quarter-billion records monitoring the world's broadcast, print, and web news from nearly every corner of every country, updated daily.. IRS 990 Filings on AWS: Machine-readable data from certain electronic 990 forms filed with the IRS from 2011 to present Common Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pages TCGA on AWS: Raw and processed genomic, transcriptomic, and epigenomic data from The Cancer Genome Atlas (TCGA) available to qualified researchers via the Cancer Genomics Cloud ICGC on AWS: Whole genome sequence data available to qualified researchers via The International Cancer Genome Consortium (ICGC) 1000 Genomes Project: A detailed map of human genetic variation Multimedia Commons: A collection of nearly 100M images and videos with audio and visual features and annotations Google Books Ngrams: A dataset containing Google Books n-gram corpuses A list of other Public Datasets is available here.


    https://aws.amazon.com/public-data-sets/gdelt/ http://aws.amazon.com/public-data-sets/irs-990/ https://aws.amazon.com/public-data-sets/common-crawl/ http://aws.amazon.com/public-data-sets/tcga/ http://aws.amazon.com/public-data-sets/icgc/ http://aws.amazon.com/1000genomes/ http://aws.amazon.com/public-data-sets/multimedia-commons/ https://aws.amazon.com/datasets/google-books-ngrams/ https://aws.amazon.com/datasets/

  • 35

    https://aws.amazon.com/earth/ https://aws.amazon.com/earth/

  • Accessing and processing Landsat data

    What is Landsat on AWS?

    How to access Landsat on AWS?

    How to use Landsat on AWS?


  • Landsat on AWS

    We have committed to make up to 1 petabyte of Landsat imagery readily available as objects on Amazon S3.

    All Landsat 8 scenes from 2015 and 2016 are available, along with a selection of cloud-free scenes from 2013 and 2014.

    All new Landsat 8 scenes are made available each day (~700 per day), often within hours of production.


  • Landsat on AWS

    Landsat on AWS makes each band of each scene readily available as objects on Amazon S3. Data can be accessed programmatically via HTTP and quickly deployed to any of our products for analysis and processing.

    Users do not need to worry about local storage and have access to virtually unlimited computing power on demand.

    Amazon EC2





  • Undifferentiated heavy lifting