Click here to load reader

Unlocking Open Data in the Cloud - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/_Munich_Loft...DevOps Event-Driven Computing Resource Templates Identity Mobile Analytics

  • View
    6

  • Download
    0

Embed Size (px)

Text of Unlocking Open Data in the Cloud - Amazon Web...

  • © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

    Unlocking Open Data in the Cloud

    Grischa GundelsweilerPublic Sector Account Manager, DACHLoft + Lab Munich11th November 2016

  • What this session is about

    1) Open Data: Concepts, Examples & Trends2) AWS as a Platform for Open Data3) Case Study: Provide Open Data on AWS4) Case Study: Use Open Data on AWS

    2

  • Open Data: Concepts, Examples & Trends

    3

  • “Open data is data that can be freely used, shared and built-on by anyone, anywhere, for any purpose.”

    Definition by Open Knowledge Foundation, 2013http://blog.okfn.org/2013/10/03/defining-open-data/

    http://blog.okfn.org/2013/10/03/defining-open-data/

  • The 8 Open Government Data Principles

    1. Complete2. Primary3. Timely4. Accessible5. Machine processable6. Non-discriminatory7. Non-proprietary8. License-free OGD Principleshttps://opengovdata.org/

    https://opengovdata.org/

  • Why Open Data?

    1. Transparency

    2. Releasing social and commercial value

    3. Participation and engagement

  • 8

    McKinsey report from October 2013http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/open-data-unlocking-innovation-and-performance-with-liquid-information

    http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/open-data-unlocking-innovation-and-performance-with-liquid-information

  • 9EC study from November 2015: Creating Value through Open Data: Study on the Impact of Re-use of Public Data Resources https://www.europeandataportal.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf

    https://www.europeandataportal.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf

  • 10 Open Data Portal of Deutsche Bahn http://data.deutschebahn.com/

    http://data.deutschebahn.com/

  • 12

  • 14

  • 15

  • 16

  • 17

  • AWS as a Platformfor Open Data

    18

  • Why does AWS care about Open Data?

    Many of our commercial sector customers rely on quality open data as much as they rely on our cloud infrastructure services.

    Many of our public sector customers use AWS to make their data available to a global community of researchers, entrepreneurs, students, and fellow government agencies.

    Sharing data makes it accessible to a large and growing community of researchers, entrepreneurs, and enterprises.

    19

  • The cloud allows users from anywhere to take their algorithms to data rather than downloading data to their computing resources.

    Data Acquisition in the Cloud

    20

  • Open data as a platform

    Data Creation Data Enrichment

    Sen

    sem

    akin

    g

    Data at Rest(Object storage)

    Basic APIs

    Complex APIs

    Consumerapplications

    Algorithmicpolicy

    Data-drivenjournalism

    Data Catalogs

    Focused datadashboards

    Predictivemodeling

    Visualizations

    Lower cost of knowledge(Efficiency)

    21

  • A Rich Set of Programmable Services

    22

    Administrationand Security

    Access Control

    Identity Management

    Key Management and Storage

    Monitoringand Logs

    Resource and Usage Auditing

    Platform Services

    Analytics App Services Developer Tools and Operations Mobile Services

    DataPipelines

    DataWarehouse

    Hadoop

    Real-TimeStreaming Data

    Application LifecycleManagement

    Containers

    Deployment

    DevOps

    Event-Driven Computing

    Resource Templates Identity

    Mobile Analytics

    Push Notifications

    Sync

    App Streaming

    Email

    Queuing and Notifications

    Search

    Transcoding

    Workflow

    Core Services CDNCompute(VMs, Auto-Scaling and Load Balancing)

    Databases(Relational, NoSQL, and Caching)

    Networking(VPC, DX, and DNS)

    Storage(Object, Block, and Archival)

    Infrastructure Availability ZonesPoints of Presence

    Regions

    EnterpriseApplications

    Business Email

    Sharing and Collaboration

    Virtual Desktop

    Technical and Business Support

    AccountManagement

    PartnerEcosystem

    ProfessionalServices

    Security and Pricing Reports

    SolutionsArchitectsSupport

    Training and Certification

  • Providing Open Data on AWS

    24

  • Case Study: Transport for London

    25 graphics from TfL, October 2016

  • Why open data at TfL?

    TransparencyReachOptimal use of transport networkEconomic benefitInnovation…

    26

  • Available Datasets

    The API supports all the data requirements of the TfLwebsite. Every data-driven aspect of the website (including maps) is powered by the unified API.

    Some of the multi-modal core datasets included and available to developers are:

    Journey Planning (current and future)Status (current and future)Disruptions (current) and Planned works (future)Arrival/departure predictions (instant and websockets)TimetablesEmbarkation points and facilitiesRoutes and lines (topology and geographical)Fares

    27

  • London

    28

    Munich

    Almost 500 apps produced.Playground for innovation.Improving transportation, collaboratively.

    Apps by public transportationauthorities: MVV, MVG, DB. No info how to access data, lacksdocumentation.

  • 29 graphic from TfL, October 2016

  • Outcomes Cloud Benefits

    Customers save time, economic benefitsNew jobs and investmentsin startup and techecosystemUsage of data has sincedoubledData consolidation andquality

    Pay for what you useLower maintenance costsElasticityAutomation and consistencyBlue/green deployment –zero downtimeHighly secure

    30 mwd advisors cased study https://d0.awsstatic.com/analyst-reports/MWD_AWS_TFL_Case_Study_Sept_2015.pdf

    https://d0.awsstatic.com/analyst-reports/MWD_AWS_TFL_Case_Study_Sept_2015.pdf

  • Solutions for providing Open Data on AWS

    Open data platformsCatalogPublishDiscoverVisualizeAnalyzeShare…

    31

    https://aws.amazon.com/de/government-education/open-data/https://aws.amazon.com/de/government-education/open-data/

  • Using Open Data on AWS

    32

  • Public Data Sets on AWSSeveral high-value datasets are available for anyone to access for free on AWS. Examples include:

    Landsat on AWS3K Rice Genome NEXRAD on AWS

    33

  • More available Public Datasets on AWS…

    GDELT: Over a quarter-billion records monitoring the world's broadcast, print, and web news from nearly every corner of every country, updated daily..IRS 990 Filings on AWS: Machine-readable data from certain electronic 990 forms filed with the IRS from 2011 to presentCommon Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pagesTCGA on AWS: Raw and processed genomic, transcriptomic, and epigenomic data from The Cancer Genome Atlas (TCGA) available to qualified researchers via the Cancer Genomics CloudICGC on AWS: Whole genome sequence data available to qualified researchers via The International Cancer Genome Consortium (ICGC)1000 Genomes Project: A detailed map of human genetic variationMultimedia Commons: A collection of nearly 100M images and videos with audio and visual features and annotationsGoogle Books Ngrams: A dataset containing Google Books n-gram corpusesA list of other Public Datasets is available here.

    34

    https://aws.amazon.com/public-data-sets/gdelt/http://aws.amazon.com/public-data-sets/irs-990/https://aws.amazon.com/public-data-sets/common-crawl/http://aws.amazon.com/public-data-sets/tcga/http://aws.amazon.com/public-data-sets/icgc/http://aws.amazon.com/1000genomes/http://aws.amazon.com/public-data-sets/multimedia-commons/https://aws.amazon.com/datasets/google-books-ngrams/https://aws.amazon.com/datasets/

  • 35

    https://aws.amazon.com/earth/https://aws.amazon.com/earth/

  • Accessing and processing Landsat data

    What is Landsat on AWS?

    How to access Landsat on AWS?

    How to use Landsat on AWS?

    36

  • Landsat on AWS

    We have committed to make up to 1 petabyte of Landsat imagery readily available as objects on Amazon S3.

    All Landsat 8 scenes from 2015 and 2016 are available, along with a selection of cloud-free scenes from 2013 and 2014.

    All new Landsat 8 scenes are made available each day (~700 per day), often within hours of production.

    37

  • Landsat on AWS

    Landsat on AWS makes each band of each scene readily available as objects on Amazon S3. Data can be accessed programmatically via HTTP and quickly deployed to any of our products for analysis and processing.

    Users do not need to worry about local storage and have access to virtually unlimited computing power on demand.

    AmazonEC2

    s3://landsat-pds

    .tarUSGS

    .tiff

    38

  • Undifferentiated heavy lifting

    We use GDAL to add “internal tiling” on each Landsat on AWS tiff, which allows developers to use HTTP range gets to access specific portions of each scene.

    This allows people to only access the data they need when they need it. Standard tiff

    objectInternal tiled tiff

    object

    1 2 3 4 5 67 8 9 10 11 12

    13 14 15 16 17 18

    19 20 21 22 23 24

    25 26 27 28 29 3031 32 33 34 35 36

    1 2 34 5 6

    7 8 9

    10 11 1213 14 15

    16 17 18

    19 20 2122 23 2425 26 27

    28 29 3031 32 3334 35 36

    39

  • RGBVisible light

    InfraredVegetation

    Shortwave infraredUrban areas

    Think of URLs instead of copiesWellington, New Zealandhttps://landsat-pds.s3.amazonaws.com/L8/072/089/

  • Using Landsat on S3

    Landsat on Amazon

    S3

    ArcGIS Server on

    Amazon EC2

    AWS US West Oregon Region

    reliable, performant data access

    user

  • Usage in the first year:Over 400,000 scenes available

    Over 1 billion hits globally

    Used for new product development by:

    Landsat on AWS

    Small invest, big impact:

    Public dataset hosted in FRA

    Apps for agriculture, disaster relief, vegetation monitoring, property taxation, ..

    Used for new product development by:

    42

    Sentinel-2 on AWS

  • Next steps

    Depending on your role, your goalsUse open data in your projects / your organisationProvide open data from your organisationBuild a new business on open data

    AWS offersTechnology platform that constantly evolvesEnablement through workshops, training, ProServCustomer and partner ecosystem to connect and build

    44

  • Thank you!

    [email protected]

    Grischa Gundelsweiler