1-3 Keller Pivotal Apps & Data @ Dell EMC Forum Vienna...

Preview:

Citation preview

GLOBAL SPONSORS

Schaffen von KundenwertenMit Cloud Native Apps & Analytics

© Copyright 2017 Pivotal Software, Inc. All rights Reserved.

Schaffen von Kundenwerten mitCloud Native Apps & Analytics

September 2017Martin Kellermkeller@pivotal.io

DIE

DIGITALISIERUNG

ÄNDERT ALLES.

WANN ÄNDERT

SICH DIE

POLITIK?

Delivering information in context..

..personalized..

..in real-time

“Companies need to learn how to catch people or things in the act of doing something and affect the outcome”

Great software companies leverage analytics and insights – how do they accomplish that?

Open Source Innovation

Parallel Processing

Cloud Native Continuous

Delivery

Loosely-coupled Microservices

Data Science and Machine Learning

6© 2016 Pivotal Software, Inc. All rights reserved.

Smart Data Driven AppsLogistics Logistics

Pivotal 2017

Important Capabilities

• Ability to store and integrate volumes of data from multiple sources

• Moving beyond basic business intelligence and reporting to more sophisticated data science and predictive modeling techniques

• System must deliver insights about likely next actions in ways advisors can consume and take action on them

• Results of these actions must be fed back into the system to continually improve the predictive models

Data Architecture Pivotal Inc.

DATA FEEDS

DATASOURCES ANALYTIC APPS

Fast Ingest / PipeliningPipelines to consume streaming and batch data from various endpoints

Raw Data Landing Zone

Distributed Memory-based Processing

Realtime Data Insights

Statistical ToolsExpert SystemMachine Learning

Advanced Analytics /MPP

Hadoop Data Lakes

Massively Parallel Architecture

Public Cloud Data Lakes

Predefined Libraries

Programmatic

GPText

Parallel Configurable Data Load

High Speed Ingestion

AnalyticalData to cache

In-Memory Data Grid

Parallel Data Load and External Tables

Pivotal Data Suite

In-DB Predictive Analytics

Col

dH

otW

arm

Dat

a Te

mpe

ratu

re

PIVOTAL GEMFIRE

PIVOTAL GREENPLUM

(Data Warehouse)

Pivotal Data Suite

PIVOTAL GREENPLUMData warehouse database

based on open source Greenplum Database

PIVOTAL GEMFIREOpen source application and transaction data grid based on Apache Geode

Open source data management portfolio

CompletePlatform

MissionCritical

DeploymentOptions

OpenSource

Flexible Licensing

Advanced DataAnalytics

Complete platform

Based on open source

Deployment options

Hadoop native SQL

Flexible licensing

Advanced data services

Pivotal Data SuiteOpen data management portfolio

Pivotal Data Suite

OSS Support Spring XD & Spring Cloud Data Flow OSS Support PostgreSQL

Spielwiese .. Connected Cars

Anwendungsbeispiel

13© Copyright 2015 Pivotal. All rights reserved.

Connected Car Demo youtube linkCONNECTED

CAR

P R E D I C T T H EDESTINATION

P R E D I C T T H E

RANGE

© Copyright 2013 Pivotal. All rights reserved.

Real-time car telematics

• Driving data from in-car OBD2 port• In-depth view on driving • Framework to train models on batch

data and using for real-time prediction• Predict journey destination and fuel

consumption• Build app in collaboration with Pivotal

Labs

Roa

ds

Cars

1 many

1m

any✓

Pivotal Offerings“Companies need to learn how to catch people or things in the act of doing something and affect the outcome”

Data Suite:• Spring Cloud Data Flow - open source data management • GemFire: In-Memory Data Grid• Greenplum: Data Warehouse

Pivotal Cloud Foundry (PCF)• Industry’s Leading Cloud-Native Platform

Pivotal Container Service (PKS) -• Production-Grade Kubernetes

Spring Boot, Cloud and Data Flow• Modern-Java microservices framework

Pivotal Labs & Data Science• Build a smart app end-to-end• Focus on a specific analytical model / data-microservice

16© 2016 Pivotal Software, Inc. All rights reserved.

Pivotal Cloud Cache ExplainedIn-memory caching as an on-demand, managed service on PCF

Pivotal Cloud Cache: In-Memory PerformanceIn-memory performance with cloud-native scalability and availability

Horizontally scalable architecture

High volume transactionsBlazing fast reads10-100x faster than disk

High AvailabilityAcross application and caching layers

In-memory cloud-native cache

Microservices Need Performance and ScalabilityMicroservices with large, frequently accessed data sets need a cache layer

Performance and scalability of data● Add servers to a shared

Pivotal Cloud Cache cluster● Reduces the pressure to scale rigid

backing stores● Enables availability and resilience

App Instance

1

Prepackaged for Simple Consumption

• Easy accessibility through Marketplace

• Instant Provisioning

• Bind to apps through easy to use interface

• Common access control and audit trails across services

MySQL New Relic

Single Sign-On RabbitMQ

Config Server

ServiceDirectory

Circuit Breaker

Signal Sciences

Crunchy PostgreSQL AND

MORE

Services Marketplace

Pivotal Cloud Cache

Redis

Developers get self-service access to Pivotal Cloud Cache on Pivotal Cloud Foundry

20

21

Rio SãoPaulo

Web Application

GemFire Cluster

Oracle RAC Mainframe

Web Application

GemFire Cluster

Oracle RAC Mainframe

WANData Sync

22© 2015 Pivotal Software, Inc. All rights reserved.

GemFire GemFire

Distributed, in-memory NoSQL data grid for big data apps that need: Scale-out performance Consistent database operations across globally distributed nodes High availability, resilience, and global scale Powerful & Standards-based developer features Easy administration of distributed nodes Based on Apache Geode (incubating)

Pivotal GemFire – Usage By the Numbers

23

China Railways• 5,700 train stations• 4.6 million tickets per day• 20 million daily users• 3TB operational data in-memory• 40,000 visits per second• >1,500,000,000 Hits per day

Indian Railways• 7,000 stations• 23 million passengers daily• 120,000 concurrent users• 10,000 transactions per minute• >1,200,000,000 Hits per day

World: ~7,349,000,000~37% of the world population

© Copyright 2017 Pivotal Software, Inc. All rights Reserved.

Pivotal Greenplum

Run Anywhere, Mature, OSS, Analytical MPP

AN OPEN SOURCE DATA WAREHOUSE

BATTLE TESTED IN PRODUCTION

BUILT FOR DIVERSE ANALYTICAL USE CASES

AVAILABLE ANYWHERE YOU NEED IT

WHAT IS GREENPLUM?

26© Copyright 2013 Pivotal. All rights reserved.

The Pivotal Greenplum Database is…

A Highly-Scalable, Shared-Nothing Database

• Leading MPP architecture, including a patented next-generation optimizer

• Optimized architecture and features for loading and queries

• Start small, scale as needed• Polymorphic storage,

compression, partitioning

A Platform for Advanced Analytics on Any (and All)

Data

• Rich ecosystem (SAS, R, BI & ETL tools)

• In-DB Analytics (MADlib, Custom, languages: R, Java, Python, PERL, C, C++)

• High degree of SQL completeness so analysts can use a language they know

• Domain: Geospatial, Text processing (GPText)

An Enterprise Ready Platform Capable of Flexing

With Your Needs

• Available as needed – either as an appliance or software

• Secures data in-place, in flight, and with authentication to suit

• Capable of managing a variety of mixed workloads

Functions

Linear Systems• Sparse and Dense Solvers• Linear Algebra

Matrix Factorization• Singular Value Decomposition (SVD)• Low Rank

Generalized Linear Models• Linear Regression• Logistic Regression• Multinomial Logistic Regression• Ordinal Regression• Cox Proportional Hazards Regression• Elastic Net Regularization• Robust Variance (Huber-White),

Clustered Variance, Marginal Effects

Other Machine Learning Algorithms• Principal Component Analysis (PCA)• Association Rules (Apriori)• Topic Modeling (Parallel LDA)• Decision Trees• Random Forest• Conditional Random Field (CRF)• Clustering (K-means) • Cross Validation• Naïve Bayes• Support Vector Machines (SVM)• Prediction Metrics• K-Nearest Neighbors

Descriptive StatisticsSketch-Based Estimators• CountMin (Cormode-Muth.)• FM (Flajolet-Martin)• MFV (Most Frequent Values)Correlation and CovarianceSummary

Utility ModulesArray and Matrix OperationsSparse VectorsRandom SamplingProbability FunctionsData PreparationPMML ExportConjugate GradientStemmingSessionizationPivot

Inferential StatisticsHypothesis Tests

Time Series• ARIMA

Jan 2017

Path Functions• Operations on Pattern Matches

Graph• Single Source Shortest Path• Page Rank

Procedural Languages

• User Defined Types

• User Defined Functions

• User Defined Aggregates

• Import of libraries from open source

Greenplum Geospatial Big DataCurrent Key Features:• Points, Lines, Polygons,

Perimeter, Area, Intersection, Contains, Distance, Long/Lat

Spatial Indexes & Bounding Boxes

Round earth calculations

Raster Support

Integrated Text Analytics

GPText: SQL Warehousing + Text• Leveraging Apache Solr and GPDB• 5 years commercial production experience• Madlib integration for machine learning on text data• PL/Python and PL/Java integration for Natural Language Processing

Use Cases• Communications compliance and monitoring• Customer Sentiment analysis• Document Search and Query• Social Media Processing, etc.

Fragen?DANKE!

Backup

Greenplum Hadoop & Cloud Connectors

Dat

a Te

mpe

ratu

reW

ar mH

ot

Operational Analytics & SQL

Data Lake & Cold Storage

SLA Driven & Iterative

Parallel High Speed SQL Transfer

War m

Col

d

Public & PrivateData Lakes

Batch & AdHoc

Gemfire Greenplum Connector (GGC)

Dat

a Te

mpe

ratu

reW

ar mH

ot

Custom Apps

App 1App 1App 1

App 2App 2App 2

Data science, analytics & ML

TransactionalNative API

Rest / HTTP

AnalyticalANSI SQL

PushUpdates

Parallel ConfigurableData Load

Transactionaldata

Write behind

AnalyticalData

to cache

Recommended