22
2019 Storage Developer Conference. © Scality. 1 Data Management in a Cloud-Agnostic World Vianney Rancurel Scality

Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 1

Data Management in a Cloud-Agnostic World

Vianney RancurelScality

Page 2: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 2

Agenda

- Current state of data - Zenko cloud-agnostic data management

platform and use cases- Zenko new workflow engine and use cases

Page 3: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 3

Current State of Data

Page 4: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 4

Current State of Data- Files and objects are in separate

storage silos (business and technical).- Data locations are increasingly

dispersed (on-premise and in different clouds).

- Admins must also maintain legacy storage platforms.

- Companies need to integrate and leverage new object storage platforms.

- Creating data workflows—especially hybrid or multi-cloud flows—is difficult.

- Building apps that can leverage data is hard/slow.

Photo by Yung Chang on Unsplash

Page 5: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 5

Cloud-Agnostic Data Management

Photo by Carlos Muza on Unsplash

▪ Single interface for data control across legacy, hybrid, and multi-cloud storage architectures

▪ Global data visibility and search▪ Data governance▪ Data mobility▪ Programmatic and GUI control▪ No vendor lock-in▪ Faster app development

Page 6: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 6

Workflow Engine

Photo by Genessa Panainte on Unsplash

▪ Can execute operations on data, wherever it is.

▪ Leverages data.▪ Automates tasks (antivirus,

auto-tagging, de-identification, anonymization, etc).

▪ Gets insights from data (machine learning, AI, etc).

▪ Instead of automating using duct tape (e.g. scripts), we manage it with a data-centric UI.

Page 7: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 7

Architecture of a Cloud-AgnosticData Management Platform

(Zenko)

Page 8: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 8

Concepts

- Storage location: a bucket (somewhere on-premise or in the cloud) or a mountpoint (on a filesystem).

Page 9: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 9

Zenko (presented SDC 2018)

Page 10: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 10

Zenko Today: Predefined “Workflows”

- Replication of storage locations- Lifecycle (expiration / transition) per object

storage location- Out-of-band updates (metadata ingestion)

Page 11: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 11

Cloud-Agnostic Platform Use Cases

ENTERPRISE MEDIA &ENTERTAINMENT

HEALTHCARE SERVICEPROVIDERS

BANKS, MANUFACTURERS, PHARMA

TV, BROADCASTERS, STUDIOS, POST-PROD, DISTRIBUTION

HOSPITALS, MEDICAL GROUPS, UNIVERSITIES

CABLE, TELCO, MOBILE, HOSTERS & CLOUD

Cloud DR/HA for Data

Cloud Media Workflows

Long-term Archive with active search

Data Management as a Service

Page 12: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 12

Zenko Architecture

cloudserver

kafka

queues

mongodb

Buckets,Objects,Users

redisuser

object storage protocol metrics

metrics

metadataoperation log

asyncoperations

temporary storage

data

cloud1

cloud2

cloud3

data

data

on premise fs storage

on premise object storage

Kubernetes / prometheus

cosmos

metadata

out-of-band updates (metadata)

UI / management

portal

vault

Auth

Etc

replication

backbeat

lifecycle

ingestion

Page 13: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 13

Zenko’s New Workflow Engine

Page 14: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 14

Workflow EngineWhat if I could manage my data flows with a graphical UI ?

▪ Parallelize some processing▪ Serialize other tasks▪ Call “on-premise” or remote “cloud” functions

Page 15: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 15

Workflow Engine Use Cases

ENTERPRISE MEDIA &ENTERTAINMENT

HEALTHCARE SERVICEPROVIDERS

BANKS, MANUFACTURERS, PHARMA

TV, BROADCASTERS, STUDIOS, POST-PROD, DISTRIBUTION

HOSPITALS, MEDICAL GROUPS, UNIVERSITIES

CABLE, TELCO, MOBILE, HOSTERS & CLOUD

Antivirus, Encryption, Compression, compliance

Smart tiering, video transcoding

De-identification for AI traning, auto-tagging

Antispam, Antivirus

Page 16: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 16

Zenko Workflow Engine Architecture

cloudserver

kafka

queues

mongodb

Buckets,Objects,Users

redisuser

object storage protocol metrics

metrics

metadataoperation log

asyncoperations

temporary storage

data

cloud1

cloud2

cloud3

data

data

on premise fs storage

on premise object storage

Kubernetes / prometheus

cosmos

metadata

out-of-band updates (metadata)

UI / management

portal

vault

Auth

Etc

k8s function engine

cloud functions

backbeatworkflow engine

cloud functions

Page 17: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 17

Kubernetes Function Engines▪ We leverage Kubernetes-based function engines (also called serverless engines)

e.g., Kubeless, Fission, KNative, as runtime engines for workflow blocks.▪ Users create “functions” as Kubernetes pods through existing serverless CLI.

Zenko can list them and register them to the Workflow Manager UI.▪ The serverless engine can autoscale “functions” based on load (e.g. heavy CPU

processing).▪ “Functions” can be triggered by:

▪ CLI (for tests)▪ HTTP (for synchronous calls), e.g., for custom authorization in Zenko.▪ Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka

event in a topic, and the status (success/failure) is stored in another Kafka topic. This is what we leverage in Zenko for asynchronous processing.

▪ The serverless engine has its own Prometheus sidecars for monitoring the “functions” activity.

▪ Zenko does the control (retry policy, reporting successes and failures through a UI dashboard).

Page 18: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 18

Workflow Engine: A Real-Life Use Case

An auto-tagging feedback loop:1) Use TensorFlow locally to auto-tag pictures2) Check the score3) Call a cloud AI service4) Train our model5) Update tags

Page 19: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 19

Workflow Engine: How it worksA simple workflow with two external functions (serverless engine)

- Writes to two clouds- Only the metadata (including the

data’s location) is transferred in the topics (e.g., Function1 reads/writes data from/to the RING).

Page 20: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 20

Demo

Page 21: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 21

Questions ?

1,000+ registered Zenko Orbit users

Forum: https://forum.zenko.ioWebsite: www.zenko.io/blogGithub: https://github.com/scality/zenko

Page 22: Data Management in a Cloud-Agnostic World · Kafka topics: for asynchronous calls, the “function” is triggered by a Kafka event in a topic, and the status (success/failure) is

2019 Storage Developer Conference. © Scality. 22

Thank you.