GOOGLE CLOUD BIG DATA IN THE BIGQUERY, APACHE BEAM, … · 2017. 6. 29. · BIG DATA IN THE GOOGLE...

Preview:

Citation preview

BIG DATA IN THE GOOGLE CLOUD BIGQUERY, APACHE BEAM, DATAFLOW

2107.06.12.

Kassai Csaba - Lead Data Architect

Farkas Péter - Data Engineer

BIG DATA IN THE GOOGLE CLOUD

● Google Cloud Storage

● Google BigQuery

● Apache Beam

● Google Cloud Pub/Sub

● Google Cloud Dataflow

● Case studies

Agenda

GCP

Cloud Storage

Google Cloud Storage

When a write succeeds, the latest copy of the object is guaranteed to be returned to any GET, globally. This applies to PUTs of new or overwritten objects and DELETEs.

Consistency

Google Cloud Storage

Object Lifecycle Management

● delete live/archived objects

● “downgrade” storage class

Actions Conditions

● age● create time● live/archive● # newer versions● storage class

Google Cloud Storage

Pricing● Storage● Data retrieval (Nearline,

Coldline)● Network● Operations

Google Cloud Storage

Quickstart

https://cloud.google.com/storage/docs/quickstart-consolehttps://cloud.google.com/storage/docs/quickstart-gsutil

A fast, economical and fully-managed

enterprise data warehouse for

large-scale data analytics

Google BigQuery

enterprise data warehouse

fast & large-scale

fully-managed

economical

Google BigQuery

Dremel

Google BigQuery

Structure

SQL QueryPetabit Network

BigQuery

Storage ComputeStreaming Ingest

Fast Batch Load

Google BigQuery

Columnar-storage

Size: 60 GB

c1 c2 c3 c4 c5

125 GB

80GB

45GB

99GB

20160101

20160102

20160103

20160104

20160105

Google BigQuery

(Almost) append-only

● Data Manipulation Language: with a lot of constraints

○ No required field

○ Empty streaming buffer

○ Partitioned tables are not supported

○ No multi-statement transaction

○ Limited concurrency

● Use as an append only db when possible

A BRIEF INTRODUCTION TO BIG QUERY

Structure / Dataset

PROJECT

DATASETS

Contain a collection of tables, views

Access controll applied to all tables/views in dataset

ACLs for Readers, Writers and OwnersAccess can be granted to datasets for users who are not members of the project

PROJECT

DATASETS

TABLES

A BRIEF INTRODUCTION TO BIG QUERY

Structure / Table

Data stored in managed storageCollection of columns and rows

Virtual tables defined by SQL query

Have a schema

Views are supported

Describes strongly-typed columns of values

PROJECT

DATASETS

JOBS

TABLES

A BRIEF INTRODUCTION TO BIG QUERY

Structure / Job

Used to start all potentially long-running actions

Examples:

Can be cancelled

Queries, Importing / exporting data, Copying data

A BRIEF INTRODUCTION TO BIG QUERY

Schema - Types

● INT, FLOAT, STRING, BOOLEAN, BYTE

● DATE, DATETIME, TIME, TIMESTAMP

● ARRAY: An ARRAY is an ordered list of zero or more elements of

non-ARRAY values

● STRUCT: Container of ordered fields each with a type (required) and field

name (optional).

A BRIEF INTRODUCTION TO BIG QUERY

Query results

Used by caching

Free storage

Limited lifetime

TEMPORARY TABLES

permanent

billed

USER-DEFINED TABLES

A BRIEF INTRODUCTION TO BIG QUERY

Pricing

/GB/month

in /MB/sec granularity

discount after 90 days

10 GB per month is free

STORAGE

amount of data processed by the query

First 1 TB/month free

Cached result free

Error - free

insert row by row via the REST API

/GB

QUERIESSTREAMING

INSERT

A BRIEF INTRODUCTION TO BIG QUERY

Interfaces

WEB UI CLI RESTFUL API

A BRIEF INTRODUCTION TO BIG QUERY

BQ basic exercises

https://cloud.google.com/bigquery/quickstart-web-ui goo.gl/jxU7a5

A BRIEF INTRODUCTION TO BIG QUERY

BQ as ETL tool

Daily snapshots of the source table as

CSV

Dimension table with Type-2 history

in BQ

A BRIEF INTRODUCTION TO BIG QUERY

BQ as ETL tool - Source

Schema● STORENO: unique id of the store● STORENAME: name of the store● CHAIN: name of the chain where store belong to. Can be null. ● STORETYPE: type of the store. INTERNAL or EXTERNAL. Only INTERNAL stores should be

imported into BQ. ● BATCHDATE: the date when the snapshot was created

Location: https://console.cloud.google.com/storage/browser/bdf-bigquery-demo/storedata/

Separator: ‘;’

A BRIEF INTRODUCTION TO BIG QUERY

BQ as ETL tool - Target

BQ - Schema● code: unique id of the store● name: name of the store● chain: name of the chain where store belong to. Can be null. ● valid_from: Type2 history● valid_to: Type2 history

A BRIEF INTRODUCTION TO BIG QUERY

BQ as ETL tool - SolutionData import:

bq load --autodetect --field_delimiter=';' --replace {project_id}:bdf_demo.store_raw gs://bdf-bigquery-demo/storedata/*

Query for data transformation:https://bigquery.cloud.google.com/savedquery/862243936433:2283d8e8c4e942e1ae5ed8f7ed3d1cbd

View for proper Type-2 history:

SELECT * EXCEPT(deleted) from ( SELECT *, LEAD(valid_from) over(PARTITION BY code ORDER BY valid_from ) AS valid_to FROM `{project_id}.bdf_demo.store`)WHERE deleted = FALSE

PROGRAMMING MODEL RUNNERS

APACHE BEAM MODEL

Processing- vs event-time

Source: The world beyond batch: Streaming 102 (Tyler Akidau)

APACHE BEAM MODEL

Watermark

A watermark with a value of time X makes the statement: “all input data with event times less than X have been observed.” As such, watermarks act as a metric of progress when observing an unbounded data source with no known end.

APACHE BEAM MODEL

Watermark

Source: The world beyond batch: Streaming 102 (Tyler Akidau)

Source SinkPTransformPCollection PCollection

APACHE BEAM MODEL

Pipeline structure

What results are being computed?

Where in event time they are being computed?

When in processing time they are materialized?

How earlier results relate to later refinements?

APACHE BEAM MODEL

Concepts

Element wise Aggregation Composite

APACHE BEAM MODEL

What are you computing?

PTransform

APACHE BEAM MODEL

What are you computing? PCollection<Integer> salesRecords = ...;

PCollection<Integer> totalSales = salesRecords

.apply(new Sum.SumIntegerFn());

APACHE BEAM MODEL

What are you computing?

Source: The world beyond batch: Streaming 102 (Tyler Akidau)

● Fixed

APACHE BEAM MODEL

Where in event time?

1

2

3

Key 1 Key 2 Key 3

● Fixed

● Sliding

APACHE BEAM MODEL

Where in event time?

12

3

Key 1 Key 2 Key 3

● Fixed

● Sliding

● Per-Session

APACHE BEAM MODEL

Where in event time?

12

4

Key 1 Key 2 Key 3

3

PCollection<Integer> salesRecords = ...;

PCollection<Integer> totalSales = salesRecords

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))

.apply(new Sum.SumIntegerFn());

APACHE BEAM MODEL

Where in event time?

APACHE BEAM MODEL

Where in event time?

Source: The world beyond batch: Streaming 102 (Tyler Akidau)

Time based Data-driven Composite

APACHE BEAM MODEL

When in processing time? Triggers

Triggers

APACHE BEAM MODEL

When in processing time? PCollection<Integer> salesRecords = ...;

PCollection<Integer> totalSales = salesRecords

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AtWatermark()

.withEarlyFirings(AtPeriod(Duration.standardMinutes(1)))

.withLateFirings(AtCount(1))))

.apply(new Sum.SumIntegerFn());

APACHE BEAM MODEL

When in processing time?

Source: The world beyond batch: Streaming 102 (Tyler Akidau)

Firing Elements Discarding Accumulating Accumulating & Retracting

Early 3, 4 7 7 7

Watermark 2, 6 8 15 15, -7

Late 3 3 18 18, -15

Total observed

18 18 40 18

APACHE BEAM MODEL

How refinements relate?

APACHE BEAM MODEL

What Where When How PCollection<Integer> salesRecords = ...;

PCollection<Integer> totalSales = salesRecords

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterProcessingTime

.pastFirstElementInPane()

.plusDelayOf(Duration.standardMinutes(1)))

.discardingFiredPanes())

.apply(new Sum.SumDoubleFn());

APACHE BEAM MODEL

Live demo - Events

00:01 00:02 00:03

23:59 00:00 00:01 00:02 00:03

5 732 6 4 71 9 8

23:59 00:00

5 7 3 2 6 4 1 97 8

10 27 15

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:02

watermark23:59:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:17

watermark23:59:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:21

watermark23:59:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:27

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:48

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:50

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:58

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:01:02

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:01:12

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:01:16

watermark00:01:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:01:48

watermark00:01:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:02:01

watermark00:01:00

APACHE BEAM MODEL

Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:02:22

watermark00:02:00

APACHE BEAM MODEL

Live demo - Pipeline 2 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:02

watermark23:59:00

time00:00:17

watermark23:59:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:21

watermark23:59:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:27

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:47

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:48

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:50

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:00:58

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:01:02

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:01:12

watermark00:00:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:01:16

watermark00:01:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:01:32

watermark00:01:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:01:48

watermark00:01:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:02:01

watermark00:01:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:02:18

watermark00:01:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

time00:02:22

watermark00:02:00

APACHE BEAM MODEL

Live demo - Pipeline 2

pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.accumulatingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

APACHE BEAM MODEL

Live demo - Pipeline 3 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());

.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))

.triggering(AfterWatermark.pastEndOfWindow())

.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()

.plusDelayOf(Duration.standardSeconds(30)))

.withLateFirings(AfterPane.elementCountAtLeast(1)))

.discardingFiredPanes())

.apply(Sum.integersGlobally());

.apply(FlipChartIO.write())

Events

● Messaging - Many-to-many topology

● Topic - subscription model

● No-ops

● At-least one delivery

● Rest API

● Scalable: 10000 message/sec by default

Google Cloud Pub/Sub

Google Cloud Pub/Sub

Google Cloud Pub/Sub

Intro

https://cloud.google.com/pubsub/docs/quickstart-consolehttps://cloud.google.com/pubsub/docs/quickstart-cli

CLOUD DATAFLOW

The Cloud Dataflow runner

● Fully managed, no-ops

execution environment

● Seamless integration with other

GCP services

● Autoscale

CLOUD DATAFLOW

Fully managed

● Dynamic work rebalancing

● Graph optimization

● Worker lifecycle management

CLOUD DATAFLOW

Monitoring interface

CLOUD DATAFLOW

Logging

CLOUD DATAFLOW

Codelab

goo.gl/k0qH7a

Processing and storing sales transactions in real time, in order to do:

● Performance metrics● Demand prediction● Logistic optimization● Collecting and selling insights

USE CASE

Retail BI system

USE CASE

Architecture

Recommended