44
SnappyData Unified Stream, interactive analytics in a single in- memory cluster with Spark www.snappydata. io Jags Ramnarayan CTO, Co-founder SnappyData Feb 2016

Intro to SnappyData Webinar

Embed Size (px)

Citation preview

Page 1: Intro to SnappyData Webinar

SnappyDataUnified Stream, interactive

analytics in a single in-memory cluster with Spark

www.snappydata.io

Jags RamnarayanCTO, Co-founder

SnappyDataFeb 2016

Page 2: Intro to SnappyData Webinar

SnappyData - an EMC/Pivotal spin out● New Spark-based open source project started by

Pivotal GemFire founders+engineers● Decades of in-memory data management experience● Focus on real-time, operational analytics: Spark inside

an OLTP+OLAP database

www.snappydata.io

Page 3: Intro to SnappyData Webinar

Lambda Architecture (LA) for Analytics

Page 4: Intro to SnappyData Webinar

Is Lambda Complex ?

Impala, HBase

Cassandra, Redis, ..

Application

Page 5: Intro to SnappyData Webinar

Is Lambda Complex ?

Impala, HBase

Cassandra, Redis, ..

Application

• Application has to federate queries?• Complex Application – deal with multiple

data models? Disparate APIs?• Slow?• Expensive to maintain?

Page 6: Intro to SnappyData Webinar

Can we simplify, optimize?Can the batch and speed layer use a single, unified serving DB?

Page 7: Intro to SnappyData Webinar

Deeper Look – Ad Impression Analytics

Page 8: Intro to SnappyData Webinar

Ad Impression Analytics

Ref - https://chimpler.wordpress.com/2014/07/01/implementing-a-real-time-data-pipeline-with-spark-streaming/

Page 9: Intro to SnappyData Webinar

Bottlenecks in the write pathStream micro batches in

parallel from Kafka to each Spark executor

Emit Key-Value pairs. So, we can group By

Publisher, Geo

Execute GROUP BY … Expensive Spark Shuffle …

Shuffle again in DB cluster … data format changes …

serialization costs

Page 10: Intro to SnappyData Webinar

Bottlenecks in the Write Path

• Aggregations – GroupBy, MapReduce

• Joins with other streams, Reference data

• Replication (HA) in Fast data store

Shuffle Costs• Data models

across Spark and FastData store are different

• Data flows through too many processes

• In-JVM copying

Copying, Serialization Excessive copying in Java based Scale out stores

Page 11: Intro to SnappyData Webinar

Goal – Localize processing• Can we Localize processing with state, avoid shuffle. • Can Kafka, Spark partitions and the data store share the same partitioning

policy? ( partition by Advertiser,Geo )

-- Show cassandra, memSQL ingestion performance (maybe later section?)

• Embedded State : Spark’s native MapWithState, Apache Samza, Flink? Good, scalable KV stores but IS THIS ENOUGH?

Page 12: Intro to SnappyData Webinar

Impedance mismatch for Interactive Queries

On Ingested Ad Impressions we want to run queries like: - Find total uniques for a certain AD grouped on geography, month - Impression trends for advertisers, or, detect outliers in the bidding price

Unfortunately, in most KV oriented stores this is very inefficient - Not suited for scans, aggregations, distributed joins on large volumes - Memory utilization in KV store is poor

Page 13: Intro to SnappyData Webinar

Why columnar storage?

Page 14: Intro to SnappyData Webinar

In-memory Columns but still slow?

SELECT SUBSTR(sourceIP, 1, X),

SUM(adRevenue)

FROM uservisits

GROUP BY SUBSTR(sourceIP, 1, X)

Berkeley AMPLab Big Data Benchmark -- AWS m2.4xlarge ; total of 342 GB

Page 15: Intro to SnappyData Webinar

Can we use Statistical methods to shrink data?• It is not always possible to store the data in full

Many applications (telecoms, ISPs, search engines) can’t keep everything

• It is inconvenient to work with data in full Just because we can, doesn’t mean we should

• It is faster to work with a compact summary Better to explore data on a laptop than a cluster

Ref: Graham Cormode - Sampling for Big Data

Can we use statistical techniques to understand data, synthesize something very small but still answer Analytical queries?

Page 16: Intro to SnappyData Webinar

SnappyData: A new approach

Single unified HA cluster: OLTP + OLAP + Stream for real-time analytics

Batch design, high throughput

Real-time design center- Low latency, HA,

concurrent

Vision: Drastically reduce the cost and complexity in modern big data

Rapidly Maturing Matured over 13 years

Page 17: Intro to SnappyData Webinar

SnappyData: A new approachSingle unified HA cluster: OLTP + OLAP +

Stream for real-time analytics

Batch design, high throughput

Real time operational Analytics – TBs in memory

RDB

Rows

TxnColumn

ar

API

Stream processing

ODBC, JDBC, REST

Spark - Scala, Java, Python, R

HDFSAQP

First commercial project on Approximate Query Processing(AQP)

MPP DB

Index

Page 18: Intro to SnappyData Webinar

Tenets, Guiding Principles● Memory is the new bottleneck for speed

● Memory densities will follow Moore’s law

● 100% Spark compatible - powerful, concise. Simplify runtime

● Aim for Google Search like speed for analytic queries

● Dramatically reduce costs associated with Analytics Distributed systems are expensive – Esp. in production

Page 19: Intro to SnappyData Webinar

Use Case Patterns• Stream ingestion database for spark• Process streams, transform, real-time scoring, store,

query

• In-memory database for apps • Highly concurrent apps, SQL cache, OLTP + OLAP

• Analytic caching pattern• Caching for Analytics over any “Big data” store (esp

MPP)• Federate query between samples and backend

Page 20: Intro to SnappyData Webinar

Why Spark – Confluence of Streaming, interactive, batch

Unifies batch, streaming, interactive comp.Easy to build sophisticated applicationsSupport iterative, graph-parallel algorithmsPowerful APIs in Scala, Python, Java

Spark

SparkStreamin

g SQLBlinkDB GraphX MLlib

Streaming

Batch, Interactiv

e

Batch, InteractiveInteractive

Data-parallel,Iterative

Sophisticated algos.

Source: Spark summit presentation from Ion Stoica

Page 21: Intro to SnappyData Webinar

Spark Cluster

Page 22: Intro to SnappyData Webinar

Snappy Spark Cluster Deployment topologies • Snappy store and Spark

Executor share the JVM memory

• Reference based access – zero copy

• SnappyStore is isolated but use the same COLUMN FORMAT AS SPARK for high throughput

Unified Cluster

Split Cluster

Page 23: Intro to SnappyData Webinar

Simple API – Spark Compatible

● Access Table as DataFrameCatalog is automatically recovered

● Store RDD[T]/DataFrame can be stored in SnappyData tables

● Access from Remote SQL clients

● Addtional API for updates, inserts, deletes

//Save a dataFrame using the Snappy or spark context … context.createExternalTable(”T1", "ROW", myDataFrame.schema, props );

//save using DataFrame APIdataDF.write.format("ROW").mode(SaveMode.Append).options(props).saveAsTable(”T1");

val impressionLogs: DataFrame = context.table(colTable)val campaignRef: DataFrame = context.table(rowTable) val parquetData: DataFrame = context.table(parquetTable)<… Now use any of DataFrame APIs … >

Page 24: Intro to SnappyData Webinar

Extends SparkCREATE [Temporary] TABLE [IF NOT EXISTS] table_name ( <column definition> ) USING ‘JDBC | ROW | COLUMN ’OPTIONS ( COLOCATE_WITH 'table_name', // Default none PARTITION_BY 'PRIMARY KEY | column name', // will be a replicated table, by default REDUNDANCY '1' , // Manage HA PERSISTENT "DISKSTORE_NAME ASYNCHRONOUS | SYNCHRONOUS",

// Empty string will map to default disk store. OFFHEAP "true | false" EVICTION_BY "MEMSIZE 200 | COUNT 200 | HEAPPERCENT",….. [AS select_statement];

Page 25: Intro to SnappyData Webinar

Simple to Ingest Streams using SQL

Consume from stream

Transform raw data

Continuous Analytics

Ingest into in-memory Store

Overflow table to HDFS

Create stream table AdImpressionLog(<Columns>) using directkafka_stream options (

<socket endpoints> "topics 'adnetwork-topic’ “,

"rowConverter ’ AdImpressionLogAvroDecoder’ )

streamingContext.registerCQ(

"select publisher, geo, avg(bid) as avg_bid, count(*) imps, count(distinct(cookie)) uniques from AdImpressionLog window (duration '2' seconds, slide '2' seconds)where geo != 'unknown' group by publisher, geo”)// Register CQ

.foreachDataFrame(df => { df.write.format("column").mode(SaveMode.Append)

.saveAsTable("adImpressions")

Page 26: Intro to SnappyData Webinar

AdImpression DemoSpark, SQL Code Walkthrough, interactive SQL

Page 27: Intro to SnappyData Webinar

AdImpression DemoPerformance comparison to Cassandra,

MemSQL - TBD

Page 28: Intro to SnappyData Webinar

AdImpression Ingest Performance

Loading from parquet files using 4 cores Stream ingest (Kafka + Spark streaming to store using 1 core)

Page 29: Intro to SnappyData Webinar

Why is ingestion, querying fast?Linearly scale with partition pruning

Input queue, Stream, IMDB all share the same partitioning strategy

Page 30: Intro to SnappyData Webinar

How does it scale with concurrency?● Parallel query engine skips Spark SQL

scheduling for low latency queries

● Column tables For fast scan/aggregations

Also automatically compressed

● Row tablesFast key based or selective queries Can have any number of secondary indices

Page 31: Intro to SnappyData Webinar

How does it scale with concurrency?● Distributed shuffle for joins, ordering ..

expensive● Two techniques to minimize this

1) Colocation of related tables. All related records are collocated.

For instance, ‘Users’ table partitioned across nodes and all related ‘Ad impressions” can be collocated on same partition. The parallel query would execute the Join locally on each partition

2) Replication of ‘Dimension’ tables Joins to dimension tables are always localized

Page 32: Intro to SnappyData Webinar

Low latency Interactive Analytic Queries – Exact or Approximate

Select avg(volume),symbol from T1 where <time range> group by symbol Select avg(volume), symbol from T1 where <time range> group by symbol

With error 0.1 confidence 0.8

Speed/Accuracy tradeoffEr

ror

30 mins

Time to Execute onEntire Dataset

InteractiveQueries

2 sec

Execution Time (Sample Size)32

100 secs

2 secs

Page 33: Intro to SnappyData Webinar

Key feature: Synopses Data● Maintain stratified samples

○ Intelligent sampling to keep error bounds low● Probabilistic data

○ TopK for time series (using time aggregation CMS, item aggregation)

○ Histograms, HyperLogLog, Bloom Filters, WaveletsCREATE SAMPLE TABLE sample-table-name USING columnar OPTIONS (

BASETABLE ‘table_name’ // source column table or stream table[ SAMPLINGMETHOD "stratified | uniform" ]STRATA name (

QCS (“comma-separated-column-names”)[ FRACTION  “frac” ]

),+ // one or more QCS

Page 34: Intro to SnappyData Webinar

Unified Cluster Architecture

Page 35: Intro to SnappyData Webinar

How do we extend Spark for Real Time?• Spark Executors are long

running. Driver failure doesn’t shutdown Executors

• Driver HA – Drivers run “Managed” with standby secondary

• Data HA – Consensus based clustering integrated for eager replication

Page 36: Intro to SnappyData Webinar

How do we extend Spark for Real Time?• By pass scheduler for low

latency SQL

• Deep integration with Spark Catalyst(SQL) – collocation optimizations, indexing use, etc

• Full SQL support – Persistent Catalog, Transaction, DML

Page 37: Intro to SnappyData Webinar

Performance – Spark vs Snappy (TPC-H)

See ACM Sigmod 2016 paper for detailsAvailable on snappydata.io blogs

Page 38: Intro to SnappyData Webinar

Performance – Snappy vs in-memoryDB (YCSB)

Page 39: Intro to SnappyData Webinar

Unified OLAP/OLTP streaming w/ Spark

● Far fewer resources: TB problem becomes GB.○ CPU contention drops

● Far less complex○ single cluster for stream ingestion, continuous queries,

interactive queries and machine learning● Much faster

○ compressed data managed in distributed memory in columnar form reduces volume and is much more responsive

Page 40: Intro to SnappyData Webinar

www.snappydata.io

SnappyData is Open Source● Available for download on Github today● https://github.com/SnappyDataInc/snappydata

● Learn more www.snappydata.io/blog● Connect:

○ twitter: www.twitter.com/snappydata○ facebook: www.facebook.com/snappydata○ linkedin: www.linkedin.com/snappydata○ slack: http://snappydata-slackin.herokuapp.com

Page 41: Intro to SnappyData Webinar

EXTRAS

www.snappydata.io

Page 42: Intro to SnappyData Webinar

Colocated row/column Tables in Spark

RowTable

ColumnTable

Spark ExecutorTASK

Spark Block Manager

Stream processing

RowTable

ColumnTable

Spark ExecutorTASK

Spark Block Manager

Stream processing

RowTable

ColumnTable

Spark ExecutorTASK

Spark Block Manager

Stream processing

● Spark Executors are long lived and shared across multiple apps

● Snappy Memory Mgr and Spark Block Mgr integrated

Page 43: Intro to SnappyData Webinar

Table can be partitioned or replicated

ReplicatedTable

Partitioned Table(Buckets A-H)

ReplicatedTable

Partitioned Table(Buckets I-P)

consistent replica on each node

PartitionReplica(Buckets A-H)

ReplicatedTable

Partitioned Table(Buckets Q-W)

PartitionReplica(Buckets I-P)

Data partitioned with one or more replicas

Use partitioned tables for large fact tables, Replicated for small dimension tables

Page 44: Intro to SnappyData Webinar

Spark Core Micro-batch Streaming

Spark SQL CatalystTXN

OLAP Job

Scheduler OLAP Query

OLTP QueryAQP

P2P Cluster Replication Svc

RowColumnTable Inde

xSampleTable

Shared nothing logs

Spark program

JDBC

ODBC

SnappyData Components