Upload
kinankazuki104
View
234
Download
0
Embed Size (px)
Citation preview
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
1/41
Fast Data Meets Big Data
Jags Ramnarayan, VMware, Inc.
-- Chief Architect, GemFire/SQLFire
Mike Stolz, VMware, Inc.-- Global Senior Staff Architect
APP-CAP1250
#vmworldapps
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
2/41
2
Disclaimer
This session may contain product features that are
currently under development.
This session/overview of the new technology represents
no commitment from VMware to deliver these features in
any generally available product.
Features are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Technical feasibi lity and market demand will affect final delivery.
Pricing and packaging for any new technologies or features
discussed or presented have not been determined.
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
3/41
3
Whats Common?
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
4/41
4
Whats Common?
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
5/41
5
Fast Data Meets Big Data
Big Data allows you tofind opportunities you
didnt know you had
Fast Data allows you to
respond to opportunities
before they are gone
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
6/41
6
Fast Data Meets Big Data
Working togetherthey enable entirely new business models
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
7/41
7
The Database is Being Stretched
Big Data
Petabytes vs.
Gigabytes
Democratize BI
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
8/41
8
The Database is Being Stretched
Big Data
Petabytes vs.
Gigabytes
Democratize BI
Fast Data Low latency expectations
Horizontal scale
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
9/41
9
The Database is Being Stretched
Big Data Flexible Data Petabytes vs.
Gigabytes
Democratize BI
Multi-structured data
Developer productivity
Fast Data Low latency expectations
Horizontal scale
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
10/41
10
The Database is Being Stretched
Big Data
Cloud Delivery
Flexible Data
Virtualized
Offered -as-a-Service
Petabytes vs.
Gigabytes
Democratize BI
Multi-structured data
Developer productivity
Fast Data Low latency expectations
Horizontal scale
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
11/41
11
Need a Horizontally Scalable, Elastic Data Management Solution
Add/remove dataservers dynamically
Grow orshrink dynamically
with no interruption of service or data loss
Elastic
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
12/41
12
Tiered Data Strategy
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
13/41
13
Data Warehouse
(structured)
Looking Back Traditional Batch Analytics
BatchDelay Transform
RDBMS
(structured)OLTP Data Validate
Enrich
OLTP
Query/Update
CEP/BAM
(vendor proprietary)
Business Events
Analytic Quer ies
Analytic Quer iesDelayed Batch
Processing
Transform
Polling UI
Online DB
Other Data
Online DB
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
14/41
14
Pipeline with Hadoop (Log Analytics)
Logs,
Raw data infiles
ACQUIRE
Sequentialprocess to filter,
extract, xFrm
TRANSFORM ANALYZE
Analytics
SQL DB
MPP DBBatch SQL
Visualization
HDFS,
MapReduce
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
15/41
15
The Real-time Pipeline
Stream Data(Social, SaaS,Transactional)
ACQUIRE
Filter, enrich,correlate
PROCESS IN REAL TIME BATCH
ANALYZE
SQL DB
MPP DBBatch/
single
events?
Raw Data in
batches
HDFS,MapReduce
Low latency
Online DB
Filtered events
Derived
Insight
Online Apps
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
16/41
16
New Architecture for Real-time (Custom)
StreamsBATCH
ANALYZE
SQL DB
MPP DB
HDFS,
MapReduce
In-Memory Data Grids
Buffer data, process events,
In-memory Map-reduce
(VMWare GemFire, SQLFire,
Oracle Coherence, etc.)
Stream Processing
Derive insight with continuous
event processing
(Apache S4, STORM, Esper,
StreamBase, GemFire)
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
17/41
17
What Principles Drive This Architecture?
Very low latency ingest, high scalable (write scalabili ty)
Support structured as well as unstructured
Real-time processing cannot throttle incoming stream(s)
Highly parallelizable with minimum IO (network and disk)
Be elastic
Cannot lose events else derived value is questionable
Post processing Raw, derived events (batch analytics)
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
18/41
18
What Principles Drive This Architecture?
STAGE to SCALE
Staged Events Driven Architecture
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
19/41
19
Acquire, Transform, Filter
(Fast Ingest)
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
20/41
20
In-memory Data Grid Concepts
Distributed memory oriented key-value store
Queriable, Indexable and transactional
Distr ibuted namespace of Maps (key-value)
Called Regions (GemFire, Hibernate), Cache(Oracle), etc.
2 key storage models: Replication, Partitioning
Handle thousands of concurrent connections
Synchronous replication for
slow changing data
Replicated
Region
Partition for large data or highly transactional data
Partitioned Region
Redundant copy
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
21/41
21
Acquire, Cache, Transform, etc.
High ingest partitioned buffering
Expiry based on TTL, idleTime
Windows count, heap size, LRU eviction
Works with r igid or flexible Schema (JSON, Objects, SQL)
Cache frequently used DB data for transform, massaging
Partit ioned listeners for filtering, event transform, etc.
21
Handle thousands of concurrent connections
Synchronous replication for
slow changing data
Replicated
Region
Partition for large data or highly transactional data
Partitioned Region
Redundant copy
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
22/41
22
22
Continuously Available
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
23/41
23
Complex Stream Processing
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
24/41
24
24
Continuous Filtering Using CQs
When data changes,
subscribers are pushed
Async events reliably
-Al l related data is
accessible at memory
speeds
Streams
Distributed
Processing
Apps subscribe to
streams using Queries:
Select * from Tweetswhere tweetCount > 10
and userId in (X,Y,Z)
CQ: Continuous Queries
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
25/41
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
26/41
26
Correlate, Joins with Data
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
27/41
27
Accessing Historical Data in Real-time
Correlations/Joins with History
Option 1) Keep history in memory
Option 2) Keep in MPP DB (Greenplum DB)
Option 3) In HDFS
27
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
28/41
28
Real-time Aggregations
with Parallel Processing
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
29/41
29
Map-reduce for Real Time
Hadoop M/R is for sequential batch processing and not for real time
Java Stored procedure
@DistributedFunction(regionName="trades")public List AnalyzeTrades(@FilterKey Set months, String portfolio) {
...
}
Parallel Data aware function execution
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
30/41
30
Ingestion to Hadoop, MPP DB
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
31/41
St d Pi li b i i i t ll t th
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
32/41
32
Staged Pipeline bringing it all together
Use Spring Integration to orchestrate the pipeline
Patterns: Pub-sub, splits, routers, Xfrm, etc.
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
33/41
33
Distribution of Analytic
Results
M lti Sit C bili t
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
34/41
34
Multi-Site Capabili ty
Single Cluster Spanning Data Centers
Data Fabric Node Data Fabric Node Data Fabric Node Data
Active Everywhere
M lti Sit C bili t
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
35/41
35
Multi-Site Capabili ty
Active Everywhere
Asynchronous, Fault Tolerant, Bi-Directional WAN Gateway
Gl b l D t Di t ib ti
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
36/41
36
Global Data Distribution
Distribute
GemFire can keep clusters that are distributed around the world
synchronized in real-time and can operate reliably in Disconnected,
Intermittent and Low-Bandwidth network environments.
Bringing It All Together What Wo ld It Look Like?
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
37/41
37
Bringing It All TogetherWhat Would It Look Like?
Existing technologies
working together
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
38/41
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
39/41
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
40/41
7/27/2019 CAP1250-Fast Data Meets Big Data.pdf
41/41