Development of concurrent services using In-Memory Data Grids

Development of concurrent services using In-Memory Data Grids

OTN Tour 2014

by Julio Lorenzo

This Presentation

• Covers the basic explanation of a solution of IMDG

• Explains how it works and how it can be used within an architecture

• Shows some use cases

• Si, esta toda en ingles

Agenda

• Why?

• What is?

• Brief History

• How Its Works

• Use cases

Presentate Yourself!!

Hi! I’m Julio

Why IMDG?

To have performance close to big data using legacies

To solve problems using traditional technology are very expensive

Why IMDG?

Today, more than ever, there are many choices when it comes to storing your data

Just a few years ago

• Storing – Oracle – IBM DB2 – IBM Informix – SAP Sybase – MySQL – PosgreSQL – SQL Server – Lotus Notes

• Cache – Memcached – EhCache – New Generation

• GridGain • Tangosol (aka Coherence) • Terracota

So..For What

Really?

The Need for Speed, In Real Time…

Then, What is IMDG?

In-Memory

Data Grid (IMDG)

Is a data management software, its data structure resides entirely in RAM and is distributed among multiple servers

Handle Big Data’s “big-three V’s”: velocity, Variability, Volume

In Memory Data

Management

It is a IT solution to enabled to:

Scale-out Computing

Every node adds their CPU and RAM to the cluster, which can be used by all nodes

Resilience

Nodes can fail randomly without data loss while minimizing performance impact to running applications

(non-disruptive automated detection and recovery )

Programming Model

A way for developers to easily program the cluster of machines as if it were a single machine

Fast, Big Data

It enables very large data sets to be manipulated in main memory

Dynamic Scalability

Nodes (computers) can dynamically join the other computers in a grid (cluster)

Elastic Main

Memory

Every node adds their RAM to the cluster’s memory pool

Computational Grids

Reliable jobs execution, scheduling and load balancing

which products?

• Oracle Coherence

• Hazelcast

• GridGain

• GigaSpaces

• Terracotta

• Red Hat Infinispan

• VMWare Pivotal Gemfire

Brief History

Cache In process caching

of Key->Value data

structure

Distribute

Cache Partitioned cache

nodes

IMDG Partitioned system

of record

IMDG.next()

How It Works

Kernel View

Conceptual View

Detail View

Clustered Caching Explained Partitioned, Fault Tolerant, Self-Healing Cache

• Cluster of nodes holding % of primary data locally

• Back-up of primary data is distributed across all other nodes

• Logical view of all data from any node

• All nodes verify health of each other

• In the event a node is unhealthy, other nodes diagnose state

• Unhealthy node isolated from cluster

• Remaining nodes redistribute primary and back-up responsibilities to healthy nodes

?

Caching Patterns

Cache Aside -

Developer manages cache

• Check the cache before reading from data source

• Put data into cache after reading from data source

• Evict or update cache when updating data source

Cache

DAO

Caching Patterns

Read

Through/Write Through

• All data reads/writes occur through cache

• Cache miss causes load from data source automatically

• Updates to cache written synchronously to the data source

DAO Cache

Caching Patterns

Write Behind

• All data writes occur through cache

• Updates to cache written asynchronously to the data source

DAO Cache

Sharding

Unlimited Data and Processing

Capacity

• Data is load balanced across the data grid. • Data and processing capacity scales

linearly. • Ownership responsibilities also

partitioned. • Access and update latency are constant. • Best for large sets of frequently updated

data.

In-Memory Data Grid

Data

Applications

Process Process Process Process

Virtual Load Balancing

Fault Tolerance

High

Availability

• Automatic fault tolerance management. • Backups stored on separate machine. • Even distribution of backup

responsibilities. • Configurable number of backup copies. • Once-and-only-once processing

guarantees.

In-Memory Data Grid

Data

Applications



Fault Tolerance Management

Replicated Caching

Rapid Access to Reference Data

• Entire data set is replicated. • Data is stored in application ready

format. • Data access is immediate. • Updates are replicated across the

data grid. • Best for small sets of static data

In-Memory Data Grid


Replication

Data

Applications

Near Caching

Rapid Data

Access

• Blend of replicated and partitioned topologies.

• Recently used data is stored locally. • Repetitive data access is local and

immediate. • Automatically populated upon data access. • Automatic invalidation of updated data. • Scale tiers independently.

In-Memory Data Grid

Data



Application Application Application

Parallel Processing

Querying,

Processing, Aggregating in the Data Grid

• Send the processing to where the data lives.

• Processing performed in parallel across the grid.

– Query the Data Grid

– Continuous Query Cache

– Parallel Processing on the Data Grid

– Map/Reduce Aggregation

• Once-and-only-once guarantees.

• Processing scales with the grid.

In-Memory Data Grid


Application

Processing

Unit

Event Notifications

Event Driven Architectures

• Grid based event notification

– Java Bean Model, key and filter based events

• “Live Objects”

– Objects can respond to own state changes

– State always recoverable

– Build complex Staged Event Driven Architectures

In-Memory Data Grid


Application Application Application

Arq Patterns

Message Broker / Process

Sync

Web Session

Clustering

Service Bus Cache

Service IMDG

Some Real Life

Scenarios

WebLogic and Coherence Integration Built in Out of the Box

• Administration, operations and management built into WebLogic

• Declarative scale out session management

• Cache data access with synch/asynch read /write through

• Analytics, events and compute

Coherence

WebLogic

Coherence

WebLogic

Coherence

Coherence Coherence

Coherence

Coherence

WebLogic

Coherence

WebLogic

Coherence Coherence

Coherence

WebLogic

Coherence

WebLogic

Coherence

Coherence Coherence

Coherence

Data Cache Data Cache Query/Event

Query/Event Query/Event

Query/Event

Declarative Session

Management

Persistence Caching with

Read and Write Through

Query, compute and

event

• Business Experience Software

• Real Time processing • Real time analisys • Input 2000 events/s • Stored 4000

records/s

Arq

Design

Elastic Blend

Some Common Use Cases

Fast, Transactional

Data Access

• Inventory management

• Financial reference data

• Real time transactional data

Real Time Stream

Processing

• Fraud Detection

• Click Stream Analysis

• Real time analytics

• Continuous calculation

Heavyweight Offline

Calculations

• Trade Reconciliation

• Pattern analysis and detection

• Number crunching

Caching

• Database offloading

• Content heavy websites

Wasup?

• IMDG is an IT solution • It can be seen as a elastic

service • Deployed Stand-Alone or

Embed • Provides build in:

– Distributed Data Cache – Process Comunication

(Queue/Topic) – Process Sync (Locks) – Process Executions (Querys,

MPP, Map/Reduce)

• Increase Performance and Scalability , Reduce bottleneks

Q&A

@jlorenzocima

Acknowledgment

• Uri Cohen - In-Memory Data Grids, Demystified

• Oracle Coherence - Coherence-cnt1983393.pdf

• Hazelcast - Home Site

• GridGain - Home Site