Upload
jlorenzocima
View
317
Download
2
Embed Size (px)
DESCRIPTION
As part of OTN Tour 2014 believes this presentation which is intented for covers the basic explanation of a solution of IMDG, explains how it works and how it can be used within an architecture and shows some use cases. Enjoy
Citation preview
Development of concurrent services using In-Memory Data Grids
OTN Tour 2014
by Julio Lorenzo
This Presentation
• Covers the basic explanation of a solution of IMDG
• Explains how it works and how it can be used within an architecture
• Shows some use cases
• Si, esta toda en ingles
Agenda
• Why?
• What is?
• Brief History
• How Its Works
• Use cases
Presentate Yourself!!
Hi! I’m Julio
Why IMDG?
To have performance close to big data using legacies
To solve problems using traditional technology are very expensive
Why IMDG?
Today, more than ever, there are many choices when it comes to storing your data
Just a few years ago
• Storing – Oracle – IBM DB2 – IBM Informix – SAP Sybase – MySQL – PosgreSQL – SQL Server – Lotus Notes
• Cache – Memcached – EhCache – New Generation
• GridGain • Tangosol (aka Coherence) • Terracota
So..For What
Really?
The Need for Speed, In Real Time…
Then, What is IMDG?
In-Memory
Data Grid (IMDG)
Is a data management software, its data structure resides entirely in RAM and is distributed among multiple servers
Handle Big Data’s “big-three V’s”: velocity, Variability, Volume
In Memory Data
Management
It is a IT solution to enabled to:
Scale-out Computing
Every node adds their CPU and RAM to the cluster, which can be used by all nodes
Resilience
Nodes can fail randomly without data loss while minimizing performance impact to running applications
(non-disruptive automated detection and recovery )
Programming Model
A way for developers to easily program the cluster of machines as if it were a single machine
Fast, Big Data
It enables very large data sets to be manipulated in main memory
Dynamic Scalability
Nodes (computers) can dynamically join the other computers in a grid (cluster)
Elastic Main
Memory
Every node adds their RAM to the cluster’s memory pool
Computational Grids
Reliable jobs execution, scheduling and load balancing
which products?
• Oracle Coherence
• Hazelcast
• GridGain
• GigaSpaces
• Terracotta
• Red Hat Infinispan
• VMWare Pivotal Gemfire
Brief History
Cache In process caching
of Key->Value data
structure
Distribute
Cache Partitioned cache
nodes
IMDG Partitioned system
of record
IMDG.next()
How It Works
Kernel View
Conceptual View
Detail View
Clustered Caching Explained Partitioned, Fault Tolerant, Self-Healing Cache
• Cluster of nodes holding % of primary data locally
• Back-up of primary data is distributed across all other nodes
• Logical view of all data from any node
• All nodes verify health of each other
• In the event a node is unhealthy, other nodes diagnose state
• Unhealthy node isolated from cluster
• Remaining nodes redistribute primary and back-up responsibilities to healthy nodes
?
Caching Patterns
Cache Aside -
Developer manages cache
• Check the cache before reading from data source
• Put data into cache after reading from data source
• Evict or update cache when updating data source
Cache
DAO
Caching Patterns
Read
Through/Write Through
• All data reads/writes occur through cache
• Cache miss causes load from data source automatically
• Updates to cache written synchronously to the data source
DAO Cache
Caching Patterns
Write Behind
• All data writes occur through cache
• Updates to cache written asynchronously to the data source
DAO Cache
Sharding
Unlimited Data and Processing
Capacity
• Data is load balanced across the data grid. • Data and processing capacity scales
linearly. • Ownership responsibilities also
partitioned. • Access and update latency are constant. • Best for large sets of frequently updated
data.
In-Memory Data Grid
Data
Applications
Process Process Process Process
Virtual Load Balancing
Fault Tolerance
High
Availability
• Automatic fault tolerance management. • Backups stored on separate machine. • Even distribution of backup
responsibilities. • Configurable number of backup copies. • Once-and-only-once processing
guarantees.
In-Memory Data Grid
Data
Applications
Process Process Process Process
Virtual Load Balancing
Fault Tolerance Management
Replicated Caching
Rapid Access to Reference Data
• Entire data set is replicated. • Data is stored in application ready
format. • Data access is immediate. • Updates are replicated across the
data grid. • Best for small sets of static data
In-Memory Data Grid
Process Process Process Process
Replication
Data
Applications
Near Caching
Rapid Data
Access
• Blend of replicated and partitioned topologies.
• Recently used data is stored locally. • Repetitive data access is local and
immediate. • Automatically populated upon data access. • Automatic invalidation of updated data. • Scale tiers independently.
In-Memory Data Grid
Data
Process Process Process Process
Virtual Load Balancing
Application Application Application
Parallel Processing
Querying,
Processing, Aggregating in the Data Grid
• Send the processing to where the data lives.
• Processing performed in parallel across the grid.
– Query the Data Grid
– Continuous Query Cache
– Parallel Processing on the Data Grid
– Map/Reduce Aggregation
• Once-and-only-once guarantees.
• Processing scales with the grid.
In-Memory Data Grid
Process Process Process Process
Application
Processing
Unit
Event Notifications
Event Driven Architectures
• Grid based event notification
– Java Bean Model, key and filter based events
• “Live Objects”
– Objects can respond to own state changes
– State always recoverable
– Build complex Staged Event Driven Architectures
In-Memory Data Grid
Process Process Process Process
Application Application Application
Arq Patterns
Message Broker / Process
Sync
Web Session
Clustering
Service Bus Cache
Service IMDG
Some Real Life
Scenarios
WebLogic and Coherence Integration Built in Out of the Box
• Administration, operations and management built into WebLogic
• Declarative scale out session management
• Cache data access with synch/asynch read /write through
• Analytics, events and compute
Coherence
WebLogic
Coherence
WebLogic
Coherence
Coherence Coherence
Coherence
Coherence
WebLogic
Coherence
WebLogic
Coherence Coherence
Coherence
WebLogic
Coherence
WebLogic
Coherence
Coherence Coherence
Coherence
Data Cache Data Cache Query/Event
Query/Event Query/Event
Query/Event
Declarative Session
Management
Persistence Caching with
Read and Write Through
Query, compute and
event
• Business Experience Software
• Real Time processing • Real time analisys • Input 2000 events/s • Stored 4000
records/s
Arq
Design
Elastic Blend
Some Common Use Cases
Fast, Transactional
Data Access
• Inventory management
• Financial reference data
• Real time transactional data
Real Time Stream
Processing
• Fraud Detection
• Click Stream Analysis
• Real time analytics
• Continuous calculation
Heavyweight Offline
Calculations
• Trade Reconciliation
• Pattern analysis and detection
• Number crunching
Caching
• Database offloading
• Content heavy websites
Wasup?
• IMDG is an IT solution • It can be seen as a elastic
service • Deployed Stand-Alone or
Embed • Provides build in:
– Distributed Data Cache – Process Comunication
(Queue/Topic) – Process Sync (Locks) – Process Executions (Querys,
MPP, Map/Reduce)
• Increase Performance and Scalability , Reduce bottleneks
Q&A
@jlorenzocima
Acknowledgment
• Uri Cohen - In-Memory Data Grids, Demystified
• Oracle Coherence - Coherence-cnt1983393.pdf
• Hazelcast - Home Site
• GridGain - Home Site