Upload
muralidhar
View
222
Download
0
Embed Size (px)
Citation preview
7/31/2019 Gig as Paces DataGrid OGF Oct07
1/46
Data-Awareness and Low-
Latency on the Enterprise Grid
Getting the Most out of Your Grid withEnterprise IMDG
Shay Hassidim
Deputy CTO
Oct 2007
7/31/2019 Gig as Paces DataGrid OGF Oct07
2/46
Overall Presentation Goal
Understand the Space Based Architecture model and its 4
verbs.
Understand the Data contention challenge and the latency
challenge with Enterprise Grid based applications.
Understand why typical In-Memory-Data-Grid cant solve
the above problems and why the Enterprise IMDG can.
7/31/2019 Gig as Paces DataGrid OGF Oct07
3/46
7/31/2019 Gig as Paces DataGrid OGF Oct07
4/46
About myself Shay Hassidim
B.Sc. Electrical, Computer & Telecommunications engineer. Focus on
Neural networks & Artificial Intelligence , Ben-Gurion University , Graduated1994
Object and Multi-Dimensional DBMS Expert
Extensive knowledge with Object Oriented & Distributed Systems
Consultant for Telecom, Healthcare , Defense & Finance projects Technical Skills: MATLAB , C, C++, .Net , PowerBuilder , Visual Basic ,
Java , XML , CORBA , J2EE , ODMG , JDO , Hibernate, SQL , JMS , JMX,
IDE , GUI , Jini , ODBMS , RDBMS , JavaSpaces
In the past:
Sirius Technologies Israel - VMDB Applications & Tools team Leader
Versant Corp US. - Tools Lead Architect , R&D
Since 2003 - GigaSpaces VP Product Management (Based in Israel)
Since 2007 GigaSpaces Deputy CTO (Based in NY)
7/31/2019 Gig as Paces DataGrid OGF Oct07
5/46
GigaSpaces Technical
overview
7/31/2019 Gig as Paces DataGrid OGF Oct07
6/46
The Basics Data Grid: Caching Topologies
Partitioned Cache
Replicated Cache
Master / Local Cache
7/31/2019 Gig as Paces DataGrid OGF Oct07
7/46
So. . .What is Space-Based Architecture?
Utilizing a single logical/virtual resource to share:
Data
Logic
Events
Services: Interact with each other through the space
Can be co-located with data/events for faster results
Are deployed and managed in an adaptive and fail-safe way
} Objects! Data ProvisioningEvent
Propagation
Logic
Processing
7/31/2019 Gig as Paces DataGrid OGF Oct07
8/468
Space Based SOA using 4 Simple Verbs
Write TakeRead Write Notify
Write + Read = IMDG (Caching)
Write + Notify = Messaging
Write + Take = Parallel Processing
Take
Write
Read
Take
Notify
7/31/2019 Gig as Paces DataGrid OGF Oct07
9/46
IMDG Distributed In-Memory Query Support
Enable aggregation of data
transparently Support SQL Query
semantics
Continues query via
notifications
Local view client side
cache
Partitioned
Clustered SpaceRead
Space
proxy
Parallel Query
Local View updated using Continues Query
7/31/2019 Gig as Paces DataGrid OGF Oct07
10/46
Data virtualization IMDG Accessed by all popular API
and programming languages
JDBC
Clustered Space
Map/
JCache
SpaceApplications
Provides true data grid that
supports variety of
standard based data API
API Becomes just a view
Same data can be
accessed via multiple API
Combine the benefits of
the relational model with
OO model
CPP/.Net
7/31/2019 Gig as Paces DataGrid OGF Oct07
11/46
Integration with External Database 2 basic models
Write/Read Through
and Write behind
enables lazy load of
data from DB to thecache and async
persistency Complete mirroring
cache data into the
DB
Support also forblack box persistency
into RDBMS and
index file (light
embedded ODBMS)Sync/Async
Hibernate Cache
plug-in provides 2nd
level cache for
hibernate based
applications
7/31/2019 Gig as Paces DataGrid OGF Oct07
12/46
Seamless Integration with External Data Sources
The Mirror service ensures
Reliable synchronization with
minimal performance overhead
Mirror Service
Data is propagated seamlesslyfrom the IMDG to the external
Data source and visa versa
Through the CacheStore.
load
loadload
store store store
External Data
Source
Reliable Async Replication
7/31/2019 Gig as Paces DataGrid OGF Oct07
13/4613
Services can be
Java, C++, .Net
Content-Based
Routing
Shared state to
enable stateful
services
SBA Real-time SOA for Stateful Services
7/31/2019 Gig as Paces DataGrid OGF Oct07
14/46
Enterprise Data Grid unique features
Feature Benefits
Extended and Standard Querybased on SQL, and
ability to connect to IMDG using standard JDBCconnector.
- Makes the IMDG accessible to standard reporting tools.
- Makes accessing the IMDG just like accessing a JDBC-compatible database, reducing the learning curve.
SQL-based continuous query support. Brings relevant data close to the local memory of therelevant application instance.
Central management, monitoring and control. Allows the entire IMDG to be controlled and viewedfrom an administrators console.
Mirror Servicetransparent persistence of data from theentire IMDG to a legacy database or other datasource.
Allows seamless integration with existing reporting andback-office systems.
Real-time event notificationapplication instances canselectively subscribe to specific events.
Provides capabilities usually provided by messagingsystems, including slow-consumer support, FIFO,
batching, pub/sub, content-based routing.
7/31/2019 Gig as Paces DataGrid OGF Oct07
15/46
GigaSpaces solution for
Enterprise Grid
7/31/2019 Gig as Paces DataGrid OGF Oct07
16/46
7/31/2019 Gig as Paces DataGrid OGF Oct07
17/46
How can I bring front office application to the
grid?
The Latency challenge
Great, But
What about stateful applications? Data Contention challenge
7/31/2019 Gig as Paces DataGrid OGF Oct07
18/46
The Data Contention Challenge
Only stateless applications can scale up freely on the Grid.
Any application that needs to:a. Share state between more than one instance (service/process)
b. Store state using a central database
Could not scale easily!Could not scale easily! This implies
Partial analysis results checkpoints to enable recovery.
Managing a workflow involving more than one process.
Common data need to be shared between processes
7/31/2019 Gig as Paces DataGrid OGF Oct07
19/46
The Latency Challenge
Enterprise Grid designed for batch applications
Each client request is submitted as a job. Hardware resources are allocated.
Relevant software instances (service/process) are scheduled to run on the
resources and perform the work.
Impracticable with low-latency environments!Impracticable with low-latency environments!
Why?
An interactive application receives thousands of client requests per second, each
of which needs to be fulfilled within milliseconds.
It is impossible to respond fast enough in a job approach.
Throughput would be severely limited due to the need to schedule and launch
large numbers of application instances.
7/31/2019 Gig as Paces DataGrid OGF Oct07
20/46
Three Stages Approach to the Solution
1. In Memory Data Grid (IMDG)
2. Data Aware Grid using SLA driven containers
3. Adding front office application to the Grid using
Declarative Space Based Architecture (SBA)
7/31/2019 Gig as Paces DataGrid OGF Oct07
21/46
In Memory Data Grid (IMDG)
Data stored in the memory of numerous physical machines
instead of, or alongside, a database.
Eliminates I/O, network and CPU load.
Partitions the data and moves it closer to the
application.
However, IMDG in an Enterprise distributed environment,However, IMDG in an Enterprise distributed environment,
is only a partial solution!is only a partial solution!
Stage1
Stage1
7/31/2019 Gig as Paces DataGrid OGF Oct07
22/46
Data Aware Grid using SLA driven containers
Common wisdom holds that it is much easier to bring the business logic to the data
than to bring the data to the business logic.
But Not all IMDG support data & business logic co-locality!But Not all IMDG support data & business logic co-locality!
This results:
Unnecessary overhead caused by remote calls from business logic to IMDG
instances.
Data duplication, because business logic elements that use the same data are notnecessarily concentrated around the relevant IMDG instance.
And worst of all, data contention, because several business logic elements might
access the same IMDG instance - leading to exactly the problem the IMDG was
meant to solve!
Requirements for a Data-Aware Grid
The Enterprise Grid must know which data is stored on which IMDG instances.
There must be a way to guarantee data affinity- tasks must always be executed
with the relevant data coupled to them.
Stage2
Stage2
7/31/2019 Gig as Paces DataGrid OGF Oct07
23/46
Enterprise IMDG Deployment requirements
Deploying a shared IMDG rather than specific IMDG per
application requires: Improved resource utilization
With the IMDG as a shared resource, memory and CPUs
available to the IMDG instances can be shared between
different applications, depending on their current data loads.
It is also much easier to scale the IMDG to respond to
changing data needs
Lower total cost of ownership
Installation, testing, configuration, maintenance and
administration of the IMDG is performed centrally for all the
applications on the Grid.
Stage2
Stage2
7/31/2019 Gig as Paces DataGrid OGF Oct07
24/46
Enterprise IMDG requirements for grid environments
Sensitivity to Demand for Data vs. Available Resources
Free (Memory) resources when there is no need for them
Multi-Tenancy
Continuous High-Availability
Hot fail-over
Versioningit should be possible to upgrade or update the IMDG instances without affecting
the data or interrupting access.
Configuration changesit should be possible to change configuration without affecting
availability of the IMDG instances.
Schema evolutionchanging the data structure (i.e. adding or modifying classes) should not
affect the existing data and should not require downtime.
Isolation (Groups, instances, Data)
Content-Based Security Explicit Control over IMDG Instance Locations (manual relocation while the system is
running)
Integration with Existing Systems
Stage2
Stage2
7/31/2019 Gig as Paces DataGrid OGF Oct07
25/46
Strategies for adding data awareness to the grid
Scenario Method of Providing Data Awareness
IMDG instances deployed directly byEnterprise Grid (without SLA-DrivenContainers).
Integration using affinity keystheEnterprise Grid and users submitting tasksshare special keys that identify the datarelevant to each task. In this way theEnterprise Grid can execute tasks on the samemachine as the relevant data.
SLA-Driven Containers are launched byEnterprise Grid (each container launchesrelevant IMDG instances).
Provides data awareness implicitlydata-intensive procedures can run in the SLA-Driven Container, together (co-located) withthe IMDG instances. Because the container
itself is data aware, data affinity can beguaranteed, without making the EnterpriseGrid itself data aware.
Stage2
Stage2
Stage3
Stage3
7/31/2019 Gig as Paces DataGrid OGF Oct07
26/46
Adding front-office to the grid using Declarative SBA
All services are collocated on the same machine
Transparent data affinity via content based routing (i.e. hash based load-balancing)
Sharing can be done in local memory => the lowest possible latency.
Stage3
Stage3
Processing
unit
7/31/2019 Gig as Paces DataGrid OGF Oct07
27/46
27
Declarative SBA (cont.)
So what it this processing unit?
A mini-application which can perform the
entire business process.
Accept a user request, perform all steps of
the transaction on its own, and provide a
result.
Removes the need for sharing of state and
partial results between different
components of the application running ondifferent physical machines.
Stage3
Stage3
7/31/2019 Gig as Paces DataGrid OGF Oct07
28/46
Provides built-in support for deployment of Spring based
applications
Virtualize the network and physical resources from the
application
Handles Fail Over, Scaling and Relocation policies using
SLA based definitions.
Provides distributed dependency injection to handle
partial failure and deployment dependency.
Provides single point of access for monitoring and
management
SLA Driven Application Service Container Stage3Stage3
7/31/2019 Gig as Paces DataGrid OGF Oct07
29/46
SLA: Failover policy Scaling policy Ststem requirements Space cluster topology
PU Services beans definition
SLA Driven Deployment Stage3Stage3
7/31/2019 Gig as Paces DataGrid OGF Oct07
30/46
Fail-OverFailure
Continuous High Availability Stage3Stage3
7/31/2019 Gig as Paces DataGrid OGF Oct07
31/46
VM 1 ,2GGSCGSC
VM 3 , 2GGSCGSC
Dynamic Partitioning = Dynamic Capacity Growth
VM 2 ,2GGSCGSC
Max Capacity=2GMax Capacity=4GMax Capacity=6G
E F
Partition 1Partition 1
A B
Partition 2Partition 2
CD
Partition 3Partition 3
In some point VM 1 free memory is
below 20 % - it about the time to
increase the capacity lets move
Partitions 1 to another GSC and
recover the data from the running
backup!
Later .. Partition 2 needs to
move After the move ,
data is recovered from the
backup
VM 5 , 4GGSCGSCVM 4 ,4GGSCGSC
A BPartition 2Partition 2
E F
Partition 1Partition 1
CD
Partition 3Partition 3
P - PrimaryP - Primary
B - BackupB - Backup
PP
PP
PP
BB
BB BB
7/31/2019 Gig as Paces DataGrid OGF Oct07
32/46
A closer look at
OpenSpaces and Declarative
SBA Development
7/31/2019 Gig as Paces DataGrid OGF Oct07
33/46
Step 1:
Implement POJO domain model
Step 2:
Implement the POJO Services
Step 3: Wire the services through spring
Step 4:
Packaging
Deploy to Grid (Scale-Out)
Declarative Spring-SBA How it works.
7/31/2019 Gig as Paces DataGrid OGF Oct07
34/46
@SpaceClass
public class Data {
@SpaceId(autoGenerate = true)
public String getId() {
return id;}
@SpaceRouting
public Long getType() {
return type;
}
public void setProcessed(boolean processed) {
this.processed = processed;
}
}
SpaceClass indicate that this is aSpaceEntry SpaceClass includes
classlevel attributes such as
FIFO,Persistent
SpaceId used to define the key for thatentry.
SpaceRouting used to set the data
affinity i.e. define the partition where thisentry will be routed to.
The POJO Based Data Domain Model
7/31/2019 Gig as Paces DataGrid OGF Oct07
35/46
public class DataProcessor implements IDataProcessor {
@SpaceDataEvent
public Data processData(Data data) {
data.setProcessed(true);
data.setData("PROCESSED : " + data.getRawData());
// reset the id as we use auto generate true
data.setId(null);System.out.println(" ------ PROCESSED : " + data);
return data;
}
}
SpaceDataEvent annotation marks theprocessData method as the one that need to be called
when an event is triggered
Order Processor Service Bean
7/31/2019 Gig as Paces DataGrid OGF Oct07
36/46
7/31/2019 Gig as Paces DataGrid OGF Oct07
37/46
Write
Space BUS
Order
Processor
ServiceBean
Polling Event
Container
Notify Event
Container
Processed
Orders
Routing
Service Bean
Take Write Notify
Data Loader
Space Proxy
Direct Data Loader Client
7/31/2019 Gig as Paces DataGrid OGF Oct07
38/46
Order Proxy
Order Processor
Client
SpaceServiceProxyFactoryBean
Invoke
Write
SpaceInvokeData OrderProcessor
Delegator
Space BUS
Order
Processor
ServiceBean
SpaceServiceExporter
Take
SpaceInvokeData
Write
result
ProcesData
Space Based Remoting
7/31/2019 Gig as Paces DataGrid OGF Oct07
39/46
Order Proxy
Order Processor
Client
SpaceServiceProxyFactoryBean
Invoke
Write
SpaceInvokeData
OrderProcessor
Delegator
Space BUS
OrderProcessor
ServiceBean
SpaceServiceExporter
Take
SpaceInvokeData
Write
result
ProcessData
Space Based Remoting Inherent Scalability/Reliability
7/31/2019 Gig as Paces DataGrid OGF Oct07
40/46
Looking into the Future Many Enhancements!
Enhance Performance
Built in infiniband support Voltaire , Cisco
Enhance Database integration
Enhance the Space Mirror support (async persistency)
Enhance partnership and integration with grid vendors DataSynapse , Platform Computing , Sun Grid Engine, Microsoft
Compute Cluster Server
Enhance CPP and .Net support
Performance optimization first goal same as java
Support for complex object mapping
7/31/2019 Gig as Paces DataGrid OGF Oct07
41/46
Conclusions and Summary
Typical IMDG wont help you
You need Data Aware Enterprise IMDG to solve the data
contention and latency challenges.
Data affinity need its twin: data & business locality
The Enterprise IMDG co-locates the data with thebusiness logic
Using self-sufficient autonomic processing unit deployed into
SLA based container that scales via the Enterprise Grid
The Enterprise IMDG bring the Front-office into the grid
Makes the grid a utility model for wide spectrum of applications
across the organization
7/31/2019 Gig as Paces DataGrid OGF Oct07
42/46
Case
Studies
7/31/2019 Gig as Paces DataGrid OGF Oct07
43/46
A Dynamically Scalable Architecture for Data Intensive
Trading Analysis Applications
Most financial organizations today use
Excel or Reporting Databases as the main
trading analysis tools. These are very difficult
to scale.
The solution is to create a shared In-Memory
Data Grid (IMDG) which stores the trading
data in a shared pool of machines. Commondata calculation and analysis run on that pool
as well, leveraging the available memory and
CPU resources.
JavaSpaces is a powerful model for
distributed persistence. GigaSpaces is a
JavaSpaces vendor providing Enterprise
features.
Spring hides the details of the JavaSpaces
model, allows effort to be focused on
requirements rather than frameworks.
Using shared data grid for all users
Running analytics close to the data
to improve performance and leverage the
available resources
7/31/2019 Gig as Paces DataGrid OGF Oct07
44/46
Reconciliation Calculation
7/31/2019 Gig as Paces DataGrid OGF Oct07
45/46
Questions?
7/31/2019 Gig as Paces DataGrid OGF Oct07
46/46
Thank [email protected]