View
16.611
Download
1
Embed Size (px)
DESCRIPTION
A brief introduction into modern caching technologies, starting from distributed memcached to modern data grids like Oracle Coherence.Slides were presented during distributed caching tech talk in Moscow, May 17 2012.
Citation preview
From distributed caches to in-memory data grids
TechTalk by Max A. [email protected]
Max A. Alexejev2
Memory Hierarchy
R<1ns
L1~4 cycles,
~1nsL2
~10 cycles, ~3ns
L3~42 cycles, ~15ns
DRAM>65ns
Flash / SSD / USB
HDD
Tapes, Remote systems, etc
Cost
Storage term
Max A. Alexejev3
Software caches
Improve response times by reducing data access latency
Offload persistent storages
Only work for IO-bound applications!
Max A. Alexejev4
Caches and data location
Local Remote
Hierarchical
Distributed
Shared
Consistency
protocol
Distribution
algorithm
Max A. Alexejev5
Ok, so how do we grow beyond one node?
Data replication
Max A. Alexejev6
Pro’s and Con’s of replication
• Best read performance (for local replicated caches)• Fault tolerant cache (both local and remote)• Can be smart: replicate only part of CRUD cycle
Pro
• Poor writes performance• Additional network load• Can scale only vertically: limited by single machine size• In case of master-master replication, requires complex consistency protocol
Con
Max A. Alexejev7
Ok, so how do we grow beyond one node?
Data distribution
Max A. Alexejev8
Pro’s and Con’s of data distribution
• Can scale horizontally beyond single machine size
• Reads and writes performance scales horizontally
Pro
• No fault tolerance for cached data• Increased latency of reads (due to network
round-trip and serialization expenses)
Con
Max A. Alexejev9
What do high-load applications need from cache?
Low latency
Linear horizont
al scalabili
ty
Distributed cache
Max A. Alexejev10
Cache access patterns: Cache AsideFor reading data:
1. Application asks for some data for a given key
2. Check the cache
3. If data is in the cache return it to the user
4. If data is not in the cache fetch it from the DB, put it in the cache, return it to the user.
For writing data
5. Application writes some new data or updates existing.
6. Write it to the cache
7. Write it to the DB.
Overall:• Increases reads
performance• Offloads DB reads• Introduces race
conditions for writes
Client
DB
Cache
Max A. Alexejev11
Cache access patterns: Read Through
For reading data:
1. Application asks for some data for a given key
2. Check the cache
3. If data is in the cache return it to the user
4. If data is not in the cache – cache will invoke fetching it from the DB by himself, saving retrieved value and returning it to the user.
Overall:
• Reduces reads latency
• Offloads read load from underlying storage
• May have blocking behavior, thus helping with dog-pile effect
• Requires “smarter” cache nodes
Client
DB
Cache
Max A. Alexejev12
Cache access patterns: Write Through
For writing data
1. Application writes some new data or updates existing.
2. Write it to the cache
3. Cache will then synchronously write it to the DB.
Overall:
• Slightly increases writes latency
• Provides natural invalidation
• Removes race conditions on writes
Client
DB
Cache
Max A. Alexejev13
Cache access patterns: Write BehindFor writing data
1. Application writes some new data or updates existing.
2. Write it to the cache
3. Cache adds writes request to its internal queue.
4. Later, cache asynchronously flushes queue to DB on a periodic basis and/or when queue size reaches certain limit.
Overall:
• Dramatically reduces writes latency by a price of inconsistency window
• Provides writes batching
• May provide updates deduplication
Client
DB
Cache
Max A. Alexejev14
A variety of products on the market…
Memcached Hazelcast
Terracotta
EhCache
Oracle Coherence
Riak
Redis
MongoDB
Cassandra
GigaSpaces
Infinispan
…
Max A. Alexejev15
Lets sort em out!
KV caches
Memcached
Ehcache
…
NoSQL
Redis
Cassandra
MongoDB
…
Data Grids
Oracle Coherence
GemFire
GigaSpaces
GridGain
Hazelcast
Infinispan
Some products are really hard to sort – like Terracotta in both DSO and Express modes.
Max A. Alexejev16
Why don’t we have any distributed in-memory RDBMS?
• Is, if fact, an example of replication• Helps with reads distribution, but does not help with
writes• Does not scale beyond single master
Master – MultiSlaves configuration
• Helps with reads and writes for datasets with good data affinity
• Does not work nicely with joins semantics (i.e., there are no distributed joins)
Horizontal partitioning (sharding)
Max A. Alexejev17
Key-Value caches
• Memcached and EHCache are good examples to look at
• Keys and values are arbitrary binary (serializable) entities
• Basic operations are put(K,V), get(K), replace(K,V), remove(K)
• May provide group operations like getAll(…) and putAll(…)
• Some operations provide atomicity guarantees (CAS, inc/dec)
Max A. Alexejev18
Memcached• Developed for LiveJournal
in 2003
• Has client libraries in PHP, Java, Ruby, Python and many others
• Nodes are independent and don’t communicate with each other
Max A. Alexejev19
EHCache• Initially named “Easy Hibernate
Cache”
• Java-centric, mature product with open-source and commercial editions
• Open-source version provides only replication capabilities, distributed caching requires commercial license for both EHCache and Terracotta TSA
Max A. Alexejev20
NoSQL Systems
A whole bunch of different products with both persistent and non-persistent storage options. Lets call them caches and storages, accordingly.
Built to provide good horizontal scalability
Try to fill the feature gap between pure KV and full-blown RDBMS
Max A. Alexejev21
Case study: Redis
Written in C, supported by VMWare
Client libraries for C, C#, Java, Scala, PHP, Erlang, etc
Single-threaded async impl
Has configurable persistence
Works with K-V pairs, where K is a string and V may be either number, string or Object (JSON)
Provides 5 interfaces for: strings, hashes, sorted lists, sets, sorted sets
Supports transactions
hset users:goku powerlevel 9000 hget users:goku powerlevel
Max A. Alexejev22
Use cases: Redis
Good for fixed lists, tagging, ratings, counters, analytics and queues (pub-sub messaging)
Has Master – MultiSlave replication support. Master node is currently a SPOF.
Distributed Redis was named “Redis Cluster” and is currently under development
Max A. Alexejev23
Case study: Cassandra
• Written in Java, developed in Facebook.
• Inspired by Amazon Dynamo replication mechanics, but uses column-based data model.
• Good for logs processing, index storage, voting, jobs storage etc.
• Bad for transactional processing.
• Want to know more? Ask Alexey!
Max A. Alexejev24
In-Memory Data GridsNew generation of caching products, trying to combine benefits of replicated and distributed schemes.
Max A. Alexejev25
IMDG: Evolution
Modern
IMDG
Data Grids• Reliable storage
and live data balancing among grid nodes
Computational Grids• Reliable jobs
execution, scheduling and load balancing
Max A. Alexejev26
IMDG: Caching concepts
• Implements KV cache interface
• Provides indexed search by values
• Provides reliable distributed locks interface
• Caching scheme – partitioned or distributed, may be specified per cache or cache service
• Provides events subscription for entries (change notifications)
• Configurable fault tolerance for distributed schemes (HA)
• Equal data (and read/write load)
distribution among grid nodes
• Live data redistribution when nodes are going up or down – no data loss, no clients termination
• Supports RT, WT, WB caching patterns and hierarchical caches (near caching)
• Supports atomic computations on grid nodes
Max A. Alexejev27
IMDG: Under the hood• All data is split in a number of sections,
called partitions.
• Partition, rather then entry, is an atomic unit of data migration when grid rebalances. Number of partitions is fixed for cluster lifetime.
• Indexes are distributed among grid nodes.
• Clients may or may not be part of the grid cluster.
Max A. Alexejev28
IMDG Under the hood: Requests routingFor get() and put() requests:
1. Cluster member, that makes a request, calculates key hash code.
2. Partition number is calculated using this hash code.
3. Node is identified by partition number.
4. Request is then routed to identified node, executed, and results are sent back to the client member who initiated request.
For filter queries:
5. Cluster member initiating requests sends it to all storage enabled nodes in the cluster.
6. Query is executed on every node using distributed indexes and partial results are sent to the requesting member.
7. Requesting member merges partial results locally.
8. Final result set is returned from filter method.
Max A. Alexejev29
IMDG: Advanced use-cases
Messaging
Map-Reduce calculations
Cluster-wide singleton
And more…
Max A. Alexejev30
GC tuning for large grid nodes
An easy way to go: rolling restarts or storage-enabled cluster nodes. Can not be used in any project.
A complex way to go: fine-tune CMS collector to ensure that it will always keep up cleaning garbage concurrently under normal production workload.
An expensive way to go: use OffHeap storages provided by some vendors (Oracle, Terracotta) and use direct memory buffers available to JVM.
Max A. Alexejev31
IMDG: Market players
Oracle Coherence: commercial, free for evaluation use.
GigaSpaces: commercial.
GridGain: commercial.
Hazelcast: open-source.
Infinispan: open-source.
Max A. Alexejev32
TerracottaA company behind EHCache, Quartz and Terracotta Server Array.Acquired by Software AG.
Max A. Alexejev33
Terracotta Server ArrayAll data is split in a number of sections, called stripes. Stripes consist of 2 or more Terracotta nodes. One of them is Active node, others have Passive status.All data is distributed among stripes and replicated inside stripes.
Open Source limitation: only one stripe. Such setup will support HA, but will not distribute cache data. I.e., it is not horizontally scalable.
Max A
. A
lexeje
v
QA SessionAnd thank you for coming!