Building Highly Scalable Spring Applications using In-Memory Data Grids

Embed Size (px)

Text of Building Highly Scalable Spring Applications using In-Memory Data Grids

PowerPoint Presentation

Building Highly Scalable Spring Applications using In-Memory Data GridsBy John Blum & Luke Shannon@john_blum

SPRINGONE2GXWASHINGTON, DCUnless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Presenters2

John Blum - @john_blumSpring Data GemFire Project LeadApache Geode CommitterGemFire Engineer/Technical LeadPivotal Software, Inc

Luke ShannonField/Community EngineerApache Geode CommitterPivotal Software, Inc.

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

We are awesome!2

Agenda3Introduction to Apache GeodeDistributed System & In-Memory Database Concepts

Overview of Spring Data GemFireHow to build highly scalable applications

Spring Data GemFire in ActionFast Foot Shoes DemoCaching Demo (?)

Whats New

QA

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Introduction to Apache Geode will cover the Why, What and How?

Lets get started3

Why Apache Geode?4MotivationVolume of Data (Big Data)Velocity of Data (Fast Data)Verity of Data (Data Accuracy)

Enables new and existing Spring applications to operate at cloud-scale in a consistent, highly-available and predictable manner in order to transact and analyze big, fast data in real-time thereby achieving meaningful and impactful business results.

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Need to manage large quantities of data under extreme load with accuracy and resilience in a reliable way.

Big Data == data lake (any and all data)Fast Data == processing streams of events in (near) real-time

All about Data Access4

What is Apache Geode?5In a nutshell

Open Source core of Pivotal GemFirehttps://pivotal.io/big-data/pivotal-gemfire

Apache Incubator projecthttps://wiki.apache.org/incubator/GeodeProposalhttp://geode.incubator.apache.org/

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

5

What is Apache Geode?6

A distributed, in-memory compute and data management platform that elastically scales to achieve high-throughput, low-latency access to big, fast data powering business critical, analytical applications in real-time. John Blum,

Elastic capacity

+/-NodesOps / SecLinear scalabilityLatency optimizeddata distribution

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Data is stored in-memory for improved performance (lower latency access) and distributed across the cluster for high-availability (high read/write throughput) with the option to persist data to disk (durability).

Scale Out rather Up

Throughput (or number of operations) increases as more nodes are added to the cluster

Data is stored in distributed, highly-concurrent, in-memory data structures to minimize context switching and contentionData is replicated & partitioned for fast, predictable read/write throughput6

Apache Geode Use Cases7Persistent, OLTP/OLAP Database (System of Record)JSR-107 Cache Provider (Key/Value Store)HTTP Session State ManagementDistributed L2 Caching for HibernateMemcached Server (Gemcached)Message Bus with guaranteed message deliveryGlorified version of ConcurrentHashMap

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Database ACID properties, local/global (JTA) transactional capable, Indexing, Querying (OQL) and Functions

Cache with Eviction, Expiration, Overflow (to Disk), Read-Through, Write-Through and Write-Behind

Messaging Apache Geode enables event-based application architectures with Register Interests (RI) and Pivotal GemFire builds on that with Continuous Queries (CQ)

ConcurrentMap implements java.util.concurrent.ConcurrentMap interface.7

China Railway Corporation

GemFire runs on ten primary x86 servers with over two terabytes of memory and there are ten backup servers this has replaced the 72 UNIX boxes and traditional RDBMS with a more efficient, cost-effective approach.

With so many people relying on the website for travel, it must be continuously available. Demand has far exceeded expectations and the future shows as much as 50% growth per year as mobile phone access is added.

https://pivotal.io/big-data/case-study/scaling-online-sales-for-the-largest-railway-in-the-world-china-railway-corporation8

China Railway Corporation

20 million users per day; 40,000 visits per second4.5 million ticket purchases &.Spikes of 15,000 tickets sold per minuteThe system is operating with solid performance and uptime. Now, we have a reliable, economically sound production system that supports record volumes and has room to grow

Dr. Jiansheng Zhu, Vice Director of China Academy of Railway Sciences

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

How Apache Geode Works9Stores data In-MemoryJVM Heap + Off-HeapFunctions as a Distributed System, In-Memory Data Grid (IMDG)Pools system resources across multiple nodes in a cluster to manage both application state and behaviorIncludes: Memory, CPU, Network & (optionally) Disk(optional) Stores data to DiskIn OPLOGS | HDFS for Overflow & Persistence

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

In a nutshell under-the-hood Apache Geode is implemented

Stores data in-memory with puts.

Stores data to disk (synchronously (default) or asynchronously) on persistence and overflowOplogs are append-only; compaction is necessaryHDFS is new and Geode can feed Apache Spark processing streams.9

Memory ManagementApache Geode manages memory usingEviction: LRUExpiration: Time-To-Live (TTL), Idle Timeout (TTI)Auto resource management: critical/eviction HEAP % thresholds(Region) Data Compression: SnappyJVM/GC Tuning

10

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Auto-resource management actually can prevent Cache (Region) put operations

10

Where to begin?11Data Node

Data NodeApplication

What about load?

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

All about Data Storage & Access

Start with GemFire/Geode Data Node, a single Cache NodeAdd Distributed Regions to store dataPerhaps start a cache server and connect a cache client application, orAn application peer cache node (with embedded cache)11

Where to begin?12Data Node

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Except, what happens when too many clients overload the node OutOfMemoryErrors!!12

Where to begin?13

Locator

High read throughputWhat about writes?

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Form a GemFire cluster with a Locator (or multicast networking)

Scale-out to handle loadData is Highly AvailableDurable with Replication & Disk PersistenceResilient to node failure; shared-nothing architecture (each node is independent)

Client Connection Pool with LocatorLoad BalancingFailoverSingle-hop, low/predictable latency, data access13

Where to begin?14

Locator

High read/write throughputWhat about consistency?

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Switch to PARTITION Regions (shard the data)High read/write throughputControl of redundancy level, partitioning policy (default is hash by key; use PartitionResolver to customize), and collocationAutomatic rebalance and restore redundancy in the case of peer data node failure14

Partition Region15

Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under aCreative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/

Consistency is achieved by writing to the Primary PARTITION (and then secondaries) and us