Upload
julius-wood
View
227
Download
0
Embed Size (px)
Citation preview
The Limits of Java Performance:
Breaking through the Scalability Barriers imposed by the Java Platform.
Ron Kleinman, Lead Product Technologist
2 ©2008 Azul Systems, Inc.
Agenda
• Java Platform Scalability Barriers? What Scalability Barriers?─ The language is the platform. How big a platform?
• Avoiding the Barriers: What do people do today?
• When the Barriers can’t be avoided: Scalability Design Patterns─ Scaling Out: When performance fades, add some blades ─ Scaling Up: Performance Gains through Virtual Domains
─ Scaling Middleware: What seems local is remote─ Scaling External: Moving to a bigger house
• Focused Solution: Java Compute Appliance
• Building it out: Leveraging an integrated Appliance Architecture
3 ©2008 Azul Systems, Inc.
Scalability Barriers:What Scalability Barriers?
There just seems to be something about Managed Runtime Environments
“Perhaps the most commonly asked questions regarding memory management in .NET are: "How long does a garbage collection take?" and "How can I control when the garbage collector runs?" Apprehensive that "pauses" caused by garbage collections will be perceived by users, application developers often search for ways to control when garbage collections occur”.
- Steven Pratschner, Microsoft Program Manager for .Net Common Language Runtime
“Ruby’s garbage collector (GC) has become a problem for the Luz user experience. The GC process can cause the entire application to pause for upwards of 200ms at a time (on a P3 1.2ghz), which is simply unacceptable for an application doing real-time animation where, to achieve even 24fps, a new frame must be generated every 42ms. As a result, we see ‘hiccups’ in the animation.”
- Gnome Coder
4 ©2008 Azul Systems, Inc.
What Java Platform Scalability Barriers?
• 1. Resource Limit on Maximum # usable GB of Memory─ Unused Memory must be freed and defragmented─ All “in use” references must be found, flagged and changed─ GC Pauses scale linearly with memory size (~ 1GB max)
• 2. Resource Limit on Maximum # usable CPUs─ Synchronized Method “Large Grained”
─ Lock suspends all but one thread ─ Lock Contention, not Data Contention
─ 4 CPUs vs. 400
5 ©2008 Azul Systems, Inc.
“Avoiding the Barriers”(Living within 4 CPU / 4 GB constraints)
• Force garbage collection to occur at non-peak times
• Write components in C, C++─ Use native components rather than Java components (JNI)
• Limit Java cache sizes─ Reuse own Memory (keep own pool)
• Handcrafted fine tuning: <20+ GC algorithm settings>─ GC Algorithm Dependency (a la VMS Fortran “File Open”)
• Throw more hardware at the problem (may not work)
• Recode the Application─ Increase CPU concurrency with finer-grained locking (R/W)─ Attack GC Pauses with Real Time Java Extensions
6 ©2008 Azul Systems, Inc.
“Confronting the Barriers”
“If something can’t go on forever, it won’t.”
-Herb Stein, Former Chair of the Council of Economic Advisers
(pre-2005)
7 ©2008 Azul Systems, Inc.
Dealing with Increasing Peak Loading:The Trade Exchange Program
Operating System
Memory CPUs
Application
JVM
StockDB Cache
8 ©2008 Azul Systems, Inc.
Maintaining Service Level Agreements in the face of massively increasing demand
• # Stock Feeds up─ More sources of data to correlate
• # Trades up─ Greater volume of transactions to handle
• # Metrics up─ More things to monitor for each trade
• Processing / Metric up─ “Secret Sauce” Trading algorithms more complex
• Required maximum response times way down─ 1-2 msec and lower─ Significant swings in Latency Jitter intolerable─ GC Pause can cost $$$
11 ©2008 Azul Systems, Inc.
Java Application Scalability Design Patterns:Adding Computing Capacity
• Multiple Real Application Instances─ 1. Horizontal (Scale Out - with Commodity Servers)─ 2. Vertical (Scale Up - with Hypervisor Domains on Enterprise Servers)
• Single Virtual Application Instance─ 3. Middleware (Scale Virtually – with customized software modules)
• Single Real Application Instance─ 4. External (Scale Specialized - with Java Compute Appliances)
12 ©2008 Azul Systems, Inc.
1. Horizontal Scale Out to host multiple instancesAdd more commodity servers to Data Center
Operating System
Memory CPUs
Application
JVM StockDB Cache
[M-Z]
??Operating System
Memory CPUs
Application
JVM StockDB Cache
[A-L]
??
13 ©2008 Azul Systems, Inc.
2. Vertical Scale Up to host multiple instancesCreate more virtual servers on Hypervisor
CPUsMemory
Operating System Operating System
Hypervisor
JVM
Application
JVM
Application
StockDB Cache
[M-Z]
StockDB Cache
[A-L]
14 ©2008 Azul Systems, Inc.
Breaking through the Java Platform Scalability Barrier:Hardware Servers vs. Virtual Servers
• Same Java Platform limitations within instance:
- Refactor Data
- Recode Application
- Peak load swings can still exceed JVM memory capacity / result in huge pauses
• Cloud “Orthogonal”
• Refactor Data (Shards)
• Recode Application
• Peak load swings can exceed resource limits
Partial crashes
Load Management
• Over Provisioning
Server sprawl
Issues
• Hypervisor provides better resource utilization.
• Reduces Server Sprawl. Easier to manage.
• Easy expansion via addition of homogeneous commodity servers.
• “Cloud-izable” / Hadoop-ish
Advantages
Separate Application Instances on Virtual Servers
Separate Application Instances on Hardware Servers
Strategy
Pure VerticalPure HorizontalScale:
15 ©2008 Azul Systems, Inc.
StockDB [A-Z]
3. Memory Scale Out to multiple systemsUse Middleware to simulate one huge memory heap
Operating System
MemoryCPUs
Application
JVM
Instrumented Byte Codes
Operating System
MemoryCPUs
Application
JVM
Instrumented Byte Codes
Operating System
PhysicalMemory
CPUs
Federated Global Memory Cache
Virtual Memory Hub
16 ©2008 Azul Systems, Inc.
Breaking through the Java Platform Scalability Barrier:Multiple Local Memory Heaps vs. Single Global Memory Heap
• Not all data elements can be shared (ex: hash keys).
• Cache misses can cause widely varying response latencies
• Performance dependent upon data usage (reads >> writes good)
• Multiple points of partial failure
• Central hub limits Cloud Computing
• Global thread locks tough to scale
• Refactor Data
• Recode Application
• Peak load swings can exceed resource limits
Partial crashes
Load Management
• Over Provisioning
Server sprawl
Issues
• Selected object elements shared, dynamically cached, transparently updated from central source
• Effective JVM memory limits transparently bypassed
• Easy expansion via addition of homogeneous commodity servers
• “Cloud-izable”
• Hadoop-ish
Advantages
+ Shared global memory supported by Java byte code instrumentation (get/put element)
Separate Application Instances on Commodity Servers with separate local memory
Strategy
+ Shared Global MemoryPure HorizontalScale:
17 ©2008 Azul Systems, Inc.
4. Externally Scale on a Specialized Java ApplianceAdd physical memory and CPUs as needed
Kernel
Memory CPUs
Application
JVM
StockDB Cache
Java Compute Appliance
JVMProxy
Original Deployed
Host
18 ©2008 Azul Systems, Inc.
Scalability with the Appliance Design Pattern
Physically Isolate Resource
Share Resource
Centrally Manage
Expand Capacity
Extend Functionality
Appliance
External Resources
Transparently Utilize Remote Hardware
Application
Memory
CPUs
Network
Storage
Application
Application
19 ©2008 Azul Systems, Inc.
Example #1: Router Share, Manage, Scale Up, Extend
RouterAppliance
Guaranteed Message Delivery
Auto-encryption
Protocol Gateway
High bandwidth WAN connections
NETWORK
Resource Externalized: (Network)
Application
Application
Operating System
Memory CPUs
20 ©2008 Azul Systems, Inc.
Example #2: Storage Area NetworkShare, Manage, Scale up, Extend
StorageStorage A
rea Netw
ork (SA
N)
Flash as Storage
Disk Mirroring
Need based Allocation
Resource Externalized: (Storage)
Application
Application
Operating System
Memory CPUs
21 ©2008 Azul Systems, Inc.
Ex #3: Java Compute Appliance (JCA)(Share, Manage, Scale up, Extend )
Transparently Bring the Application to the Resources
Operating System
Memory CPUs
Proxy JVM
Original Deployed System
Optimized Kernel
100’s GBs Memory
100’sCPUs
JavaApplication
Appliance JVM
JCA
22 ©2008 Azul Systems, Inc.
Ex #3: Java Compute Appliance (JCA)Complete Java Application / Deployed Platform separation
• JVM: Decouples a Java Application from the OS─ Decoupled from local hardware (& any Hypervisor)─ Decoupled from connected appliances─ Decoupled from Middleware─ Last remaining resource connections are Memory and CPUs
• Move the Java Application to its Computing Resources─ Decouple from the original deployment platform entirely─ Transparently redeploy on a Java Compute Appliance (JCA)─ Use Appliance Memory and CPUs─ Same appliance advantages apply
─ Share─ Centrally Manage─ Expand─ Optimize / Extend
─ And some other ones as well (Stability)
25 ©2008 Azul Systems, Inc.
An integrated Java Compute Appliance
Vega 3
Up to 864 CPU Cores, 768 Gbytes Memory
On-Chip Hardware Extensions
Azul Thread Execution Kernel (AzTEK)
Mission CriticalJava Application
Azul VM
Mission CriticalJava Application
Azul VM
26 ©2008 Azul Systems, Inc.
#1. The GC Pause Scalability Barrier:Make the problem part of the solution
• Problem
• Solution
• Maximum usable Memory Limit Removed─ Scale from 1 to 100 GB heap─ Constant response latencies of 1-3 msec─ No change to existing Java code
27 ©2008 Azul Systems, Inc.
Impact of Garbage Collection(Actual Financial Service Trading system under load)
Java Pause Time ComparisonDerivatives Trading Application
0
5
10
15
20
25
30
GC Iteration
Seco
nds
Native Pause Times Azul Pause Times
Performance Impact Complexity Impact
Native Configuration-Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:TargetSurvivorRatio=80 -XX:CMSInitiatingOccupancyFraction=85 -XX:SurvivorRatio=8-XX:MaxNewSize=320m -XX:NewSize=320m -XX:MaxTenuringThreshold=10
Azul Configuration-Xms3g –Xmx3g
With Azul(yes, that flat blue line)
Native
28 ©2008 Azul Systems, Inc.
#2. The Large-Grained Lock Scalability Barrier A two-tier Integrated approach
• Serialized portions of program severely limit scalability
• Amdahl’s Law: Efficiency = 1/P /((1- P) + P/N))(N = # of concurrent threads, P = run time fraction of parallelizable code)─ At 4 threads
─ 5% serialized code = 87%+ efficiency─ At 400 threads:
─ 5% serialized code = <5% efficiency!
• Solution─ Automated: Optimistic Thread Concurrency (OTC)─ Manual: Real Time Performance Monitoring (RTPM)
29 ©2008 Azul Systems, Inc.
Optimistic Thread Concurrency (OTC)
• Strategy: Assume no data contention
• How it Works─ Java Synchronized Block: Similar to DB transaction
─ Block is Transactional around synchronized {…} ─ Transparent Roll back if object element impacted
─ JVM Dynamic lock levels (Speculative, Thick) ─ Runtime profile based
• Where it Works─ Thread instances access different variables─ Thread instances access same variables for read─ Hash Table for product database: 100 readers for every writer
• When it Works─ Parallel execution of all threads in same synchronized method:─ Competition for actual data elements, not lock─ Amdahl’s law: Efficiency reflects actual data (not lock) contention times
30 ©2008 Azul Systems, Inc.
+ Real Time Performance Monitoring (RTPM)
• JVM-assisted deep visibility into Application Performance ─ Threads (List / States, Trace, Lock Contention details, CPU Usage)─ GC (Cycle phase results, min/max pauses, memory used / freed)─ Memory (Detailed Live Objects breakdown / updated every GC cycle)─ Socket IO (Open connections, quantity of data, associated latency)
• Performance Bottleneck and Problem Detection─ Multi-core processing & concurrency─ Memory Demands and Memory Leaks─ Multithreading Race Conditions (**)
• Zero Overhead─ Monitoring won’t impact application being monitored─ No disturbance to production environment
• Real Time – Always On─ Allows ID of Performance Problems as they happen─ No application restarts
31 ©2008 Azul Systems, Inc.
A Java Compute Appliance gives Java Applications Room to Scale
TRADITIONAL
Garbage Collection Pauses
2GB Heaps
Instabilities due to resource limitations
Over-provisioning and server sprawl
Lock Contention
2-4 CPUs 100s of CPUs
OTC / RTPM
Up to 670 GB Heaps
No resource related restarts
Server consolidation
Pauseless GC
32 ©2008 Azul Systems, Inc.
JCA Product Proof Point:Winner of Largest single instance JVM benchmark
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
7380E25KItaniumRX6600
T5220 PowerEdge2950
P570
SP
EC
jbb
200
5
New$4.5
M $0.7
5M
33 ©2008 Azul Systems, Inc.
Breaking through the Java Platform Scalability Barrier:Distributed Global Memory Heap vs. Appliance-enhanced JVM
• Not applicable to all Java Apps:•Heavy JNI Use•“Chatty” DB applications•Single threaded
• IT objections to new hardware configuration
• SAAS, not Cloud
• Not all data elements can be shared (ex: hash keys).
• Cache misses can cause widely varying response latencies
• Performance dependent upon data usage (reads >> writes good)
• Multiple points of partial failure
• Global data locks tough to scale
Issues
• Transparent JVM scalability to full utilization of all memory and CPU resources on the JCA.
• No “resource limit” crashes
• Predictably low response latencies
• Usable JVM memory limits transparently extended via local cache of larger global memory
• Easy CPU scalability via addition of commodity servers.
Advantages
Single Application Instance on Appliance provided with massive amounts of usable CPUs and memory
Separate Application Instances on Commodity Servers provided with local dynamically updated cache of shared global memory.
Strategy
External Java ApplianceShared Global MemoryScale:
34 ©2008 Azul Systems, Inc.
Summary
• Java barriers to scalability are becoming more painful:─ Memory utilization limited by GC pauses─ CPU utilization limited by course grained thread locks (“synchronized”)
• Obvious Workarounds take you only so far
• Standard “scale out” & “scale up” strategies have drawbacks
─ Server sprawl, code modifications, partial failures, ...
• Additional (and transparent) scalability solutions possible for “Managed Environments”
─ Shared Global Memory─ External Java Compute Appliance
• No one answer is right in all cases
36 ©2008 Azul Systems, Inc.
References
• Azul Engineer to Engineer Technical Site─ http://www.azulsystems.com/e2e/
• VMS Fortran File Open Options─ http://www.astro.virginia.edu/class/oconnell/astr511/idl_5.1_html/idl
130.htm
• My email address─ [email protected]
38 ©2008 Azul Systems, Inc.
Breaking through the Java Platform Scalability Barrier:Transparent App Redeployment on a Java Compute Appliance
• Not applicable to all Java Applications:
- Heavy JNI Use
- “Chatty” DB applications
- Single threaded
• IT objection to new hardware configuration
• SAAS, not Cloud
• Same Java Platform limitations within instance:
- Refactor Data
- Recode Application
- Peak load swings can still exceed JVM memory capacity
• Cloud orthogonal
• Refactor Data
• Recode Application
• Peak load swings can exceed resource limits
Partial crashes
Load Management
• Over Provisioning
Server sprawl
Issues
• Enough usable memory & CPUs to operate as before
• No “resource limit” crashes
• Scalability is Transparent
• Hypervisor provides better resource utilization.
• Reduces Server Sprawl. Easier to administrate,
• Easy expansion via addition of homogeneous commodity servers
• “Cloud-izable”
Strengths
Redeploy single Instance of Application on specialized JCA
Separate Application Instances on Virtual Servers
Separate Application Instances on Commodity Servers
Strategy
External ApplianceVerticalHorizontalScale:
39 ©2008 Azul Systems, Inc.
Other Advantage of an Integrated Appliance:Low Level hooks into Kernel / Hardware
• Compute Pool Manager─ Central view of Appliances Resource─ Policy based management
─ Establish resources guarantees ─ Set application resources
(min, max, redundancy, etc.)
• Real Time Performance Monitor─ Zero cost Java application probes─ Extensive memory & thread usage info─ Isolate problems even in Production
40 ©2008 Azul Systems, Inc.
Java Compute Appliance: Summary Value Proposition: Share, Manage, Scale up, Extend
• Large (100’s of GB) heap support / No user-visible pauses─ Reduce maximum response latency time / jitter─ Reduce total application instance count (fewer / larger instances)─ End crashes due to hitting memory limit under peak loading─ Enable new design alternatives (e.g. cache the entire database in
memory)
• Hardware-assisted Optimistic Thread Concurrency─ LHF Critical bottlenecks minimized
• JVM-assisted Real Time Performance Monitor─ Critical bottlenecks discovered
• No new APIs required (ex: Real Time Java)─ Any code changes for tuning / performance
• No changes to application deployment procedures