69
© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission. Virtualizing and Tuning Large Scale Java Platforms By Guillermo Tantachuco & Emad Benjamin

Virtualizing and Tuning Large Scale Java Platforms

Embed Size (px)

DESCRIPTION

Speakers: Emad Benjamin and Guillermo Tantachuco The session will cover various GC tuning techniques, in particular focus on tuning large scale JVM deployments. Come to this session to learn about GC tuning recipe that can give you the best configuration for latency sensitive applications. While predominantly most enterprise class Java workloads can fit into a scaled-out set of JVM instances of less than 4GB JVM heap, there are workloads in the in memory database space that require fairly large JVMs. In this session we take a deep dive into the issues and the optimal tuning configurations for tuning large JVMs in the range of 4GB to 128GB. In this session the GC tuning recipe shared is a refinement from 15 years of GC engagements and an adaptation in recent years for tuning some of the largest JVMs in the industry using plain HotSpot and CMS GC policy. You should be able to walk away with the ability to commence a decent GC tuning exercise on your own. The session does summarize the techniques and the necessary JVM options needed to accomplish this task. Naturally when tuning large scale JVM platforms, the underlying hardware tuning cannot be ignored, hence the session will take detour from the traditional GC tuning talks out there and dive into how you optimally size a platform for enhanced memory consumption. Lastly, the session will also cover vFabric reference architecture where a comprehensive performance study was done.

Citation preview

Page 1: Virtualizing and Tuning Large Scale Java Platforms

© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Virtualizing and Tuning Large Scale Java

Platforms By Guillermo Tantachuco & Emad Benjamin

Page 2: Virtualizing and Tuning Large Scale Java Platforms

2

About the Speaker – Guillermo Tantachuco

Over 18 years experience as software engineer and architect

Have been at Pivotal and VMware for the past 3 years as Sr. Field Engineer – received 2012 President’s Club Award!

At Pivotal, Guillermo works with customers to understand their business needs and challenges and helps them seize new opportunities by leveraging Pivotal solutions to modernize their IT architecture

Guillermo is passionate about his family, business, technology and soccer.

Page 3: Virtualizing and Tuning Large Scale Java Platforms

3

About the Speaker – Emad Benjamin

I have been with VMware for the last 8 years, working on Java and vSphere – received 2011 VMware President’s Club Award.

20 years experience as a Software Engineer/Architect, with last 15 years focused on Java development

Open source contributions

Prior work with Cisco, Oracle, and Banking/Trading Systems

Authored the following books

Page 4: Virtualizing and Tuning Large Scale Java Platforms

4

Agenda

Overview

Design and Sizing Java Platforms

Performance

Best Practices and Tuning ( & GC Tuning )

Tuning Demo

Customer Success Stories

Questions

Page 5: Virtualizing and Tuning Large Scale Java Platforms

5

Java Platforms Overview

Page 6: Virtualizing and Tuning Large Scale Java Platforms

6

Conventional Java Platforms

Java Platforms are multitier and multi org

DB Servers Java Applications

Load Balancer Tier

Load Balancers Web Servers

IT Operations

Network Team

IT Operations

Server Team

IT Apps – Java

Dev Team

IT Ops & Apps

Dev Team

Organizational Key Stakeholder Departments

Web Server Tier Java App Tier DB Server Tier

Page 7: Virtualizing and Tuning Large Scale Java Platforms

7

Middleware Platform Architecture on vSphere

SHARED, ALWAYS-ONINFRASTRUCTURE

SHARED INFRASTRUCTURE SERVICES

Capacity On Demand High AvailabilityDynamic

APPLICATION SERVICES

DB ServersJava ApplicationsLoad balancers Web Servers

VMware vSphere

High Uptime, Scalable, and Dynamic Enterprise Java ApplicationsLoad Balancers as VMs

Web Servers

Java Application Servers

Page 8: Virtualizing and Tuning Large Scale Java Platforms

8

Java Platforms Design and Sizing

Page 9: Virtualizing and Tuning Large Scale Java Platforms

9

Design and Sizing of Java Platforms on vSphere

Step 1 –

Establish Load profile

From production logs/monitoring reports measure:

Concurrent UsersRequests Per SecondPeak Response TimeAverage Response Time

Establish your response time SLA

Step 2

Establish Benchmark

Iterate through Benchmark test until you are satisfied with the Load profile metrics and your intended SLAafter each benchmark iteration you may have to adjust the Application Configuration Adjust the vSphereenvironment to scale out/up in order to achieve your desired number of VMs, number of vCPU and RAM configurations

Step 3 –

Size Production Env.

The size of the production environment would have been established in Step2, hence either you roll out the environment from Step-2 or build a new one based on the numbers established

Page 10: Virtualizing and Tuning Large Scale Java Platforms

10

Step 2 – Establish Benchmark

DETERMINE HOW MANY VMs Establish Horizontal Scalability

Scale Out Test How many VMs do you need to meet your

Response Time SLAs without reaching 70%-80% saturation of CPU?

Establish your Horizontal scalability Factor before bottleneck appear in your application

Scale Out Test

Building Block VM Building Block VM

SLA OK?

Test complete

Investigate bottlnecked layer Network, Storage, Application

Configuration, & vSphere

If scale out bottlenecked layer is removed, iterate

scale out test

If building block app/VM config

problem, adjust & iterate No

Building Block VM

Building Block VM

ESTABLISH BUILDING BLOCK VM Establish Vertical scalability

Scale Up Test Establish how many JVMs on a VM? Establish how large a VM would be in terms

of vCPU and memory

Scal

e U

p T

est

Building Block VM

Page 11: Virtualizing and Tuning Large Scale Java Platforms

11

Design and Sizing HotSpot JVMs on vSphere

JVM Max Heap -Xmx

JVM Memory

Perm Gen

Initial Heap

Guest OS Memory

VM Memory

-Xms

Java Stack -Xss per thread

-XX:MaxPermSize

Other mem

Direct native

Memory

“off-the-heap”

Non

Direct

Memory

“Heap”

Page 12: Virtualizing and Tuning Large Scale Java Platforms

12

Design and Sizing of HotSpot JVMs on vSphere

Guest OS Memory approx 1G (depends on OS/other processes)

Perm Size is an area additional to the –Xmx (Max Heap) value and is not GC-ed because it

contains class-level information.

“other mem” is additional mem required for NIO buffers, JIT code cache, classloaders, Socket

Buffers (receive/send), JNI, GC internal info

If you have multiple JVMs (N JVMs) on a VM then:

• VM Memory = Guest OS memory + N * JVM Memory

VM Memory = Guest OS Memory + JVM Memory

JVM Memory = JVM Max Heap (-Xmx value) + JVM Perm Size (-XX:MaxPermSize) +

NumberOfConcurrentThreads * (-Xss) + “other Mem”

Page 13: Virtualizing and Tuning Large Scale Java Platforms

13

Sizing Example

JVM Max Heap -Xmx

(4096m)

JVM Memory (4588m)

Perm Gen

Initial Heap

Guest OS Memory

VM Memory (5088m)

-Xms (4096m)

Java Stack -Xss per thread (256k*100)

-XX:MaxPermSize (256m)

Other mem (=217m)

500m used by OS

set mem Reservation to 5088m

Page 14: Virtualizing and Tuning Large Scale Java Platforms

14

Perm Gen

Initial Heap

Java Stack

Larger JVMs for In-Memory Data Management Systems

JVM Max Heap -Xmx (30g)

Guest OS Memory

-Xms (30g)

-Xss per thread (1M*500)

-XX:MaxPermSize (0.5g)

Other mem (=1g)

0.5-1g used by OS

Set memory reservation to 34g

JVM Memory for

SQLFire (32g)

VM Memory for

SQLFire (34g)

Page 15: Virtualizing and Tuning Large Scale Java Platforms

15

NUMA Local Memory with Overhead Adjustment

Physical RAM

On vSphere host

Physical RAM

On vSphere host

Number of VMs

On vSphere host

1% RAM

overhead

vSphere RAM

overhead

Number of Sockets

On vSphere host

Page 16: Virtualizing and Tuning Large Scale Java Platforms

16

Middleware ESXi Cluster

96GB RAM

2 sockets

8 pCPU per

socket

Middleware

components

47GB RAM VMs

with

8vCPU

Locator/heart beat

for middleware

DO NOT VMotion

Memory Available for all VMs = 96*0.98 -1GB =>

94GB

Per NUMA memory => 94/2

47GB

Page 17: Virtualizing and Tuning Large Scale Java Platforms

17

96 GB RAM

on Server

Each NUMA

Node has 94/2

47GB

8 vCPU VMs

less than

47GB RAM

on each VM ESX

Scheduler If VM is sized greater

than 47GB or 8 CPUs,

Then NUMA interleaving

Occurs and can cause

30% drop in memory

throughput performance

Page 18: Virtualizing and Tuning Large Scale Java Platforms

18

1

128 GB RAM

on server

2vCPU VMs

less than

20GB RAM

on each VM

4vCPU VM

40GB RAM

split by ESXi into

2 NUMA Clients

available in ESX4.1

ESXi

Scheduler 2

3

4

5

Page 19: Virtualizing and Tuning Large Scale Java Platforms

19

Java Platform Categories – Category 1

Smaller JVMs < 4GB heap, 4.5GB Java

process, and 5GB for VM

vSphere hosts with <96GB RAM is more

suitable, as by the time you stack the

many JVM instances, you are likely to

reach CPU boundary before you can

consume all of the RAM. For example

if instead you chose a vSphere host

with 256GB RAM, then 256/4.5GB =>

57JVMs, this would clearly reach CPU

boundary

Multiple JVMs per VM

Use Resource pools to manage LOBs Category 1: 100s to 1000s of JVMs

Resource Pool 1

Gold LOB 1

Resource Pool 2

SilverLOB 2

Use 4 sockets servers

to get more cores

Page 20: Virtualizing and Tuning Large Scale Java Platforms

20

Most Common Sizing and Configuration Question

JVM-1

JVM-2

JVM-1A

JVM-1

JVM-2

JVM-1

JVM-2

JVM-2A

JVM-3

JVM-4 Option-1 Scale out VM and JVM ( best)

Option-2 Scale Up JVM heap size (2nd best)

JVM-2

JVM-1

Option-3 Scale up VM and JVM (3rd best)

2GB 2GB 2GB 2GB

2vCPU 2vCPU 2vCPU 2vCPU

2vCPU 2vCPU

4GB 4GB

Page 21: Virtualizing and Tuning Large Scale Java Platforms

21

What Else to Consider When Sizing?

Job

Web

JVM-1

Job

Web

JVM-2

Job

Web

Job

Web

JVM-3

Job

Web

JVM-4

Vert

ical

Horizontal

Mixed workloads Job Scheduler vs Web app require different GC Tuning

Job Schedulers care about Throughput

Web apps care about minimize latency and response time

You can’t have both reduced response time and increased throughput, without

compromise – best to separate the concerns for optimal tuning

Page 22: Virtualizing and Tuning Large Scale Java Platforms

22

Java Platform Categories – Category 2

Fewer JVMs < 20

Very large JVMs, 32GB to 128GB

Always deploy 1 VM per NUMA node and size to fit

perfectly

1 JVM per VM

Choose 2 socket vSphere hosts, and install ample

memory128GB to 512GB

Example is in memory databases, like SQLFire and

GemFire

Apply latency sensitive BP disable interrupt coalescing

pNIC and vNIC, Dedicated vSphere cluster

Category 2: a dozen of very large JVMs

Use 2 sockets servers

to get larger NUMA

nodes

Page 23: Virtualizing and Tuning Large Scale Java Platforms

23

Java Platform Categories – Category 3

Category 3: Category-1 accessing data from Category-2

Resource Pool 1

Gold LOB 1

Resource Pool 2

SilverLOB 2

Page 24: Virtualizing and Tuning Large Scale Java Platforms

24

Java Platforms Performance

Page 25: Virtualizing and Tuning Large Scale Java Platforms

25

Performance Perspective

See the Performance of Enterprise Java Applications on VMware vSphere 4.1 and SpringSource tc Server at

http://www.vmware.com/resources/techresources/10158 .

Page 26: Virtualizing and Tuning Large Scale Java Platforms

26

Performance Perspective

See the Performance of Enterprise Java Applications on VMware vSphere 4.1 and SpringSource tc Server at

http://www.vmware.com/resources/techresources/10158 .

80% Threshold

% CPU

R/T

Page 27: Virtualizing and Tuning Large Scale Java Platforms

27

SQLFire vs. Traditional RDBMS

SQLFire scaled 4x compared to RDBMS

Response times of SQLFire are 5x to 30x faster

than RDBMS

Response times on SQLFire are more stable and

constant with increased load

RDBMS response times increase with increased

load

Page 28: Virtualizing and Tuning Large Scale Java Platforms

28

Load Testing SpringTrader Using Client-Server Topology

SpringTrader

Integration Services Application Tier SpringTrader

Application Service

SQLFire

Member 2

Redundant

Locators

SpringTrader Data Tier

SQLFire

Member1

Integration

Patterns

4 Application Services

Page 29: Virtualizing and Tuning Large Scale Java Platforms

29

vFabric Reference Architecture Scalability Test

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

0

2000

4000

6000

8000

10000

12000

1 2 3 4

Sca

lin

g f

rom

1 A

pp

Se

rvic

es

VM

Nu

mb

er

of

Use

rs

Number of Application Services VMs

Maximum Passing Users and Scaling With this topology

10400 users session or

3300 txns per second

Page 30: Virtualizing and Tuning Large Scale Java Platforms

30

10k Users Load Test Response Time

0

1

2

3

4

5

6

7

0 2000 4000 6000 8000 10000 12000

Se

con

ds

Number of Users

Operation 90th-Percentile Response-Time Four Application Services VMs

HomePage Register Login DashboardTab PortfolioTab

TradeTab GetHoldingsPage GetOrdersPage SellOrder GetQuote

BuyOrder Logout MarketSummary

10400 users session

Approx. 0.25 seconds

response time

Page 31: Virtualizing and Tuning Large Scale Java Platforms

31

Java Platforms Best Practices and Tuning

Page 32: Virtualizing and Tuning Large Scale Java Platforms

32

Most Common VM Size for Java Workloads

2 vCPU VM with 1 JVM, for tier-1 production workloads

Maintain this ratio as you scale out or scale-up, i.e. 1 JVM : 2vCPU

Scale out preferred over Scale-up, but both can work

You can diverge from this ratio for less critical workloads

2 vCPU VM

1 JVM (-Xmx 4096m)

Approx 5GB RAM Reservation

Page 33: Virtualizing and Tuning Large Scale Java Platforms

33

However for Large JVMs + CMS

For large JVMs

4+ vCPU VM

1 JVM (8-128GB)

Start with 4+ vCPU VM with 1 JVM, for tier-1 in memory data

management systems type of production workloads

Likely increase JVM size, instead of launching a second JVM

instance

Multiple 4vCPU+ will allow for ParallelGCThreads to be allocated 50%

of the available vCPUs to the JVM, i.e. 2 GC Threads +

Ability to increase ParallelGCThreads is critical to YoungGen

scalability for large JVMs

ParallelGCThreads should be allocated 50% of available vCPU to the

JVM and not more. You want to ascertain there other vCPUs

available for other txns

Page 34: Virtualizing and Tuning Large Scale Java Platforms

34

Which GC?

ESX doesn’t care which GC you select, because of the degree of

independence of Java to OS and OS to Hypervisor

Page 35: Virtualizing and Tuning Large Scale Java Platforms

35

GC Policy Types

GC Policy Type Description

Serial GC • Mark, sweep and compact algorithm

• Both minor and full GC are stop the world threads

• Stop the world GC means application is stopped while GC is executing

• Not very scalable algorithm

• Suited for smaller <200MB JVMs like Client machines

Throughput GC • Parallel GC

• Similar to Serial GC, but uses multiple worker Threads in parallel to increase

throughput

• Both Young and Old Generation collection are multi thread, but still stop-the-world

• number of threads allocated by -XX:ParallelGCThreads=<nThreads>

• NOT Concurrent, meaning when the GC worker threads run, they will pause your

application threads. If this is a problem move to CMS where GC threads are

concurrent.

Page 36: Virtualizing and Tuning Large Scale Java Platforms

36

GC Policy Types GC Policy Type Description

Concurrent GC • Concurrent Mark and Sweep, no compaction

• Concurrent implies when GC is running it doesn't pause your application threads –

this is the key difference to throughput/parallel GC

• Suited for application that care more about response time than throughput

• CMS does use more heap when compared to throughput/ParallelGC

• CMS works on OLD gen concurrently, but young generation is collected using

ParNewGC, a version of the throughput collector

• Has multiple phases:

• Initial mark (short pause)

• concurrent mark (no pause)

• Pre-cleaning (no pause)

• re-mark (short pause)

• Concurrent sweeping (no pause)

G1 • Only in J7 and mostly experimental, equivalent to CMS + compacting

Page 37: Virtualizing and Tuning Large Scale Java Platforms

37

Tuning GC – Art Meets Science!

Either you tune for Throughput or Latency, one at the cost of the other

Increase

Throughput

Reduce

Latency Tuning

Decisions

• improved R/T

• reduce latency impact

• slightly reduced throughput

• improved throughput

• longer R/T

• increased latency impact

Job

Web

Page 38: Virtualizing and Tuning Large Scale Java Platforms

38

Parallel Young Gen and CMS Old Gen

application threads minor GC threads concurrent mark and sweep GC

Young Generation Minor GC Parallel GC in YoungGen using

XX:ParNewGC & XX:ParallelGCThreads

-Xmn

Old Generation Major GC Concurrent using in OldGen using

XX:+UseConcMarkSweepGC

Xmx minus Xmn

S

0

S

1

Page 39: Virtualizing and Tuning Large Scale Java Platforms

39

What to measure when tuning GC?

Young Gen

Minor GC Old Gen

Major GC

Young Gen minor

GC duration

frequency frequency

Old Gen

GC duration

Page 40: Virtualizing and Tuning Large Scale Java Platforms

40

Why is Duration and Frequency of GC Important?

Young Gen

Minor GC Old Gen

Major GC

Young Gen minor

GC duration

frequency

frequency

Old Gen

GC duration

We want to ensure regular application

user threads get a chance to execute in

between GC activity

Page 41: Virtualizing and Tuning Large Scale Java Platforms

41

Further GC Tuning Considerations

General approach to investigating latency

• Determine Minor GC duration

• Determine Minor GC Frequency

• Determine worst Full GC duration

• Determine worst Full GC frequency

Minor GC measurements drive Young Generation refinements

Full GC measurements drive Old Generation refinements

The decision to switch to –XX:+UseConcMarkSweepGC

• If throughput collector's worst case FullGC duration/frequency compared with app latency is requirements is not

tolerable

Page 42: Virtualizing and Tuning Large Scale Java Platforms

42

High Level GC Tuning Recipe

Measure

Minor GC

Duration

and

Frequency

Adjust –Xmn

Young Gen size

and /or

ParallelGCThreads

Measure

Major GC

Duration

And

Frequency

Adjust

Heap space

–Xmx Or adjust CMSInitiatingOccupancyFraction

Adjust –Xmn

And/or

SurvivorSpaces

Step A-Young Gen Tuning

Step B-Old Gen Tuning

Step C-

Survivor Spaces

Tuning

Page 43: Virtualizing and Tuning Large Scale Java Platforms

43

Impact of Increasing Young Generation (-Xmn)

Young Gen

Minor GC Old Gen

Major GC

less frequent

Minor GC

but longer

duration

potentially

increased

Major GC frequency

You can mitigate the

increase in GC

frequency

by increasing -Xmx

You can mitigate the

increase in Minor GC

duration by increasing

ParallelGCThreads

Page 44: Virtualizing and Tuning Large Scale Java Platforms

44

Impact of Reducing Young Generation (-Xmn)

Young Gen

Minor GC Old Gen

Major GC

more frequent

Minor GC

but shorter

duration

Potentially

increased

Major GC duration

You can mitigate the

increase in Major GC

duration by

decreasing -Xmx

Page 45: Virtualizing and Tuning Large Scale Java Platforms

45

Survivor Spaces

Survivor Space Size = -Xmn / (-XX:SurvivorRatio + 2 )

• Decrease Survivor Ratio causes an increase in Survivor Space Size

• Increase in Survivor Space Size causes Eden space to be reduced hence

• MinorGC frequency will increase

• More frequent MinorGC causes Objects to age quicker

• Use –XX:+PrintTenuringDistribution to measure how effectively objects age in survivor

spaces.

Page 46: Virtualizing and Tuning Large Scale Java Platforms

46

Sizing The Java Heap

JVM Max Heap -Xmx

(4096m)

Eden Space

Survivor Space 2

Old Generation

Survivor Space 1

Slower

Full GC

Quick

Minor GC YoungGen

-Xmn (1350m)

OldGen 27460m

Page 47: Virtualizing and Tuning Large Scale Java Platforms

47

Decrease Survivor Spaces by Increasing Survivor Ratio

Young Gen

Minor GC Old Gen

Major GC

more frequent

Minor GC

but shorter

duration Hence Minor GC

frequency is reduced

with slight increase in

minor GC duration

S0 S1 S

0

S

1

Reduce

Survivor Space

Page 48: Virtualizing and Tuning Large Scale Java Platforms

48

Increasing Survivor Ratio Impact on Old Generation

Young Gen

Minor GC Old Gen

Major GC

S

0

S

1

Increased Tenure ship/promotion

to old Gen

hence increased Major GC

Page 49: Virtualizing and Tuning Large Scale Java Platforms

49

Why is Duration and Frequency of GC Important?

Young Gen

Minor GC Old Gen

Major GC

Young Gen minor

GC duration

frequency frequency

Old Gen

GC duration

We want to ensure regular application

user threads get a chance to execute in

between GC activity

Page 50: Virtualizing and Tuning Large Scale Java Platforms

50

CMS Collector Example

java –Xms30g –Xmx30g –Xmn10g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC –XX:CMSInitiatingOccupancyFraction=75

–XX:+UseCMSInitiatingOccupancyOnly -XX:+ScavengeBeforeFullGC

-XX:TargetSurvivorRatio=80 -XX:SurvivorRatio=8 -XX:+UseBiasedLocking

-XX:MaxTenuringThreshold=15 -XX:ParallelGCThreads=4

-XX:+UseCompressedOops -XX:+OptimizeStringConcat -XX:+UseCompressedStrings -XX:+UseStringCache

This JVM configuration scales up and down effectively

-Xmx=-Xms, and –Xmn 33% of –Xmx

-XX:ParallelGCThreads=< minimum 2 but less than 50% of available vCPU to the JVM. NOTE: Ideally use it for

4vCPU VMs plus, but if used on 2vCPU VMs drop the -XX:ParallelGCThreads option and let Java select it

Page 51: Virtualizing and Tuning Large Scale Java Platforms

51

IBM JVM – GC Choice -Xgc:mode Usage Example

-Xgcpolicy:Optthruput

(Default)

Performs the mark and sweep operations during

garbage collection when the application is paused to

maximize application throughput. Mostly not suitable for

multi CPU machines.

Apps that demand a high

throughput but are not very

sensitive to the occasional

long garbage collection

pause

-Xgcpolicy:Optavgpause

Performs the mark and sweep concurrently while the

application is running to minimize pause times; this

provides best application response times.

There is still a stop-the-world GC, but the pause is

significantly shorter. After GC, the app threads help out

and sweep objects (concurrent sweep).

Apps sensitive to long

latencies transaction-based

systems where Response

Time are expected to be

stable

-Xgcpolicy:Gencon Treats short-lived and long-lived objects differently to

provide a combination of lower pause times and high

application throughput.

Before the heap is filled up, each app helps out and

mark objects

(concurrent mark).

Latency sensitive apps,

objects in the transaction

don't survive beyond the

transaction commit

Job

Web

Web

Page 52: Virtualizing and Tuning Large Scale Java Platforms

52

Demo

Page 53: Virtualizing and Tuning Large Scale Java Platforms

53

Load Testing SpringTrader Using Client-Server Topology

jConsole

jMeter

SpringTrader

Page 54: Virtualizing and Tuning Large Scale Java Platforms

54

Results

EXECUTING JMETER SCRIPTS 10 TIMES - 5,000 SAMPLES PER JMETER

RUN

Average percentage

1st scenario 10M Latency 61 32 20 18 13 16 18 17 20 16 23.1

Throughput 270 411 491 562 541 540 507 537 486 511 485.6

164M Latency 45 10 7 5 6 6 6 6 7 6 10.4 54.978355

Throughput 345 521 597 605 541 548 530 544 558 545 533.4 9.84349259

2nd scenario 164M Latency 64 12 6 5 6 5 5 6 5 6 12 49.790795

Throughput 248 570 612 626 583 614 632 617 590 595 568.7 13.0166932

10M Latency 56 33 22 18 18 18 17 17 19 21 23.9

Throughput 297 416 515 526 554 560 555 559 544 506 503.2

Page 55: Virtualizing and Tuning Large Scale Java Platforms

55

Results

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9 10

10M Latency

164M Latency

0

100

200

300

400

500

600

700

1 2 3 4 5 6 7 8 9 10

10M Throughput

164M Throughput

55% Better

R/T 10% Better

Throughput

Page 56: Virtualizing and Tuning Large Scale Java Platforms

56

Middleware on VMware – Best Practices

Enterprise Java Applications

on VMware Best Practices

Guide

http://www.vmware.com/resources/techresources/1087

Best Practices for

Performance Tuning of

Latency-Sensitive Workloads

in vSphere VMs

http://www.vmware.com/resources/techresources/10220

vFabric SQLFire Best

Practices Guide

http://www.vmware.com/resources/techresources/10327

vFabric Reference

Architecture

http://tinyurl.com/cjkvftt

Page 57: Virtualizing and Tuning Large Scale Java Platforms

57

Middleware on VMware – Best Practices Summary

Follow the design and sizing examples we discussed thus far

Set appropriate memory reservation

Leave HT enabled, size bases on vCPU=1.25pCPU if needed

RHEL6 and SLES 11 SP1 have tickless kernel that does not rely on a high frequency interrupt-based timer, and

is therefore much friendlier to virtualized latency-sensitive workloads

Do not overcommit memory

Locators/heartbeat process should not be vMotion® migrated, it otherwise would lead to network split brain

problems

vMotion over 10Gbps when doing scheduled maintenance

Use Affinity and Anti-Affinity rules to avoid redundant copies on the same VMware ESX®/ESXi host

Page 58: Virtualizing and Tuning Large Scale Java Platforms

58

Middleware on VMware – Best Practices

Disable NIC interrupt coalescing on physical and virtual NIC

Extremely helpful in reducing latency for latency-sensitive

virtual machines

Disable virtual interrupt coalescing for VMXNET3

• It can lead to some performance penalties for other virtual machines on the ESXi host, as well as higher CPU

utilization to deal with the higher rate of interrupts from the physical NIC

This implies it is best to use dedicated ESX cluster for

Middleware Platforms

• All host are configured the same way for latency sensitivity and this insures non middleware workloads, such as other

enterprise applications are not negatively impacted

• This is applicable in category 2 workloads

Page 59: Virtualizing and Tuning Large Scale Java Platforms

59

Middleware on VMware – Benefits

Flexibility to change compute resources, VM sizes, add more hosts

Ability to apply hardware and OS patches while

minimizing downtime

Create more manageable system through reduced

middleware sprawl

Ability to tune the entire stack within one platform

Ability to monitor the entire stack within one platform

Ability to handle seasonal workloads, commit resources when they are needed and

then remove them when not needed

Page 60: Virtualizing and Tuning Large Scale Java Platforms

60

Customer Success Stories

Page 62: Virtualizing and Tuning Large Scale Java Platforms

62

Cardinal Health Virtualization Journey

6

2

Consolidation

< 40% Virtual

<2,000 VMs

<2,355 physical

Data Center Optimization

30 DCs to 2 DCs

Transition to Blades

<10% Utilization

<10:1 VM/Physical

Low Criticality Systems

8X5 Applications

Internal cloud

>58% Virtual

>3,852 VMs

<3,049 physical

Power Remediation

P2Vs on refresh

HW Commoditization

15% Utilization

30:1 VM/Physical

Business Critical Systems

SAP ~ 382

WebSphere ~ 290

Cloud Resources

• >90% Virtual

>8,000 VMs

<800 physical

Optimizing DCs

Internal disaster recovery

Metered service offerings (SAAS, PAAS, IAAS)

Shrinking HW Footprint

> 50% Utilization

> 60:1 VM/Physical

Heavy Lifting Systems

Database Servers

Virtual

HW

SW

Timeline 2005 – 2008 2009 – 2011 2012 – 2015

Theme Centralized IT

Shared Service

Capital Intensive - High

Response

Variable Cost

SubscriptionServices

DC

Page 63: Virtualizing and Tuning Large Scale Java Platforms

63

Virtualization Why Virtualize WebSphere on VMWare

DC strategy alignment

• Pooled resources capacity ~15% utilization

• Elasticity – for changing workloads

• Unix to Linux

• Disaster Recovery

Simplification and manageability

• High availability for thousands instead of thousands

of high availability solutions

• Network & system management in DMZ

Five year cost savings ~ $6 million

• Hardware Savings ~ $660K

• WAS Licensing ~ $862K

• Unix to Linux ~ $3.7M

• DMZ – ports~ >$1M

Page 64: Virtualizing and Tuning Large Scale Java Platforms

64

Thank you and are there any Questions?

Emad Benjamin,

[email protected]

You can get the book here:

https://www.createspace.com/3632131

Page 66: Virtualizing and Tuning Large Scale Java Platforms

Why have Java developers chosen Spring?

DI

AOP

TX

Core Model

J(2)EE usability

Testable, lightweight model for

programming

Application Portability

Powerful Service Abstractions

Deployment Flexibility

Page 67: Virtualizing and Tuning Large Scale Java Platforms

Spring

Deploy to Cloud or on

premise

Big, Fast,

Flexible Data

Web, Integration,

Batch

Core Model

GemFire

Page 68: Virtualizing and Tuning Large Scale Java Platforms

Spring Stack

DI AOP TX JMS JDBC

MVC Testing

ORM OXM Scheduling

JMX REST Caching Profiles Expression

Spring Framework

HATEOAS

JPA 2.0 JSF 2.0 JSR-250 JSR-330 JSR-303 JTA JDBC 4.1

Java EE 1.4+/SE5+

JMX 1.0+ WebSphere 6.1+

WebLogic 9+

GlassFish 2.1+

Tomcat 5+

OpenShift

Google App Eng.

Heroku

AWS Beanstalk

Cloud Foundry Spring Web Flow Spring Security

Spring Batch Spring Integration

Spring Security OAuth

Spring Social

Twitter LinkedIn Facebook

Spring Web Services

Spring AMQP

Spring Data

Redis HBase

MongoDB JDBC

JPA QueryDSL

Neo4j

GemFire

Solr Splunk

HDFS MapReduce Hive

Pig Cascading

Spring for Apache Hadoop

SI/Batch

Spring XD

Page 69: Virtualizing and Tuning Large Scale Java Platforms

Learn More. Stay Connected.

<your CTA here>

<related sessions>

Twitter: twitter.com/springsource

YouTube: youtube.com/user/SpringSourceDev

Google +: plus.google.com/+springframework

LinkedIn: springsource.org/linkedin

Facebook: facebook.com/groups/springsource