21
8/14/2019 Large Application Clusters http://slidepdf.com/reader/full/large-application-clusters 1/21

Large Application Clusters

Embed Size (px)

Citation preview

Page 1: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 1/21

Page 2: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 2/21

About Dan

• Responsible for Turbotax Online performance and NewBusiness Initiatives

• Previously responsible for turbotax.com, intuit.com,

quickbooks.com, quicken.com, back-end order systems,

content services, and many others

• The Intuit eStore cluster is the largest stateful JBoss cluster 

in the world

• Best Buy, American Airlines, Sony Online Entertainment, MTV, Rubbermaid, Oprah, Plantronics, JCrew,Neiman Marcus, FAA, US Army, Kodak, American Eagle Outfitters, Cabelas, Finish Line, Lexmark, Alcatel,

Deutsche Post, B&Q, Abbott Labs, United Health Group, Sun Microsystems, GlaxoSmithKline, AT&T Wireless,

Yellowpages, and many others

Page 3: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 3/21

Agenda

• Intros and Expectations

• Overview: Typical 3 Tier Architecture

• What to do when your cluster starts getting huge

• Tweak, Tune, Reduce: Before and After 

• Discussion

Page 4: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 4/21

Page 5: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 5/21

Overview: 3 Tier Architecture

Load Balancer

Web Server 1 Web Server 2

Database

App Server 1 App Server 2

F5 BigIP/Cisco CSS

Apache

Tomcat/JBoss/Websphere

Oracle/SQLServer/MySQL

Page 6: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 6/21

Overview: A Large Cluster 

Load Balancer

WS

Database

WS WS WS WS WS WS WS WS WS WS WS WS WS WS WS WS

WS WS WS WS WS WS WS WS WS WS WS WS WS WS WS WS WS

WS WS WS WS WS WS WS WS WS WS WS

Database

AS AS AS AS AS AS AS AS AS AS AS

AS AS AS AS AS AS AS AS AS AS AS

AS AS AS AS AS AS AS AS AS AS AS

ASAS AS

AS AS AS

AS AS AS

AS AS AS

45x Apache Servers

45x App Servers

4x JVMs each

180x JBoss

Old eStore

Page 7: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 7/21

How do you manage this?

Can you improve performance and reduce the number of machines needed?

How do you validate performance on something this big?

Code deployment to 300 machines?!

Can the network take it?

Who’s the poor guy who gets called when something breaks?

Why me?

That looks like fun!

Some things that come to mind

Page 8: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 8/21

How do you manage this?

Start on the individual tiers and improve/reduce

Expand up to the big picture level and TEST IN PRODUCTION!

Blow our testing lab to smithereens

Tweak and tune services, operating systems and network devices

Went to 64 bit Linux/Java - bigger heap sizes!

Improved our connection management - F5 BigIP tuning

How we approach it

Page 9: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 9/21

Overview: 3 Tier Architecture

Load Balancer

WS

Database

WS WS WS WS WS WS WS WS WS WS WS WS WS WS

Database

AS AS AS AS AS AS AS AS AS AS AS

AS AS AS AS AS AS AS AS AS AS AS

AS AS AS AS AS AS AS AS AS AS AS

AS AS AS

AS AS AS

AS AS AS

AS AS AS

New eStore

WS WS WS WS WS WS WS22 Apache Servers

45 App Servers

3 JVMs each

135x JBoss

Page 10: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 10/21

Overview: Silo Approach

Load Balancer

Silo 1

Database

Silo 2 Silo 3 Silo 4 Silo 5

AS

WS WS WS WS

AS AS AS

Rib Sandwich Architecture

Some Benefits:

Performance

testing

Rolling restartsand code

deployments

Page 11: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 11/21

Overview: Testing Approach

Load Balancer

Silo 1

Database

Silo 2 Silo 3 Silo 4 Silo 5

AS

WS WS WS WS

AS AS AS

The Prod Scale-Down Test - Poor man’s performance test

It’s Free

Gives you real

world answer

about your

capacity and

usage models

Some risk to

customers on

the site - good

monitoring is

essential

Page 12: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 12/21

Overview: Testing Approach

Load Balancer

Silo 1

Database

Silo 2 Silo 3 Silo 4 Silo 5

300k Virtual Users!!

Real World Production Testing?! Awesome!

Page 13: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 13/21

Observing and Tuning:

Load Balancer 

Features and Focus Areas

OneConnect

Content Compression

SSL Acceleration

Optimized TCP Profiles

Page 14: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 14/21

Observing and Tuning:

Web Server - Apache

PreFork 

Worker

Threading Models PreFork forks off an apache process forevery request. This is the original model

and works great.

Worker model is multithreaded model. It’s

much easier on memory because there are

less processes in use.

The really important question is: Do you have enough threads?

Page 15: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 15/21

Observing and Tuning:

Web Server - Apache

Apache mod_status

A simple way to find out how many

threads are in use!

Page 16: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 16/21

Page 17: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 17/21

Observing and Tuning:

Operating System

There’s a HARD and SOFT limit on file descriptors

To find out the HARD: ulimit -H

This is a system wide maximum limit that can

never be exceeded by a user or process

Changing this is a root level config changerequiring a system reboot

File Descriptors: In Unix based operating systems, a network socket is actually just a

special type of a file on the filesystem

To find out the SOFT limit: ulimit -n

This is the limit of file descriptors a particular shell can

have available to it

This can be changed without a reboot but you will needto put it in a shell script or profile to get set every time

the shell is started

Page 18: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 18/21

Observing and Tuning:

Operating System

Its the set of configs and behaviors that your OS uses for network communication

TCP stack?? - what the heck is that?

tcp_time_wait_interval

Default = 240,000ms/4 MinutesRecommended = 60,000ms/1 Minute

(sometimes lower)

http://www.sean.de/Solaris/soltune.html

tcp_conn_req_max_qtcp_conn_req_max_q0

Default = 1024 and 128

Page 19: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 19/21

Observing and Tuning:

Application Server 

ThreadHandler Pool - Each thread handles a page request (JSP)

Defaults are often too low or too high - JBoss = 100, WebSphere = 20

Find out how many are in use with the web admin browsers (JMX-Console...)

Page 20: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 20/21

Observing and Tuning:

Akamai for Static Assets

Helps free up web server resources

Better customer experience

Page 21: Large Application Clusters

8/14/2019 Large Application Clusters

http://slidepdf.com/reader/full/large-application-clusters 21/21

Discussion - Q&A

Real performance problems?

Architecture challenges?

Organizational issues?

Thank you!

For more information:

Dan Bartow

Sr. Manager - Performance Engineering

Consumer [email protected]

[email protected]

www.turbotax.com