Upload
neovik82
View
217
Download
0
Embed Size (px)
Citation preview
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 1/21
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 2/21
About Dan
• Responsible for Turbotax Online performance and NewBusiness Initiatives
• Previously responsible for turbotax.com, intuit.com,
quickbooks.com, quicken.com, back-end order systems,
content services, and many others
• The Intuit eStore cluster is the largest stateful JBoss cluster
in the world
• Best Buy, American Airlines, Sony Online Entertainment, MTV, Rubbermaid, Oprah, Plantronics, JCrew,Neiman Marcus, FAA, US Army, Kodak, American Eagle Outfitters, Cabelas, Finish Line, Lexmark, Alcatel,
Deutsche Post, B&Q, Abbott Labs, United Health Group, Sun Microsystems, GlaxoSmithKline, AT&T Wireless,
Yellowpages, and many others
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 3/21
Agenda
• Intros and Expectations
• Overview: Typical 3 Tier Architecture
• What to do when your cluster starts getting huge
• Tweak, Tune, Reduce: Before and After
• Discussion
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 4/21
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 5/21
Overview: 3 Tier Architecture
Load Balancer
Web Server 1 Web Server 2
Database
App Server 1 App Server 2
F5 BigIP/Cisco CSS
Apache
Tomcat/JBoss/Websphere
Oracle/SQLServer/MySQL
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 6/21
Overview: A Large Cluster
Load Balancer
WS
Database
WS WS WS WS WS WS WS WS WS WS WS WS WS WS WS WS
WS WS WS WS WS WS WS WS WS WS WS WS WS WS WS WS WS
WS WS WS WS WS WS WS WS WS WS WS
Database
AS AS AS AS AS AS AS AS AS AS AS
AS AS AS AS AS AS AS AS AS AS AS
AS AS AS AS AS AS AS AS AS AS AS
ASAS AS
AS AS AS
AS AS AS
AS AS AS
45x Apache Servers
45x App Servers
4x JVMs each
180x JBoss
Old eStore
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 7/21
How do you manage this?
Can you improve performance and reduce the number of machines needed?
How do you validate performance on something this big?
Code deployment to 300 machines?!
Can the network take it?
Who’s the poor guy who gets called when something breaks?
Why me?
That looks like fun!
Some things that come to mind
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 8/21
How do you manage this?
Start on the individual tiers and improve/reduce
Expand up to the big picture level and TEST IN PRODUCTION!
Blow our testing lab to smithereens
Tweak and tune services, operating systems and network devices
Went to 64 bit Linux/Java - bigger heap sizes!
Improved our connection management - F5 BigIP tuning
How we approach it
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 9/21
Overview: 3 Tier Architecture
Load Balancer
WS
Database
WS WS WS WS WS WS WS WS WS WS WS WS WS WS
Database
AS AS AS AS AS AS AS AS AS AS AS
AS AS AS AS AS AS AS AS AS AS AS
AS AS AS AS AS AS AS AS AS AS AS
AS AS AS
AS AS AS
AS AS AS
AS AS AS
New eStore
WS WS WS WS WS WS WS22 Apache Servers
45 App Servers
3 JVMs each
135x JBoss
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 10/21
Overview: Silo Approach
Load Balancer
Silo 1
Database
Silo 2 Silo 3 Silo 4 Silo 5
AS
WS WS WS WS
AS AS AS
Rib Sandwich Architecture
Some Benefits:
Performance
testing
Rolling restartsand code
deployments
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 11/21
Overview: Testing Approach
Load Balancer
Silo 1
Database
Silo 2 Silo 3 Silo 4 Silo 5
AS
WS WS WS WS
AS AS AS
The Prod Scale-Down Test - Poor man’s performance test
It’s Free
Gives you real
world answer
about your
capacity and
usage models
Some risk to
customers on
the site - good
monitoring is
essential
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 12/21
Overview: Testing Approach
Load Balancer
Silo 1
Database
Silo 2 Silo 3 Silo 4 Silo 5
300k Virtual Users!!
Real World Production Testing?! Awesome!
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 13/21
Observing and Tuning:
Load Balancer
Features and Focus Areas
OneConnect
Content Compression
SSL Acceleration
Optimized TCP Profiles
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 14/21
Observing and Tuning:
Web Server - Apache
PreFork
Worker
Threading Models PreFork forks off an apache process forevery request. This is the original model
and works great.
Worker model is multithreaded model. It’s
much easier on memory because there are
less processes in use.
The really important question is: Do you have enough threads?
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 15/21
Observing and Tuning:
Web Server - Apache
Apache mod_status
A simple way to find out how many
threads are in use!
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 16/21
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 17/21
Observing and Tuning:
Operating System
There’s a HARD and SOFT limit on file descriptors
To find out the HARD: ulimit -H
This is a system wide maximum limit that can
never be exceeded by a user or process
Changing this is a root level config changerequiring a system reboot
File Descriptors: In Unix based operating systems, a network socket is actually just a
special type of a file on the filesystem
To find out the SOFT limit: ulimit -n
This is the limit of file descriptors a particular shell can
have available to it
This can be changed without a reboot but you will needto put it in a shell script or profile to get set every time
the shell is started
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 18/21
Observing and Tuning:
Operating System
Its the set of configs and behaviors that your OS uses for network communication
TCP stack?? - what the heck is that?
tcp_time_wait_interval
Default = 240,000ms/4 MinutesRecommended = 60,000ms/1 Minute
(sometimes lower)
http://www.sean.de/Solaris/soltune.html
tcp_conn_req_max_qtcp_conn_req_max_q0
Default = 1024 and 128
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 19/21
Observing and Tuning:
Application Server
ThreadHandler Pool - Each thread handles a page request (JSP)
Defaults are often too low or too high - JBoss = 100, WebSphere = 20
Find out how many are in use with the web admin browsers (JMX-Console...)
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 20/21
Observing and Tuning:
Akamai for Static Assets
Helps free up web server resources
Better customer experience
8/14/2019 Large Application Clusters
http://slidepdf.com/reader/full/large-application-clusters 21/21
Discussion - Q&A
Real performance problems?
Architecture challenges?
Organizational issues?
Thank you!
For more information:
Dan Bartow
Sr. Manager - Performance Engineering
Consumer [email protected]
www.turbotax.com