Upload
threesixty
View
446
Download
0
Embed Size (px)
DESCRIPTION
IDEAGEN Performance Engineering for Cloud Computing LERO
Citation preview
Performance EngineeringPerformance Engineering for Cloud Computing
John MurphyJohn Murphy
Performance Engineering Lab
Lero © 2011. Slide 18th European Performance Engineering Workshop ‐ EPEW 2011
• Cloud overviewCloud overview
• Previous results• Previous results
E l L i• Example: Logging
• Future directions
Lero © 2011. Slide 28th European Performance Engineering Workshop ‐ EPEW 2011
The same approach(es) can be applied tobe applied to solve queueing problems in very different areasdifferent areas
Can the same approach(es) forapproach(es) for Performance Engineering be applied to solveapplied to solve problems in very different areas
Lero © 2011. Slide 38th European Performance Engineering Workshop ‐ EPEW 2011
Challenges in the Cloud
What is Cloud Computing?
XXXX as a Service .... While (Hype=True)While (Hype=True) {
Replace XXXX with AnythingReplace XXXX with Anything}
Cloud Washing Required
Lero © 2011. Slide 48th European Performance Engineering Workshop ‐ EPEW 2011 422
Cloud Users
Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010 A view of cloud computing Commun ACM 53 4 (April 2010) 50 58
Lero © 2011. Slide 58th European Performance Engineering Workshop ‐ EPEW 2011 522
2010. A view of cloud computing. Commun. ACM 53, 4 (April 2010), 50‐58.
Evolving Critical Systems
Cloud Computing
“As‐A‐Service” market sizing
Report II, OctoberReport II, October 2010
Lero © 2011. Slide 68th European Performance Engineering Workshop ‐ EPEW 2011 622
Challenges in the Cloud
Flavours of Cloud Computing
Public Cloud (Amazon, Google, Microsoft)
Private Cloud (many)
Hybrid Cloud Computing
Surge / Utility ComputingSurge / Utility Computing
Lero © 2011. Slide 78th European Performance Engineering Workshop ‐ EPEW 2011 722
Challenges in the Cloud
So what’s really new:
• Infinite computing resources on demand
• No cap ex
• Pay for what you use
Lero © 2011. Slide 88th European Performance Engineering Workshop ‐ EPEW 2011 822
Infinite computing resources on demand
Lero © 2011. Slide 98th European Performance Engineering Workshop ‐ EPEW 2011 922
Infinite computing resources on demand
Ability to follow surges in workload
• No capacity planning required
• Speed of surge important to provide capacity
• Data center utilisation low
• Lower price possible due to statistical multiplexing of d d ( lf i il i i )many demands (self‐similarity an issue)
Workload varies over the day
Workload varies over the seasonWorkload varies over the season
Workload varies with events
Lero © 2011. Slide 108th European Performance Engineering Workshop ‐ EPEW 2011 1022
Pay for what you use
• Amazon: Physical Hardware (EC2 instances), control kernel upwards, lots of state information
• Google: Applications on AppEngine (web applications), separation between compute and storage
Mi ft A fl ibl th th A E i• Microsoft: Azure more flexible than the AppEngine
Cost: 1 machine for 1000 hours = 1000 machines for 1 hourCost: 1 machine for 1000 hours = 1000 machines for 1 hour
Pay as you go, or usage based pricing (not renting)y y g , g p g ( g)
Pay per box, or pay per resources used
Lero © 2011. Slide 118th European Performance Engineering Workshop ‐ EPEW 2011 1122
Challenges in the Cloud
Ten Challenges in Cloud Computing [1]1. Business Continuity & Service Availability2. Data Lock In3. Data Confidentiality/Auditability4 P f U di t bilit4. Performance Unpredictability5. Scalable Storage6 Bugs in Large Scale Distributed Systems6. Bugs in Large Scale Distributed Systems7. Scaling Quickly8. Reputation Fate Sharing9. Data Transfer Bottlenecks10.Software Licensing
Lero © 2011. Slide 128th European Performance Engineering Workshop ‐ EPEW 2011 1222
[1] ”A View of Cloud Computing”, by Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia
Typical Enterprise Systems
Lero © 2011. Slide 138th European Performance Engineering Workshop ‐ EPEW 2011
Data Data Everywhere
• 150 billion GB (exabytes) of data created in 2005; Eight times that amount (1,200 exabytes) in 2010( , y )
• The amount of enterprise data will grow about 650% over the next five years, the vast majority of it unstructured, or not included in any database.
• Log data is the fasted‐growing data source at large organizations
• Many organizations are currently producing terabytes of log data per month
Lero © 2011. Slide 148th European Performance Engineering Workshop ‐ EPEW 2011
log data per month
Cloud Services
• Not deployed in houseNot deployed in house
• Services need to handle 1000’s of customersServices need to handle 1000 s of customers
• Services need to handle 1000’s of enterprise systems• Services need to handle 1000 s of enterprise systems
• Processing higher volumes of data required• Processing higher volumes of data required
Lero © 2011. Slide 158th European Performance Engineering Workshop ‐ EPEW 2011
Log Management
• Automatic Collection, Analysis & Visualization of LogAutomatic Collection, Analysis & Visualization of Log Data
• Use Cases:
Problem DeterminationProblem Determination
Operations
SecuritySecurity
Compliance & Auditing
Lero © 2011. Slide 168th European Performance Engineering Workshop ‐ EPEW 2011
Log Maths
100,000 log messages / second100,000 log messages / second
x 300 bytes / log message = 28.6 MB
x 3600 seconds ~ 100 6 GB / hourx 3600 seconds 100.6 GB / hour
x 24 hours ~ 2.35 TB / day
x 365 days ~ 860 5 TB / yearx 365 days 860.5 TB / year
x 3 years ~ 2.52 PB
From Anton Chuvakin’s Blog Aug 2010
Lero © 2011. Slide 178th European Performance Engineering Workshop ‐ EPEW 2011
http://chuvakin.blogspot.com/
Typical Log Volumes
Customer Type Log Volumes Events per Second Events per Day
Large Cloud Provider 50 Terabyes per Day 2,000,000 172,000,000,000
Large Social Media 25 Terabytes per Day 1 000 000Large Social Media Organisation
25 Terabytes per Day 1,000,000
Telecom Middleware/ Applications
1 Terabyte per Day 50,000
Large Organisation(>1000 employees)
300 GB Per Day 15,000
Online Marketing Org 100 GB per day 5,000 432,000,000
SmallData Centre
10 GBs per Day 500
SAAS Educational Tools 5Gbs Per Day 250
Single IBM Test Team 2 GBs per Day 100
Online Multimedia 700Mbs Per Day 35
E l St St t 50Mb P D 25 2 000 000
Lero © 2011. Slide 188th European Performance Engineering Workshop ‐ EPEW 2011
Early Stage Start up 50Mbs Per Day 25 2,000,000
Partial results in log management
• High volume data processing
• Correlation
• Searching / Indexing
• Pattern detection (symptom database)
• Real time requirements
Lero © 2011. Slide 198th European Performance Engineering Workshop ‐ EPEW 2011
q
Real Time Correlation Engine RTCE
• IBM & UCD Research (since 2007)IBM & UCD Research (since 2007)
• In house deploymentIn house deployment
• In use across 10’s of IBM teams (Dublin US China)• In use across 10 s of IBM teams (Dublin, US, China)
• Ability to process 80 000 events per second• Ability to process 80,000 events per second
Lero © 2011. Slide 208th European Performance Engineering Workshop ‐ EPEW 2011
RTCE details
Nodeb
Nodea
Nodec
Componenta
Log
Agent
Network ofagents
Nodeb
Nodea
Nodecb c
Componentb
LogAgent
Noded
b c
Node detailTesting environment
Presentation
Inter-agentcommunication
Presentation
AgentWeb server
Lero © 2011. Slide 218th European Performance Engineering Workshop ‐ EPEW 2011 21
Usera Userb Userc Userd
logentries.com
• Log Management as a ServiceLog Management as a ServiceBuilt on Amazon Web services
Scales Horizontallyo CPU
Sto Storage
Distributed File System/ NoSQL DBs (Hadoop)Distributed File System/ NoSQL DBs (Hadoop)
Needs to handle TB per day (2TB per customer)
Lero © 2011. Slide 228th European Performance Engineering Workshop ‐ EPEW 2011
p y ( p )
Key log research challenges
• Scalable hardware resourcesCloud
Auto scaling
• Indexing large volumes of data in real timeNo SQL / Columnar Storage
• Processing millions of events per secondBloom Filters
Lero © 2011. Slide 238th European Performance Engineering Workshop ‐ EPEW 2011
Conclusions
• Cloud computing is a tag for the next while...
• Major issues still to be fully addressed
• Previous performance engineering in enterprise, grid data centre or mainframe research cangrid, data centre or mainframe research can probably feed into the solutions
Thank you!
Lero © 2011. Slide 248th European Performance Engineering Workshop ‐ EPEW 2011 2422