6
Performance Engineering Performance Engineering for Cloud Computing John Murphy John Murphy Performance Engineering Lab Lero © 2011. Slide 1 8th European Performance Engineering Workshop EPEW 2011 Cloud overview Cloud overview Previous results Previous results E l L i Example: Logging Future directions Lero © 2011. Slide 2 8th European Performance Engineering Workshop EPEW 2011 The same approach(es) can be applied to be applied to solve queueing problems in very different areas different areas Can the same approach(es) for approach(es) for Performance Engineering be applied to solve applied to solve problems in very different areas Lero © 2011. Slide 3 8th European Performance Engineering Workshop EPEW 2011 Challenges in the Cloud What is Cloud Computing? XXXX as a Service .... While (Hype=True) While (Hype=True) { Replace XXXX with Anything Replace XXXX with Anything } Cloud Washing Required Lero © 2011. Slide 4 8th European Performance Engineering Workshop EPEW 2011 4 22

Performance engineeringforcloudcomputing lero

Embed Size (px)

DESCRIPTION

IDEAGEN Performance Engineering for Cloud Computing LERO

Citation preview

Page 1: Performance engineeringforcloudcomputing lero

Performance EngineeringPerformance Engineering for Cloud Computing

John MurphyJohn Murphy

Performance Engineering Lab

Lero © 2011. Slide 18th European Performance Engineering Workshop ‐ EPEW 2011

• Cloud overviewCloud overview

• Previous results• Previous results 

E l L i• Example: Logging 

• Future directions 

Lero © 2011. Slide 28th European Performance Engineering Workshop ‐ EPEW 2011

The same approach(es) can be applied tobe applied to solve queueing problems in very different areasdifferent areas

Can the same approach(es) forapproach(es) for Performance Engineering be applied to solveapplied to solve problems in very different areas

Lero © 2011. Slide 38th European Performance Engineering Workshop ‐ EPEW 2011

Challenges in the Cloud

What is Cloud Computing?

XXXX as  a Service .... While (Hype=True)While (Hype=True) {

Replace XXXX with AnythingReplace XXXX with Anything}

Cloud Washing Required

Lero © 2011. Slide 48th European Performance Engineering Workshop ‐ EPEW 2011 422

Page 2: Performance engineeringforcloudcomputing lero

Cloud Users

Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010 A view of cloud computing Commun ACM 53 4 (April 2010) 50 58

Lero © 2011. Slide 58th European Performance Engineering Workshop ‐ EPEW 2011 522

2010. A view of cloud computing. Commun. ACM 53, 4 (April 2010), 50‐58. 

Evolving Critical Systems

Cloud Computing

“As‐A‐Service” market sizing

Report II, OctoberReport II, October 2010

Lero © 2011. Slide 68th European Performance Engineering Workshop ‐ EPEW 2011 622

Challenges in the Cloud

Flavours of Cloud Computing

Public Cloud (Amazon, Google, Microsoft)

Private Cloud (many)

Hybrid Cloud Computing

Surge / Utility ComputingSurge / Utility Computing

Lero © 2011. Slide 78th European Performance Engineering Workshop ‐ EPEW 2011 722

Challenges in the Cloud

So what’s really new:

• Infinite computing resources on demand

• No cap ex 

• Pay for what you use

Lero © 2011. Slide 88th European Performance Engineering Workshop ‐ EPEW 2011 822

Page 3: Performance engineeringforcloudcomputing lero

Infinite computing resources on demand

Lero © 2011. Slide 98th European Performance Engineering Workshop ‐ EPEW 2011 922

Infinite computing resources on demand

Ability to follow surges in workload

• No capacity planning required

• Speed of surge important to provide capacity

• Data center utilisation low

• Lower price possible due to statistical multiplexing of d d ( lf i il i i )many demands (self‐similarity an issue)

Workload varies over the day

Workload varies over the seasonWorkload varies over the season

Workload varies with events

Lero © 2011. Slide 108th European Performance Engineering Workshop ‐ EPEW 2011 1022

Pay for what you use

• Amazon: Physical Hardware (EC2 instances), control kernel upwards, lots of state information

• Google: Applications on AppEngine (web applications), separation between compute and storage

Mi ft A fl ibl th th A E i• Microsoft: Azure more flexible than the AppEngine

Cost: 1 machine for 1000 hours = 1000 machines for 1 hourCost: 1 machine for 1000 hours = 1000 machines for 1 hour

Pay as you go, or usage based pricing (not renting)y y g , g p g ( g)

Pay per box, or pay per resources used

Lero © 2011. Slide 118th European Performance Engineering Workshop ‐ EPEW 2011 1122

Challenges in the Cloud

Ten Challenges in Cloud Computing [1]1. Business Continuity & Service Availability2. Data Lock In3. Data Confidentiality/Auditability4 P f U di t bilit4. Performance Unpredictability5. Scalable Storage6 Bugs in Large Scale Distributed Systems6. Bugs in Large Scale Distributed Systems7. Scaling Quickly8. Reputation Fate Sharing9. Data Transfer Bottlenecks10.Software Licensing

Lero © 2011. Slide 128th European Performance Engineering Workshop ‐ EPEW 2011 1222

[1] ”A View of Cloud Computing”, by Michael Armbrust, Armando Fox, Rean Griffith,  Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson,  Ariel Rabkin, Ion Stoica, and Matei Zaharia

Page 4: Performance engineeringforcloudcomputing lero

Typical Enterprise Systems

Lero © 2011. Slide 138th European Performance Engineering Workshop ‐ EPEW 2011

Data Data Everywhere

• 150 billion GB (exabytes) of data created in 2005;  Eight times that amount (1,200 exabytes) in 2010( , y )

• The amount of enterprise data will grow about 650% over the next five years, the vast majority of it unstructured, or not included in any database. 

• Log data is the fasted‐growing data source at large organizations 

• Many organizations are currently producing terabytes of log data per month

Lero © 2011. Slide 148th European Performance Engineering Workshop ‐ EPEW 2011

log data per month 

Cloud Services

• Not deployed in houseNot deployed in house

• Services need to handle 1000’s of customersServices need to handle 1000 s of customers

• Services need to handle 1000’s of enterprise systems• Services need to handle 1000 s of enterprise systems

• Processing higher volumes of data required• Processing higher volumes of data required

Lero © 2011. Slide 158th European Performance Engineering Workshop ‐ EPEW 2011

Log Management

• Automatic Collection, Analysis & Visualization of LogAutomatic Collection, Analysis & Visualization of Log Data

• Use Cases:

Problem DeterminationProblem Determination

Operations

SecuritySecurity

Compliance & Auditing

Lero © 2011. Slide 168th European Performance Engineering Workshop ‐ EPEW 2011

Page 5: Performance engineeringforcloudcomputing lero

Log Maths

100,000 log messages / second100,000 log messages / second 

x 300 bytes / log message = 28.6 MB

x 3600 seconds ~ 100 6 GB / hourx 3600 seconds    100.6 GB / hour

x 24 hours ~ 2.35 TB / day

x 365 days ~ 860 5 TB / yearx 365 days   860.5 TB / year

x 3 years ~ 2.52 PB

From Anton Chuvakin’s Blog Aug 2010

Lero © 2011. Slide 178th European Performance Engineering Workshop ‐ EPEW 2011

http://chuvakin.blogspot.com/

Typical Log Volumes

Customer Type Log Volumes Events per Second Events per Day

Large Cloud Provider 50 Terabyes per Day 2,000,000 172,000,000,000

Large Social Media 25 Terabytes per Day 1 000 000Large Social Media Organisation

25 Terabytes per Day 1,000,000

Telecom Middleware/ Applications

1 Terabyte per Day 50,000

Large Organisation(>1000 employees)

300 GB Per Day 15,000

Online Marketing Org 100 GB per day 5,000 432,000,000

SmallData Centre

10 GBs per Day 500

SAAS Educational Tools 5Gbs Per Day 250

Single IBM Test Team 2 GBs per Day 100

Online Multimedia 700Mbs Per Day 35

E l St St t 50Mb P D 25 2 000 000

Lero © 2011. Slide 188th European Performance Engineering Workshop ‐ EPEW 2011

Early Stage Start up 50Mbs Per Day 25 2,000,000

Partial results in log management

• High volume data processing

• Correlation

• Searching / Indexing

• Pattern detection (symptom database)

• Real time requirements

Lero © 2011. Slide 198th European Performance Engineering Workshop ‐ EPEW 2011

q

Real Time Correlation Engine RTCE

• IBM & UCD Research (since 2007)IBM & UCD Research (since 2007)

• In house deploymentIn house deployment

• In use across 10’s of IBM teams (Dublin US China)• In use across 10 s of IBM teams (Dublin, US, China)

• Ability to process 80 000 events per second• Ability to process 80,000 events per second

Lero © 2011. Slide 208th European Performance Engineering Workshop ‐ EPEW 2011

Page 6: Performance engineeringforcloudcomputing lero

RTCE details

Nodeb

Nodea

Nodec

Componenta

Log

Agent

Network ofagents

Nodeb

Nodea

Nodecb c

Componentb

LogAgent

Noded

b c

Node detailTesting environment

Presentation

Inter-agentcommunication

Presentation

AgentWeb server

Lero © 2011. Slide 218th European Performance Engineering Workshop ‐ EPEW 2011 21

Usera Userb Userc Userd

logentries.com

• Log Management as a ServiceLog Management as a ServiceBuilt on Amazon Web services

Scales Horizontallyo CPU 

Sto Storage

Distributed File System/ NoSQL DBs (Hadoop)Distributed File System/ NoSQL DBs (Hadoop)

Needs to handle TB per day (2TB per customer)

Lero © 2011. Slide 228th European Performance Engineering Workshop ‐ EPEW 2011

p y ( p )

Key log research challenges

• Scalable hardware resourcesCloud 

Auto scaling

• Indexing large volumes of data in real timeNo SQL / Columnar Storage

• Processing millions of events per secondBloom Filters

Lero © 2011. Slide 238th European Performance Engineering Workshop ‐ EPEW 2011

Conclusions

• Cloud computing is a tag for the next while...

• Major issues still to be fully addressed

• Previous performance engineering in enterprise, grid data centre or mainframe research cangrid, data centre or mainframe research can probably feed into the solutions

Thank you!

Lero © 2011. Slide 248th European Performance Engineering Workshop ‐ EPEW 2011 2422