Performance engineeringforcloudcomputing lero

Performance EngineeringPerformance Engineering for Cloud Computing

John MurphyJohn Murphy

Performance Engineering Lab

Lero © 2011. Slide 18th European Performance Engineering Workshop ‐ EPEW 2011

• Cloud overviewCloud overview

• Previous results• Previous results

E l L i• Example: Logging

• Future directions


The same approach(es) can be applied tobe applied to solve queueing problems in very different areasdifferent areas

Can the same approach(es) forapproach(es) for Performance Engineering be applied to solveapplied to solve problems in very different areas


Challenges in the Cloud

What is Cloud Computing?

XXXX as a Service .... While (Hype=True)While (Hype=True) {

Replace XXXX with AnythingReplace XXXX with Anything}

Cloud Washing Required

Lero © 2011. Slide 48th European Performance Engineering Workshop ‐ EPEW 2011 422

Cloud Users

Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010 A view of cloud computing Commun ACM 53 4 (April 2010) 50 58


2010. A view of cloud computing. Commun. ACM 53, 4 (April 2010), 50‐58.

Evolving Critical Systems

Cloud Computing

“As‐A‐Service” market sizing

Report II, OctoberReport II, October 2010



Flavours of Cloud Computing

Public Cloud (Amazon, Google, Microsoft)

Private Cloud (many)

Hybrid Cloud Computing

Surge / Utility ComputingSurge / Utility Computing



So what’s really new:

• Infinite computing resources on demand

• No cap ex

• Pay for what you use


Infinite computing resources on demand


Infinite computing resources on demand

Ability to follow surges in workload

• No capacity planning required

• Speed of surge important to provide capacity

• Data center utilisation low

• Lower price possible due to statistical multiplexing of d d ( lf i il i i )many demands (self‐similarity an issue)

Workload varies over the day

Workload varies over the seasonWorkload varies over the season

Workload varies with events


Pay for what you use

• Amazon: Physical Hardware (EC2 instances), control kernel upwards, lots of state information

• Google: Applications on AppEngine (web applications), separation between compute and storage

Mi ft A fl ibl th th A E i• Microsoft: Azure more flexible than the AppEngine

Cost: 1 machine for 1000 hours = 1000 machines for 1 hourCost: 1 machine for 1000 hours = 1000 machines for 1 hour

Pay as you go, or usage based pricing (not renting)y y g , g p g ( g)

Pay per box, or pay per resources used



Ten Challenges in Cloud Computing [1]1. Business Continuity & Service Availability2. Data Lock In3. Data Confidentiality/Auditability4 P f U di t bilit4. Performance Unpredictability5. Scalable Storage6 Bugs in Large Scale Distributed Systems6. Bugs in Large Scale Distributed Systems7. Scaling Quickly8. Reputation Fate Sharing9. Data Transfer Bottlenecks10.Software Licensing


[1] ”A View of Cloud Computing”, by Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia

Typical Enterprise Systems


Data Data Everywhere

• 150 billion GB (exabytes) of data created in 2005; Eight times that amount (1,200 exabytes) in 2010( , y )

• The amount of enterprise data will grow about 650% over the next five years, the vast majority of it unstructured, or not included in any database.

• Log data is the fasted‐growing data source at large organizations

• Many organizations are currently producing terabytes of log data per month


log data per month

Cloud Services

• Not deployed in houseNot deployed in house

• Services need to handle 1000’s of customersServices need to handle 1000 s of customers

• Services need to handle 1000’s of enterprise systems• Services need to handle 1000 s of enterprise systems

• Processing higher volumes of data required• Processing higher volumes of data required


Log Management

• Automatic Collection, Analysis & Visualization of LogAutomatic Collection, Analysis & Visualization of Log Data

• Use Cases:

Problem DeterminationProblem Determination

Operations

SecuritySecurity

Compliance & Auditing


Log Maths

100,000 log messages / second100,000 log messages / second

x 300 bytes / log message = 28.6 MB

x 3600 seconds ~ 100 6 GB / hourx 3600 seconds 100.6 GB / hour

x 24 hours ~ 2.35 TB / day

x 365 days ~ 860 5 TB / yearx 365 days 860.5 TB / year

x 3 years ~ 2.52 PB

From Anton Chuvakin’s Blog Aug 2010


http://chuvakin.blogspot.com/

Typical Log Volumes

Customer Type Log Volumes Events per Second Events per Day

Large Cloud Provider 50 Terabyes per Day 2,000,000 172,000,000,000

Large Social Media 25 Terabytes per Day 1 000 000Large Social Media Organisation

25 Terabytes per Day 1,000,000

Telecom Middleware/ Applications

1 Terabyte per Day 50,000

Large Organisation(>1000 employees)

300 GB Per Day 15,000

Online Marketing Org 100 GB per day 5,000 432,000,000

SmallData Centre

10 GBs per Day 500

SAAS Educational Tools 5Gbs Per Day 250

Single IBM Test Team 2 GBs per Day 100

Online Multimedia 700Mbs Per Day 35

E l St St t 50Mb P D 25 2 000 000


Early Stage Start up 50Mbs Per Day 25 2,000,000

Partial results in log management

• High volume data processing

• Correlation

• Searching / Indexing

• Pattern detection (symptom database)

• Real time requirements


q

Real Time Correlation Engine RTCE

• IBM & UCD Research (since 2007)IBM & UCD Research (since 2007)

• In house deploymentIn house deployment

• In use across 10’s of IBM teams (Dublin US China)• In use across 10 s of IBM teams (Dublin, US, China)

• Ability to process 80 000 events per second• Ability to process 80,000 events per second


RTCE details

Nodeb

Nodea

Nodec

Componenta

Log

Agent

Network ofagents

Nodeb

Nodea

Nodecb c

Componentb

LogAgent

Noded

b c

Node detailTesting environment

Presentation

Inter-agentcommunication

Presentation

AgentWeb server


Usera Userb Userc Userd

logentries.com

• Log Management as a ServiceLog Management as a ServiceBuilt on Amazon Web services

Scales Horizontallyo CPU

Sto Storage

Distributed File System/ NoSQL DBs (Hadoop)Distributed File System/ NoSQL DBs (Hadoop)

Needs to handle TB per day (2TB per customer)


p y ( p )

Key log research challenges

• Scalable hardware resourcesCloud

Auto scaling

• Indexing large volumes of data in real timeNo SQL / Columnar Storage

• Processing millions of events per secondBloom Filters


Conclusions

• Cloud computing is a tag for the next while...

• Major issues still to be fully addressed

• Previous performance engineering in enterprise, grid data centre or mainframe research cangrid, data centre or mainframe research can probably feed into the solutions

Thank you!


Health & Medicine

Performance engineeringforcloudcomputing lero