11
Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

Embed Size (px)

Citation preview

Page 1: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

Large dataset processing in the CloudKevin Glenny and GridwiseTech team

Page 2: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

Simplified data oriented system

Internal or external

data sources

applications working on data

Page 3: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

IT systems are constantly growing

Increased numberof users

Increased numberof applications

Increased amountof data

Page 4: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

IT systems are constantly growing

Infrastructure bottleneck

Page 5: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

Example

Electronics manufacturer

24/7 production

Report computation too long

for decision making

2.5 million transactions daily

4TB data to manage

Page 6: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

What is Cloud computing?

„Transparant access to

capabilities using a

pay-per-use

business model”

Benefits:– Dynamic scaling

– Pay-for-use

– Off-shored administration

Page 7: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

What are the delivery models?

SaaS (Software as a Service)– SalesForce.com, 63,00 clients

PaaS (Platform as a Service)– Google App Engine (2008), Microsoft Azure

(2008)

IaaS (Infrastructure as a Service)– Amazon Elastic Compute Cloud, 8.2 million

instances launched since 2006

Page 8: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

Application data processing

Database sharding (MySQL,

postgreSQL etc.)

NoSQL (Google's BigTable,

Amazon's Dynamo etc.)

Data-grid (GigaSpaces XAP, Oracle Coherance, InfiniSpan etc.)

Page 9: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

Data-grid and sharding in the Cloud

All data processing and persistencein the Cloud

Achievements:•Near real-time•Dynamic scaling (applicationand resources)•Pay-per-use•Reduced administration•HA

Page 10: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

Remaining issues

Getting large datasets in and out of the Cloud– Bandwidth limited client side

– Resort to mailing hard drives!

Performance - 2 to 50% slow down

Data security/privacy - trust

SLAs – plan for the worst

Page 11: Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

Conclusions

Data oriented systems datasets grow causing bottlenecks

Datasets in the Cloud can be processed using scalable technologies

Challenges remain

Main – how to get the data to the Cloud?