Exploring cloud for data warehousing

Preview:

DESCRIPTION

Cloud computing is creating a new era for IT by providing a set of services that appear to have infinite capacity, immediate deployment and high availability at trivial cost. These are all appealing to someone running a data warehouse when data volume, use and cost are growing at a rapid rate.   Today most organizations look at cloud as a way to lower data center and IT costs. While cost reduction is a real benefit, there is more value in the increased scalability, speed to procure (and give up) resources, and ease of delivery in cloud environments.   Database workloads are particularly challenging in the cloud. Cloud deployments beyond a moderate scale favor shared-nothing database architectures designed to run transparently in a multi-node environment. We are still in an early period of standardization and design of software to run in the cloud. Not all workloads are suitable for deployment on a collection of small virtualized servers today. Business intelligence and analytic database workloads fall into this area, raising the importance of analysis for fit with public and private cloud options.  

Citation preview

Exploring Cloud Computing Options for Data Warehousing

July 26, 2012

Mark Madsen@markmadsenwww.ThirdNature.net

Cloud Computing

" …a model for enabling ubiquitous, convenient, on‐demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction." 

What people see: seemingly infinite resource to apply to performance problems on short notice and at low cost

http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

Generators: Expensive Product

Generators: Commodity Product

Generators as a Service: Electricity

The Natural Process of Commoditization

Simon Wardley, A Lifecycle Approach to Cloud Computing

Managing Hardware Resources

Systems are sized for the peak workload, with the expectation that it will fluctuate.

Demand

Capacity

Time

Resources

Idle resources = low utilizations = money wasted

Demand

Capacity

Time

Resources

Idle resources

Not enough resource is (much) worse than too much.

Demand

Capacity

Time

Resources

Maintaining capacity just above the peak as workloads increase is the art of capacity planning.

One problem is the large step when upgrading to more resources, equating to a large capital cost.

Demand

Capacity

Time

Resources

Great performance after an upgrade, bad performance at year‐end before the next upgrade.

A steady decline can be worse for user perception than constant mediocre performance.

Demand

Capacity

Time

Resources

Idle

What everyone would like: elastic capacity

Pay for the resources you use when you use them, not up front for the entire system that supplies them. Just like electricity.

Capacity

Time

Resources

Demand

Five Key Cloud Characteristics

1. On‐demand self‐service

2. Network accessibility

3. Resource pooling

4. Measured service

5. Elasticity

Cloud Architecture

Started with virtual machines

Lots of servers, lots of virtual nodes. But in public clouds:• Storage can, often is separated

• VMs don’t run across nodes

• Great for OLTP, not so much for BI

• Implies new software architectures

Database Architecture and the Cloud

Virtualizing on a single server makes no sense for a database that needs the full resources.

If your server hardware environment looks like this:

then it’s probably good for lightweight transaction processing, simple storage and retrieval, procedural computations on data.

If you want to use it for a data warehouse, you need:

• A shared‐nothing database• A proper storage architecture• Dynamic licensing

Three Models of Deployment

3. Private cloud

1. Public cloud

2. Leased / hosted private cloud

Benefits and Rationale

Why did you / are you considering a move to the cloud?

Two primary reasons:▪ Cost reduction▪ Reduced time to value

IBM global survey of IT and line-of-business decision makers

Unexpected Benefits

Speed to deploy:▪ opex vs capex means faster approvals and less planning

▪ Provision on‐demand means ability to do all those small projects that needed resources and staff to set up

Performance management:▪ Resource‐oriented fixes done in minutes

▪ Instead of static resources and fluctuations in performance, set static SLAs and fluctuate the resources

Administration:▪ No more hardware or operating system upgrades to deal with

Public Cloud Challenges

1. Multi‐tenant servers and unpredictable I/O performance

2. Legal problems:▪ Data co‐mingling in multi‐tenant databases

▪ Data locality and national laws3. Cloud compatibility for data 

integration and data management tools (environment, data movement)

4. Security requirements

When these are a concern, private clouds may be the better option today.

What are manager preferences?

9%

21%

52%

44%

39%

35%

Data mining, text mining, or other analytics

Data warehouses or data marts

Prefer not to use cloud

Private cloud preference

Public cloud preference

IBM global survey of IT and line-of-business decision makers

Comparison of Models

New and growing use cases drive the need to expand

The use cases are now interactive applications, lower latency data, complex analytics and rapidly growing data volumes.

Image Attributions

Thanks to the people who supplied the images used in this presentation:

Commoditization diagram – from A Lifecycle Approach to Cloud Computing, © Simon Wardleytesla coil train ‐ http://www.flickr.com/photos/winterhalter/27364687Amazon Virtual Private Cloud diagram‐© Amazon, Inc..caged_tower_melbourne.jpg ‐ http://www.flickr.com/photos/vermininc/2227512763

About the PresenterMark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, analytics and information management. Mark is an award-winning author, architect and former CTO whose work has been featured in numerous industry publications. During his career Mark received awards from the American Productivity & Quality Center, TDWI, Computerworld and the Smithsonian Institute. He is an international speaker, contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.

About Third Nature

Third Nature is a research and consulting firm focused on new and emerging technology and practices in business intelligence, analytics and performance management. If your question is related to BI, analytics, information strategy and data then you‘re at the right place.

Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors.

We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.

Recommended