16
WHITE PAPER Lean Data Warehouse Practices Optimize Data Warehouses with Better Visibility into Data Usage

Lean Data Warehouse Practices - Informatica ETL Tool DW... · Lean Data Warehouse Practices: ... multiplier effect in the many nonproduction copies, ... the impact on cost could be

  • Upload
    vunhan

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

W H I T E P A P E R

Lean Data Warehouse Practices Optimize Data Warehouses with Better Visibility into Data Usage

This document contains Confidential, Proprietary and Trade Secret Information (“Confidential Information”) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written consent of Informatica.

While every attempt has been made to ensure that the information in this document is accurate and complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of loss resulting from the use of information contained in this document. The information contained in this document is subject to change without notice.

The incorporation of the product attributes discussed in these materials into any release or upgrade of any Informatica software product—as well as the timing of any such release or upgrade—is at the sole discretion of Informatica.

Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700.

This edition published February 2012

1

White Paper

Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage

Table of ContentsExecutive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Big Data Getting Bigger, Analytical Complexity Exploding . . . . . . . . . . . . . . 3

Inadequacy of Legacy Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Inadequacy of Existing Data Warehouse Management Practices . . . . . . . . . . . . . . . . . . . 5

Introducing Lean Data Warehouses Practices . . . . . . . . . . . . . . . . . . . . . . . 6

Implement Lean Data Warehouse Practices and Take Meaningful Action Based on the Analysis to Deliver Immediate and Quantifiable Benefits . . . . 7

Informatica Solutions for Lean Data Warehouses . . . . . . . . . . . . . . . . . . .11

Informatica Data Warehouse Advisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

2

Executive SummaryAs technological advances have reshaped business, government, and consumers, business intelligence (BI) applications and data warehouse deployments have grown from departmental to enterprise-wide in recent years. As a result, the appetite for data is insatiable and analytic data volumes are growing exponentially with data warehouse systems of hundreds of terabytes and even petabytes becoming the norm.

With exploding data volumes and increasing analytic complexity, information technology (IT) managers are under siege to respond to business needs while reducing the costs associated with data delivery. Unfortunately, data managers, application database administrators, data architects, and analytic application managers do not have the required instrumentation to gain visibility and understand what data is used or unused and, more importantly, decipher how data is being used to retain and optimize the most relevant assets.

Lean Data Warehouses is a best-practices methodology for gaining greater visibility into the data warehouse environment by monitoring business activity and data usage, and managing data growth in data warehouses. Based on this visibility, organizations can reduce data management costs and ensure scalability of both infrastructure and available IT.

The key objectives of Lean Data Warehouses are to:

• Justify costs, prioritize and invest resources based on business utilization

•Retain and optimize the most relevant data and processes

•Respond faster and ensure scalability and performance

Lean Data Warehouses is one of the three pillars of Lean Data Management best practices and aims to address the challenges of managing Big Data warehouses, (See Figure 1). Lean Data Management is adapted from Lean Manufacturing practices that emphasize waste elimination to reduce costs. The other two pillars of Lean Data Management address the challenges of managing Big Applications and Big Application Portfolios.

To gain tangible benefits, a comprehensive solution for Lean Data Warehouses based on usage monitoring should be combined with best practices to analyze how data is used and take meaningful action. The best practices to leverage usage monitoring to deliver immediate and quantifiable benefits include:

•Developing key performance indicators (KPIs) to expose business utilization and consumption

• Identifying unused and unnecessary data

•Streamlining data loads and archiving data based on identification of unused and infrequently accessed data

•Optimizing and tuning databases based on actual data usage

•Reducing unnecessary complexity to improve scalability and performance

LEAN APPLICATIONSArchive Production

Subset NonproductionImprove performanceReduce maintenance

LEAN DATA WAREHOUSESMonitor usage

Identify dormant dataOptimize infrastructure

Optimize processes

MANAGE BIG DATA, REDUCE COSTS, MEET SLAs

LEAN APPLICATIONS PORTFOLIOSRetire legacy applications

Maintain data accessEliminate costs

Enforce retention Improve compliance, e-Discoveryrove compliance, e Discov

Figure 1. The three pillars of Lean Data Management practices

White Paper

3Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage

Effective usage monitoring requires a solution that integrates with the BI, data warehouse, and data integration stacks to provide a complete view into business activity and data usage. Informatica® Data Warehouse Advisor is a software solution that monitors how business units and departments use data so that IT organizations can improve operational efficiency, scalability, and performance and control data delivery costs.

Once dormant data is identified, Informatica Data Archive should be used to move the inactive data out of production instances to substantially reduce production data size, costs, and maintenance windows and increase data warehouse availability and performance.

To further manage data growth in nonproduction environments, Informatica Lean Data Warehouse practices employ the Test Data Management solution (TDM), based on Informatica Data Subset software. This solution significantly reduces the footprint of nonproduction data warehouses by creating smaller, referentially intact subset copies of production that contain only the most relevant data for the user.

Together, Informatica Data Archive, Data Subset, and Data Warehouse Advisor furnish the solutions to support Lean Data Warehouse practices.

In the rest of this white paper, we will discuss:

• The challenges around data growth and the drivers for adopting a Lean Data Warehouses approach

• Inadequacies of legacy data warehouse monitoring tools and data warehouse management practices

•How Informatica Data Warehouse Advisor, Data Archive, and Data Subset provide the right technology and solutions for implementing Lean Data Warehouse practices

Big Data Getting Bigger, Analytical Complexity ExplodingOver the past several years, organizations have invested heavily in business intelligence (BI) applications and data warehousing to provide better access to enterprise data. The deployment of analytic applications has evolved from small departmental “power” users using disparate tools to widely accessed enterprise BI applications.

As business use of analytic applications has exploded, so has the appetite for more data. Big Analytic Data in data warehouses is getting bigger by the minute as the amount of raw data and number of source systems continues to burgeon. Data stored in data warehouses is now growing into the tens and hundreds of terabytes very rapidly. A recent study conducted by the Aberdeen Group showed that large enterprises witnessed a 41 percent growth in data per year from 2009 to 20101.The same study also found that 50 percent of the organizations expressed that a lot of the data is not accessed or is underutilized by the business.

In this exploding environment, IT managers have the unenviable task of having to respond faster to increasingly demanding business needs while reducing the costs associated with data delivery. Additionally, the pressure on IT organizations to control costs comes when organizations are mandating expanded deployments of BI and data warehouse implementations. Compounding the problem is the fact that an enterprise deployment of BI and data warehousing requires complex interaction and collaboration between different functional groups before, during, and most importantly after the implementation. In addition, unlike off-the-shelf transactional applications, a BI and data warehouse deployment is constantly evolving. The number of users, query volume, and query complexity change with alarming irregularity as data marts and data warehouses grow in amazing size and complexity.

1Data Management for BI, Aberdeen Group, December 2010

4

Unfortunately, most inefficiency in BI and data warehousing environments arises from a lack of understanding how applications and data are actually used across the organization. Data managers, application database administrators, data architects, and analytical application managers are hampered by a lack of tools designed to provide visibility into what data is used or unused and, more importantly, how data is being used. This information is critical to retain and optimize the most relevant data assets.

Data warehouses experience many of the same data growth issues as transactional systems: higher infrastructure costs to support dormant data in production, large and growing maintenance windows, lower user productivity due to poor performance, the creation of derivative data marts, and the multiplier effect in the many nonproduction copies, with the associated risk of data breaches.

These issues are all more significant for data warehouses because data warehouses integrate data from multiple applications, as well as historical information for analytical reporting, so they are much larger and usually grow faster than transactional systems. Instead of a few terabytes, data warehouses typically contain tens to hundreds of terabytes. According to industry estimates, data warehouses actively use only the first years’ worth of data, but maintaining historical data can easily increase the storage requirement to as much as 20 times that of the current year’s data size. Based on this estimate and the massive potential size of data warehouses, the impact on cost could be substantial.

Inadequacy of Legacy Monitoring ToolsTo help address the issues noted above, most IT managers call upon legacy application and database monitoring tools or application audit logs. However, these solutions are simply not designed to provide the visibility needed to understand business activity and data usage to help organizations manage data warehouses more efficiently.

Legacy application monitoring tools are primarily designed for monitoring and tracing transactional applications. They enable application developers to conduct load testing and transaction profiling necessary to design, test, and deploy transactional applications in production. The legacy application monitoring tools are not designed to provide visibility into how the business uses analytical data.

In addition, legacy database monitoring tools are designed mainly for system database administrators and focus on monitoring system metrics such as CPU, memory, buffer pools, and SQL explain plans to tune the databases. The legacy database monitoring tools do not integrate with analytical applications and therefore cannot provide insight into how the business users and applications interact with analytic data. These legacy monitoring tools, which were designed for transactional database systems and for the use of system DBAs, do not supply adequate analysis of data usage (at an object level) in any meaningful way that can be consumed by data architects, application managers, data warehouse developers, and data management executives.

Most analytic application vendors provide audit logs to monitor application activity. These logs are designed specifically for application administrators to monitor activity for document management, change tracking, security administration, and job scheduling. Because application audit logs are limited to activity related to the application servers, they do not correlate user and application activity with data usage and impact on the data warehouse servers; therefore, they cannot indicate how data is being used by the business.

“We have too much of the same

information that typical DBA

tools provide. But that is not

useful to understand what our

business users are doing and how

data is used.”

Director of Data Management,

Large Financial Services Organization

White Paper

5Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage

Inadequacy of Existing Data Warehouse Management PracticesA data warehouse environment can have many challenges associated with explosive data growth. They include performance, lengthening maintenance windows, burgeoning production infrastructure costs, the inability to meet SLAs, and the multiplier effect. In addition, the usual practice of making complete copies of production for nonproduction purposes compounds these issues.

A common practice to address declining performance is data warehouse tuning. If data warehouse DBAs spend a lot of time tuning, they won’t have time for more proactive activities (that is, unless you hire more DBAs). The data warehouse can reach a point where tuning is no longer effective.

Longer maintenance windows (backup, disaster recovery, replication, and upgrades) due to explosive data growth means maintenance tasks need to be broken up into multiple, shorter windows. This requires more complex planning, or data warehouse availability will be impacted. Because the database is growing, you may find yourself having to address this issue repeatedly, eventually running out of time and cutting into your operational hours.

One of the most common ways to cope with explosive data growth is to purchase hardware to accommodate it. Additional storage and server upgrades are a typical answer to explosive data growth. When you consider that you are purchasing the most expensive storage and servers for production systems to support data that has limited value to the organization, be sure that the hardware you are buying is indeed necessary to support the data residing on it.

Most IT organizations make a complete copy of the production system for their nonproduction environments. Creating full nonproduction copies is inefficient and expensive in terms of maintenance, support, and storage costs, (See Figure 2). As development and testing environments get bogged down with unnecessary and obsolete data, system performance suffers and IT teams struggle to meet service-level agreements.

10 TBproduction

data warehouse

TEST & DEVELOPMENT / NONPRODUCTION

PRODUCTION

data warehouse data warehouse

10 TB copy ofproduction

10 TB copy ofproduction

10 TB copy ofproduction

10 TB copy ofproduction

10 TB copy ofproduction

10 TB copy ofproduction

WITHOUT LEAN DATA WAREHOUSESManaged cost of storage for

10 TB production data warehouse$70,000

Managed cost of storage for6 full-size production copies

$420,000

Overall managed cost of storage$490,000

Using full-size copies of the production data warehouse for testing and development

projects strains storage capacities and can cause unnecessary delays

Over time, using full-size copies in this manner can become costly.

Figure 2. The multiplier effect of data growth in nonproduction environments

6

Introducing Lean Data Warehouses PracticesData warehouses in large enterprises are now routinely growing into the tens and hundreds of terabytes, and data management costs and complexity are growing exponentially. Organizations cannot keep supporting this growth without a prohibitive impact on resource and infrastructure costs. Just as business units rely on data to make informed decisions that drive profitability and cost controls, IT organizations have to assess how the business is utilizing (or underutilizing) the application and data assets to make informed decisions.

Lean Data Warehouses is the practice of monitoring and assessing business activity and data usage and managing data growth to deliver increased operational efficiency, ensure scalability of infrastructure and available IT resources, and reduce data management costs.

For example, recently a large healthcare organization was able to analyze data usage to find out that 87 percent of its data was not being utilized (when reviewed over a three-month period) and just two schemas out of 5,700 received more than 60 percent of the query workload. This analysis allowed them to start focusing their optimization efforts on what was most relevant (and frequently used) and begin to streamline data loads by eliminating what was not needed. The organization suspended an infrastructure upgrade, saving several million dollars, and instead began focusing on getting more out of its existing infrastructure.

Justify costs, prioritize and invest resources based on business utilization

With BI and data warehousing becoming mission critical for organizations, IT teams continue to be pressured to deliver sustained value to the business with shrinking IT budgets. Faced with growing data volumes and increasing business demands for faster and more relevant information, enterprise IT should clearly measure and evaluate how the business is utilizing existing investments to help justify costs, prioritize resources, and make informed investments.

By tracking and measuring business activity, utilization, and data usage trends, organizations can assess and identify underutilized IT assets along with highly utilized but underperforming assets to reduce operational costs and efficiently plan for future capacity and resources.

Retain and optimize the most relevant data and processes

As data volumes continue to explode and business users clamor for access to more data, IT teams need to understand how users interact with data to ensure that the most relevant business information is retained and made readily available. Organizations should track and assess what data is being used, what data is no longer being used, and what data is never used. Dormant or inactive data that is no longer used but needs to be retained for compliance should be archived. Informatica Data Archive allows you to relocate inactive data in data warehouses to either another data warehouse instance on a lower-cost infrastructure or to an optimized file archive format, which is highly compressed and immutable, while maintaining easy access from any reporting tool. Data that is never used should be eliminated from data warehouse loads. Additionally, IT organizations should understand how the business is using data to focus optimization efforts on data that is most relevant to the business. With insight into how data is being used, IT services can be better aligned with the business while reducing the costs of storage and data management.

“I don’t want to spend millions

of dollars in additional hardware

without first figuring out how

we can better utilize what we

support today. I want the ability

to measure who is doing what

and what is used and unused to

manage our data and associated

infrastructure more efficiently.”

IT Director,

Large Healthcare Services Company

White Paper

7Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage

Respond faster and ensure scalability and performance

BI and data warehouse systems are growing in data size and complexity and must be continuously available to support diverse, often global user communities. Business users expect that IT teams proactively discover and respond to issues before critical business services are adversely affected. Instead of managing the BI, data warehouse, and data integration stacks as independent silos, organizations should provide end-to-end visibility to the multifunctional teams responsible for data delivery. By deploying a solution that provides an integrated view into the activity of BI users and applications with a correlated view into data warehouse usage and performance, IT organizations can improve operational efficiency and reduce the time and effort required to diagnose performance bottlenecks.

Reduce the size of production and nonproduction instances to lower costs

Although archiving reduces the size of your production data warehouse and any corresponding copies, creating data subsets in nonproduction copies can further shrink data sizes to address the Big Data challenges in nonproduction environments. Making subsets involves providing a smaller set of data from production based on the most relevant functional or time slices, while maintaining data and referential integrity and meeting the needs of the nonproduction users, such as developers and testers.

Implement Lean Data Warehouse Practices and Take Meaningful Action Based on the Analysis to Deliver Immediate and Quantifiable Benefits Develop key performance indicators (KPIs) to expose business utilization and consumption

Too often, IT organizations rely on intuition and gut instinct to make key decisions relating to hardware and software investments and performance optimizations. By measuring and analyzing usage activity from a business-centric view, IT organizations not only can develop key performance objectives and metrics but can measure the results and variances as well. Delivering actionable information across the IT organization lets the support staff work more efficiently to achieve strategic and tactical objectives. It also lets senior IT managers measure the effectiveness of their investment, drive initiatives that can gain the most business value, and reduce the total cost of ownership.

Identify unused and unnecessary data to streamline data loads and archive inactive data

Because business users are constantly demanding more data, it is imperative for IT managers to assess and identify data that is being unnecessarily loaded every day (and often many times a day) into the warehouse but is not used or required. By identifying unused data (i.e., schemas, tables, and columns), IT can collaborate with the business to develop a more efficient process of sourcing only the necessary data, thereby streamlining data loads and indirectly improving data load times as well.

8

In addition, by identifying dormant data or data that’s no longer used, IT organizations can develop a plan for archiving historical data in a lower-cost infrastructure (See Figure 3). Informatica Data Archive provides a highly compressed (up to 98 percent), immutable, secure optimized archive that can be accessed easily and quickly for e-discovery or reporting purposes. The operational benefits of archiving are clear: minimize maintenance windows and substantially reduce the footprint of your data warehouse (which sits on the most expensive infrastructure), while moving inactive data to the highly compressed archive on less expensive infrastructure.

For example, a large pharmaceutical company has been able to reduce data management costs by more than $500,000 annually by only retaining data that is relevant and used by the business and archiving inactive data. In addition, it continually streamlines its data loads by pruning unnecessary data and has reduced batch load times by 50 percent.

Figure 3. Monitor data usage and identify dormant records for archiving

White Paper

9Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage

Optimize and tune databases based on data usage

A significant challenge for database administrators is lack of knowledge about how columns of data are actually used. Very often, indexes are created to tune the database, based on evaluating explain plans of individual SQL statements. This approach is misleading because the data warehouse query workload is generally ad hoc. By identifying frequently used data that would benefit from indexing (e.g.,, columns most often used in Where, OrderBy, or GroupBy), the DBAs can focus on indexing strategies that provide the most efficient performance (See Figure 4). For example, a leading financial services organization saved weeks of time and effort by implementing indexing strategies based on how columns of data are being used rather than reacting to individual database SQL statements. It leverages usage monitoring and associated analytics to periodically identify frequently used columns of data that are employed in a way that would require indexing and have this information to proactively notify the relevant data warehouse administrators to take appropriate action.

Figure 4: Optimize the data warehouse by monitoring BI reports and used columns

10

Reduce complexity to improve scalability and performance

By analyzing how users and analytic applications interact with data, the IT teams responsible for data delivery can implement strategies to reduce complexity and improve end-user experience. Application performance can be significantly improved by identifying frequent data conversions or functions performed by the applications that are better suited to be performed on the data warehouse. Frequent and expensive data aggregations can be discovered that are better suited to be performed during the data loading stage. Complex queries that have an inordinate number of Joins and Sub Queries should be identified to drive the redesign and modifications for analytical reports to reduce the impact on data warehouses. And finally, with more and more ad hoc reporting done by business users, identify poorly written reports (e.g., reports generating unconstrained queries) that need to be modified.

For example, a large manufacturing organization that supports some 20,000 analytical reports used by more than 10,000 global users evaluates how analytical applications and reports interact with data during the test and development stages to optimize end-user experience and performance. It also monitors usage on production systems to compare data usage and associated performance characteristics against benchmarks created in test and development to rapidly discover areas of optimization. The manufacturing organization estimates savings in excess of $2 million in costs from improving aggregated processing times by over 22,000 hours, which directly impacts business users’ experience and productivity and ensures scalability of existing investments in infrastructure.

Shrink nonproduction instances and further reduce costs

With Informatica Data Subset, your IT organization can quickly build and update nonproduction systems with a small subset of production data and provision nonproduction copies faster. You can easily customize provisioning rules to meet each organization’s changing business requirements. By testing configuration updates with current, realistic data before introducing them into production, you reduce deployment risk. By shrinking the footprint of nonproduction environments, your IT organization can significantly reduce infrastructure and maintenance costs (See Figure 5). You also lower training costs by standardizing on a single approach and infrastructure, while improving training effectiveness by using reliable, production-like data in training environments.

Figure 5. Create lean targeted copies of production data warehouses in nonproduction environments to reduce infrastructure costs and increase development and testing efficiency

10 TBproductiondatabase

data warehouse

TEST & DEVELOPMENT / NONPRODUCTION

PRODUCTION

smaller, targeted productioncopies, or lean clients

3 TB

WITH LEAN DATA WAREHOUSE PRACTICESManaged cost of storage for

10 TB production data warehouse

Managed cost of storage for6 smaller 3 TB production copies

$126,000

Overall managed cost of storage$196,000

By using smaller, targeted clones of the production data warehouse for new projects, testing teams experience less lag time and

save considerable storage costs.3 TB

3 TB

3 TB

3 TB3 TB

White Paper

11Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage

Informatica Solutions for Lean Data Warehouses

Informatica Data Warehouse AdvisorInformatica Data Warehouse Advisor is a software solution that monitors how business units and departments use data so that IT organizations can improve operational efficiency, scalability, and performance and control data delivery costs.

Informatica Data Warehouse Advisor monitors business users’ activity from business intelligence (BI) tools, such as Microstrategy, SAP Business Objects, IBM Cognos, and Oracle BI. It tracks who accesses which reports and which data in the data warehouse is accessed by those reports. The software also monitors data usage, such as which tables, columns, and records in the data warehouse are used most often and who accesses and uses which data, including sensitive data.

Informatica Data Warehouse Advisor measures data warehouse query performance. It determines which queries are issued to identify, for example, the indexes that need to be created on the tables to improve query performance.

The software also monitors Informatica PowerCenter® workflow performance. It correlates PowerCenter workflow and data warehouse workloads so that PowerCenter workflows can be scheduled to run during lower data warehouse workloads, thus improving workflow performance. With better visibility into operations against the data warehouse, IT can troubleshoot PowerCenter workflow errors easier and faster.

Informatica Data Archive

Informatica Data Archive manages data growth in Big Data warehouses by relocating dormant data that Informatica Data Warehouse Advisor identifies as dormant. Informatica Data Archive is highly scalable, high-performance software that helps IT organizations cost-effectively manage the proliferation of data volumes in a variety of enterprise applications. The software enables IT teams to safely and easily archive structured data in databases, enterprise applications, and data warehouses and then readily access it when needed.

With Informatica Data Archive, IT organizations can identify and move inactive data to another lower-cost data warehouse infrastructure or to a secure, highly compressed, immutable file. Easy access is maintained to the combined production and archived data from any reporting or BI tool. By archiving the data to a highly compressed file that offers up to 98 percent compression, it also greatly reduces storage requirements.

12

Informatica Data Subset

Informatica Data Subset is flexible enterprise software that automates the creation of smaller, targeted databases from large, complex databases. Integrated with the Informatica PowerCenter platform for built-in scalability, robustness, and enterprise-wide connectivity to access any nonproduction database, the software supports crating subsets of all enterprise data regardless of database, platform, or location.

Each data subset is a referentially intact, compact production data copy that enables IT organizations to dramatically reduce the time, effort, and disk space necessary to support nonproduction environments. By quickly replicating and refreshing production data with only the most relevant, realistic, high-quality application data, Informatica Data Subset eliminates the need to create a full database copy. The software helps untangle complex transactional systems and data warehouses,

separating out functionally related data.

Remove from datawarehouse load

Business Usersand Departments

BI Applications

Data Warehouse

ETL

ActiveActive

Dormant

Unused

Figure 6: Use Lean Data Warehouse practices to monitor data usage, identify dormant data that can be archived, eliminate unused data from data warehouse loads, and create lean subsets in nonproduction environments to further reduce costs

White Paper

13Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage

ABOuT InfORMATICA

Informatica Corporation (NASDAQ: INFA) is the world’s number one independent provider of data integration software. Organizations around the world rely on Informatica to gain a competitive advantage with timely, relevant and trustworthy data for their top business imperatives. Worldwide, over 4,630 enterprises depend on Informatica for data integration, data quality and big data solutions to access, integrate and trust their information assets residing on premise and in the Cloud. For more information, call +1 650-385-5000 (1-800-653-3871 in the U.S.), or visit www.informatica.com.

ConclusionData warehouses in large enterprises are routinely growing into the tens and hundreds of terabytes, and the associated data management costs and complexity are growing exponentially. Organizations cannot continue supporting this growth without a prohibitive impact on costs associated with resources and infrastructure.

Lean Data Warehouses is composed of best practices and solutions that leverage usage monitoring and data growth management to increase operational efficiencies, ensure scalability of infrastructure and available IT resources, and reduce data growth management costs. Effective usage monitoring requires a solution that integrates with the BI, data warehouse, and data integration stacks to provide a comprehensive view into business activity and data usage.

Monitoring data usage is only the first step in the Lean Data Warehouses practice. Once you see how data is used, you need to act upon it by:

• Eliminating unused data from data loads

•Optimizing the data warehouse schema

• Proactively archiving data periodically to reduce the data warehouse size, creating lean subsets of production data in nonproduction environments to further reduce costs

Informatica delivers the best-in-class technology and solutions to implement Lean Data Warehouses practices. (See Figure 6) With Informatica Lean Data Warehouses solutions, you’ll lower the total cost of ownership of your data warehouses and other applications by:

•Reducing storage, server, software, and maintenance costs

• Improving data warehouse performance

• Increasing data warehouse availability

•Supporting compliance with internal, industry, and governmental mandates and regulations

Together, Informatica and your IT organization can align the business value of data in your data warehouses with the most appropriate and cost-effective IT infrastructure to manage it.

W H I T E P A P E R

Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USAphone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.653.3871 www.informatica.com

© 2012 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, The Data Integration Company, Ultra Messaging, and RulePoint are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. First Published: December 2011 1887 (02/16/2012)