View
994
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Best Practices in Database Archiving
and Information Lifecycle
An InformationWeek Webcast
Sponsored by
Webcast Logistics
Today’s Presenter
Carl Olofson,
Research Vice President,
Application Development and Deployment,
IDC
Copyright IDC. Reproduction is forbidden unless authorized. All rights reserved.
Best Practices in Database Archiving and Information Lifecycle ManagementHow ILM Saves Money, Reduces Risk
Carl Olofson
Research Vice President
IDC
May 2011
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC 5
Agenda
The Problem
Unchecked database growth
Hidden costs of large databases
Security and privacy in test data
Information Lifecycle Management
What is ILM?
Database archiving
– Requirements of database archiving
– Benefits of database archiving
Test data masking
– How data is masked
– Benefits of data masking
Conclusions / Recommendations
Source:/Notes:
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Unchecked Database Growth
As a database grows…
It requires larger indices
It consumes more storage
It requires specialized administration to tune
It needs more processor power to execute queries and updates
The hidden costs include
More storage administration
More downtime for reorgs
Larger batch windows for backups
6
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Polling Question #1
How rapidly is your main production database growing?
Under 10% per year
10% per year
25% per year
Over 25% per year
7
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Elements of Test Data Management
Selecting the data
Must be referentially complete subset of the database
Must reflect realistic patterns of data to ensure valid testing
Protecting sensitive data
Sensitive data must be masked to prevent unauthorized viewing
Masked data needs to make sense to the test system.
8
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Security and Privacy in Test Data
Normal Security Is Often Suspended for Test Data
Confidential data could be compromised
Privacy requirements could be breached
Corporate policies may be violated
Contractual requirements and government regulations could lead
to legal culpability
In-House Masking Is Inadequate
Simplistic results create unrealistic test data
Code must be changed as the database changes, an
unreasonable burden on in-house IT
9
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Polling Question #2
In what role is the person in your organization primarily
responsible for refreshing test data?
DBA
Development Manager
Project Leader
Developer
Other
10
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Information Lifecycle Management (ILM)
Manage
Define
Protect
Test
Archive
11
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
The Basic Elements of ILM
Definition
Policies governing data creation, management, removal
Security
Encryption and access control at a granular level
Protection
Blocking access to sensitive data, including test data
Data test data protection done through data masking
Archiving
Removal of inactive data from the live database
Storage in a compressed, read-only datastore
12
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
The Data Masking Challenge
Application testing requirements
Using simple XXXX or #### or “Ipsum lorem” usually not
adequate for robust application testing.
Data must be representative of actual data in value range and
distribution.
Masked data must “make sense”; zip codes correlate to city and
state, for instance.
Secured information, such as personal identification, should not
be inferable from the masked data.
The fake data should be consistent.
13
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Archiving: Types of Data
Reference
Created in response to a stand-alone event.
Randomly retrieved without requiring context
Active until a special event
Examples: Customer, Patient, Product
Transactional
Created at the start of a business process.
Retrieved in the context of a transaction
Deactivated at the end of a business process.
Examples: Sales order, treatment, shipment
Streaming
Created at reception of a streamed item
Inactive immediately (cannot be updated)
14
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Classes of Data
Active
Data that is still being updated.
Includes reference and transactional data.
Inactive
Data no longer active, but retained for query and reporting
Includes historical and streamed data
Historical data is inactive transaction data
– Sales order completed, revenue recognized
– Inventory item sold and picked up
– Patient treatment completed, patient discharged
15
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Buildup of Inactive Data
Hypothetical Example
Suppose we have a sales order table
We start the year with 10,000 orders per month
Orders grow at 1% per month
Each order takes 60 days to complete (recognize revenue)
Orders in process are active data
Completed orders are inactive data
16
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Buildup of Inactive Transaction Data
17
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Ro
ws
Sales Order Table
Inactive
Active
Inactive %
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Inactive Data Clogs the Database
DBMS Overhead
Big Indexes
Storage demand
Slower queries
Slower transaction processing
Operational Overhead
DBA tuning
Disruption for unload/reload and reorg
Longer backup batch windows
18
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Polling Question # 3
Think of transaction data that you retain. What is your required
retention period?
3-5 years
6-10 years
Over 10 years
We don’t have a retention policy
19
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Approaches to “Aging Out” Data
Partitioning
Move data to low frequency partition on 2nd or 3rd tier storage
Use local partition indexes to avoid growth of global table indexes
Perform maintenance operations by physical partition
Problem: this approach impacts the whole table, and creates a complex operational and management challenge that extends across the database
Archiving
Select referentially complete subsets of inactive data
Move the inactive data to an archiving system outside the database
Ensure that the archive can support SQL and that queries can, if necessary, be executed in an integrated manner with those of the live database.
20
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Benefits of Archiving
Database benefits
Faster queries
Less index maintenance overhead
Smaller dataspaces and simpler schema than partitioning option
Requires less CPU; license/maintenance savings for DB and
applications
Operational benefits
Less schema maintenance than partitioning option
Stable backup windows
Much less data reorganization21
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Application Retirement
Inactive Applications
Applications become inactive when they are no longer used, and their
functions have been migrated elsewhere.
They commonly still have data that must be retained for corporate
policy or legal reasons.
For this reason, enterprises keep them running, maintaining them, and
paying fees for them even though they are inactive.
Retiring Inactive Applications
All their data is inactive, so it may be archived altogether
The archiving system must retain the ability to report on the data.
The savings in servers, storage, software, and operations costs can be
very significant.
22
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Critical Requirements of Database Archiving
DBMS Support
Must support ongoing versions of major RDBMS including DB2, Informix, Oracle, Sybase ASE, Microsoft SQL Server, and MySQL
Must record schema and schema changes to support data retrieval even after data definitions have changed.
Must support SQL and ODBC/JDBC used by applications.
Technical requirements
Random data retrieval
Compressed, optimized based on read-only access
Reasonable performance on 2nd and 3rd tier storage
23
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Data Governance
Purpose is to ensure that data is trustworthy
Data is well defined, and maintenance is rational
Original source is known
Sequence and agents of update are known (provenance)
Data is valid and consistent
No unauthorized access has happened
No sensitive data is visible to unauthorized personnel
Data is retained as required without compromising performance
Business Benefits
Database development and management addresses known business needs
Trade secrets are not exposed and confidences are not compromised
Ensures contractual and legal requirements compliance
Reduces risk of actual or opportunity cost due to data-driven application error
24
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
ILM and Data Governance
Data GovernanceUniform Data Definition & Policy Management
Information Lifecycle Management
Managed Data Selection & Retention
Database Subsetting
Database Archiving
Data Protection
Test Data Masking
Trust Management
Validity and Consistency Assurance
Data Quality and
Profiling
Data Cleansing
Security & Monitoring
Access Control and Encryption
Provenance Tracking
Access Log Analysis
25
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
ILM and Database Development and Management Tools
Database Development and Management Tools (DDMT)
Software used by DBAs and data managers to manage the size,
performance, and reliability/recoverability of databases
Includes DBA tools, database replication software, development
and optimization software, and database archiving / ILM.
The ILM Segment of the DDMT Market
Just 4.6% in 2009, but the fastest growing segment; the only
segment to show positive growth in that tough economic year.
Projected to show the greatest growth of all DDMT segments to
2014, with a forecast CAGR of 9.9% from $90 m to $188 m.
26
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
What’s IBM’s Share in the ILM Market Segment
27
IBM56%Informatica
13%
HP11%
CA4%
Solix4% Other
12%
Revenue ($M)
Total = $89.9 MillionSource: IDC, 2010
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Conclusions and Recommendations
Conclusions
Data governance is critical because the utility and trustworthiness of enterprise data cannot be left to chance.
ILM addresses the key dimension of data size management in relation to data retention, and test data management.
These functions cannot be developed and maintained in-house.
Recommendations
Users should carefully review their data access and retention policies and ensure that those policies are carried out.
In most cases, the best approach to ensuring data retention without bloating the databases is to employ database archiving.
Test data management is not trivial; find professionally developed data masking and subsetting tools.
IBM’s InfoSphere Optim leads the market in addressing these key ILM requirements.
28
May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC
© 2011 IBM Corporation
Information Management
Easily refresh & maintain right sized non-production environments, while reducing storage costs
Improve application quality and deploy new functionality more quickly
Speed understanding and project time through
relationship discovery within and across data sources
Understand sensitive data to protect and secure it
IBM InfoSphere Optim solutionsManaging data throughout its lifecycle in heterogeneous environments
Production
Training
Development
Test
Archive
Subset
Mask
Reduce hardware, storage & maintenance costs
Streamline application upgrades and improve
application performance
Data Growth Management
Test Data Management
Data Masking
Protect sensitive information from misuse & fraud
Prevent data breaches and associated fines
Retire
Discover
Understand
Classify
Discover
Safely retire legacy & redundant applications while
retaining the data
Ensure application-independent access to archive
data
Application Retirement
© 2011 IBM Corporation
Information Management
Managing Data Across its Lifecycle
Information GovernanceQuality Management – Lifecycle – Security & Privacy
Develop &Test
Discover &Define
Optimize, Archive & Access
Consolidate &Retire
Enable compliance with retention & e-
discovery
Rationalize application portfolio
Validate test resultsDefine policiesReport & retrieve
archived data
Create & refresh test data
Manage data growthClassify & define data
and relationships
Develop database structures & code
Enhance performanceDiscover where data resides
© 2011 IBM Corporation
Information Management
You can’t govern what you don’t understand
Define business objects for archival and
test data applications
– Automation of manual activities
accelerates time to value
Discover data transformation rules and
heterogeneous relationships
– Business insight into data
relationships reduces project risk
Identify hidden sensitive data for privacy
– Provides consistency across
information agenda projects
?
??
??
??
?
???
?
?
?
?
?
?
?
??
?
??
?
??
?
?
?
?
Distributed Data Landscape
Discover &Define
© 2011 IBM Corporation
Information Management
100 GB
25 GB
50 GB
• Create targeted, right-sized test environments
• Substitute sensitive data with fictionalized yet contextually accurate data
• Easily refresh, reset and maintain test environments
• Compare data to pinpoint and resolve application defects faster
• Accelerate release schedules
Employ effective test data management practices
Production or Production Clone
25 GB
2TB
Development
Unit Test
TrainingIntegration
Test
Subset & Mask
Develop &Test
© 2011 IBM Corporation
Information Management
Archive historical data for data growth management
Current
Production
Historical
Archive
RetrieveRestored Data
Universal Access to Application Data
Data
Archives
Historical Data
Reference Data
Data Archiving is an intelligent process for moving inactive or infrequently
accessed data that still has value, while providing the ability to search and
retrieve the data
Can selectively restore archived data records
ODBC / JDBC XML Report WriterApplicationMashup Center Data Find
Optimize, Archive & Access
© 2011 IBM Corporation
Information Management
Retire redundant and legacy applications
Preserve application data in its business context
– Capture all related data, including transaction details, reference data & associated
metadata
– Capture any related reference data may reside in other application databases
Retire out-of-date packaged applications as well as legacy custom applications
– Leverage out-of-box support of packaged applications to quickly identify & extract the
complete business object
Shut down legacy system without a replacement
– Provide fast and easy retrieval of data for research and reporting, as well as audits
and e-discovery requests
Consolidate &Retire
Infrastructure before Retirement Archived Data after Consolidation
`
User Archive DataArchive Engine
`
User
`
User
`
User DatabaseApplication Data
`
User DatabaseApplication Data
`
User DatabaseApplication Data
© 2011 IBM Corporation
Information Management
Resources to Learn More!
InfoSphere Optim Solutions page:http://www-01.ibm.com/software/data/optim/
–IDC Worldwide Database Development and
Management Tools 2009 Vendor and Segment Analysis
Report
–Whitepaper: Control Application Data Growth Before It
Controls Your Business
–Whitepaper: Enterprise Strategies to Improve
Application Testing
–InfoSphere Optim Solutions for Custom and Packaged
Applications Solution Brief
Q&A Session
Please Submit Your Questions Now