37
Best Practices in Database Archiving and Information Lifecycle An InformationWeek Webcast Sponsored by

525 ibm optim

  • View
    994

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 525 ibm optim

Best Practices in Database Archiving

and Information Lifecycle

An InformationWeek Webcast

Sponsored by

Page 2: 525 ibm optim

Webcast Logistics

Page 3: 525 ibm optim

Today’s Presenter

Carl Olofson,

Research Vice President,

Application Development and Deployment,

IDC

Page 4: 525 ibm optim

Copyright IDC. Reproduction is forbidden unless authorized. All rights reserved.

Best Practices in Database Archiving and Information Lifecycle ManagementHow ILM Saves Money, Reduces Risk

Carl Olofson

Research Vice President

IDC

May 2011

Page 5: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC 5

Agenda

The Problem

Unchecked database growth

Hidden costs of large databases

Security and privacy in test data

Information Lifecycle Management

What is ILM?

Database archiving

– Requirements of database archiving

– Benefits of database archiving

Test data masking

– How data is masked

– Benefits of data masking

Conclusions / Recommendations

Source:/Notes:

Page 6: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Unchecked Database Growth

As a database grows…

It requires larger indices

It consumes more storage

It requires specialized administration to tune

It needs more processor power to execute queries and updates

The hidden costs include

More storage administration

More downtime for reorgs

Larger batch windows for backups

6

Page 7: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Polling Question #1

How rapidly is your main production database growing?

Under 10% per year

10% per year

25% per year

Over 25% per year

7

Page 8: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Elements of Test Data Management

Selecting the data

Must be referentially complete subset of the database

Must reflect realistic patterns of data to ensure valid testing

Protecting sensitive data

Sensitive data must be masked to prevent unauthorized viewing

Masked data needs to make sense to the test system.

8

Page 9: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Security and Privacy in Test Data

Normal Security Is Often Suspended for Test Data

Confidential data could be compromised

Privacy requirements could be breached

Corporate policies may be violated

Contractual requirements and government regulations could lead

to legal culpability

In-House Masking Is Inadequate

Simplistic results create unrealistic test data

Code must be changed as the database changes, an

unreasonable burden on in-house IT

9

Page 10: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Polling Question #2

In what role is the person in your organization primarily

responsible for refreshing test data?

DBA

Development Manager

Project Leader

Developer

Other

10

Page 11: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Information Lifecycle Management (ILM)

Manage

Define

Protect

Test

Archive

11

Page 12: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

The Basic Elements of ILM

Definition

Policies governing data creation, management, removal

Security

Encryption and access control at a granular level

Protection

Blocking access to sensitive data, including test data

Data test data protection done through data masking

Archiving

Removal of inactive data from the live database

Storage in a compressed, read-only datastore

12

Page 13: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

The Data Masking Challenge

Application testing requirements

Using simple XXXX or #### or “Ipsum lorem” usually not

adequate for robust application testing.

Data must be representative of actual data in value range and

distribution.

Masked data must “make sense”; zip codes correlate to city and

state, for instance.

Secured information, such as personal identification, should not

be inferable from the masked data.

The fake data should be consistent.

13

Page 14: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Archiving: Types of Data

Reference

Created in response to a stand-alone event.

Randomly retrieved without requiring context

Active until a special event

Examples: Customer, Patient, Product

Transactional

Created at the start of a business process.

Retrieved in the context of a transaction

Deactivated at the end of a business process.

Examples: Sales order, treatment, shipment

Streaming

Created at reception of a streamed item

Inactive immediately (cannot be updated)

14

Page 15: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Classes of Data

Active

Data that is still being updated.

Includes reference and transactional data.

Inactive

Data no longer active, but retained for query and reporting

Includes historical and streamed data

Historical data is inactive transaction data

– Sales order completed, revenue recognized

– Inventory item sold and picked up

– Patient treatment completed, patient discharged

15

Page 16: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Buildup of Inactive Data

Hypothetical Example

Suppose we have a sales order table

We start the year with 10,000 orders per month

Orders grow at 1% per month

Each order takes 60 days to complete (recognize revenue)

Orders in process are active data

Completed orders are inactive data

16

Page 17: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Buildup of Inactive Transaction Data

17

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Ro

ws

Sales Order Table

Inactive

Active

Inactive %

Page 18: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Inactive Data Clogs the Database

DBMS Overhead

Big Indexes

Storage demand

Slower queries

Slower transaction processing

Operational Overhead

DBA tuning

Disruption for unload/reload and reorg

Longer backup batch windows

18

Page 19: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Polling Question # 3

Think of transaction data that you retain. What is your required

retention period?

3-5 years

6-10 years

Over 10 years

We don’t have a retention policy

19

Page 20: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Approaches to “Aging Out” Data

Partitioning

Move data to low frequency partition on 2nd or 3rd tier storage

Use local partition indexes to avoid growth of global table indexes

Perform maintenance operations by physical partition

Problem: this approach impacts the whole table, and creates a complex operational and management challenge that extends across the database

Archiving

Select referentially complete subsets of inactive data

Move the inactive data to an archiving system outside the database

Ensure that the archive can support SQL and that queries can, if necessary, be executed in an integrated manner with those of the live database.

20

Page 21: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Benefits of Archiving

Database benefits

Faster queries

Less index maintenance overhead

Smaller dataspaces and simpler schema than partitioning option

Requires less CPU; license/maintenance savings for DB and

applications

Operational benefits

Less schema maintenance than partitioning option

Stable backup windows

Much less data reorganization21

Page 22: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Application Retirement

Inactive Applications

Applications become inactive when they are no longer used, and their

functions have been migrated elsewhere.

They commonly still have data that must be retained for corporate

policy or legal reasons.

For this reason, enterprises keep them running, maintaining them, and

paying fees for them even though they are inactive.

Retiring Inactive Applications

All their data is inactive, so it may be archived altogether

The archiving system must retain the ability to report on the data.

The savings in servers, storage, software, and operations costs can be

very significant.

22

Page 23: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Critical Requirements of Database Archiving

DBMS Support

Must support ongoing versions of major RDBMS including DB2, Informix, Oracle, Sybase ASE, Microsoft SQL Server, and MySQL

Must record schema and schema changes to support data retrieval even after data definitions have changed.

Must support SQL and ODBC/JDBC used by applications.

Technical requirements

Random data retrieval

Compressed, optimized based on read-only access

Reasonable performance on 2nd and 3rd tier storage

23

Page 24: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Data Governance

Purpose is to ensure that data is trustworthy

Data is well defined, and maintenance is rational

Original source is known

Sequence and agents of update are known (provenance)

Data is valid and consistent

No unauthorized access has happened

No sensitive data is visible to unauthorized personnel

Data is retained as required without compromising performance

Business Benefits

Database development and management addresses known business needs

Trade secrets are not exposed and confidences are not compromised

Ensures contractual and legal requirements compliance

Reduces risk of actual or opportunity cost due to data-driven application error

24

Page 25: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

ILM and Data Governance

Data GovernanceUniform Data Definition & Policy Management

Information Lifecycle Management

Managed Data Selection & Retention

Database Subsetting

Database Archiving

Data Protection

Test Data Masking

Trust Management

Validity and Consistency Assurance

Data Quality and

Profiling

Data Cleansing

Security & Monitoring

Access Control and Encryption

Provenance Tracking

Access Log Analysis

25

Page 26: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

ILM and Database Development and Management Tools

Database Development and Management Tools (DDMT)

Software used by DBAs and data managers to manage the size,

performance, and reliability/recoverability of databases

Includes DBA tools, database replication software, development

and optimization software, and database archiving / ILM.

The ILM Segment of the DDMT Market

Just 4.6% in 2009, but the fastest growing segment; the only

segment to show positive growth in that tough economic year.

Projected to show the greatest growth of all DDMT segments to

2014, with a forecast CAGR of 9.9% from $90 m to $188 m.

26

Page 27: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

What’s IBM’s Share in the ILM Market Segment

27

IBM56%Informatica

13%

HP11%

CA4%

Solix4% Other

12%

Revenue ($M)

Total = $89.9 MillionSource: IDC, 2010

Page 28: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Conclusions and Recommendations

Conclusions

Data governance is critical because the utility and trustworthiness of enterprise data cannot be left to chance.

ILM addresses the key dimension of data size management in relation to data retention, and test data management.

These functions cannot be developed and maintained in-house.

Recommendations

Users should carefully review their data access and retention policies and ensure that those policies are carried out.

In most cases, the best approach to ensuring data retention without bloating the databases is to employ database archiving.

Test data management is not trivial; find professionally developed data masking and subsetting tools.

IBM’s InfoSphere Optim leads the market in addressing these key ILM requirements.

28

Page 29: 525 ibm optim

May-11© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Page 30: 525 ibm optim

© 2011 IBM Corporation

Information Management

Easily refresh & maintain right sized non-production environments, while reducing storage costs

Improve application quality and deploy new functionality more quickly

Speed understanding and project time through

relationship discovery within and across data sources

Understand sensitive data to protect and secure it

IBM InfoSphere Optim solutionsManaging data throughout its lifecycle in heterogeneous environments

Production

Training

Development

Test

Archive

Subset

Mask

Reduce hardware, storage & maintenance costs

Streamline application upgrades and improve

application performance

Data Growth Management

Test Data Management

Data Masking

Protect sensitive information from misuse & fraud

Prevent data breaches and associated fines

Retire

Discover

Understand

Classify

Discover

Safely retire legacy & redundant applications while

retaining the data

Ensure application-independent access to archive

data

Application Retirement

Page 31: 525 ibm optim

© 2011 IBM Corporation

Information Management

Managing Data Across its Lifecycle

Information GovernanceQuality Management – Lifecycle – Security & Privacy

Develop &Test

Discover &Define

Optimize, Archive & Access

Consolidate &Retire

Enable compliance with retention & e-

discovery

Rationalize application portfolio

Validate test resultsDefine policiesReport & retrieve

archived data

Create & refresh test data

Manage data growthClassify & define data

and relationships

Develop database structures & code

Enhance performanceDiscover where data resides

Page 32: 525 ibm optim

© 2011 IBM Corporation

Information Management

You can’t govern what you don’t understand

Define business objects for archival and

test data applications

– Automation of manual activities

accelerates time to value

Discover data transformation rules and

heterogeneous relationships

– Business insight into data

relationships reduces project risk

Identify hidden sensitive data for privacy

– Provides consistency across

information agenda projects

?

??

??

??

?

???

?

?

?

?

?

?

?

??

?

??

?

??

?

?

?

?

Distributed Data Landscape

Discover &Define

Page 33: 525 ibm optim

© 2011 IBM Corporation

Information Management

100 GB

25 GB

50 GB

• Create targeted, right-sized test environments

• Substitute sensitive data with fictionalized yet contextually accurate data

• Easily refresh, reset and maintain test environments

• Compare data to pinpoint and resolve application defects faster

• Accelerate release schedules

Employ effective test data management practices

Production or Production Clone

25 GB

2TB

Development

Unit Test

TrainingIntegration

Test

Subset & Mask

Develop &Test

Page 34: 525 ibm optim

© 2011 IBM Corporation

Information Management

Archive historical data for data growth management

Current

Production

Historical

Archive

RetrieveRestored Data

Universal Access to Application Data

Data

Archives

Historical Data

Reference Data

Data Archiving is an intelligent process for moving inactive or infrequently

accessed data that still has value, while providing the ability to search and

retrieve the data

Can selectively restore archived data records

ODBC / JDBC XML Report WriterApplicationMashup Center Data Find

Optimize, Archive & Access

Page 35: 525 ibm optim

© 2011 IBM Corporation

Information Management

Retire redundant and legacy applications

Preserve application data in its business context

– Capture all related data, including transaction details, reference data & associated

metadata

– Capture any related reference data may reside in other application databases

Retire out-of-date packaged applications as well as legacy custom applications

– Leverage out-of-box support of packaged applications to quickly identify & extract the

complete business object

Shut down legacy system without a replacement

– Provide fast and easy retrieval of data for research and reporting, as well as audits

and e-discovery requests

Consolidate &Retire

Infrastructure before Retirement Archived Data after Consolidation

`

User Archive DataArchive Engine

`

User

`

User

`

User DatabaseApplication Data

`

User DatabaseApplication Data

`

User DatabaseApplication Data

Page 36: 525 ibm optim

© 2011 IBM Corporation

Information Management

Resources to Learn More!

InfoSphere Optim Solutions page:http://www-01.ibm.com/software/data/optim/

–IDC Worldwide Database Development and

Management Tools 2009 Vendor and Segment Analysis

Report

–Whitepaper: Control Application Data Growth Before It

Controls Your Business

–Whitepaper: Enterprise Strategies to Improve

Application Testing

–InfoSphere Optim Solutions for Custom and Packaged

Applications Solution Brief

Page 37: 525 ibm optim

Q&A Session

Please Submit Your Questions Now