30
Intelligent Archiving Strategies: Toward ILM Arun Taneja, Founder and Consulting Analyst, Taneja Group Alex Gorbansky, Senior Analyst, Taneja Group

Intelligent Archiving Strategies: Toward ILM Arun Taneja, Founder and Consulting Analyst, Taneja Group Alex Gorbansky, Senior Analyst, Taneja Group

Embed Size (px)

Citation preview

Intelligent Archiving Strategies: Toward ILM

Arun Taneja, Founder and Consulting Analyst, Taneja GroupAlex Gorbansky, Senior Analyst, Taneja Group

Agenda

A Bit of Historical Perspective

Why Archive?

What to Archive?

The ILM Panacea

Developing an Operational Archival Strategy

Key Considerations

Representative Vendors and Solutions

Conclusions

Archival ≠ Backup

BACKUP

Copying production data to an

alternative medium for restorability

in the event of data loss, corruption,

or unavailability.

ARCHIVAL

Retention of historical data for future

access for business reasons such as

audits, customer issues, or litigation.

Some History On Archiving

3000 BCE

Ancient Egypt:• Library of Alexandria• Engravings

Middle Ages 1600s 1789 1884

Shift from Feudalism To Nation State:• Records• Property rights

American colonists• Births • Marriages• Businesses

French Revolution• Property records

American Historical Association• Archival standards • Marriages• Businesses

Archival Business Drivers Today

REGULATORY COMPLIANCE

REQUIREMENTS

EXPLOSIVE DATA GROWTH

APPLICATION PERFORMANCE

DEGRADATION

RISING COSTS

What to Archive?

Structured Data: • ERP/CRM DB tiers

• Business transactions

Unstructured Data: • Documents

• X-Rays

• Check Images

• Voice recording

Semi-structured Data:• Email

• Instant Messaging

ILM…ShmILM

“ILM” is an abstract framework for

describing the processes and technology

used to manage information throughout

its life according to its business value.

“ILM” is NOT the panacea for your

storage management challenges.

Archival is a key component of what vendors are calling “ILM

Applications: ERP, CRM, Email, Call Recording, Image Access

Application Data: Structured, Unstructured, Semi-Structured

Policies and Rules

Business ContextReferential Integrity Regulatory Compliance

Snapshots HSM

Data Movement Technologies

Replication Backup Archiva

l

Primary

Storage Infrastructure Tiers

Secondary Tertiary

Developing an Archival Strategy

PLAN When/How

Data Classification

Requirements

2. DESIGN

3. IMPLEMENT

4. REPORT &

TEST

Why Plan and When to Start

Upfront Planning will Result in Significant Benefits in Future Phases.

Develop an Archival Strategy as part of your application design and development process.

Engage Key Stakeholders: • Application Owners

• Business Decision Makers: Compliance Officers, Legal

Identify Key Archival Business Drivers:• Regulatory Compliance

• Other: Data Growth, Increasing Costs, Poor Performance

The Data Classification Puzzle

Assess the application data in your shop according to the following categories:• Structured: database

• Unstructured: files, videos, images

• Semi-structured: email

Identify specific data sets impacted by regulatory compliance:• Examples: Email, Medical Records, Call Recordings

Requirements DefinitionEngage Application Owners

Compliance not the ONLY archival driver

Separate requirements processes for

applications impacted by compliance.

Compliance-specific:

• Retention period

• Media characteristics

• Data restorability rates

• Access control policies

• Data availability/DR

General archival:

• Data Access Patterns

• Restore time requirements

• Application performance

• Cost structure

• Access control policies

• Data availability/DR

Taming the Compliance Monster

1. Understand the Regulations: Significant

Variance by Industry

2. Assess/Communicate Requirements to Key

Business Stakeholders

3. Judge Products for Yourself – Just because a

vendor says a solution is “Compliant” doesn’t

make it so.

4. Stay abreast of changes in regulatory mandates.

Defining Key Archival Metrics

Archive Distribution Percentages Across:• Online: Disk, Object-based storage

• Near-line: Optical, Tape (local)

• Off-line: Off-site vaults

Number of data copies• Local

• Remote

Designing an Archival Solution

Requires an application specific

assessment – look for commonality in

application requirements

Wholly enterprise-wide strategies will be

difficult to build and sustain

Evaluate alternative solutions based on

application requirements and metrics

Don’t Ignore the Organizational Dynamics

Archival Touches Multiple Organizations:• IT – Applications

• IT – Infrastructure

• Legal

• Users

Consequences of mistakes are enormous:• Fines

• Litigation

Consider organizing a cross-functional team led by an archival champion with a combination of technical and business expertise

Comprehensive Application Assessment

Data Classification Exercise

Data Set Size and Historical and Predicted Data Growth Rates based on business drivers

Is Regulatory Compliance an Issue?

Data Valuation over Time:• Access patterns of data of 90 days old and beyond.

• Cost of data loss

Going it alone can be difficult

Available resources: • Services organizations: GlassHouse, Accenture, EDS, Storage Vendor

• Application Management Tools: File-Level SRM, Precise

Budgetary Requirements

Components of the Archival Stack

Application Specific Module

Discovery and analysis of data assets

Business rules and policies definitions

Identification and movement of specific data to

appropriate storage medium

Management, indexing of data and metadata

Access control mechanism

Application Data

Storage Infrastructure

Physical archive repository

Data Preservation and Protection

Indexing Technologies for Retrieval

Management

& Control

Physical

Repository

Data Flow

Structured Data Archival Challenges to Investigate

ERP deployments are still very nascent

Preventing application downtime during archival

Preserving referential data integrity:• Archival of core data and associated data in other tables

Enforcing single read-only state across related data

Delivering transparent access to archived/combined data via native app UI• Maintaining performance of remote queries and union views.

Update process:• Restate vs. entire reload

Unstructured Data Considerations

Scalability

Sustained performance with data growth• Hierarchical file-systems limited at large scales

Content Access and Visibility

Meta data use to intelligently manage and maintain archive addresses traditional file system limitations

Scalability of Index (Content addresses)

Email Archival Challenges

Stringent regulations: SEC Rule 17A-4• Non-rewriteable, non-reusable media

• Verification of writes

• Serialize units of media

Solution Requirements• Server-based capture

• Support for multiple distributed Email Servers

Meta Data Holds Real Value

• Digital asset tied to specific infrastructure

• No value outside of infrastructure context

• Self-describing attributes for digital asset

• Enables powerful policy-based data movement applications

Traditional File Systems

Object-based systems

Meta Data is data about data

Object Age and creation date

Object Change History

Associated application/users

Access control

Priority/Criticality

Data Access/Frequency

Choosing the Right Storage Medium

Amount of D

ata

Probability of Reuse

D2D Systems

Libraries

Drives

< Seconds Minutes Hours to Days

1 WeekLife Expectancy

1 Month 3 Months 18 – 30 Years1 Year

Disk Systems

Recovery Time

Object Storage

Key Considerations for Storage Media

Cost

Access time

Application access method:• NFS/CIFS

• Application-specific API

Reliability/Availability

Data Preservation Capability

Scalability

Archival solution integration

Storage Media Considerations

Pros Cons

Primary Storage No risk of data loss

Instantaneous access

Exorbitant costs

Performance degradation

Secondary

Storage (SATA)

Cost effective

Solid access time

Integration

Enforcing preservation

Management

Object Storage Fit for large unstructured files

Elimination of data redundancy

WORM-like preservation

Price premium

Performance scalability with

index growth

Tape Most cost effective

Removable

Integrated WORM

Access time

Reliability

Shifting towards an On-line Model

Primary

Object

Storage

Tape

SATA

Representative VendorsStructured Data Email Unstructured

Archival

Solutions

OuterBay, Princeton Softech,

Applimation, Ixos,

Legato, KVS, Assentor Documentum, FileNet,

NICE

SATA Object Tape

Storage

Platforms

CLARiiON, STK, IBM, Nexsan COPAN, Centera,

Archivas, Permabit,

DCT

STK, Quantum, ADIC,

IBM

Start with your application vendor

Trust But Verify

Develop processes to periodically access

historical data to test:• Data integrity

• Access time

Manage capacity growth using vendor-

supplied reporting tools

Summary

Archival is not backup and is not just about compliance

Successful strategy requires application-centric approach

Engage with key corporate stakeholders to define requirements and select solutions

Look for automated and interoperable software and hardware modules.

Be Paranoid!