Ilm library techniques with tivoli storage and ibm total storage products sg247030

ibm.com/redbooks

Front cover

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Charlotte BrooksGiacomo Chiapparini

Wim FeyantsPallavi Galgali

Vinicius Franco Jose

Learn about basic ILM concepts

Use TPC for Data to assess ILM readiness

Stages to ILM implementation

http://www.redbooks.ibm.com/


International Technical Support Organization

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

February 2006

SG24-7030-00

© Copyright International Business Machines Corporation 2006. All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP ScheduleContract with IBM Corp.

First Edition (February 2006)

Note: Before using this information and the product it supports, read the information in “Notices” on page xi.

Contents

Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiTrademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiThe team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiBecome a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivComments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xv

Part 1. ILM overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 1. Introduction to ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1 What is ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Why ILM is needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 IT challenges and how ILM can help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 ILM elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Tiered storage management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.2 Long-term data retention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.3 Data lifecycle management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.4 Policy-based archive management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4 Standards and organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4.1 Storage Networking Industry Association (SNIA) . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5 IT Infrastructure Library and value of ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.5.1 What is ITIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.5.2 ITIL management processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.5.3 ITIL and ILM value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Chapter 2. ILM within an On Demand storage environment . . . . . . . . . . . . . . . . . . . . . 212.1 Information On Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.1.1 Infrastructure Simplification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.1.2 Business Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.1.3 Information Lifecycle Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 IBM and ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.3 IBM Information On Demand environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.4 Supporting ILM through On Demand storage environment . . . . . . . . . . . . . . . . . . . . . 28

Chapter 3. Implementing ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1 Logical stages in ILM implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.1 Assessment and planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.1.2 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1.3 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.1.4 Flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 IBM ILM consulting and services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Chapter 4. Product overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.1 Summary of IBM products for ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2 TotalStorage Productivity Center for Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.2 Key aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

© Copyright IBM Corp. 2006. All rights reserved. iii

4.2.3 Product highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.3 SAN Volume Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.3.2 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3.3 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.4 IBM TotalStorage DS family of disk products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4.1 Enterprise disk storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4.2 Mid-range disk storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.5 IBM TotalStorage tape solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.5.1 IBM Virtualization Engine TS7510 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.6 Tivoli Storage Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.6.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.6.3 Tivoli Storage Manager applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6.4 Tivoli Storage Manager APIs and DR550 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.7 DB2 Content Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.7.2 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.7.3 Standards and data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.8 DB2 CommonStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.8.1 DB2 CommonStore for Exchange Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.8.2 DB2 CommonStore for Lotus Domino. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.8.3 DB2 CommonStore for SAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.9 More information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Part 2. Evaluating ILM for your organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Chapter 5. An ILM quick assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.1 Initial steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.2 Getting business and storage information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.3 Defining data collection reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.3.1 Creating groups of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.3.2 Collecting reports from TPC for Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.4 Classifying data and analyzing reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.4.1 Types of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.4.2 Data classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.5 Defining actions with classified data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.5.1 Actions for non-business files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.5.2 Actions for duplicate files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.5.3 Actions for temporary files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.5.4 Actions for stale files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.5.5 Actions to RDBMSs space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.6 ILM - Return on investment (ROI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.6.1 Data classification and storage cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.6.2 Data management and personnel cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.6.3 Long-term retention and non-compliancy penalties cost. . . . . . . . . . . . . . . . . . . . 995.6.4 Backup/archiving solutions cost - Disk or tape . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.7 ILM Services offerings from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Chapter 6. The big picture for an ILM implementation framework . . . . . . . . . . . . . . . 1036.1 The big picture and why you should care about it . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.1.1 Business consulting, assessment, definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056.1.2 Application and server hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066.1.3 Software infrastructure and automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

iv ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

6.1.4 Hardware infrastructure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096.1.5 Management tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.2 What to do now - The many entry points to ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Part 3. Sample solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Chapter 7. ILM initial implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1197.1 Storage management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.1.1 Capacity management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217.1.2 Service level management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.2 Optimization of storage occupation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277.2.1 Reclaimable space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287.2.2 Avoiding over allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

7.3 Tiered storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1497.3.1 What storage devices to use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Chapter 8. Enforcing data placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1538.1 Moving from the initial ILM scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1548.2 Requirements for data placement enforcement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8.2.1 Data classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1568.2.2 Enforcing data placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Chapter 9. Data lifecycle and content management solution . . . . . . . . . . . . . . . . . . . 1659.1 Moving from the previous steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1669.2 Placement in function of moment in lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

9.2.1 Determining the value of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1679.2.2 Placement of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1719.2.3 Movement of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1749.2.4 Using document management systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

9.3 E-mail management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1779.3.1 Reclaim invalid space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1779.3.2 E-mail archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

9.4 IBM System Storage Archive Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1819.4.1 Chronological archive retention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1819.4.2 Event-based retention policy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

Contents v

vi ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figures

1-1 Information Lifecycle Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41-2 Data value changes over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51-3 ILM elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71-4 Traditional non-tiered storage environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81-5 Multi-tiered storage environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91-6 ILM policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131-7 Information value changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131-8 Value of information and archive/retrieve management . . . . . . . . . . . . . . . . . . . . . . 151-9 SNIA vision for ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161-10 ITIL processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182-1 IS, BC, and ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222-2 Business Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232-3 Convergence of technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242-4 Information On Demand storage environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252-5 Information Assets and Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263-1 Data classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313-2 Information classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323-3 Storage tiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333-4 Flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344-1 Storage resource management lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394-2 First screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404-3 Availability report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414-4 Asset Report of a computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424-5 Largest files by computer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434-6 SVC block virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474-7 SVC components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484-8 Tivoli Storage Manager components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504-9 Tivoli Storage Manager for Mail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524-10 Tivoli Storage Manager for Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524-11 Tivoli Storage Manager for Application Servers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534-12 Tivoli Storage Manager for ERP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534-13 Tivoli Storage Manager for Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544-14 Tivoli Storage Manager for Space Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554-15 TSM APIs and DR550 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564-16 Enterprise content management components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574-17 IBM content management portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575-1 Quick Assessment steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665-2 Access File Summary report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705-3 Access Time Summary report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725-4 Disk Capacity Summary report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735-5 Oldest Orphaned Files report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745-6 Storage Access Times report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755-7 Storage Capacity report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765-8 Storage Modification Times report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775-9 Total Freespace report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785-10 User Space Usage report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795-11 Wasted Space report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805-12 Largest Files report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

© Copyright IBM Corp. 2006. All rights reserved. vii

5-13 Duplicate Files report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825-14 File Types Report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835-15 Access Time report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845-16 Modification Time Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855-17 Database Storage by Computer report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865-18 Database Storage by Computer report table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875-19 Total Database Free report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885-20 Segments with Wasted Space report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895-21 Access Time Reporting by report group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935-22 Access Time reporting file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945-23 Modification Time Reporting by report group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955-24 Modification Time reporting file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966-1 ILM implementation framework at service level maturity . . . . . . . . . . . . . . . . . . . . . 1056-2 Business, assessment, and ongoing tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066-3 Server types and agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076-4 Software components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096-5 Storage Hardware Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106-6 Placement of WBEM and CIM technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137-1 Generic process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217-2 Capacity management tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227-3 Space usage over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1237-4 Service level management activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1247-5 Steps to create a service level agreement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1267-6 Overview of space usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287-7 Top 10 file types using the most space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1307-8 Defining non-business data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1327-9 Ratio between temporary space and used space remains constant . . . . . . . . . . . . 1347-10 Ratio between temporary used space and total used space increasing . . . . . . . . . 1357-11 Decrease in ratio between temporary and used space . . . . . . . . . . . . . . . . . . . . . . 1367-12 Creating a profile - Defining statistics to gather . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377-13 Defining the file filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377-14 Defining the scan systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1387-15 Defining the scan profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1387-16 Generating a report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397-17 Temporary space report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397-18 Organizational and project-based file-sharing structure . . . . . . . . . . . . . . . . . . . . . 1407-19 Moving stale data in a two-tier Tivoli Storage Manager HSM solution. . . . . . . . . . . 1427-20 TPC for data access time reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1437-21 HSM Data placement in function of time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457-22 Overview of two-tier HSM implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1467-23 File system unused space reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1477-24 Defining the space allocation trigger level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487-25 Database unused space report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487-26 Handling over-allocated file systems and databases. . . . . . . . . . . . . . . . . . . . . . . . 1497-27 Matching data classes to storage tiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1508-1 Initial static ILM implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1548-2 Adding automated data placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1558-3 Adding file-based location rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1568-4 Complete picture of data types to tiers mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 1618-5 Server tiered volume mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1628-6 Enforcing data placement using TPC for Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1649-1 Adding the lifecycle dimension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1669-2 The changing value of data over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

viii ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

9-3 Business process to data mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1689-4 Cost versus benefit for storage placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1709-5 Process example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1739-6 E-mail propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1789-7 E-mail archiving diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1799-8 Standard IBM System Storage Archive Manager archive retention. . . . . . . . . . . . . 1819-9 Event driven archiving mechanism - Honoring RETVER. . . . . . . . . . . . . . . . . . . . . 1829-10 Event driven archiving mechanism - honouring RETMIN - case 2 . . . . . . . . . . . . . 183

Figures ix

x ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.

© Copyright IBM Corp. 2006. All rights reserved. xi

TrademarksThe following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:

Eserver®Eserver®Redbooks (logo) ™developerWorks®iSeries™xSeries®z/OS®AIX®Domino®

DB2 Universal Database™DB2®Enterprise Storage Server®FlashCopy®Informix®IBM®Lotus Notes®Lotus®Notes®

OS/390®POWER5™Redbooks™System Storage™Tivoli®TotalStorage®VideoCharger™Virtualization Engine™WebSphere®

The following terms are trademarks of other companies:

JDBC, Streamline, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Excel, Microsoft, Outlook, PowerPoint, Visio, Visual Basic, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

xii ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Preface

Every organization has large amounts of data to store, use, and manage. For most, this quantity is increasing. However, over time, the value of this data changes. How can we map data to appropriately priced storage media, so that it can be accessed in a timely manner when needed, retained for as long as required, and disposed of when no longer needed? Information Lifecycle Management (ILM) provides solutions. What is ILM? ILM is the process of managing information from creation, through its useful life, to its eventual destruction. In a manner that aligns storage costs with the changing business value of information. We can think of ILM as an integrated solution of five IT management and infrastructure components working together: Service management (service levels), content management, workflow management (or process management), storage management, and storage infrastructure.

This IBM® Redbook will help you understand what ILM is, why it is of value to you in your organization, and some suggested ways to implement it using IBM products. It focuses particularly on data life cycle management. Look for other Redbooks™ on topics such as archive and retention management.

The team that wrote this redbookThis redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, San Jose Center.

Charlotte Brooks is an IBM Certified IT Specialist and Project Leader for Storage Solutions at the International Technical Support Organization, San Jose Center. She has 15 years of experience with IBM in storage hardware and software support, deployment, and management. She has written many Redbooks, and has developed and taught IBM classes in all areas of storage and storage management. Before joining the ITSO in 2000, she was the Technical Support Manager for Tivoli® Storage Manager in the Asia Pacific Region.

Giacomo Chiapparini is an IBM Certified System Expert for Open Systems Storage Solutions in IBM Global Services Switzerland. He has eight years of practical experience in designing, implementing, and supporting different storage solutions across the country. He is an SNIA certified professional and holds product certifications for Linux®, AIX®, and Windows®. His areas of expertise include storage products, storage networking, and open systems server hardware with corresponding operating systems.

Wim Feyants is an IBM Certified IT Specialist in Belgium. He has 11 years of experience in different IT fields. His areas of expertise include storage infrastructure and storage management solution, and designing and implementing them for clients. He has written extensively on different storage-related matters, including a number of Redbooks.

Pallavi Galgali is a Software Engineer at the IBM India Software Lab in Pune, India. Pallavi has been involved in development and maintenance projects with products such as SAN File System and Advanced Distributed File System. She has co-authored the IBM Redbook The IBM TotalStorage Solutions Handbook, SG24-5250, and an article on developerWorks® titled A comparison of security subsystems on AIX, Linux, and Solaris. She holds a degree in Computer Engineering from Pune Institute of Computer Technology, India. Her areas of expertise include storage networking, file systems, and device drivers.

Vinicius Franco Jose is a Senior IT Specialist at IBM Brazil. He has been in the IT industry for eight years and has extensive experience implementing UNIX® and Storage solutions.

© Copyright IBM Corp. 2006. All rights reserved. xiii

His areas of expertise include several TotalStorage® Disk and Tape solutions, Tivoli Storage products, and TPC. He also has experience in storage networking, SAN Volume Controller, and SAN File System. He holds products certifications including AIX, Tivoli Storage Manager, and TPC for Data. He is currently working in IBM Global Services on client support and services delivery. He is also a member of the ILM IT Solution group in Brazil deploying solutions for ILM projects.

Figure 1 The team: Wim, Charlotte, Pallavi, Giacomo, Vinicius

Thanks to the following people for their contributions to this book:

David Bartlett, Larry Heathcote, BJ Klingenberg, Toby Marek, Scott McPeek, Dave Russell, Evan Salop, Chris Saul, Scott Selvig, Alan Stuart, Sergei VarbanovIBM

Emma Jacobs, Mary Lovelace, Sangam RacherlaInternational Technical Support Organization, San Jose Center

Julie CzubikInternational Technical Support Organization, Poughkeepsie Center

Taya WyssEnterprise Strategy Group

Pillip MillsSNIA Rep for IBM

Become a published authorJoin us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or clients.

xiv ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Your efforts will help increase product acceptance and client satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability.

Find out more about the residency program, browse the residency index, and apply online at:

ibm.com/redbooks/residencies.html

Comments welcomeYour comments are important to us!

We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways:

� Use the online Contact us review redbook form found at:

ibm.com/redbooks

� Send your comments in an email to:

[email protected]

� Mail your comments to:

IBM Corporation, International Technical Support OrganizationDept. QXXE Building 80-E2650 Harry RoadSan Jose, California 95120-6099

Preface xv

http://www.redbooks.ibm.com/residencies.html

http://www.redbooks.ibm.com/residencies.html



http://www.redbooks.ibm.com/contacts.html

xvi ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Part 1 ILM overview

In this part we introduce basic definitions and concepts for ILM, as well as some of the core IBM and Tivoli products in this solution space.

Part 1

© Copyright IBM Corp. 2006. All rights reserved. 1

2 ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Chapter 1. Introduction to ILM

Information is essential to any business. Organizations have the challenge to efficiently manage information throughout its lifecycle, related to its business value. The quantity of information and its value changes over time, and becomes increasingly costly and complex to store and manage.

This chapter discusses the importance of ILM, its benefits, and introduces you to the elements of data lifecycle management.

1


1.1 What is ILMInformation Lifecycle Management (ILM) is a process for managing information through its lifecycle, from conception until disposal, in a manner that optimizes storage and access at the lowest cost.

ILM is not just hardware or software—it includes processes and policies to manage the information. It is designed upon the recognition that different types of information can have different values at different points in their lifecycle. Predicting storage needs and controlling costs can be especially challenging as the business grows.

The overall objectives of managing information with Information Lifecycle Management are to help reduce the total cost of ownership (TCO) and help implement data retention and compliance policies. In order to effectively implement ILM, owners of the data need to determine how information is created, how it ages, how it is modified, and if/when it can safely be deleted. ILM segments data according to value, which can help create an economical balance and sustainable strategy to align storage costs with businesses objectives and information value. The adoption of ILM technologies and processes, as shown in Figure 1-1, turns that strategy into a business reality.

Figure 1-1 Information Lifecycle Management

1.2 Why ILM is neededIn order to run your business efficiently, you need fast access to your stored data. But in today’s business environment, you face increasing challenges: The explosion of the sheer volume of digital information, the increasing cost of storage management, tight regulatory requirements for data retention, and manual business and IT processes that are increasingly complex and error prone.

Although the total value of stored information has increased overall, historically, not all data is created equal, and the value of that data to business operations fluctuates over time. This is shown in Figure 1-2 on page 5, and is commonly referred to as the data lifecycle. The existence of the data lifecycle means that all data cannot be treated the same.


Figure 1-2 Data value changes over time

Figure 1-2 shows typical data values of different types of data, mapped over time. Most frequently, the value of data decreases over time, albeit at different rates of decline. However, infrequently accessed or inactive data can become suddenly valuable again as events occur, or as new business initiatives or projects are taken on. Historically, the need to retain information has resulted in a “buy more storage” mentality. However, this approach has only served to increase overall storage management costs and complexity, and has increased the demand for hard-to-find qualified personnel.

Executives today are tasked with reducing overall spending while supporting an ever-increasing number of service and application demands. While support and management tasks increase, IT departments are being asked to justify their position by demonstrating business value to the enterprise. IT must also develop and enhance the infrastructure in order to support business initiatives while facing some or all of these data storage issues:

� Costs associated with e-mail management can reduce employee productivity in many companies.

� Backup and recovery windows continue to expand as data volumes grow unmanaged.

� Inactive data consumes valuable, high-performance disk storage space.

� Duplicate data copies are consuming additional storage space.

� As data continues to grow and management costs increase, budgets continue to be under pressure.

1.2.1 IT challenges and how ILM can helpThere are many challenges facing business today that make organizations think about managing their information more efficiently and effectively. Among these are some particular issues that might motivate you to develop an ILM strategy and solution:

� Information and data growing faster than the storage budget.

� What data can I delete and when? What to keep and for how long?

� Disk dedicated to specific applications - inhibits sharing.

� Duplicated copies of files and other data. Where are they and how much space do they use?

� No mapping of the value of the data to the value of the hardware on which it is stored.

7Days

14Days

21Days

28Days

3Months

6Months

9Months

1Year

5Years

10Years

0

20

40

60

80

100

120

Dat

a Va

lue

Data BaseDevelopment CodeEmailProductivity FilesMPEG

Source of graph: Enterprise Strategy Group Time

Chapter 1. Introduction to ILM 5

� Longer time required to backup data, but the window keeps shrinking.

� Storage performance does not meet requirements.

� Low utilization of existing assets; for example, in open environments, storage utilization rates of around 30 percent are quite typical.

� Manual processes causing potential business risk due to errors.

� Regulatory requirements dictate long-term retention for certain data.

� Inability to achieve backup/recovery/accessibility objectives for critical data.

� Inability to grow the support staff to keep up with the demand for storage management in an increasingly complex environment.

� Multiple backup and restore approaches and processes.

� Storage management requirements not well defined.

In response to these, it is necessary to define specific objectives to support and improve information management:

� Control demand for storage and create policies for allocation.

� Reduce hardware, software. and storage personnel costs.

� Improve personnel efficiency, optimizing system and productivity.

� Define and enforce policies to manage the lifecycle of data.

� Define and implement the appropriate storage strategy to address current and future business requirements.

In the next section, we describe the major ILM solution components and how they can help you to overcome these challenges, and propose an ILM assessment for planning and design.

1.3 ILM elementsTo manage the data lifecycle and make your business ready for on demand, there are four main elements that can address your business to an ILM structured environment, as shown in Figure 1-3 on page 7. They are:

� Tiered storage management � Long-term data retention� Data lifecycle management� Policy-based archive management


Figure 1-3 ILM elements

In the next four sections we describe each of these elements in detail:

� 1.3.1, “Tiered storage management” on page 7� 1.3.2, “Long-term data retention” on page 9� 1.3.3, “Data lifecycle management” on page 12� 1.3.4, “Policy-based archive management” on page 14

1.3.1 Tiered storage managementMost organizations today seek a storage solution that can help them manage data more efficiently. They want to reduce the costs of storing large and growing amounts of data and files and maintain business continuity. Through tiered storage, you can reduce overall disk-storage costs, by providing benefits like:

� Reducing overall disk-storage costs by allocating the most recent and most critical business data to higher performance disk storage, while moving older and less critical business data to lower cost disk storage.

� Speeding business processes by providing high-performance access to most recent and most frequently accessed data.

� Reducing administrative tasks and human errors. Older data can be moved to lower cost disk storage automatically and transparently.

Typical storage environmentStorage environments typically have multiple tiers of data value, such as application data that is needed daily and archive data that is accessed infrequently. But typical storage configurations offer only a single tier of storage, as in Figure 1-4 on page 8, which limits the ability to optimize cost and performance.

“The process of managing information, from creation to disposal,

in a manner that aligns costs with the changing value of information”

Tiered StorageIncorporates tiered storage and advanced SAN technologies.

Storage ranging from enterprise disk, midrange disk and tape to optimize costs and availability

Long-Term Data RetentionAddress needs for risk and

compliance objectives;leverages Content Management andRecords Management technologies

Data Life Cycle ManagementExploit Hierarchical Storage Management for

any data that needs to be protected and retained for a period of time and then disposed.Establish policies and automation to move data

among different storage systems

Policy-based Archive ManagementE-mail, database and application archive.

Focused offerings driven by efficiency of major applications


Figure 1-4 Traditional non-tiered storage environment

Multi-tiered storage environmentA tiered storage environment is the infrastructure needed to align storage cost with the changing value of information. The tiers will be related to data value. The most critical data is allocated to higher performance disk storage, while less critical business data is allocated to lower cost disk storage.

Each storage tier will provide different performance metrics and disaster recovery capabilities. Creating classes and storage device groups is an important step to configure a tiered storage ILM environment. We will provide details of this in later chapters of this book. Figure 1-5 on page 9 shows a multi-tiered storage environment.


Figure 1-5 Multi-tiered storage environment

An IBM ILM solution in a tiered storage environment is designed to:

� Reduce the total cost of ownership (TCO) of managing information. It can help optimize data costs and management, freeing expensive disk storage for the most valuable information.

� Segment data according to value. This can help create an economical balance and sustainable strategy to align storage costs with business objectives and information value.

� Help make decisions about moving, retaining, and deleting data, because ILM solutions are closely tied to applications.

� Manage information and determine how it should be managed based on content, rather than migrating data based on technical specifications. This approach can help result in more responsive management, and offers you the ability to retain or delete information in accordance with business rules.

� Provide the framework for a comprehensive enterprise content management strategy.

Key products of IBM for tiered storage solutions and storage virtualization solutions are:

� IBM TotalStorage SAN Volume Controller (SVC)� IBM TotalStorage DS family of disk storage - DS4x000, DS6000, and DS8000� IBM TotalStorage tape drives, tape libraries, and virtual tape solutions

For details of these, see Chapter 4, “Product overview” on page 37.

1.3.2 Long-term data retentionThere is a rapidly growing class of data that is best described by the way in which it is managed rather than the arrangement of its bits. The most important attribute of this kind of data is its retention period, hence it is called retention managed data, and it is typically kept in an archive or a repository. In the past it has been variously known as archive data, fixed


content data, reference data, unstructured data, and other terms implying its read-only nature. It is often measured in terabytes and is kept for long periods of time, sometimes forever.

In addition to the sheer growth of data, laws and regulations governing the storage and secure retention of business and client information are increasingly becoming part of the business landscape, making data retention a major challenge to any institution. An example of these is the Sarbanes-Oxley Act in the US, of 2002.

Businesses must comply with these laws and regulations. Regulated information can include e-mail, instant messages, business transactions, accounting records, contracts, or insurance claims processing, all of which can have different retention periods, for example, for 2 years, for 7 years, or retained forever. Moreover, some data must be kept just long enough and no longer. Indeed, content is an asset when it needs to be kept; however, data kept past its mandated retention period could also become a liability. Furthermore, the retention period can change due to factors such as litigation. All these factors mandate tight coordination and the need for ILM.

Not only are there numerous state and governmental regulations that must be met for data storage, but there are also industry-specific and company-specific ones. And of course these regulations are constantly being updated and amended. Organizations need to develop a strategy to ensure that the correct information is kept for the correct period of time, and is readily accessible when it needs to be retrieved at the request of regulators or auditors.

It is easy to envisage the exponential growth in data storage that will result from these regulations and the accompanying requirement for a means of managing this data. Overall, the management and control of retention managed data is a significant challenge for the IT industry when taking into account factors such as cost, latency, bandwidth, integration, security, and privacy.

Regulations examplesIt is not within the scope of this book to enumerate and explain the regulations in existence today. For illustration purposes only, we list some of the major regulations and accords in Table 1-1, summarizing their intent and applicability.

Table 1-1 Some regulations and accords affecting companies

Regulation Intention Applicability

SEC/NASD Prevent securities fraud. All financial institutions and companies regulated by the SEC

Sarbanes Oxley Act Ensure accountability for public firms.

All public companies trading on a U.S. Exchange

HIPAA Privacy and accountability for health care providers and insurers.

Health care providers and insurers, both human and veterinarian

Basel II aka The New Accord Promote greater consistency in the way banks and banking regulators approach risk management across national borders.

Financial industry

21 CFR 11 Approval accountability. FDA regulation of pharmaceutical and biotechnology companies


For example, in Table 1-2, we list some requirements found in SEC 17a-4 to which financial institutions and broker-dealers must comply. Information produced by these institutions, regarding solicitation and execution of trades and so on, is referred to as compliance data, a subset of retention-managed data.

Table 1-2 Some SEC/NASD requirements

IBM ILM data retention strategyRegulations and other business imperatives, as we just briefly discussed, stress the need for an Information Lifecycle Management process and tools to be in place. The unique experience of IBM with the broad range of ILM technologies, and its broad portfolio of offerings and solutions, can help businesses address this particular need and provide them with the best solutions to manage their information throughout its lifecycle. IBM provides a comprehensive and open set of solutions to help.

IBM has products that provide content management, data retention management, and sophisticated storage management, along with the storage systems to house the data. To specifically help companies with their risk and compliance efforts, the IBM Risk and Compliance framework is another tool designed to illustrate the infrastructure capabilities needed to help address the myriad of compliance requirements. Using the framework, organizations can standardize the use of common technologies to design and deploy a compliance architecture that may help them deal more effectively with compliance initiatives.

For more details about the IBM Risk and Compliance framework, visit:

http://www-306.ibm.com/software/info/openenvironment/rcf/

Key products of IBM for data retention and compliance solutions are:

� IBM Tivoli Storage Manager, including IBM System Storage™ Archive Manager

� IBM DB2® Content Manager Family, which includes DB2 Content Manager, Content Manager OnDemand, CommonStore for Exchange Server, CommonStore for Lotus® Domino®, and CommonStore for SAP

� IBM DB2 Records Manager

� IBM TotalStorage DS4000 with S-ATA disks

� IBM System Storage DR550

� IBM TotalStorage Tape (including WORM) products

Requirement Met by

Capture all correspondence (unmodified) [17a-4(f)(3)(v)].

Capture incoming and outgoing e-mail beforereaching users.

Store in non-rewritable, non-erasable format [17a-4(f)(2)(ii)(A)].

Write Once Read Many (WORM) storage of alle-mail, all documents.

Verify automatically recording integrity and accuracy [17a-4(f)(2)(ii)(B)].

Validated storage to magnetic, WORM.

Duplicate data and index storage [17a-4(f)(3)(iii)].

Mirrored or duplicate storage servers (copy pools).

Enforce retention periods on all stored data andindexes [17a-4(f)(3)(iv)(c)].

Structured records management.

Search/retrieve all stored data and indexes [17a-4(f)(2)(ii)(D)].

High-performance search retrieval.


http://www-306.ibm.com/software/info/openenvironment/rcf/

For details on these products, see Chapter 4, “Product overview” on page 37.

1.3.3 Data lifecycle managementAt its core, the process of ILM moves data up and down a path of tiered storage resources, including high-performance, high-capacity disk arrays, lower-cost disk arrays such as serial ATA (SATA), tape libraries, and permanent archival media where appropriate. Yet ILM involves more than just data movement; it encompasses scheduled deletion and regulatory compliance as well. Because decisions about moving, retaining, and deleting data are closely tied to application use of data, ILM solutions are usually closely tied to applications.

ILM has the potential to provide the framework for a comprehensive information-management strategy, and helps ensure that information is stored on the most cost-effective media. This helps enable administrators to make use of tiered and virtual storage, as well as process automation. By migrating unused data off of more costly, high-performance disks, ILM is designed to help:

� Reduce costs to manage and retain data.� Improve application performance. � Reduce backup windows and ease system upgrades. � Streamline™ data management. � Allow the enterprise to respond to demand—in real-time. � Support a sustainable storage management strategy. � Scale as the business grows.

ILM is designed to recognize that different types of information can have different value at different points in their lifecycle. As shown in Figure 1-6 on page 13, data can be allocated to a specific storage level aligned to its cost, with policies defining when and where data will be moved.

Important: The IBM offerings are intended to help clients address the numerous and complex issues relating to data retention in regulated and non-regulated business environments. Nevertheless, each client’s situation is unique, and laws, regulations, and business considerations impacting data retention policies and practices are constantly evolving. Clients remain responsible for ensuring that their information technology systems and data retention practices comply with applicable laws and regulations, and IBM encourages clients to seek appropriate legal counsel to ensure their compliance with those requirements. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in compliance with any law.


Figure 1-6 ILM policies

But, sometimes, the value of a piece of information may change and data that was previously inactive and was migrated to a lower-cost storage now could be needed and should be processed in a high-performance disk. A data lifecycle management policy can be defined to move the information back to enterprise storage, making the storage cost aligned to data value, as illustrated in Figure 1-7.

Figure 1-7 Information value changes

Key products of IBM for lifecycle management are:

� IBM TotalStorage Productivity Center � IBM TotalStorage SAN Volume Controller (SVC)� IBM Tivoli Storage Manager including IBM System Storage Archive Manager� IBM Tivoli Storage Manager for Space Management


For details of these products, see Chapter 4, “Product overview” on page 37.

1.3.4 Policy-based archive managementAs businesses of all sizes migrate to e-business solutions and a new way of doing business, they already have mountains of data and content that have been captured, stored, and distributed across the enterprise. This wealth of information provides a unique opportunity. By incorporating these assets into e-business solutions, and at the same time delivering newly generated information media to their employees and clients, a business can reduce costs and information redundancy and leverage the potential profit-making aspects of their information assets.

Growth of information in corporate databases such as Enterprise Resource Planning (ERP) systems and e-mail systems makes organizations think about moving unused data off the high-cost disks. They need to:

� Identify database data that is no longer being regularly accessed and move it to an archive where it remains available.

� Define and manage what to archive, when to archive, and how to archive from the mail system or database system to the back-end archive management system.

Database archive solutions can help improve performance for online databases, reduce backup times, and improve application upgrade times.

E-mail archiving solutions are designed to reduce the size of corporate e-mail systems by moving e-mail attachments and/or messages to an archive from which they can easily be recovered if needed. This action helps reduce the need for end-user management of e-mail, improves the performance of e-mail systems, and supports the retention and deletion of e-mail.

The way to do this is to migrate and store all information assets into an e-business enabled content manager. ERP databases and e-mail solutions generate large volumes of information and data objects that can be stored in content management archives. An archive solution allows you to free system resources, while maintaining access to the stored objects for later reference. Allowing it to manage and migrate data objects gives a solution the ability to have ready access to newly created information that carries a higher value, while at the same time still being able to retrieve data that has been archived on less expensive media, as shown in Figure 1-8 on page 15.


Figure 1-8 Value of information and archive/retrieve management

Key products of IBM for archive management are:

� IBM Tivoli Storage Manager including IBM System Storage Archive Manager� IBM DB2 Content Manager family of products� IBM DB2 CommonStore family of products

For details o these products, see Chapter 4, “Product overview” on page 37.

1.4 Standards and organizationsThe success and adoption of any new technology, and any improvement to existing technology, is greatly influenced by standards. Standards are the basis for the interoperability of hardware and software from different, and often rival, vendors. Although standards bodies and organizations such as the Internet Engineering Task Force (IETF), American National Standards Institute (ANSI), and International Organization for Standardization (ISO) publish these formal standards, other organizations and industry associations, such as the Storage Networking Industry Association (SNIA), play a significant role in defining the standards and market development and direction.

1.4.1 Storage Networking Industry Association (SNIA)The Storage Networking Industry Association is an international computer system industry forum of developers, integrators, and IT professionals who evolve and promote storage networking technology and solutions. SNIA was formed to ensure that storage networks become efficient, complete, and trusted solutions across the IT community. IBM is one of the founding members of this organization. SNIA is uniquely committed to networking solutions into a broader market. SNIA is using its Storage Management Initiative (SMI) and its Storage Management Initiative Specification (SMI-S) to create and promote adoption of a highly


functional interoperable management interface for multi-vendor storage networking products. SMI-S makes multi-vendor storage networks simpler to implement and easier to manage. IBM has led the industry in not only supporting the SMI-S initiative, but also using it across its hardware and software product lines. The specification covers fundamental operations of communications between management console clients and devices, auto-discovery, access, security, the ability to provision volumes and disk resources, LUN mapping and masking, and other management operations.

Data Management ForumSNIA has formed the Data Management Forum (DMF) to focus on defining, implementing, qualifying, and teaching improved methods for the protection, retention, and lifecycle management of data.

Vision for ILM by SNIA and DMFThe Data Management Forum defines ILM as a new management practice for the datacenter. ILM is not a specific product, nor is it just about storage and data movement to low-cost disk. It is a standards-based approach to automating datacenter operations by using business requirements, business processes, and the value of information to set policies and service level objectives for how the supporting storage, compute, and network infrastructure operates.

The key question that flows from this vision of ILM is How do we get there?, since these capabilities do not fully exist today. This is the work of SNIA and the Data Management Forum: To unify the industry towards a common goal, to develop the relevant standards, to facilitate interoperability, and to conduct market education around ILM. Figure 1-9 illustrates the SNIA vision for ILM.

Figure 1-9 SNIA vision for ILM

For additional information about the various activities of SNIA and DMF see its Web site at:

http://www.snia.org

1.5 IT Infrastructure Library and value of ILMThe intent of this section is introduce you to the IT Infrastructure Library (ITIL) and the value of ILM within the ITIL methodology. We begin by defining ITIL and its Service Support processes.


http://www.snia.org

1.5.1 What is ITILITIL is a process-based methodology used by IT departments to verify that they can deliver IT services to end users in a controlled and disciplined way. It incorporates a set of best practices that are applicable to all IT organizations, no matter what size or what technology is used. ITIL is used to create and deliver service management processes. These tasks are made easier by the use of service and system management tools.

Over recent decades, multiple IT process models have been developed. ITIL is the only one that is not proprietary.

� Late 1970s: Information Systems Management Architecture (ISMA) (IBM)� Late 1980s: IT Infrastructure Library V1 (ITIL) (CCTA - now OGC)� 1995: IT Process Model (ITPM) (IBM)� 2000: Enterprise Operational Process Framework (IBM)� 2000: IT Service Management Reference Model (ITSM) (HP)� 2000–2001: Microsoft® Operations Framework (MOF) (Microsoft)� 2001–2002: IT Infrastructure Library V2 (ITIL) (OGC)

ITIL has a library of books describing best practices for IT services management that describe goals, activities, and inputs and outputs of processes. It is a set of best practices. ITIL has a worldwide approach to IT management and its methodology sets that specific procedures can vary from organization to organization. ITIL is not tied to any particular vendor, and IBM has been involved with ITIL since its inception in 1988.

1.5.2 ITIL management processesThe ITIL approach to creating and managing service management processes is widely recognized around the world and the adoption of its principles is clearly growing, as evidenced by new groups appearing in more countries every year.

The service management disciplines are grouped into the two areas of Service Support and Service Delivery. There are now eleven basic processes used in the areas of Service Support and Service Delivery, as shown in Figure 1-10 on page 18. Since it can take a long time to implement these disciplines it is not uncommon to find only some of the processes in use initially.

Note: ITIL is a registered trademark of the OGC. OGC is the UK Government's Office of Government Commerce. CCTA is the Central Computer and Telecommunications Agency.


Figure 1-10 ITIL processes

Now we briefly explain each component of Service Support and Service Delivery.

Service SupportThe processes in the Service Support group are all concerned with providing stability and flexibility for the provisioning of IT Services.

Configuration Management Configuration Management is responsible for registering all components in the IT service (including clients, contracts, SLAs, hardware and software components, and more) and maintain a repository of configurable attributes and relationships between the components.

Service DeskThe Service Desk acts as the main point-of-contact for the users of the service.

Incident ManagementIncident Management registers incidents, allocates severity, and coordinates the efforts of the support teams to ensure timely and correct resolution of problems. Escalation times are noted in the SLA and are as such agreed between the client and the IT department. Incident Management also provides statistics to Service Level Management to demonstrate the service levels achieved.

Problem Management Problem Management implements and uses procedures to perform problem diagnosis and identify solutions that correct problems. It registers solutions in the configuration repository, and agrees on escalation times internally with Service Level Management during the SLA negotiation. It provides problem resolution statistics to support Service Level Management.

Service Support

Service Delivery

Core ITIL Service Management Processes

Service Level Management

Capacity Management

IT BusinessContinuity

AvailabilityManagement

Financial Management

ReleaseManagement

Configuration Management

Service Desk

ChangeManagement

Incident & Problem

Management

Provide stability and flexibility for

IT service provision

Provide quality, cost-effective

IT services


Change Management Change Management ensures that the impact of a change to any component of a service is well known, and the implications regarding service level achievements are minimized. This includes changes to the SLA documents and the Service Catalog, as well as organizational changes and changes to hardware and software components.

Release Management Release Management manages the master software repository and deploys software components of services. It deploys changes at the request of Change Management, and provides management reports on the deployment.

Service DeliveryThe processes in the Service Delivery group are all concerned with providing quality, cost-effective IT services.

Service Level Management The purpose of Service Level Management is to manage client expectations and negotiate Service Delivery Agreements. This involves finding out the client requirements and determining how these can best be met within the agreed budget. Service Level Management works together will all IT disciplines and departments to plan and ensure delivery of services. This involves setting measurable performance targets, monitoring performance, and taking action when targets are not met.

Financial Management for IT Services Financial Management registers and maintains cost accounts related to the usage of IT services. It delivers cost statistics and reports to Service Level Management to assist in obtaining the right balance between service cost and delivery. It assists in pricing the services in the Service Catalogue and Service Level Agreements.

IT Service Continuity Management Service Continuity Management plans and ensures the continuing delivery—or minimum outage—of the service by reducing the impact of disasters, emergencies, and major incidents. This work is done in close collaboration with the company’s business continuity management, which is responsible for protecting all aspects of the company’s business—including IT.

Capacity Management Capacity Management is responsible for planning and ensuring that adequate capacity with the expected performance characteristics is available to support the Service Delivery. It delivers capacity usage, performance, and workload management statistics, as well as trend analysis to Service Level Management.

Availability Management Availability Management is responsible for planning and ensuring the overall availability of the services. It provides management information in the form of availability statistics—including security violations—to Service Level Management. This discipline may also include negotiating underpinning contracts with external suppliers, and a definition of maintenance windows and recovery times.


1.5.3 ITIL and ILM valueILM is a service-based solution with policies and processes. The ITIL methodology has the processes needed for delivery and support storage services to manage the lifecycle of information.

The ILM components tiered-storage, archive management, long-term retention, and data lifecycle management, aligned to ITIL processes, are a powerful solution for IT organizations to manage their data. By implementing ILM within the ITIL methodology, they will be able to achieve its objectives—enabling the management of data lifecycle, and providing quality, stability, flexibility, and cost-effective IT services.


Chapter 2. ILM within an On Demand storage environment

This chapter explains how ILM fits in to the strategy for IBM On Demand Business. It briefs on the pillars of Information On Demand, that is:

� Infrastructure Simplification� Business Continuity� Information Lifecycle Management

It talks about the IBM On Demand storage environment, which helps organizations achieve Information On Demand and discusses ILM components of this environment.

2


2.1 Information On DemandIn today’s ever more competitive and growing business environment, information is an increasingly valuable, but costly organizational asset.

The volume of information is growing very rapidly in most organizations. And with this, the need to protect and manage information also continues to increase. Organizations are seeking to reduce costs, improve efficiency, and increase effectiveness by aligning IT investments according to information value and business needs. This is a step towards Information On Demand. With Information On Demand, business can respond with flexibility and speed to client requirements and market opportunity. Getting there involves three aspects:

� Infrastructure Simplification (IS): Simplification of the underlying IT infrastructure and its management to lower the cost and complexity

� Business Continuity (BC): Assuring security and durability of information

� Information Lifecycle Management (ILM): Efficiently managing information over its lifecycle

Figure 2-1 IS, BC, and ILM

2.1.1 Infrastructure SimplificationInfrastructure simplification is a process by which organizations contain expenses, enable business growth, and reduce operational risks by optimizing IT resources. Simplified infrastructures hold the promise of improved system optimization and Total Cost of Ownership (TCO), higher personnel productivity, and greater application availability through infrastructure resiliency. IBM products are designed to help clients obtain these benefits through consolidation, virtualization, and automated management. Once simplified, the infrastructure can be better managed for lower cost and with fewer errors.

2.1.2 Business ContinuityThe business climate in today's on demand era is highly competitive. Clients, employees, suppliers, and business partners expect to be able to tap into your information at any hour of the day from any corner of the globe. If you have continuous business operations, then

Note: For more information about Infrastructure Simplification, see:

http://www-1.ibm.com/servers/storage/solutions/is/


http://www-1.ibm.com/servers/storage/solutions/is/

people can get what they need from your business—helping bolster your success and competitive advantage. Thus downtime is unacceptable today. Businesses must also be increasingly sensitive to issues of client privacy and data security, so that vital information assets are not compromised. To achieve all this, you need a comprehensive Business Continuity plan for your business.

As shown in Figure 2-2, Business Continuity can be achieved with:

� High Availability� Continuous Operations� Disaster Recovery

High Availability is achieved by means of fault tolerant, failure resistant infrastructure supporting continuous application processing.

Continuous Operations imply non-disruptive backups and system maintenance coupled with continuous availability of applications.

Disaster Recovery means protection against unplanned outages such as natural disasters through reliable and predictable recovery methods.

Figure 2-2 Business Continuity

2.1.3 Information Lifecycle ManagementIn Chapter 1, “Introduction to ILM” on page 3, we defined ILM as a process for managing information through its lifecycle, from conception until disposal, in a manner that optimizes storage and access at the lowest cost. The most efficient ILM strategy for a business manages information according to its value. For small and medium sized enterprises, predicting storage needs and controlling costs can be especially challenging as the business grows.

IBM’s unique experience with the broad range of ILM technologies and its broad portfolio of offerings and solutions, including offerings in Tivoli Storage Management software,

Note: For more information about Business Continuity, see:

http://www-1.ibm.com/servers/storage/solutions/business_continuity/

Chapter 2. ILM within an On Demand storage environment 23

http://www-1.ibm.com/servers/storage/solutions/business_continuity/

TotalStorage hardware, TotalStorage Open Software, and DB2 Information Management software, can help provide businesses with the best solutions to manage their information throughout its lifecycle.

2.2 IBM and ILMAt IBM, ILM is being addressed by the convergence of several technologies, as shown in Figure 2-3.

Figure 2-3 Convergence of technologies

Tivoli software provides automated storage management functions via TotalStorage Productivity Center and Tivoli Storage Manager. They help effectively manage the growth of information. IBM TotalStorage (and IBM System Storage) provides hardware offerings like DR550, DS8000, DS6000, and DS4000, as well as tape solutions to enable ILM implementations. IBM TotalStorage also provides offerings in the area of virtualization like SAN Volume Controller (SVC), which help clients use their storage environments most optimally. IBM also offers DB2’s integrated content management software solution to manage and archive unstructured data.

IBM has a very long history in the area of Information Lifecycle Management with a variety of products. IBM has marketed tape drives and libraries since 1952. IBM has been selling various disk storage systems including DS4000 and Enterprise Storage Server® (ESS) since 1957. The world’s first Hierarchical Storage Manager was developed by IBM in 1974. Tivoli

Note: For more information about Information Lifecycle management, refer to:

http://www-1.ibm.com/servers/storage/solutions/ilm/


http://www-1.ibm.com/servers/storage/solutions/ilm/

Storage Manager, which provides backup, archive, and lifecycle management for a wide range of operating platforms, has been available for more than 10 years. In the area of content management, IBM’s DB2 Content Manager has been available since 1988.

IBM provides extensive planning and solutions services through its IBM Global Services organizations to assist businesses in developing their ILM strategies, providing assessments, helping businesses meet the challenges of compliance, etc.

Thus IBM is a complete one-stop storage solution with its wide range of products combined with services, education, and financing.

2.3 IBM Information On Demand environmentTo achieve Information On Demand via Infrastructure Simplification, Business Continuity, and Information Lifecycle Management as explained above, IBM has developed the Information On Demand environment. This is shown in Figure 2-4. In an ILM perspective, we are focussed mainly on the bottom element, Information Assets and Systems; however, the diagram helps to position this in the wider IT and business environment.

Figure 2-4 Information On Demand storage environment

Figure 2-5 on page 26 shows the details of the Information Assets and Systems box (Figure 2-5 on page 26).


Figure 2-5 Information Assets and Systems

Here we talk about the building blocks of this environment, and then in 2.4, “Supporting ILM through On Demand storage environment” on page 28, we discuss the components that belong specifically to Information Lifecycle Management.

SystemsThe Systems layer is the hardware infrastructure for the information assets. It includes the servers, networking, and storage systems. In the storage arena, IBM provides a complete range of hardware, providing flexibility in the choice of service quality and cost structure. The products support a common, industry standard management interface (SMI-S). For more information about this, see the Web site:

http://www.ibm.com/servers/storage/

Resource virtualizationResource virtualization products are designed to improve the flexibility and utilization of the hardware. Resource virtualization includes virtualization of both the servers and the storage. For server virtualization, IBM provides Virtual Machines, Hypervisor, Virtual Ethernet, and Virtual I/O. Storage virtualization includes tape virtualization (Virtual Tape Server and the TS7000 series), disk virtualization (SAN Volume Controller), and array partitioning (for example, LPARs on the IBM TotalStorage DS8000). Disk virtualization works by pooling the storage volumes, files, and file systems into a single logical repository of capacity for centralized management. This repository can include storage capacity from multiple vendors and platforms in heterogeneous environments. Virtualization products also comply with SMI-S. For more information about this, see the Web site:

http://www.ibm.com/servers/storage/software/virtualization/index.html

Infrastructure managementinfrastructure management is used to provide a single point of management and automation and is also considered from both the server and storage perspective. For servers, IBM Director and Enterprise Workload Manager provide the control point and workload management function, respectively. For storage, infrastructure management is designed to make resource sharing possible across the enterprise, including heterogeneous networks. It interacts with the bottom layers in the environment like virtualization and hardware using common and open interfaces. It helps empower administrators by providing an integrated view of the entire storage environment, including software and hardware. Storage infrastructure management provides insight into the historic, operational, and predictive analytics of the storage environment that, in turn, can help administrators improve storage capacity and network utilization, and help avoid business outages. It also supports


http://www-03.ibm.com/servers/storage/

http://www-03.ibm.com/servers/storage/software/virtualization/index.html

policy-based automation, such as capacity provisioning, performance optimization, and data management, helping to provide outstanding business agility. For information about products related to storage infrastructure management, see the Web site:

http://www.ibm.com/servers/storage/software/center/index.html

Retention and Lifecycle ManagementThis is what we cover in detail in this book. It includes archive management for files, e-mail, databases, and applications, as well as HSM.

Archive ManagementArchive management gives complete solutions designed to help enterprises archive, retain, and manage data to help satisfy regulatory, legal, and other business requirements. Archive management products are interoperable with many content management products available in the marketplace, including the IBM DB2 Content Management family. For more information about product offerings in this area, see the Web site:

http://www.ibm.com/software/tivoli/products/storage-mgr-data-reten/

HSMIBM Open Software Family Hierarchical Storage Management (HSM) capabilities provide a way to capture low-activity or inactive data and feed it into a hierarchy of lower cost or tiered storage. This helps control data storage growth and costs. Automated, policy-based capabilities determine where data should be stored, based on factors such as its criticality to the business, how accessible and available it should be, and the cost structures of available devices. Interoperability with IBM Content Management and Records Management products allows enterprise data to be moved from one medium to another with efficiency while helping avoid disruptions in service. For more information about product offerings in this area, see the Web site:

http://www.ibm.com/software/tivoli/products/storage-mgr-space/

Business ContinuityBusiness Continuity is the process of maintaining availability of IT services, including timely recovery in the event of failure. Similarly to Retention and Lifecycle Management, and Infrastructure Management, Business Continuity is also considered from a server and storage perspective. Servers use various clustering techniques to maintain availability. From a storage perspective, Advanced Copy Services (or Replication), including FlashCopy®, Metro Mirror, and Global Mirror, also provide service availability. IBM also offers a range of product for managing recovery.

Advanced Copy ServicesAdvanced copy and mirroring functions are designed to help reduce application downtime and provide realtime remote mirroring for disaster recovery. For more information about this, see the Web site:

http://www.ibm.com/servers/storage/disk/enterprise/advanced_copy.html

Recovery ManagementRecovery management solutions are designed to quickly and reliably recover enterprise data when needed, utilizing centralized Web-based management, intelligent backup and archiving (with minimal or no impact on application availability), and automated policy-based data migration copy services. For more information about product offerings in this area, see the Web site:

http://www.ibm.com/software/tivoli/products/storage-mgr/


http://www-306.ibm.com/software/tivoli/products/storage-mgr-space/

http://www-03.ibm.com/servers/storage/software/center/index.html

http://www-306.ibm.com/software/tivoli/products/storage-mgr-data-reten/

http://www-306.ibm.com/software/tivoli/products/storage-mgr/

http://www.ibm.com/servers/storage/disk/enterprise/advanced_copy.html

2.4 Supporting ILM through On Demand storage environmentAt the heart of our delivery of Information Lifecycle Management solutions is the Information On Demand environment, as described in the previous section. In support of Information Lifecycle management, the Information On Demand storage environment delivers:

� A complete hardware infrastructure offering different media types with different qualities of service and cost structures

� Storage Infrastructure Management to help IT managers categorize their data

� Virtualization software to pool the different cost/quality-of-service storage hardware and provide policies that automatically place the different categories of data on the right cost of storage

� Hierarchical Storage Management software to control storage growth

� Archive Management software to help manage the cost of storing bookkeeping and compliance data for long periods of time

� Content Management software for integrating and delivering unstructured business information


Chapter 3. Implementing ILM

This chapter gives an overview of the logical stages involved in implementing ILM in any organization. We also mention some IBM products that prove helpful during these stages. For detailed information about these products, please refer to Chapter 4, “Product overview” on page 37. A more detailed consideration of this process follows in the next chapters.

Finally, it gives information about IBM’s ILM consulting and services for ILM, including the IBM four-step process for an ILM setup.

3


3.1 Logical stages in ILM implementationIn simple terms, an ILM implementation mainly involves identifying the right data, putting it on the right media at the right time until its ultimate disposal.

The whole implementation process can be subdivided into three main stages, as follows:

� Assessment and planning� Execution� Monitoring

This section describes these three stages in detail.

3.1.1 Assessment and planningAssessment and planning is the first and the most important stage in the ILM implementation. This is a sequential process that involves different phases like gatherings service levels, classifying data, finding information classes, designing storage tiers, and deciding ILM policies. This section explains these phases in more detail.

Gathering service levelsThis phase actually lets you understand what are the requirements or objectives of your organization’s data. Service levels can be defined in terms of availability, performance, recoverability, accessibility, security, support, and billing. The service levels may be expressed using language like:

� I need RAID 5 level of availability for all of my business data.

� I need 500 MB per second of performance for my media data.

� I have an Recovery Time Objective (RTO) of 25 minutes for or all SAP data.

� Data that is less than 30 days old needs to be on RAID 5, while all other data can be on RAID 1.

Above are just some examples of service level requirements. These can vary from organization to organization, depending upon the types of data and its value to the business. Thus, in this step we gather all the service levels required from different departments of the organization for their various types of data.

Data classificationData classification is a very important phase in ILM strategy because a business does not want to spend money protecting and storing data that is not needed or used, or is not business critical.

Data classification can be defined as sorting of like data, based on consumer wants and needs, into data classes that provide the ability to simplify and comprehend the business process requirements.

Once we study the data from different angles or dimensions, such as the size of the files, access patterns, access times, types of the files, age of the files, etc., we can categorize it into different classes as per its value to the business.

Data classification may differ from organization to organization, depending upon the kind of data they have. The classes might be application specific like SAP data, e-mail data, etc. They also can be department specific like HR data, development data, etc. One example of data classification is shown in Figure 3-1 on page 31. Here valid data is all the business-critical data. Stale or orphan data is data that is no longer required but is still being


stored. Non-business files can be employees’ personal data. All other terms used in the classification are self explanatory. In a classification like this, we want to manage the valid data and the system files, while minimizing, archiving, or eliminating all the other data. We will show other examples of data classes in later chapters in this book.

Figure 3-1 Data classification

IBM TotalStorage Productivity Center for Data (TPC for Data) is very helpful in the data classification process. See Chapter 4, “Product overview” on page 37, for more information about this product.

Finding information classesIn this phase, we study the different service levels and data classes that were identified in the previous phases, so that we can define information classes that have similar service levels.

Thus, information classes are the groups of data classes that have similar service level requirements.

If we try to derive information classes from the data classes mentioned in Figure 3-1, we may come up with the classes named A, B, and C, with decreasing service level objectives.

� Information class A: All valid data will go in this.

� Information class B: All system files, redundant application data, log files, dump files, and temp files will go in this.

� Information class C: All duplicate data, stale/orphan data, and non-business data will go in this.

Figure 3-2 on page 32 shows this mapping.

Chapter 3. Implementing ILM 31

Figure 3-2 Information classes

Designing storage tiersThis phase gets inputs from all the above phases. Here we need to consider the types of storage that are available, budgetary constraints for new purchasing, and data growth trends that will affect future demands for the storage. After considering all these factors, we design storage tiers for the in-scope environment. These tiers can exploit the attributes of different device types like cost, performance, concurrent access, etc.

Storage tiers are often expressed as follows:

� Platinum

– Mirrored - Potentially remotely, online recovery points– High performance, high availability, high-end SAN fabric– High duty cycles, frequently backed up– Example - Enterprise disk, disk subsystem replication, director class switches

� Gold

– Mirrored, less frequent online recovery points than platinum– Less cache– Less frequently backed up– Example - Enterprise disk, disk subsystem replication, director class switches

� Silver

– RAID, but not mirroring– Lower performance, less redundancy for components– Backed up once a day– Example - Mid-range disk, Fibre Channel SAN with lower cost switches

� Bronze

– SATA disks– Fibre channel or iSCSI SAN

Figure 3-3 on page 33 also shows how the quality of service and cost can be mapped against storage tiers.


Figure 3-3 Storage tiers

As shown above, we categorize the available (or planned) storage in to the hierarchy to exploit its features and optimize the usage to reduce the cost.

Deciding ILM policiesThis is the final phase in the assessment and planning stage of implementing ILM. Here we map information classes to storage tiers. We define the policies for active (frequently accessed) as well as inactive data. Policies for active data involve data placement on various tiers, and data movement or migration across tiers. Policies for inactive data involve backups, archivals, retention, or destruction, as per regulatory compliance. Policies can be defined addressing file/database ownerships, retrieval from archive, quotas, threshold reporting, compliance, deletion/backup/recovery of files, or databases.

Policies may look like:

� Data on tier 3 will get backed up every day.� Move information class A data, which is older than 30 days from tier1 to tier 2.� Destroy all information class B data that is older than one year.

Clearly defined policies are essential to enable easier data path management. The success of the whole ILM implementation is very much dependent on well defined policies.

3.1.2 ExecutionAfter assessment and planning, in the execution stage we actually implement whatever is designed and planned. The main tasks involved are:

� Implementation of the storage tiers � Enforcing policies

We first simplify the storage infrastructure and consolidate it. We pool the storage according to its classes of service. We can also use block level virtualization to create logical volumes from physical disks. At the end we come up with the storage tiers, as designed in the planning stage.


IBM’s wide range of disk and tape solutions can be used to design the storage tiers. The IBM SAN Volume Controller provides block level virtualization. For more information about these products please see Chapter 4, “Product overview” on page 37.

Once all the tiers are in place, we enforce the ILM policies. Products like Tivoli Storage Manager provide the facilities to enforce these policies automatically. Please see Chapter 4, “Product overview” on page 37, for more information about these products.

3.1.3 MonitoringThe last but not least stage is monitoring. In this stage, we monitor the in-scope environment for major changes in data types, sizes, and access trends. These changes might reflect the changes in the data classes or information classes in the future. We also monitor whether the performance given by the storage tiers are as expected. This stage actually deals with verifying the ILM implementation for its expected returns.

IBM TotalStorage Productivity Center is a very useful product for this stage.

3.1.4 Flow diagramThe flow diagram in Figure 3-4 gives a complete view of the logical stages of ILM implementation explained in the above sections.

Figure 3-4 Flow diagram

3.2 IBM ILM consulting and servicesILM offers organizations a methodology for evolving existing data-centric storage infrastructures into business-centric infrastructures that can more efficiently manage data according to its value to the organization. Often, IT departments in the organizations do not

Gathering Service Levels

Data Classification

Finding Information Classes

Designing Storage Tiers

Deciding ILM Policies

Execution

Monitoring

Assessment and

Planning


have the in-house expertise to design ILM storage infrastructure solutions. In this situation, IBM Global Services provides a comprehensive ILM storage infrastructure design offering that provides IT departments with the plan they need to help transform their existing data storage infrastructure into a robust, business-centric infrastructure that takes optimal advantage of ILM and provides greater value to the organization.

IBM takes into consideration the following factors while designing solutions:

� Data mix� Current storage infrastructure� Data retention requirements� ILM storage infrastructure goals and objectives� Short and long term opportunities for storage “wins”

It then provides a custom-tailored design for ILM storage infrastructure implementation that can help:

� Optimize existing storage efficiencies.� Manage the costs of storage infrastructure changes.� Increase compliance and ROI.� Validate the implementation tools under consideration.� Ensure a smooth rollout.

The solution offering process involves four main steps, as follows.

Step 1: ILM data collectionCollect the information/data related to the in-scope environment, both automatically and manually.

Step 2: Analysis and data classificationDefine the classes of data, the ILM policies for each class, and the requirements on how data in each class should be stored throughout its lifecycle (capacity, performance, availability, protection, retention).

Identify opportunities for so-called quick wins, for example, data cleanup, rationalized space usage, and adaptive capacity plans.

Step 3: Methodology and architecture definitionDefine the model and architecture for the storage technology, the storage management processes and organization that support the data classes requirements and the ILM storage infrastructure policies.

Establish a preliminary business case for ILM.

Step 4: Solution roadmap and recommendationsEstablish a decision model, apply the defined architecture to known ILM storage infrastructure vendor solutions, and select the "best-fit" solution.

Identify existing gaps between current and target environments and then create a complete program for change relative to the deployment and implementation of the selected ILM storage infrastructure solution.

At the end of this four-step process the client organization will have:

� Recommendations for optimizing existing storage for short-term "wins"


� A custom blueprint for more effective ILM storage infrastructure and variable-cost storage hierarchy

� A description of corresponding storage services and classes of services� A roadmap that shows the IT department how to get from where it is to where it wants to

be

� A validated business case that anticipates short-term and long-term ROI

For more information about IGS offerings for ILM, contact your IBM representative.


Chapter 4. Product overview

This chapter gives an overview of various IBM products that aid clients in the process of implementing ILM.

4


4.1 Summary of IBM products for ILMWe have seen in Chapter 2, “ILM within an On Demand storage environment” on page 21, that ILM can be a key step towards evolving to an On Demand storage environment. IBM has solutions for every aspect of ILM.

As data classification is a very important step in ILM implementation, IBM TotalStorage Productivity Center for Data provides a rich functionality for enterprise-wide data reporting.

For building storage tiers, IBM Tivoli Storage Manager and IBM TotalStorage SAN Volume Controller (backed by a variety of IBM and other storage systems) are the main contributors. For the storage infrastructure itself, IBM also offers a wide range of disk storage systems starting from entry-level and mid-range disk systems to enterprise disk systems, plus tape systems ranging from a single tape drive to tape libraries that store petabytes of data.

For matching the information classes to storage tiers and applying policies, the following products are indicated:

� IBM Tivoli Storage Manager for Space Management� IBM TotalStorage SAN Filesystem� IBM Tivoli Storage Manager Archive� IBM System Storage Archive Manager� IBM DB2 CommonStore� IBM System Storage DR550

In the area of content management, IBM has products like IBM DB2 Content Manager and IBM DB2 Records Manager.

This chapter gives a brief overview of these products, which form the basis of an ILM implementation.

4.2 TotalStorage Productivity Center for DataAs a component of the IBM TotalStorage Productivity Center, IBM TotalStorage Productivity Center for Data is designed to help you improve your storage ROI by:

� Improving storage utilization� Enabling intelligent capacity planning� Helping you manage more storage with the same staff� Supporting high application availability

From an ILM perspective, this is a very helpful tool that provides various reports on the enterprise data. These reports play a major role in the data classification process. It also proves useful in the post-monitoring process once the initial ILM implementation is complete.

4.2.1 OverviewIBM TotalStorage Productivity Center (TPC) for Data helps discover, monitor, and create enterprise policies for disks, storage volumes, file systems, files, and databases. Knowing where all your storage is located and the properties of your data places you in a better position to act intelligently on your data.

Architected for efficiency and ease-of-use, IBM TotalStorage Productivity Center for Data uses a single agent per server to provide detailed information without a high consumption of network bandwidth or CPU cycles.


Figure 4-1 shows the concept of storage resource management from a lifecycle perspective. The idea is to establish a base understanding of the storage environment, with an emphasis on discovering areas where simple actions can deliver rapid return on investment. Ideally, the process should identify potential areas of exposure, evaluate the data residing on the servers, set up control mechanisms for autonomic management, and start the capacity planning process by predicting growth.

Figure 4-1 Storage resource management lifecycle

TPC for Data monitors storage assets, capacity, and usage across an enterprise. It can look at:

� Storage from a host perspective: Manage all the host-attached storage, capacity, and resources attributed to file systems, users, directories, and files, as well as the view of the host-attached storage from the Storage Subsystem perspective.

� Storage from an application perspective: Monitor and manage the storage activity inside different database entities including instance, tablespace, and table.

� Storage utilization: Provide chargeback information so that storage usage is justified or accounted for.

TPC for Data provides over 300 standardized reports (and the ability to customize reports) about file systems, databases, and storage infrastructure. These reports provide the storage administrator information about:

� Assets� Availability� Capacity� Usage� Usage violation� Backup

With this information, the storage administrator can:

� Discover and monitor storage assets enterprise-wide.� Report on enterprise-wide assets, files, file systems, databases, users, and applications.� Provide alerts (set by the user) on issues such as capacity problems and policy violations.� Support chargebacks by usage or capacity.

4.2.2 Key aspectsThe key aspects are discussed in this section.

Chapter 4. Product overview 39

Basic menuFigure 4-2 shows the IBM TotalStorage Productivity Center - initial screen, which displays when the application is started. It shows a quick summary of the overall health of the storage environment, and can highlight potential problem areas for further investigation.

Figure 4-2 First screen

This screen contains four viewable areas, which cycle among seven predefined panels. It shows the following statistics:

� Enterprise-wide summary: The Enterprise-wide Summary panel shows statistics accumulated from all the agents. The statistics are total file system capacity available, total file system capacity used, total number of monitored servers, total number of users, total number of disks, etc.

� File system used space: Displays a pie chart showing the distribution of used and free space in all file systems.

� Users consuming the most space: By default, displays a bar chart of the users who are using the largest amount of file system space.

� Monitored server summary: Shows a table of total disk file system capacity for the monitored servers, sorted by operating system type.

� File systems with least free space percentage: Shows a table of the most full file systems, including the percent of space free, the total file system capacity, and the file system mount point.

� Users consuming the most space report: Shows the same information as the Users Consuming the Most Space panel, but in a table format.


� Alerts pending: Shows active alerts that have been triggered but are still pending.

Discover and monitor informationTPC for Data uses three methods to discover information about the assets in the storage environment: Pings, probes, and scans. These are typically set up to run automatically as scheduled tasks. You can define different ping, probe, and scan jobs to run against different Agents or groups of Agents (for example, to run a regular probe of all Windows systems), according to your particular requirements.

PingsA ping is a standard ICMP ping that checks registered agents for availability. If an agent does not respond to a ping (or a predefined number of pings) you can set up an alert to take some action. The actions could be one, any, or all of:

� SNMP trap� Notification at login� Entry in the Windows event log� Run a script� Send e-mail to a specified user or users

Pings are used to generate Availability Reports, which list the percentage of times a computer has responded to the ping. An example of an Availability Report for Ping is shown in Figure 4-3.

Figure 4-3 Availability report


ProbesProbes are used to gather information about the assets and system resources of monitored servers, such as processor count and speed, memory size, disk count and size, file systems, and so on. Probes also gather information about the files, instances, logs, and objects that make up the monitored databases. Data collected by probes is used in the Asset Reports. Figure 4-4 shows an Asset Report for the detected computer.

Figure 4-4 Asset Report of a computer

ScansThe scan process is used to gather statistics about usage and trends of the server storage. Scans also gather information about the storage usage and trends within the monitored databases. Data collected by the scan jobs are tailored by Profiles. Results of scan jobs are stored in the enterprise repository. This data supplies the data for the capacity, usage, usage violations, and backup reporting functions. These reports can be scheduled to run regularly, or they can be run ad hoc by the administrator.

Profiles limit the scanning according to the parameters specified in the profile. Profiles are used in scan jobs to specify what file patterns will be scanned, what attributes will be gathered, what summary view will be available in reports, and the retention period for the statistics. TPC for data supplies a number of default profiles that can be used, or additional profiles can be defined. Some of these include:

� Largest files: Gathers statistics on the largest files� Largest directories: Gathers statistics on the largest directories� Most obsolete files: Gathers statistics on most obsolete files

Figure 4-5 on page 43 shows the sample report of largest files by computer.


Figure 4-5 Largest files by computer

ReportingReporting in TPC for Data is very rich, with over 300 predefined views, and the capability to customize those standard views. Reports can be scheduled or run as needed. You can also create your own individual reports according to particular needs and set them to run as needed or in batch (regularly). Reports can be produced in table format or in a variety of charting (graph) views. You can export reports to CSV or HTML formats for external usage.

Reports are generated against data already in the repository. A common practice is to schedule scans and probes just before running reports.

Reporting can be done at almost any level in the system, from the enterprise down to a specific entity and any level in between. Reports can be produced either system-wide or grouped into views, such as by computer or operating system type.

TPC for Data allows you to group information about similar entities (disk, file systems, and so on) from different servers or business units into a Summary Report so that business and technology administrators can manage an enterprise infrastructure. Or you can summarize information from a specific server. The flexibility and choice of configuration is entirely up to the administrator.

Major reporting categories for file systems and databases are:

� Assets Reporting uses the data collected by probes to build a hardware inventory of the storage assets. You can then navigate through a hierarchical view of the assets by drilling down through computers, controllers, disks, file systems, directories, and exports. For


databases, information about instances, databases, tables, and data files is presented for reporting.

� Availability Reporting shows responses to ping jobs, as well as computer uptime.

� Capacity Reporting shows how much storage capacity is installed, how much of the installed capacity is being used, and how much is available for future growth. Reporting is done by disk and file system, and for databases, by database.

� Usage Reporting shows the usage and growth of storage consumption, grouped by file system, and computers, individual users, or enterprise-wide.

� Usage Violation Reporting shows violations to the corporate storage usage policies, as defined through TPC for Data. Violations are either of Quota (defining how much storage a user or group of users is allowed) or Constraint (defining which file types, owners, and file sizes are allowed on a computer or storage entity). You can define what action should be taken when a violation is detected, for example, SNMP trap, e-mail, or running a user-written script.

� Backup Reporting identifies files which are at risk because they have not been backed up.

AlertsAn alert defines an action to be performed if a particular event occurs or condition is found. Alerts can be set on physical objects (computers and disks) and/or logical objects (file systems, directories, users, databases, and operating system user groups). Alerts can tell you, for instance, if a disk has a lot of recent defects, or if a file system or database is approaching capacity.

Alerts on computers and disks come from the output of probe jobs and generate an alert for each object that meets the triggering condition. If you have specified a triggered action (running a script, sending an e-mail, and so on), then that action will happen if the condition is met.

Alerts on file systems, directories, users, and operating system user groups come from the combined output of a probe and a scan. Again, if you have specified an action, that action will be performed if the condition is met.

An Alert will register in the Alert log, plus you can also define one, some, or all of the following actions to be performed in addition:

� Send an e-mail indicating the nature of the alert.� Run a specific script with relevant parameters supplied from the content of the Alert.� Make an entry into the Windows event log. � Pop up next time the user logs in to IBM TotalStorage Productivity Center for Data.� Send an SNMP trap.

4.2.3 Product highlightsFollowing are the main product highlights of TPC for Data.

Enterprise reportingThis supports over 300 comprehensive enterprise-wide reports designed to help administrators make intelligent capacity management decisions based on current and trended historical data.

Policy-based management Policy-based management enables administrators to set thresholds, while monitoring can detect when thresholds have been exceeded and issue an alert or initiate a predefined action.


Automated file system extension This feature enables administrators to ensure application availability by providing on demand storage for file systems.

Direct Tivoli Storage Manager integration This allows administrators to initiate a Tivoli Storage Manager archive or backup via a constraint or directly from a file report, simplifying policy-based actions.

The Database capacity reporting featureThis is designed to enable administrators to see how much storage is being consumed by users, groups of users, and OSs within the database application.

Chargeback capabilitiesChargeback capabilities are designed to provide usage information by department, group, or user, making data owners aware of and accountable for their data usage.

Advanced provisioningTPC for Data is an integral piece of the IBM TotalStorage Productivity Center with Advanced Provisioning solution and is designed to allow you to automate capacity provisioning capabilities through automated workflows.

4.3 SAN Volume ControllerThe SAN Volume Controller (SVC) is designed to simplify the storage infrastructure by enabling changes to the physical storage with minimal or no disruption to applications. SAN Volume Controller implements virtualization by combining the capacity from multiple disk storage systems into a single storage pool, which can be managed from a central point. This is simpler to manage, and helps to increase utilization and improve application availability. SAN Volume Controller's extensive support for non-IBM storage systems, including EMC, HP, and HDS, enables a tiered storage environment to better allow matching of the cost of the storage to the value of the data. It also allows advanced copy services to be applied across storage systems from many different vendors to help further simplify operations.

4.3.1 OverviewStorage area networks (SANs) enable companies to share homogeneous storage resources across the enterprise. But for many companies, information resources are spread over a variety of locations and storage environments, often with products from different vendors. To achieve higher utilization of resources, companies need to be able to share their storage resources from all of their environments, regardless of vendor. While storage needs rise rapidly, and companies operate on lean budgets and staffing, the best solution is one that leverages the investments already made and that provides growth when needed. IBM TotalStorage SAN Volume Controller (SVC) offers a solution that can help strengthen existing SANs by increasing storage capacity, efficiency, uptime, administrator productivity, and functionality.

Note: For more information about IBM TotalStorage Productivity Center for Data refer to the IBM TotalStorage Productivity Center V2.3: Getting Started, SG24-6490, and to the Web site:

http://www-03.ibm.com/servers/storage/software/center/data/index.html


http://www-03.ibm.com/servers/storage/software/center/data/index.html

The SAN Volume Controller combines hardware and software into a comprehensive, modular appliance. Using xSeries® server technology in highly reliable clustered pairs, SVC is designed to avoid single points of failure. SAN Volume Controller software is a highly available cluster optimized for performance and ease of use.

The following are the key features of the IBM TotalStorage SAN Volume Controller.

Storage utilizationThe SAN Volume Controller is designed to help increase the amount of storage capacity that is available to host applications. By pooling the capacity from multiple disk arrays within the storage area network (SAN), it helps enable host applications to access capacity beyond their island of SAN storage.

High scalabilityAn I/O group is formed by combining a pair of high-performance, redundant Intel® processor-based servers. Each I/O group contains 4 GB of mirrored cache memory. Highly available I/O groups are the basic configuration of a cluster. Adding another I/O group can help increase cluster performance and bandwidth.

At its base level, a SAN Volume Controller contains a single I/O group. It can scale up to support four I/O groups. For every cluster, the SAN Volume Controller supports up to 4096 virtual disks.

Personnel productivity The SAN Volume Controller is designed to help improve administrator productivity by enabling management at the cluster level, and it is designed to provide a single point of control over all the storage it manages.

The SAN Volume Controller provides a comprehensive, easy-to-use graphical interface for central management. This simple interface incorporates the Storage Management Initiative Specification (SMI-S) application programming interface (API) and further demonstrates the IBM commitment to open standards. With this single interface, administrators can perform configuration, management, and service tasks over storage volumes from disparate storage controllers. The SAN Volume Controller allows administrators to map disk storage volumes to virtual pooled volumes to help better use existing storage.

Application availabilityBy pooling storage into a single reservoir, the SAN Volume Controller insulates host applications from physical changes to the storage pool, so that applications continue to run without disruption.

The SAN Volume Controller includes a dynamic data-migration function to help administrators migrate storage from one device to another, without taking it offline. This helps administrators reallocate and scale storage capacity without disrupting applications.

The solution supports both local area network (LAN)-free and server-free backups. Through the IBM FlashCopy function, administrators can copy point-in-time mission critical data to lower cost storage devices, such as Serial Advanced Technology Attach (SATA) devices. The SAN Volume Controller also supports IBM TotalStorage Multipath Subsystem Device Driver (SDD). This mature multipathing software provides failover and load-balancing capabilities.

Tiered storageIn most IT environments, inactive data makes up a high proportion, if not the bulk of, the total stored data. SAN Volume Controller is designed to help administrators control storage growth


more effectively by moving low-activity or inactive data into a hierarchy of lower-cost storage. And administrators can free disk space on higher-value storage for more-important, active data. The SAN Volume Controller is designed to enable you to match the cost of storage to the value of data.

4.3.2 VirtualizationSVC provides block aggregation and volume management for disk storage within the SAN. In simpler terms, this means that the SVC manages a number of back-end storage controllers and maps the physical storage within those controllers to logical disk images that can be seen by application servers and workstations in the SAN. The SAN is zoned in such a way that the application servers cannot see the back-end storage, preventing any possible conflict between SVC and the application servers both trying to manage the back-end storage.

The SVC I/O Groups are connected to the SAN in such a way that all back-end storage and all application servers are visible to all of the I/O Groups. The SVC I/O Groups see the storage presented to the SAN by the back-end controllers as a number of disks, known as Managed Disks or mdisks. The mdisks are collected into groups, known as Managed Disk Groups. The mdisks that are used in the creation of a particular vDisk must all come from the same Managed Disk Group. A vdisk is the SVC device that appears to a host system as a SAN attached disk. Each mDisk is divided up into a number of extents (default minimum size 16 MB, maximum size of 512 MB), which are numbered sequentially from the start to the end of each mDisk. Conceptually, this might be represented as shown in Figure 4-6.

Figure 4-6 SVC block virtualization

Extent 1aExtent 2aExtent 3aExtent 1bExtent 2bExtent 3bExtent 1cExtent 2cExtent 3c

create a stripedvirtual disk VDisk1

Extent 1aExtent 1bExtent 1cExtent 1dExtent 1eExtent 1fExtent 1gExtent 1hExtent 1i



MDisk1 MDisk2 MDisk3

Managed Disk Group

A VDisk isa collectionof extents(each 16 to 512MB)


4.3.3 ArchitectureThe IBM TotalStorage SAN Volume Controller is a modular solution that consists of a Master Console for management, up to eight cluster nodes (added in pairs), and dual UPS for write cache data protection (Figure 4-7). The nodes are the hardware elements of the SAN Volume Controller. SVC combines these nodes (servers) to create a high availability cluster. Each of the servers in the cluster is populated with 4 GB of high-speed memory, which serves as the cluster cache.

Figure 4-7 SVC components

The storage engines (or storage nodes) are always installed in pairs and combined into a high availability cluster.

SVC nodes within the cluster are grouped into pairs (called I/O groups), with a single pair being responsible for serving I/O on a given vdisk. One node within the I/O Group will represent the preferred path for I/O to a given vdisk (the other node representing the non-preferred path). This preference will alternate between nodes as each vDisk is created within an I/O Group to balance the workload evenly between the two nodes.

4.4 IBM TotalStorage DS family of disk productsIBM has a wide variety of disk products to meet a range of price/performance needs. All of these can be included in a tiered storage environment, either standalone or behind a SAN Volume Controller.

4.4.1 Enterprise disk storageThe IBM TotalStorage DS6000 and IBM TotalStorage DS8000 are designed for high reliability, scalability, capacity, performance, and multiplatform support.

4.4.2 Mid-range disk storageThe IBM TotalStorage DS4000 family, including DS4800, DS4500, DS4300, and DS4100, offers a range of price/performance/functionality options for lower tiered storage.

Note: For more information about the IBM TotalStorage SAN Volume Controller, see IBM TotalStorage SAN Volume Controller, SG24-6423, and the Web site:

http://www-03.ibm.com/servers/storage/software/virtualization/svc/index.htm



4.5 IBM TotalStorage tape solutionsIBM tape products are designed to address business continuity, infrastructure simplification, and Information Lifecycle Management requirements. As IT budgets continue to shrink, tape products have become even more appealing to help reduce costs by providing a cost-effective alternative to primary disk storage. IBM offers tape systems and tape accessories that range from a single tape drive to tape libraries that store petabytes of data, and has virtualization offerings that combine the performance of disk with the affordability of tape. IBM also has offerings that are designed to help address regulatory or compliance requirements. Tape is removable and portable, provides high volumetric efficiency (amount of data that can be stored in a small form factor), and has a long life. All of these factors make tape a critical part of any tiered storage environment.

4.5.1 IBM Virtualization Engine TS7510The IBM Virtualization Engine™ TS7510 has been developed to exploit a tiered storage hierarchy. Data that resides on tape in an ILM environment should be data that will usually be accessed infrequently, but needs to be stored on a cost-effective, reliable medium. The TS7510 architecture provides fast access to data on virtual volumes on the disk buffer, and cost-effective storage by migrating virtual volumes to tape. It has throughput of up to 600 MB/sec and supports a native storage capacity of up to 46 TB. It can be configured with up to 128 virtual tape libraries, 1024 virtual drives, and 8192 virtual volumes.

4.6 Tivoli Storage ManagerIBM Tivoli Storage Manager and its complementary products provide a comprehensive solution focused on the key data protection activities of backup, archive, recovery, space management, hierarchical storage management, and disaster recovery planning.

4.6.1 OverviewIBM Tivoli Storage Manager is an enterprise-wide solution that:

� Provides backup-restore and archive-retrieve solutions and stores backup and archive copies of data in off-site storage

� Scales to protect hundreds of computers running a dozen operating systems

� Provides intelligent data move and store techniques

Note: For more information about IBM TotalStorage disk products, see the Web site:

http://www.ibm.com/servers/storage/disk/

Note: For more information about IBM TotalStorage tape products, see the Web site:

http://www.ibm.com/servers/storage/tape/

Note: For more information about the IBM Virtualization Engine TS7510, seeIntroducing the IBM Virtualization Engine TS7510: Tape Virtualization for Open Systems Servers, SG24-7189, and the Web site:

http://www.ibm.com/servers/storage/tape/virtualization/index.html





� Provides optional modules that allow business-critical applications, which run 24 hour a day, 365 days a year, to use data protection with no interruption in service

Designed for a heterogeneous environment, IBM Tivoli Storage Manager uses Local Area Network (LAN), Wide Area Network (WAN), Internet, and Storage Area Network (SAN) connectivity to provide smart data move and store techniques, comprehensive policy-based automation, and data management.

4.6.2 ComponentsFigure 4-8 shows the different components of Tivoli Storage Manager.

Figure 4-8 Tivoli Storage Manager components

Backup-archive clientsThe Tivoli Storage Manager client sends data to, and retrieves data from, a Tivoli Storage Manager server. The Tivoli Storage Manager backup-archive client must be installed on every machine that needs to transfer data to server-managed storage called storage pools. Data can be recovered to the same client machine that initially transferred it or to another client with a compatible file system format if that client has provided permission.

Storage manager serverIn a traditional LAN configuration, the role of a storage manager server is to store the backup or archive data from the backup-archive clients that it supports to storage media.

It also has a database of information to keep track of the data it manages, including policy management objects, and user, administrator, and client nodes.

backup-archive clients and Webclients

Policy Domain

Active Policy Set

Management Class

ArchiveCopyGroup

BackupCopyGroup

Administration Center

Storage Manager Server

scheduler

scheduler

TSM database

TSM recovery log

storage pool hierarchy

tape library

TSM driver

administrativeintervace


Administration centerThe administration center operates on the Integrated Solutions Console (ISC). It provides a task-oriented graphical user interface for storage administrators. The administration console provides support for tasks including:

� Creating server maintenance scripts� Scheduling� Adding storage devices� Setting policy domains� User management� Viewing the health monitor

Tivoli Storage Manager databaseIBM Tivoli Storage Manager saves information in the Tivoli Storage Manager database about each file, raw logical volume, or database that it backs up, archives, or migrates. This information includes the file names, file size, management class, copy group, location of the files in Tivoli Storage Manager server storage, and all other information except for the data. Data is stored in the storage pools.

Tivoli Storage Manager recovery logThe TSM recovery log keeps track of the all changes made to the Tivoli Storage Manager database, so that if a system outage should occur, a record of the changes will be available for recovery.

Policy-based managementBusiness policy is used to centrally manage backup-archive client data. Policies are created by the administrator and stored in the database on the server. They are maintained in the form of policy domains, policy sets, and management classes, as shown in the Figure 4-8 on page 50.

Storage pool hierarchy and tape librariesStorage pools are a collection of like media that provides storage for backed-up, archived, and migrated data. These pools can be chained in order to create a storage hierarchy.

Tivoli Storage Manager also supports a variety of tape library types including manual libraries, SCSI libraries, 349X and 358X (LTO) libraries, and external libraries.

4.6.3 Tivoli Storage Manager applicationsIBM Tivoli Storage Manager offers several optional software modules to handle special needs and applications. The following section provides an overview of these software modules.

IBM Tivoli Storage Manager for MailIBM Tivoli Storage Manager for Mail is a software module for IBM Tivoli Storage Manager that automates the data protection of e-mail servers running either Lotus Domino or Microsoft Exchange. This module utilizes the application program interfaces (APIs) provided by e-mail application vendors to perform online hot backups without shutting down the e-mail server, and improves data restore performance.

Figure 4-9 on page 52 shows how Tivoli Storage Manager for Mail works.


Figure 4-9 Tivoli Storage Manager for Mail

IBM Tivoli Storage Manager for DatabasesIBM Tivoli Storage Manager for Databases is a software module that works with IBM Tivoli Storage Manager to protect a wide range of application data through the protection of the underlying database management systems holding that data. Tivoli Storage Manager for Databases exploits the backup-certified utilities and interfaces provided for Oracle and Microsoft SQL Server. In conjunction with Tivoli Storage Manager, this module automates data protection tasks and allows database servers to continue running their primary applications while they back up and restore data to and from offline storage.

DB2 no longer requires a data protection module.

Figure 4-10 shows how Tivoli Storage Manager for Databases works.

Figure 4-10 Tivoli Storage Manager for Databases

IBM Tivoli Storage Manager for Application ServersIBM Tivoli Storage Manager for Application Servers is a software module that works with IBM Tivoli Storage Manager to better protect the infrastructure and application data and improve the availability of WebSphere® Application Servers. It works with the WebSphere Application Server software to provide an applet GUI to do reproducible, automated online backup of a WebSphere Application Server environment, including the WebSphere administration database (DB2 Universal Database™), configuration data, and deployed application program files.


Figure 4-11 shows how Tivoli Storage Manager for Application Servers works.

Figure 4-11 Tivoli Storage Manager for Application Servers

IBM Tivoli Storage Manager for Enterprise Resource PlanningSpecifically designed and optimized for the SAP R/3 environment, IBM Tivoli Storage Manager for Enterprise Resource Planning (ERP) provides automated data protection, reduces the CPU performance impact of data backups and restores on the R/3 server, and greatly reduces the administrator workload necessary to meet data protection requirements. Tivoli Storage Manager for ERP builds on the SAP database and includes a set of database administration functions integrated with R/3 for database control and administration.

Figure 4-12 shows how Tivoli Storage Manager for ERP works.

Figure 4-12 Tivoli Storage Manager for ERP

IBM Tivoli Storage Manager for HardwareIBM Tivoli Storage Manager for Hardware improves the data protection of your business-critical databases and ERP applications that require 24-hour by 365-day availability. This software module helps IBM Tivoli Storage Manager and its other data protection modules to perform high-efficiency data backups and archives of your most business-critical applications while eliminating nearly all performance impact on database or ERP servers.

Figure 4-13 on page 54 shows how Tivoli Storage Manager for Hardware works.


Figure 4-13 Tivoli Storage Manager for Hardware

IBM System Storage Archive Manager IBM System Storage Archive Manager (formerly IBM Tivoli Storage Manager for Data Retention) facilitates compliance with the most stringent regulatory requirements in the most flexible and function-rich manner. It helps manage and simplify the retrieval of the ever-increasing amount of data that organizations must retain for strict records retention regulations. Many of the regulations demand the archiving of records, e-mails, design documents, and other data for many years, in addition to requiring that the data not be changed or deleted.

IBM Tivoli Storage Manager for Space ManagementIBM Tivoli Storage Manager for Space Management frees administrators and users from manual file system pruning tasks and defers the need to purchase additional disk storage by automatically and transparently migrating rarely accessed files to Storage Manager storage while the files most frequently used remain in the local file system. This can be used for implementing Hierarchical Storage Management (HSM).

Figure 4-14 on page 55 shows how Tivoli Storage Manager for Space Management works.


Figure 4-14 Tivoli Storage Manager for Space Management

4.6.4 Tivoli Storage Manager APIs and DR550Tivoli Storage Manager APIs are used for Tivoli’s own Disaster Protection products but they are also documented and published. This allows ISVs to adapt their solutions to integrate with Tivoli Storage Manager to extend its functionality. In particular, various vendors have used the APIs to provide bare metal recovery solutions for various platforms. Among the vendors exploiting these APIs for disaster recovery include Cristie, UltraBac Software, and Symantec.

The IBM System Storage DR550 is a packaged solution that uses IBM System Storage Archive Manager, as shown in Figure 4-15 on page 56. The DR550 is a storage solution designed for fast affordable access to retention managed data. The following are the main feature highlights of the DR550:

� It is designed as a preconfigured solution to help store, retrieve, manage, share, and secure regulated and nonregulated data.

� It supports nondisruptive enterprise scalability of up to 56 TB physical capacity.

� It is designed to offer automatic provisioning, migration, expiration, and archiving capabilities.

� It offers a comprehensive suite of software tools for policy-based and event-based data management.

� It is designed to avoid a single point of failure.

For more information about the DR550, see Understanding the IBM TotalStorage DR550, SG24-7091, and the Web site:

http://www-03.ibm.com/servers/storage/disk/dr/

Figure 4-15 on page 56 shows how the Tivoli Storage Manager API fits into the DR550 solution.


http://www-03.ibm.com/servers/storage/disk/dr/

Figure 4-15 TSM APIs and DR550

4.7 DB2 Content ManagerContent lifecycle management is a very important part of ILM, since unstructured data like e-mails, messages, and documents form a big part of organization data.

Figure 4-16 on page 57 shows components of enterprise content management.

Note: For more information about IBM Tivoli Storage Manager, see.


For more information about the IBM System Storage DR550, see Understanding the IBM TotalStorage DR550, SG24-7091.

IBM System Storage DR550

Document Management System (from ISV)

IBM DB2 Content Management Applications

Tivoli StorageManager API

Tivoli StorageManager API

IBM System Storage Archive Manager

IBM TotalStorage DS4100 Disk



Figure 4-16 Enterprise content management components

IBM has a wide portfolio of content management products to deal with these different components of content management. Figure 4-17 shows the different products in this portfolio with their basic functionality.

Figure 4-17 IBM content management portfolio

Here we cover DB2 Content Manager, which is a very important product in this portfolio.

© 2005 IBM Corporation1

Content Content RepositoryRepository

ContentContentControl & Control & IntegrationIntegration

Presentation &Presentation &DeploymentDeployment

Portals Web Browsers Rich Clients

Process MonitoringProcess MonitoringWorkflow ManagementWorkflow Management

Document Management

Document Management

Web Content Management

Web Content Management

ImagingImagingContent Extensions

IntegrationServices

Multi-mediaMulti-media

CollaborationCollaboration

SearchSearch Federation

Output & Reports

Output & Reports

FormsForms Records Mgmt.

Records Mgmt.

Digital RightsDigital Rights

Library ServersCatalogsModeling

Library ServersCatalogsModeling

Resource ManagersText, Image, Video, Audio, Print Stream, +++

Resource ManagersText, Image, Video, Audio, Print Stream, +++

Version ControlCheck-in/Check-out

Version ControlCheck-in/Check-out

Annotation/RedactionAccess Control, RI

ContentStorage

Mgmt. Services

Physical Data Storage

Physical Data Storage

Archiving, +++

Imaging

Docum entM anagem ent

COLD

Digital AssetManagement

DB2 Content M anager

DB2ContentM anager

ArchivingDB2 Comm onStore

- SAP- MS Exchange- Lotus Dom ino

RecordsManagem ent

DB2 Records M anager

W eb Content M anagem ent

W orkplaceW eb ContentM anagem ent

Digital RightsManagem ent

Digital AssetM anagem ent

DB2 Inform ation Integrator Content Edition

ContentIntegration

DB2 Docum ent M anager

DB2 Content M anager OnDem and

DB2 Content M anager VideoCharger


DB2 Content Manager manages all types of digitized content including HTML and XML Web content, document images, electronic office documents, printed output, audio, and video. V8.3 expands record management integration and workflow.

4.7.1 OverviewUnlike simple file systems, Content Manager uses a powerful relational database to provide indexed search, security, and granular access control at the individual content item level. Content Manager provides check-in and check-out capabilities, version control, object-level access control, a flexible data model that enables compound document management, and advanced searching based on user-defined attributes. It also includes workflow functionality, automatically routing and tracking of content through a business process according to predefined rules.

It provides the content infrastructure for solutions such as:

� Compliance in a regulated life sciences environment� Records management� Document lifecycle management� Lotus Notes® e-mail management� Exchange Server e-mail management� Monitoring electronic messages� Digital media� Web content management

The multi-tier, distributed architecture of DB2 Content Manager offers:

� Scalability to grow from a single department to a geographically dispersed enterprise

� Openness to support multiple operating systems (including Linux), databases, applications, and resources

� A secure environment and a single source of access for administration

� A powerful and expressive XML-ready data model

4.7.2 ArchitectureThe Content Manager multi-tier distributed architecture is fully Web enabled, scalable, and extensible. It includes five core components, as follow.

Library ServerLibrary Server is the central source for indexing, describing, locating, organizing, and managing enterprise content. It locates stored objects using a variety of search technologies and provides secured access to content and manages transactions.

Mid-tier ServerMid-tier Server is the Web-exploiting broker that mediates between the client and the Library Server. It supports the enhanced Content Manager API toolkit and manages connections to the Library Server and, optionally, to the Resource Managers.

Resource ManagerResource Managers (formerly called Object Servers) are specialized repositories optimized to manage the storage, retrieval, and archival of enterprise content. DB2 Content Manager for Multiplatforms V8 includes document, image, and rich media resource managers. DB2 Content Manager VideoCharger™ provides streaming media support. DB2 Content Manager OnDemand manages resources for high volume print output (COLD) data.


eClienteClient is the browser-based thin client that provides the graphical user interface to the Content Manager and related systems. The eClient communicates through the mid-tier server and/or directly to Resource Managers, enabling fast and secure delivery of objects while maintaining full transactional support with referential integrity.

Client for WindowsClient for Windows is the desktop client that exploits the client-server architecture, providing out-of-the-box capabilities for supporting high volume, high performance, production-level document applications.

4.7.3 Standards and data modelContent Manager is based on industry standards and internet protocols. It supports HTTP, FTP, RTSP, JDBC™, SQL, and LDAP.

Designed to be fully open to any application, Content Manager publishes a robust set of application programming interfaces (APIs) to handle different types of content for unified search, retrieval, workflow, access control management, and system administration.

Content Manager V8 provides a powerful XML-ready, physical data model. The advanced model can capture structural and relationship information across all types of content and its associated metadata or attributes. It facilitates the integration of structured data with unstructured content.

The V8 data model, combined with OO APIs, allows faster development of new and more sophisticated applications with greater flexibility. Applications which exploit fully linked relationships and management of virtual or compound documents can be created.

4.8 DB2 CommonStoreDB2 CommonStore is within the IBM content management portfolio and works in the area of archiving, as shown in Figure 4-17 on page 57. DB2 CommonStore middleware seamlessly integrates SAP, Lotus Domino, and Microsoft Exchange Server with IBM archives. It has three different products, as follows:

� DB2 CommonStore for Exchange Server� DB2 CommonStore for Lotus Domino� DB2 CommonStore for SAP

4.8.1 DB2 CommonStore for Exchange ServerDB2 CommonStore for Exchange Server manages e-mail archiving and retrieval. It helps:

� Trim the size of the Exchange database to reduce storage costs.� Improve e-mail system performance.� Provide virtually unlimited mailbox space for each user.

Note: For more information about DB2 Content Manager and other products mentioned in the IBM content manager portfolio, please see the Web sites:

http://www-306.ibm.com/software/data/cm/cmgr/mp/http://www-306.ibm.com/software/data/


http://www-306.ibm.com/software/data/cm/cmgr/mp/

http://www-306.ibm.com/software/data/

Automated offloading to archives reduces requirements to manage e-mail server growth. It provides direct access to archives, which lets users view them via browser or Outlook® desktop.

V8.3 supports Exchange 5.5 Servers and easier migration from Exchange 5.5 to Exchange 2000 Server or Exchange 2003 Server. It also supports:

� Additional Outlook client platforms

� Windows Services to automate tasks

� More than 600 IBM or non-IBM storage devices to meet business and content lifecycle needs

� HTTPS to prevent unauthorized access to critical data

CommonStore is a middleware server between the Exchange Server mail server system and the back-end archive management system. CommonStore does not store data or documents, but defines and manages what to archive, when to archive, and how to archive from the mail system to the back-end archive management system.

The back-end archive management system can be one of the following IBM repositories:

� DB2 Content Manager for Multiplatforms or DB2 Content Manager for z/OS®� IBM DB2 Content Manager OnDemand for Multiplatforms� IBM Tivoli Storage Manager

The companion product, Tivoli Storage Manager for Mail, automates the data protection of e-mail servers running either Lotus Domino or Microsoft Exchange. This module utilizes the application program interfaces (APIs) provided by e-mail application vendors to perform online "hot" backups without shutting down the e-mail server and improve data-restore performance. Tivoli Storage Manager for Mail protects the growing amount of new and changing data that should be securely backed up to help maintain 24x365 application availability. Refer to Figure 4-9 on page 52. Tivoli Storage Manager for Mail allows client access to backed up e-mail after it has restored the entire set, while CommonStore allows a client immediate access to archived mail at a click of a button from an existing user client interface, Notes client, or Outlook, providing the best of both worlds.

The CommonStore solution also can be extended with DB2 Content Manager or DB2 Content Manager OnDemand providing access for a user population beyond the messaging system users. The CommonStore archive for e-mail messages and documents can be accessed by any user who has rights to access, including messaging system clients, Content Manager, or Content Manager OnDemand clients, Internet, and intranet users.

4.8.2 DB2 CommonStore for Lotus DominoDB2 CommonStore for Lotus Domino manages e-mail archiving and retrieval for any Notes database or server platform. It helps:

� Trim the size of the Notes database to reduce storage costs.� Improve e-mail system performance.� Provide virtually unlimited user mailbox space.

It is tightly integrated with Domino ND6 and with Content Manager repositories and it provides options to integrate e-mail content with images, facsimiles, and for policy-driven archives. It supports more than 600 IBM or non-IBM storage devices, providing flexibility to meet business and content lifecycle needs. V8.3 integrates records management capabilities and strengthens security, search, and archiving.


DB2 CommonStore for Lotus Domino connects the world of Lotus Notes and Domino with the electronic archive. Three archives are supported:

� Tivoli Storage Manager allows you to archive attachments only, so the document remains in the Notes database, but has a pointer inserted that points to the location of the attachment.

� DB2 Content Manager captures, indexes, manages, and distributes electronic documents (including scanned paper, faxes, electronic documents, images, audio, video, and e-mail).

� DB2 Content Manager OnDemand stores and manages e-mail and automatically captures, indexes, and manages print streams.

4.8.3 DB2 CommonStore for SAPDB2 CommonStore for SAP helps clients:

� Offload operational SAP databases.� Work with non-SAP documents from within SAP Business Objects.� Process business documents that reside in an external archiving system.

It supports any SAP operational database such as DB2 Universal Database, Informix®, Oracle, etc.

CommonStore for SAP is a middleware server between the SAP ArchiveLink interface and a (required) back-end archive. It integrates documents into SAP applications such as:

� SAP Document Management System (DMS), allowing users to archive in batch instead of one document at a time.

� SAP R/3 Document Finder for access to any enterprise content stored in Content Manager or DB2 CM OnDemand, not just SAP archived items.

� SAP Workflow Integration to start workflow from document capture outside SAP.

DB2 CommonStore does not store any data or documents. Instead, it manages the data and document archive process defined by the SAP ArchiveLink protocol—storing and retrieving archived content to and from the back-end archive management repositories. The back-end archive management system can be one of the following IBM repositories:

� IBM DB2 Content Manager� IBM DB2 Content Manager OnDemand� IBM Tivoli Storage Manager

4.9 More informationHere we only give you an overview of the specific IBM products. For more information about these, as well as some other IBM products that are helpful when implementing ILM, see the Web sites:

� The IBM TotalStorage Solutions Handbook, SG24-5250:

http://www.redbooks.ibm.com/redbooks/pdfs/sg245250.pdf

� IBM System Storage and TotalStorage Web site:

http://www.storage.ibm.com

Note: For more information about DB2 CommonStore, see the Web site:

http://www-306.ibm.com/software/data/commonstore/


http://www-306.ibm.com/software/data/commonstore/

http://www.redbooks.ibm.com/redbooks/pdfs/sg245250.pdf

http://www.storage.ibm.com

� IBM Disk Storage Systems:

http://www-03.ibm.com/servers/storage/disk/index.html

� IBM Tape Systems:

http://www-03.ibm.com/servers/storage/tape/index.html

� IBM TotalStorage Productivity Center with Advanced Provisioning:

http://www-03.ibm.com/servers/storage/software/center/provisioning/

� IBM Tivoli Software Web site:

http://www-306.ibm.com/software/tivoli/


http://www-03.ibm.com/servers/storage/disk/index.html



http://www-03.ibm.com/servers/storage/software/center/provisioning/



Part 2 Evaluating ILM for your organization

In this part we introduce techniques for evaluating and proposing the value of ILM in an organization.

Part 2



Chapter 5. An ILM quick assessment

This chapter provides a methodology to perform a quick assessment for ILM solutions. It will help you:

� Get information about data usage profiles in your current environment.

� Use IBM TotalStorage Productivity Center for Data in the assessment, including the best reports to run.

� Collect, classify, and analyze data.

� Discuss Return On Investment (ROI) on an ILM project.

5


5.1 Initial stepsOne of the most important steps in implementing an ILM solution is to classify the current data and define the classes aligned to its value and service level requirements. This is the first logical step, as explained in Chapter 3, “Implementing ILM” on page 29.

The assessment objectives are to collect information about the current storage environment and create reports to enable storage administrators to take actions. These actions will allow you to store information on the storage device with the most appropriate cost, maximize the utilization of the installed storage devices, and improve the capacity to plan storage growth through better forecasting and trending.

An ILM assessment can be executed by performing the steps shown in Figure 5-1, beginning with documenting the current environment information and concluding with defining appropriate actions leading to the assessment findings.

Figure 5-1 Quick Assessment steps

The next sections describe each step and help you to create a quick assessment for an ILM solution.

5.2 Getting business and storage informationFirst of all, it is necessary to know what is the current storage environment and what are the specific business and technical drivers for improving data management. Collecting the following information is useful and makes it easier to understand the current storage environment:

� Types of storage technologies running

� Quantities of enterprise, mid-range, and long-term retention storages

Get business and storage environment information

Define data collection reports(IBM TPC for Data)

Classify data and analyzeTPC for Data reports

Define actions:(Delete data, migrate data, etc)

Calculate ROI

Assessment Findings


� Terabytes of storage installed, terabytes in use, growth rate, and reasons for the growth

� How many people are managing the storage environment and what tools they are using

� What the objectives are for storage environment

� What the main storage management problems to be solved are and their priority

� What, if any, data classes are currently defined and whether a tiered-storage environment has been implemented to support these classes

� The main applications running and their objectives for storage performance, capacity, availability, and recoverability

This is a comprehensive list; you may not be able to collect all of the information, for reasons of time, practicality, or complexity. In 5.3, “Defining data collection reports” on page 67, we discuss methods to help collect this information and classify the data.

5.3 Defining data collection reportsThis section provides some steps to collect data, define reports, and evaluate gains with an ILM solution. These steps can be summarized as:

1. Define current storage tiers in use.

2. Match file systems in use to storage tiers creating file system groups.

3. Choose the best reports to run.

To collect reports from current storage in use and classify data, we use IBM TotalStorage Productivity Center for Data (TPC for Data). For more information about TPC for Data, see Chapter 4, “Product overview” on page 37.

For installation, configuration, and usage instructions of TPC for Data, access:

http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/index.jsp?toc=/com.ibm.itpc_data.doc/toc.xml

5.3.1 Creating groups of dataThe first step is to define tiers of storage where the current data is allocated. For example, we show some typical tiers to classify storage type, as shown in Table 5-1. Most enterprises have a subset of these tiers. You should consider only the tiers currently installed.

Table 5-1 Storage tiers

After identifying tiers per storage type, we can create groups for file systems considering the type and which tier of storage they are using.

Table 5-2 on page 68 shows an example of file system groups for a typical company that has Windows and UNIX systems. The entries in the Storage tier column map back to the storage tiers shown in Table 5-1.

Tier Storage type

T0 Local Direct Attached Storage (DAS)

T1 Storage Area Network (SAN) - Enterprise storage

T2 Storage Area Network (SAN) - Mid-range storage

T3 Low-cost archive storage

Chapter 5. An ILM quick assessment 67

http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/index.jsp?toc=/com.ibm.itpc_data.doc/toc.xml

Table 5-2 File system groups

More details about creating file system groups can be found in Chapter 2, “Monitoring,” of the Manual IBM TotalStorage Productivity Center for Data User’s Guide, GC32-1728.

After creating file system groups, the system administrators should assign each actual file system to one file system group only. All file systems being monitored should be assigned to a group. These file system groups will be used to collect data using TPC for Data.

Table 5-3 shows an example of assigning file systems to file system groups.

Table 5-3 Matching file systems to groups

File system group Server type File system type Storage tier

T0-Workstation Workstation All T0

T0-Windows-OS Windows OS T0

T0-Windows-App Windows Application T0

T0-Unix-OS UNIX OS T0

T0-Unix-App UNIX Application T0

T1-Windows Windows Any T1

T1-Unix UNIX Any T1


T2-Unix UNIX Any T2


T3-Unix UNIX Any T3

Computer File system File system group

AIX_Server1 /usr T0-Unix-OS

AIX_Server1 /db_datafilesa

a. Assuming /db_datafiles is created on enterprise storage.

T1-Unix

AIX_Server1 /products T0-Unix-App

Windows_Server1 C: T0-Windows-OS

Windows_Server1 E:b

b. Assuming E: is a drive of a mid-range storage.

T2-Windows

Why create file system groups? Creating file system groups enables TPC for Data reports to be generated at the file system groups level. It will facilitate the analysis of the usage of each tier. Remember that data classification is used to select candidate files of a storage tier to be moved to others. By creating reports of these file system groups, you can analyze each data type separately in a storage tier.


5.3.2 Collecting reports from TPC for DataFor a quick assessment, we propose some useful TPC for Data reports, divided into four classes:

� System Reports� File Reports� Access Load Reports� Database Reports

We will use these reports to collect information about servers, disks, files, users, databases, and file systems that will help to classify data and define the data value classes.

The System Reports in TPC for Data contains standard reports that are automatically generated for all the machines on the network that are being monitored. These pre-defined reports provide a quick and efficient view of the enterprise data. Data for these system reports is gathered during the last scan scheduled for each computer.

For more information about the scan process, see Chapter 2, “Monitoring,” in IBM TotalStorage Productivity Center for Data User’s Guide, GC32-1728.

File Reports are reports on the files found during the scan process. During the scan process, TPC for Data gathers a vast number of statistics on the files and a number of attributes about the files. These can be reported on in a variety of ways:

� Largest Files Reporting - Information about the largest files in the environment

� Most Obsolete Files Reporting - Information about the files in the environment that have not been accessed in the longest period of time

� Duplicate Files Reporting - Information about the files in the environment during a scan that have duplicate file names

� Orphan Files Reporting - Information about files owned by users that are no longer active in the environment

� File Size Distribution Reporting - Information about the distribution of file sizes across storage resources

� File Summary Reporting - Summary information about the files in the environment

� File Types Reporting - Information about the storage usage of different file types in the environment

The Access Load Reports monitor and report on the usage and growth of the consumption of storage. These reports can be viewed based on specific file systems and computers, groups of file systems and computers, or throughout the entire enterprise. Use these reports to:

� View which servers and file systems are experiencing the heaviest (or lightest) load of data and storage access.

� Identify wasted space by pinpointing files that are no longer needed or have not been accessed for the longest time.

The Database Reports provide both overview and detailed information about the storage used by tables in a Relational Database Management System (RDBMS), including Oracle, Sybase SQL Server, Microsoft SQL Server, and UDB/DB2. The reporting features are very powerful; you can select the instances, tablespaces, databases, devices, containers, data files, fragments, tables, control files, redo logs, archive log directories, and even users to report on.


These reports help project storage consumption for the future and maximize the current storage assets currently in place by eliminating wasted space and making the most of the space available.

Best TPC for Data system reports for ILMWe selected some System Reports that provide the most value for ILM, as our best practice system reports for an ILM assessment.

Access File SummaryThe Access File Summary report provides overview information for files used by computers, file systems, and other resources. Through this report you can view the historical number of files for each resource in the report. The historical chart can be generated to show daily, weekly, or monthly history.

This report will provide the following information:

� Total Size - Total size of the storage space consumed by the files on a network� File Count - Total number of files on a network� Directory Count - Total number of directories on a network� Avg File Size - Average storage space consumed by each of the files on a network� File system Capacity - Total storage capacity of the files on a network

Figure 5-2 is an example of this report, showing historical values and predicted trends for space used by files. This report helps with storage capacity planning.

Figure 5-2 Access File Summary report


Access Time SummaryThe Access Time Summary report provides a summary of the number of files in the environment and when they were last accessed (for example, created, modified, etc.) during the past day, the past week, over a year, etc.

This report provides the following information:

� Last Accessed <= 1 day - The number and total size of the files accessed within the last day

� Last Accessed 1 day — 1 week - The number and total size of the files accessed between 1 day and 1 week ago

� Last Accessed 1 week — 1 month - The number and total size of the files accessed between 1 week and 1 month ago

� Last Accessed 1 month — 2 months - The number and total size the files accessed between 1 month and 2 months ago

� Last Accessed 2 months — 3 months - The number and total size of the files accessed between 2–3 months ago



� Last Accessed 9 months — 1 year - The number and total size of the files accessed between 9 months and 1 year ago

� Last Accessed > 1 year - The number and total size of the files accessed over a year ago

� Total Count - Total number of files across a network

� Total Size - Total size of the space consumed by the files on a network

� Average Age - Average age of files on a network measured by days, hours, minutes, and seconds

Figure 5-3 on page 72 shows an example of this report.


Figure 5-3 Access Time Summary report

Disk Capacity Summary The Disk Capacity Summary report reports and chart disk capacity, per disk, per computer, per cluster, per computer group, per domain, or for the whole network.


� Capacity - Total storage capacity of the disks on the computers within a network

� File system Used Space - Amount of used storage space on the file systems within a network

� File system Free Space - Amount of unused storage space on the file systems within a network

� Raw Volume Space - Space on host-side logical volumes that is not occupied by file systems

� Overhead - Total RAID/Mirror redundancy (For example, two 1 GB disks mirrored together would have an overhead of 1 GB.)

� Unallocated Space - Space assigned to a (monitored) host that is not part of any logical volume.

� Unknown LUN Capacity - LUN capacity of unknown usage



Figure 5-4 Disk Capacity Summary report

Oldest Orphaned FilesThe Oldest Orphaned Files report provides information about files that have the oldest creation date with owners who are no longer registered as users on the computer/network.


� Access Time - Date and time when an orphaned file was last accessed.

� Computer - Name of the computer where an orphaned file is stored.

� File system - File system where an orphaned file is stored.

� Path - Full path to the location of an orphaned file.

� Owner - Operating system internal ID of the user who owned the orphaned file. This is the internal ID the operating system uses to identify the user, and not the user ID.

� OS User Group - OS user group to which the owner of an orphaned file belongs.

� Physical Size - Physical size of an orphaned file (measured in kilobytes, megabytes, or gigabytes).

� Modification Time - Date and time when an orphaned file was last modified.

� Create Time - Data and time when an orphaned file was created.



Figure 5-5 Oldest Orphaned Files report

Storage Access Times The Storage Access Times report indicates when files were last accessed.


� Computer - Name of a computer against which the report was run

� Last Accessed <= 1 day Count, Total Size - Number of files that were accessed within the last 24 hours and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes)

� Last Accessed 1 day - 1 week Count, Total Size - Number of files that were accessed between 1 day to 1 week previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes)

� Last Accessed 1 week - 1 month Count, Total Size - Number of files that were accessed between 1 week to 1 month previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes)

� Last Accessed 1 month - 1 year Count, Total Size - Number of files that were accessed between 1 month to 1 year previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes)

� Last Accessed > 1 year Count, Total Size - Number of files that were accessed over one year previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes)

� Total Count - Total of all the counts

� Total Size - Total of all the sizes

� Average Age - Average time since the file was last accessed


Figure 5-6 shows an example of this report.

Figure 5-6 Storage Access Times report

Storage Capacity The Storage Capacity report provides storage capacity information about each monitored system.


� Computer - Name of a computer against which the report was run

� Capacity - Total storage capacity for a computer

� Unallocated Space - Amount of unused storage space on a computer (not in file systems seen by this operating system)

� OS Type - Operating system running on a computer

� Network Address - Network address of a computer

� IP Address - IP address of a computer

� Time Zone - Time zone in which a computer is running



Figure 5-7 Storage Capacity report

Storage Modification Times The Storage Modification Times report provides information about files within the network that were modified:

� Within the last 24 hours � Between 24 hours and one week previous � Between one week to one month previous � Between one month to one year previous � More than one year previous


� Last Modified <= 1 day Count, Total Size - Number of files that were modified in the last 24 hours and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes)

� Last Modified 1 day - 1 week Count, Total Size - Number of files that were modified between 1 day to 1 week previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes)

� Last Modified 1 week - 1 month Count, Total Size - Number of files that were modified between 1 week to 1 month previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes)

� Last Modified 1 month - 1 year Count, Total Size - Number of files that were modified between 1 month to 1 year previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes)

� Last Modified > 1 year Count, Total Size - Number of files that were modified over a year previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes)


� Total Count - Sum of the files

� Total Size - Sum of the size of the files

� Average Age - Average age since files were modified


Figure 5-8 Storage Modification Times report

Total Freespace The Total Freespace report shows the total amount of unused storage across a network.


� Free Space - Total amount of available storage space available on a network� Percent Free Space - Percentage of total space that is unused on a network� Used Space - Amount of used storage space on a network� Capacity - Total amount (capacity) of storage space on a network� File Count - Total number of files on a network� Directory Count - Total number of directories on a network� Percent Free Inodes - Percent of free Inodes on a network� Used Inodes - Number of used Inodes on a network� Free Inodes - Number of free Inodes on a network



Figure 5-9 Total Freespace report

User Space Usage The User Space Usage report provides information about storage statistics related to a specific user.


� User Name - ID of a user

� Total Size - Total amount of space used by a user

� File Count - Number of files owned/created by a user

� Directory Count - Number of directories owned/created by a user

� Largest File - Largest file owned by a user

� 2nd Largest File - Second-largest file owned by a user

� File Size <1KB Count, Total Size - Number and total space usage of files under 1 KB in size

� File Size 1KB - 10KB Count, Total Size - Number and total space usage of files between 1 KB and 10 KB in size

� File Size 10KB - 100KB Count, Total Size - Number and total space usage of files between 10 KB and 100 KB in size

� File Size 100KB - 1MB Count, Total Size - Number and total space usage of files between 100 KB and 1 MB in size

� File Size 1MB - 10MB Count, Total Size - Number and total space usage of files between 1 MB and 10 MB in size




� File Size > 500MB Count, Total Size - Number and total space usage of files over 500 MB in size


Figure 5-10 User Space Usage report

Wasted SpaceThe Wasted Space report provides information about storage statistics on non-OS files not accessed in the last year and orphan files.


� Computer - Name of the computer that contains wasted space� Total Size - Total amount of space used by the obsolete and orphan files� File Count - Total number of obsolete and orphan files� Directory Count - Total number of orphan directories� Avg File Size - Average size of obsolete and orphan files

Figure 5-11 on page 80 shows an example of this report generated with the following conditions: (ATTRIBUTES include any of (ORPHANED) OR (NAME matches none of ('?:\WINNT\system*\%', '/usr/lib/%', '/usr/bin/%', '/sbin/%', '/usr/sbin/%', '/etc/%') AND LAST ACCESSED earlier than 365 days 06:00 ago)).


Figure 5-11 Wasted Space report

Best TPC for Data File Reports for ILMWe selected some File Reports that provide most value for ILM, as our best practice file reports for an ILM assessment.

All of them can be used to report by file system group level. Use the groups created in 5.3.1, “Creating groups of data” on page 67, to generate the following reports.

Largest Files reportThe Largest Files report generates reports of detailed information about the largest files found in the environment. The report can be viewed by directory, directory group, file system, file system group, cluster, computer, computer group, domain, and for the entire network.

The default largest files profile is set to collect the 20 largest files per file system. This value may be increased if required.



Figure 5-12 Largest Files report

Duplicate Files reportThe Duplicate Files report generates reports on duplicate files found during a scan. This data can be analyzed to identify files that might no longer be needed and could be wasting storage space. The report can be viewed by directory, directory group, file system, file system group, cluster, computer, computer group, domain, and for the entire network.



Figure 5-13 Duplicate Files report

File Types reportThe File Types report shows data organized by the file types found during a scan. TPC for Data will collect storage usage information about such file types as .exe, .zip, .sys, .pdf, .doc, .dll, .wav, .mp3, .avi, etc. Each of these file types is represented by its own row in the generated reports. Use these reports to:

� Relate the space used by applications to the total capacity and used space. For example, view the total amount of storage consumed by Acrobat files and Lotus Notes mail databases.

� View which applications are consuming the most space across a given set of storage resources.

� View the total amount of storage consumed by different types of data, like non-business and temporary files. Details about non-business files, temporary files, and other types of data are in 5.4.1, “Types of data” on page 90.



Figure 5-14 File Types Report

Best TPC for Data access load reports for ILMWe selected some access load reports that provide most value for ILM, as our best practice access load reports for an ILM assessment.

They can be used to report by file system group level. Use the groups created in 5.3.1, “Creating groups of data” on page 67, to generate the following reports.

Access Time reportThe Access Time report shows the amount of data and the number of files that have been accessed during the last day, week, month, year, and more. The information can be viewed at the directory level, file system level, computer level, domain level, and for the entire network.

This report will help to identify which files are candidates to be migrated to another storage tier by identifying files which are infrequently accessed but are currently stored on high-cost storage (or the opposite).



Figure 5-15 Access Time report

Modification Time reportThe Modification Time report shows the amount of data and the number of files that have been modified during the last day,week, month, year, and beyond. This information can be viewed at the directory level, file system level, computer level, domain level, and for the entire network.

This report will help to identify which files are candidates to be migrated to another storage tier by showing mismatches between modification times and the cost of the storage used.



Figure 5-16 Modification Time Reporting

Best TPC for Data database reports for ILMWe selected some database reports that provide the most value for ILM as our best practice database reports for an ILM assessment.

Database Storage By ComputerThe Database Storage By Computer report shows information about the databases in the environment sorted by the computer or computers on which they are stored.


� Computer - Name of the computer where the databases are located� Total Size - Amount of space consumed by the databases on the computers� Data File Capacity - Storage capacity of the data files within the databases� Data File Free space - Amount of free space available in the database’s data files � Tablespace Count - Number of tablespaces associated with the databases� Data File Count - Number of data files associated with the tablespaces in the databases� Log File Count - Number of log files associated with the databases

The Figure 5-17 on page 86 shows an example of a Database Storage By Computer report with size distribution among the databases.


Figure 5-17 Database Storage by Computer report

This report can also be viewed in table layout. Check the free space for each database, as shown in Figure 5-18 on page 87.


Figure 5-18 Database Storage by Computer report table

Total Database Freespace or DMS Container Freespace reportThe Total Database Freespace report provides information about the total free space for data files at a network-wide level, as well as the percentage of free space, the total used space, number of free extents, and number of data files.


� Free Space - Amount of free space on all data files in the network� Percent Free - Percent of free space available on the data files in the network� Used Space - Amount of used space on the data files in the network� Total Size - Total size of data files in the network� Free Extents - Number of free extents on the data files in the network� Coalesced Extents - Number of coalesced extents on the databases in the network� Number Data Files - Number of data files on the databases in the network

The Total DMS Container Freespace report shows the total free space for the containers associated with DMS tablespaces on UDB Instances within the environment.


� Free Space - Amount of free space available on the DMS containers within a network

� Percent Free - Percentage of free space available on the DMS containers within a network

� Used Space - Amount of storage space consumed on the DMS containers within a network

� Total Size - Total amount of space on the DMS containers within a network

� Number of Containers - Number of DMS containers within a network



Figure 5-19 Total Database Free report

Segments with Wasted Space The Segments with Wasted Space report provides information about the Oracle segments containing allocated space that is currently empty/not being used. This report can help to discover space that can be reclaimed and allocated to other objects. This report is available for Oracle databases only.


� Empty Used Space - Amount of empty used space within a segment (table, index, etc.)� Segment Creator - Owner of a segment� Segment Name - Name of a segment� Computer - Name of the computer on which the segment’s Instance resides� Instance - SID of an Oracle Instance� Database - Name of the database to which the segment belongs� Tablespace - Name of the tablespace to which the segment belongs� Partition - Partition on which the segment is stored� Segment Type - Type of the segment, including:

– Table – Table partition– Table subpartition– Nested table– Cluster – Index – Index partition – Index subpartition – Lobindex


– Lobsegment – Lob partition – Lob subpartition

� Parent Type - Subset of the segment type� Parent Creator - Owner of a segment� Parent Name - Name of a segment� Total Size - Amount of space allocated to a segment� Number of Extents - Number of extents allocated to a segment� Freelist Blocks - Number of blocks on the freelist chain� Initial Extent - Size of the first extent allocated to a segment� Next Extent - Amount of space Oracle will retrieve when allocating another extent� Maximum Extents - Maximum number of extents that Oracle would allocate to an object� Percent Increase - Percent increase is size Oracle will allocate for the next extent


Figure 5-20 Segments with Wasted Space report

5.4 Classifying data and analyzing reportsClassifying data is important because organizations should not pay, store, or protect data that is not used or that is not critical to business. After classifying data, they will be able to reclaim storage space taking actions like deleting, moving, and archiving data.

The following list contains the reports we selected in 5.3.2, “Collecting reports from TPC for Data” on page 69. The best system reports are:

� Access File Summary


� Access Time Summary� Disk Capacity Summary� Oldest Orphaned Files� Storage Access Times� Storage Capacity� Storage Modification Times� Total Freespace� User Space Usage� Wasted Space

The best database reports are:

� Access File Summary� Total Database Freespace� Segments with Wasted Space

These system reports and database reports provide information about storage infrastructure, occupancy, storage allocation, and usage of a single computer, database, or entire network that make easier to understand the storage environment currently in use.

The best file reports are:

� Largest Files Reporting� Duplicated Files Reporting� File Types Reporting

The best access load reports are:

� Access Time Reporting� Modification Time Reporting

The next step is to classify the data, understand how to use each report above, separate all types of files, and view the amount consumed by each of them. The next section describes an overview of each type of data.

5.4.1 Types of dataThe business value of files typically changes over time. Some files have no value at all, while some are only of temporary value. This section describes different types of files that should be identified in a storage environment. The objective of classifying data is to reclaim storage space and find file candidates for moving to lower-cost storage or delete data without business value.

Non-business filesThese are files that do not belong to any business application. One approach to identifying these files may be to use their extension, for example, an mp3 may represent a personal music file and be a non-business file. However, the file types that are business-related are often industry specific; for a media company, .mp3 files may well be business-critical data. Therefore, organizations should individually define what data are related to their business. We discuss approaches to this in Chapter 7, “ILM initial implementation” on page 119.

Duplicate filesThese might be copies of the same file that are created in different locations to share data among different applications, or files that are duplicated by users typically in fileservers. Duplicate files are usually identified by the file name and size.


Temporary filesThese are files that are created and, after being used, should be deleted. If not managed, these files can use storage space needed for critical data. These files are mostly identified by their extension, for example, .zip, .bkp, .old, and might also be older log files, dump files, etc.

Stale filesFiles that belong to users that no longer exist (also known as orphaned files) and files that have not been accessed in a period of time or have an access rate below a certain threshold. These are mostly identified by their last access time or modification time.

Valid filesThese are all remaining files that are related to business. Critical data, application data, and all files with value to a business are considered valid data. This data should be protected and allocated to high-cost or low-cost storage according to its value.

5.4.2 Data classificationThis section describes steps to classify and view the amount of storage used by different types of data. The steps are:

1. Reporting non-business files2. Reporting duplicate files3. Reporting temporary files4. Classifying valid data and verifying stale files to be migrated to other tiers5. Reporting database unused space

Reporting non-business filesAs mentioned before, organizations and system administrators should define which data has no value to business. A quick way to find non-business files is to search for file extensions defined as non-business, and then investigate these files to see how much storage space is being consumed.

Use the “File Types report” on page 82 to report storage space occupied by file extensions defined as non-business.

This report can be viewed by the filespace group. Using the groups defined in 5.3.1, “Creating groups of data” on page 67, you can evaluate how much space these files are using in each storage tier.

Reporting duplicate filesDuplicate files are usually created by fileserver users or intentionally created by administrators to share files among different applications. These files may be located in the same or different servers.

The quickest way to find duplicated files is to search for files with the same name, date, and size reported by “Duplicate Files report” on page 81. This report analyzes metadata information only. File content is not checked by TPC for Data, and duplicated files with different names, for example, will not be identified.

This report can also be viewed by the file system group previously defined. Then the space that these files are using in each storage tier can be easily viewed, too.


Reporting temporary filesTemporary files are files that no longer have value for any application. These files can usually be deleted. The quickest way to find temporary files is to search for file extensions most common for temporary files, for example, .tmp, .bak, .log, .dmp, .txt, .zip, .bkp, .old, core.

Use the “File Types report” on page 82 to report storage space occupied by file extensions most commonly found in temporary files.

This report can also be viewed by file system groups previously defined. By using these groups you can evaluate how much space these files are using in each storage tier.

Classify valid data and verify stale filesAfter viewing invalid data (non-business, temporary, duplicated files), the next step is to analyze the valid data and check if files are allocated in the correct storage tier. The following reports help to evaluate the storage usage and search for stale data.

Last access date reportCreate a report showing the amount of storage accessed during different periods of time for each file system group or storage tiers.

First create a report file system group. Include the file system groups created in 5.3.1, “Creating groups of data” on page 67, of the same tier. Table 5-4 shows an example.

Table 5-4 Reporting file system groups

For More details about reporting on file system groups on TPC for Data, see Chapter 5, “Reporting,” in IBM TotalStorage Productivity Center for Data User’s Guide, GC32-1728.

After creating report groups, use the report “Access Time report” on page 83 to show the amount of storage last accessed:

� Between the last 3 to last 6 months� Between the last 6 to last 9 months� Between the last 9 to last 12 months� Before the last 1 year

Generate this report by the report groups created in Table 5-4.

Figure 5-21 on page 93 shows an example of this report generated by report groups, where the amount of storage for each tier can be evaluated.

Reporting file system group Monitoring file system groups

T1-Total T1-Windows, T1-Unix

T2-Total T2-Windows, T2-Unix


Figure 5-21 Access Time Reporting by report group

Evaluate the amount of storage accessed during these periods of time. If there are file systems in one or more periods, they should be good candidates for migration to a lower-cost storage tier, as shown in Figure 5-22 on page 94.


Figure 5-22 Access Time reporting file systems

Last modification date reportCreate a report showing the amount of storage modified during different periods of time for each file system group or storage tiers.

Use the report described in “Modification Time report” on page 84 to show the amount of storage last modified:

� Between the last 3 to last 6 months� Between the last 6 to last 9 months� Between the last 9 to last 12 months� Before the last 1 year

Generate this report using the report groups created in Table 5-4 on page 92.

Figure 5-23 on page 95 shows an example of this report generated by report groups. Use this report to evaluate the amount of storage per tier.


Figure 5-23 Modification Time Reporting by report group

Evaluate the amount of storage modified during these periods of time. If there are file systems in one or more periods, they could be good candidates for migration to a lower-cost storage tier, as shown in Figure 5-24 on page 96.


Figure 5-24 Modification Time reporting file systems

Largest files reportCreate a report showing the largest files for each file system group or storage tier.

Use the report described in “Largest files report” on page 96 to show the largest files for each file system, file system group, or tier (report group), and evaluate whether they are allocated in the appropriate storage tier. You can define filters to this reports to exclude file paths or file names that match to valid large files. For example, exclude all files in /oracle or .dbf, since these correspond to database files. Generate this report using the report groups shown in Table 5-4 on page 92.

For more details about creating filters for reports on TPC for Data, see Chapter 5, “Reporting,” in IBM TotalStorage Productivity Center for Data User’s Guide, GC32-1728.

Reporting database unused spaceDatabases are typically large consumers of storage space and a complete ILM assessment should analyze these data files to report the amount of storage used and unused by them.

Use the report described in “Total Database Freespace or DMS Container Freespace report” on page 87 to show the amount of free space remaining in the datafiles. Utilization of 60 percent or less in database files is common, meaning the remaining space cannot be used by other files. This report checks the data files to show how much space is free.

Another useful report specifically for Oracle databases is described in “Segments with Wasted Space” on page 88. Use this report to show the unused segments and to check how much space is wasting storage resources.


5.5 Defining actions with classified dataThe quick assessment goal is to collect, as quickly as possible, information about storage usage to help administrators view the current environment status, and help them take actions to reclaim storage space and improve storage capacity and availability.

These actions are important decisions and should be taken considering the information provided by data classification reports, administrators’ knowledge about their systems, and best practices and recommendations of product vendors.

The following sections provide general suggestions to help manage the space used by each type of data.

For more details and proposed solutions for ILM and data classification, see Chapter 7, “ILM initial implementation” on page 119.

5.5.1 Actions for non-business filesThe section “Reporting non-business files” on page 91 shows TPC for Data-generated reports with the file types selected as non-business and the amount of storage they are using.

Ultimately, the best choice might be to simply delete these files, considering that enterprise storage space should not be used for non-business files. However, at this stage of the assessment, it is important to know how much space is being used by these files. If reports show large amounts of storage occupied by non-business files, an administrator should further investigate why these files are being created, and take appropriate actions to stem this growth, for example, implementing quotas for users’ personal directories or issuing guidelines on storage of non-business files by users.

5.5.2 Actions for duplicate filesIn “Reporting duplicate files” on page 91, TPC for Data generated reports for duplicate files and the amount of storage they are using.

Possible solutions for duplicate files are:

� Share files on network drives so that only one copy is accessed by all users.� Create symbolic links to allow applications to share files.� Delete files for unnecessary duplicated files.

Sharing duplicate files means reorganizing files to be located in a shared place where all applications can use them. Sharing file solutions can be a simple shared folder, Network Attached Storage (NAS).

Creating symbolic links is a useful and fast solution for duplicate files in the same UNIX system. They can be easily created between versions of duplicate files or directories, saving storage space and preserving access paths needed by applications.

Deleting duplicated files may be the best option, but this should be done carefully to ensure that users and applications still have access to their files.

5.5.3 Actions for temporary filesIn “Reporting temporary files” on page 92, TPC for Data generated reports on temporary files and the amount of storage they are using.


Often temporary files can be deleted on a cyclical basis, for example, files in temporary directories that are older than 90 days (or whatever policy you want to set), or archiving older log files. It is important to monitor the amount of storage space used for temporary files, and check if the growth rate is constant or variable. It should be proportional to used storage growth rate.

5.5.4 Actions for stale filesIn “Classify valid data and verify stale files” on page 92, TPC for Data generated reports with the amount of storage last accessed and last modified in different periods of time.

Analyzing this report, we can select some file systems to be migrated to a lower-cost tier storage. In the first report, described in “Last access date report” on page 92, files last accessed between the past 3 and 9 months are good candidates to be evaluated and migrated to a lower tier. Files last accessed over the past 9 months may be considered candidates for migration to the lowest-cost storage or archived to a sequential media (for example, optical or tape).

In the second report, described in “Last modification date report” on page 94, files last modified between the last 6 to 9 months may be considered candidates for migration to a lower-cost storage. But a further study should be done to check which storage they are using and which application is using them. If they are located in high-performance-cache-write disks, then evaluate whether storage with this characteristic is still needed for these files.

5.5.5 Actions to RDBMSs spaceIn “Reporting database unused space” on page 96, TPC for Data generated reports with the amount of storage used by RDBMS databases, datafiles free space and segments wasting storage.

Check the storage space unused by databases and the amount of free space remaining in datafiles and report them to database administrators. This space can be used to satisfy future needs and growth. When more space is needed, expand the database in the unused space instead of allocating more disks. If data files have free space, reclaim it with procedures to reduce the databse size. For example, check with database administrators if it is possible to export and import the database to reduce the storage space.

5.6 ILM - Return on investment (ROI)We should expect an ILM project to generate a return on investment; but how can this be calculated? ILM is a service level solution, and an ROI calculation should consider savings in many areas of investment—hardware, software, services, and support costs, among others. The main ROI factors focus on improvements in application availability, storage utilization, and personnel productivity. Some or all of the main elements of ILM can be implemented to meet the data management needs:

� Tiered storage management � Long-term data retention� Data lifecycle management� Policy-based archive management

Organizations should invest in information management solutions to be able to grow storage, and back up and move data without impacting access. They should have a centralized control for backup, restore, copy services, and a common pool for unallocated storage. They need


detailed knowledge of storage device utilization to be easier to plan storage capacity and growth.

We describe in the next sections some facts to improve storage return on investment.

5.6.1 Data classification and storage cost

After reclaiming space by deleting invalid files and extra versions of duplicated files, the ROI is demonstrated with:

� Additional space on existing storage using the delta between the actual and future utilization of existing storage.

� Reduce the utilization of actual storage, and thus reduce the amount of disk storage space needed to be purchased.

� Data on storage with the appropriate quality of service and cost. Enterprise storage resources are used for critical data only and their reclaimed space is used for critical data growth or new critical applications. Recoverability, availability, and accessability of enterprise storage data are alll improved, because invalid data is not consuming critical storage tier space.

5.6.2 Data management and personnel costAn important element for calculating ROI in an ILM solution relates to application outages, in particular, how much of these outages are due to the problems caused by the difficulty of managing storage. The cost of the outages is measured in terms of revenue, profit, or savings lost due to downtime. In many organizations, between 10 to 20 percent of outages are related to problems with storage management, for example, application outages caused by no free space.

With ILM techniques, manual effort is reduced for functions such as:

� Gathering the storage-used information and analyzing it� Planning and implementing new applications that require significant amounts of storage� Moving and migrating data� Planning for storage growth� Going server to server to individually manage storage

The reduction of manual effort can be done by investing in:

� Better tools that make it easier to create new storage for new applications or make it automatically.

� Ability to move and migrate data nondisruptively, virtualizing storage and defining policies to automatically migrate data among storage tiers.

� Manage server data from a centralized point.

� Centralize access to storage.

These techniques will strongly improve return on investment, reducing storage personnel cost and application and storage downtime.

5.6.3 Long-term retention and non-compliancy penalties costThe first step in a company’s compliance efforts should be to assess the effectiveness of its current internal controls and information management processes. Identification of risks and controls and evaluation of the effectiveness of these controls are important processes for organizations that need to be compliant to regulations. Some non-compliancy penalties


should be considered when investing in long-term retention solutions. For example, Sarbanes-Oxley(SOX), which requires CEOs and CFOs to personally certify quarterly and annual financial statements, can bring fines up to $5 million or 20 years in prison if violated. Other specific industries, like healthcare and life sciences, insurance, banking, etc., have compliance regulations and penalties costs that are important values when considering ROI on ILM long-term solutions.

In defining the technical, business, and regulatory requirements for archiving space, it is critical to consider a solution's key features and whether those features meet the organization's archiving and compliance needs. ILM techniques for retention, driven by storage costs and return on investment, are a major focal point in approaching a solution of this nature. The impact of the solution with respect to an overall enterprise content management initiative, as well as support for a greater storage management and compliance infrastructure, is key to defining an ILM project's success.

5.6.4 Backup/archiving solutions cost - Disk or tapeTape solutions have been used as the main backup and archive storage media for many years. But with the need to recover critical data quickly and make the data available in seconds after a disaster, disk solutions have also become an important media type for fast recovery.

The emergence of Serial ATA (SATA) technology and disk cost reduction are contributing to make disk another media solution for backup and archive storage.

A disk-to-disk backup process can be done to make copies instantly. Usually called snapshot copy, point-in-time data is copied from one disk to another disk and can be restored rapidly.

For instant disaster recovery, disk-to-disk remote mirroring solutions make data available in seconds after disk in the primary site is lost. A target disk located in the secondary site is continuously updated with primary site disk change.

Disk-to-disk backup has many benefits, but cannot replace tape completely. Tape has several unique benefits that should be considered when investing in backup and archive storage, for example:

� For disaster recovery environments, tape is removable for offsite storage and it is inexpensive.

� For backup requirements where several versions are needed. Disk-to-disk backup can be used for the current data backup version, but the cost and capacity of disk makes it less viable for maintaining multiple (older) backup versions. Tape is very useful in this instance to store many versions and provide historical point-in-time recovery.

� For scalability and growth. The capacity needed to back up data is growing at the same rate as data volume increases. Disk growth needs more controllers, floorspace, and software. Tape is much easy to scale by simply adding more tape cartridges.

� For long-term retention. Regulations and compliance require lengthy retention periods (years) for data that may not be accessed for a long period of time. Using disks to store all the compliance data will be very costly. Tapes are the most cost-effective storage for long-term archival requirements.

When investing in backup or archive storage, consider not only the price per gigabyte of storage, but also:

� Disk raw capacity and average utilization� Tape average utilization� Hardware configuration cost


� Controllers and software for servers� Environmental costs

Typically, including all the information above to calculate the entire cost will show that disk solutions can be up to 11 times more expensive than tape solutions.

Therefore, the best choice between disk or tape solutions for backup and archive depends on access, amount of data, and recovery time objectives. If the business states that critical data must be restored in seconds, disk solutions are the best choice. For fast restore of some data and lower cost storage, tape is the best solution.

5.7 ILM Services offerings from IBMYou can engage IBM to help perform the ILM assessment in your enterprise. Among the offerings is the 4–6 week ILM Assessment study that addresses the following:

� Introduction to ILM concepts, solutions, tool, products, and services � High-level review of storage issues and challenges� Prioritized recommendations, benefits, financial analysis, and next steps

You may then choose to have a deeper analysis of the storage environment to help you answer questions about the following:

� Development of an ILM strategy, roadmap, or implementation plan� Data classification review and recommendations� ROI and TCO analysis for an ILM solution� Regulatory compliance analysis and guidelines

For more information about these and other service offerings, please consult your IBM representative.



Chapter 6. The big picture for an ILM implementation framework

After reading the previous chapters you now know the different business drivers to implement ILM, the key technology enablers, and the different benefits. You know that ILM requires a change in thinking to leave the traditional function-based storage model and evolve to a service-based storage model.

In this chapter you will learn about:

� The big picture and why you should care about it� Some entry points to ILM, whether ILM is for you, why there is not one packaged solution� Why ILM is more then tiered storage� Why ILM is not the final step to the ultimate storage model� How the different elements of ILM fit together

6


6.1 The big picture and why you should care about itThere has been much discussion around what ILM is. The best way to understand it is to know where the ILM best practices fit in your IT environment. We give you an example of such an IT infrastructure, as shown in the “big picture” in Figure 6-1 on page 105. This figure represents a sample environment only. There are four main blocks represented by the large boxes:

� Business, Consulting, Assessment, Definition� Application, Server Hardware� Software Infrastructure, Automation� Hardware Infrastructure

We will show you on the following pages what components are part of each group and how they interact. This will enable you to recognize where your current software and hardware fits. This also gives you a starting point to implement ILM best practices, and to logically place them in a new framework.

Finally, you see in Figure 6-1 on page 105 on the left side a description of the major components. On the right side you see the orchestration tools.


Figure 6-1 ILM implementation framework at service level maturity

6.1.1 Business consulting, assessment, definitionThe aim of ILM best practices is to align information to business requirements through management policies and service levels associated with applications, metadata, and data. If your organization is large, has complex IT environments and distributed storage solutions, the initial assessment can be quite challenging. IBM can work with you to define this baseline or you can do it yourself. We provide some guidelines for this assessment in the following chapters. During the assessment, you will identify the value, lifecycle, and classification of information for each business unit. This collaboration requires not only input from IT service management, but also from the business processes and business applications’ owners. This

Software Infrastructure / mechanisms toward Automation

Output are Service Level :

Consulting, Assessment, Implementation and ongoing tasks

IBM DB2 Content Manager

Metadata for OnDemand, Record Management Teamwide Collaboration, Content, VideoCharger, CommonStore

Enterprise Storage @ services based model level

TSM (Tivoli Storage Manager)

HSM (scheduled) T4

Pools

TPC (Tivoli Productivity Center)

for data : Monitor, Reports

for fabric: Manage SAN

T1Platinum

T3Silver

T4BronzeT2

Gold

VirtualDisk

VirtualDisk

SAN Volume Controller (SVC)

VirtualDisk

EnterpriseDisk

VirtualDisk

Compliance

WORMTape

Storage Environment Backup Environment Archive

Backup

Archive

SAN ( Block Level )

DS8000, DS6000, ESS& non IBM Products

Enterprise

Tape WORMTape

VirtualDisk

Tier implementation

Storage Hardware Infrastructure

SM

I-S

Gold

File / NAS

Apps

T2

… …DB Mail

Platinum

WEB

T1 T2

T4

TPC Agent

Content Manager Agent

Subsystem Device Driver (SDD) Multipath

TSM Agent

Server / Multiple OS

TSM & HSMTSM for DR

staged

staticstatic

staged

staged

Orchestration

CM System Administrator Client

TSM & HSM (GUI)

TPM (GUI)

TPC (GUI)

WORM Disk

Virtualized Filesystem (Global Namespace)

TSM for DR

TPM (Tivoli Provisioning Manager)

Platinum Gold Silver Bronze

for disk: Manage Disk

for replication: Manage HWCopyServices

SM

I-S

Silver

T3 T3

DS4000 (Fiber)& non IBM Products

Mid Range Cost centricDS4000 (SATA)

& non IBM Products

VirtualDisk

VirtualDiskVirtual

DiskonlineMigrate

DS4000 (SATA) / Enterprsie Tape Library& non IBM Products

Backup ArchiveDR550

& non IBM Products

VaultPools

Server / Application Infrastructure

LAN ( File Level )

Chapter 6. The big picture for an ILM implementation framework 105

collaboration is very time intensive, but crucial for the rest of the ILM implementation. Depending on the size of your organization, this baseline assessment may take weeks or even months. IBM services can help you shorten this time by giving your organization a common context and frame of reference for discussing ILM within your organization. We recommend using a Storage Resource Management (SRM) tool (such as IBM TotalStorage Productivity Center for Data) to identify the information assets and infrastructure resources and services. An overview of this was given in Chapter 4, “Product overview” on page 37. Chapter 5, “An ILM quick assessment” on page 65, gives you a selection of reports we used to identify data.

After a first step in ILM implementation, you need a tool like TPC for Data to monitor the changes. You want to compare the monitored data to your new service levels to see if you met the expected results and benefits. You may want to refine the service levels you defined. The next step would be an enforcement of those policies. Although this task will still be manual work in the begining of your ILM implementation, your policy-based service enables you now to automate this tasks with ILM management tools.

Figure 6-2 summarizes the steps in this part of the implementation.

Figure 6-2 Business, assessment, and ongoing tasks

6.1.2 Application and server hardwareApplications or physical servers are often used to define service levels and tiers.

The choice of operating system may dictate the tools available. Using a single operating system or a small number of platforms simplifies the communication to each ILM management tool. This is because, for key monitored and ILM-operated servers using IBM software, an agent is required. This agent is available for most major operating system platforms. For specific details see the IBM support site for each product. Organizations with

valid data

Apps

Service Level Gatherings Data Classification

Finding Information Classes

Designing Storage Tiers

Deciding ILM Policies

Implement

Monitor

Assessment and Planning

Adding business value to data

Enforce

SL / Target

Consulting, Assessment, Implementation and ongoing tasks

11OS


many different operating systems see a great benefit in having only one type of agent across the different operating systems. In our big picture we used these four agents, as shown in Figure 6-3:

� Tivoli Storage Manager client

The Tivoli Storage Manager client is the Backup Agent. It is used to communicate to a Tivoli Storage Manager Server for backup and archives, and can transfer data over either the LAN or the SAN. The Tivoli Storage Manager client optionally includes the Hierarchical Storage Management (HSM) Agent.

� TPC Agent

The Totalstorage Productivity Center (TPC) Agent is used to communicate with the TPC Server components. It is used to monitor the file systems from a server point of view.

� CM Agent

The Content Management (CM) agent communicates via LAN to the CM Server. This agent is used to check the policies and can be used to execute scripts on policy violation.

� SDD Driver

A common problem in complex storage configurations is the requirement to install and maintain multiple device drivers, to handle hardware from different vendors. With a virtualized solution, like IBM SAN Volume Controller (SVC), you need only one disk multipathing driver, while supporting hardware from many storage vendors.

Figure 6-3 Server types and agents

Figure 6-3 also shows a sample tiered storage setup, which is mapped to the storage classes. So in this setup, the Web, database, and mail servers have been determined to require a platinum class of storage, and within that class are two storage tiers, T1 and T2. In the gold class, T2 and T3 are used, while in the silver class, T3 is used, plus an additional tape-based tier, T4.

6.1.3 Software infrastructure and automationILM best practices require some software tools to be standardized in your enterprise.

Figure 6-4 on page 109 shows the major software components of a full ILM implementation. These are:

� Backup/Restore/Archive/Tape Library Management/HSM tool

G old

File / NAS

Apps

T2

… …DB Mail

P latinum

W EB

T1 T2

T4

TPC Agent

Content M anager Agent

Subsystem Device Driver (SDD) M ultipath

TSM Agent

Server / M ultiple OS

T SM & H SMT SM for D R

staged

staticstatic

S ilver

T3 T3

Server / Application Infrastructure


� Storage Resource Management (SRM) tool� Provisioning Manager� Content Manager� Data mover to automate the enforcement of policies

Each software component is described in more detail in Chapter 4, “Product overview” on page 37. We suggest first looking at what you already have; nearly every client has one or more automated backup products. These should be standardized on one comprehensive product if possible. Many enterprise backup software solutions provide a wide range of operating system support, which is key for a centralized solution. The backup solution should have a script and/or API to provide automation, and also allow automatic deletion of data (compliance, or non-business) after it has been archived.

The SRM tool is used during the assessment phase to collect data on how much data you have. In the post-implementation phase, the SRM tool (for example, TPC for Data) allows you to monitor your service levels. In this case you define reports according to your defined policies to represent the service levels you want to meet. A next step would be the automation of data placement. This requires, among other things, a provisioning manager tool (for example, Tivoli Prosivioning Manager).

Hierarchical Storage Manager (HSM) is the tool used to dynamically move data to less expensive storage tiers. According to your defined SLAs, which are represented as policies, you use HSM to analyze and move infrequently accessed data to the lower tier, leaving the sub-file behind. If any application or user needs to access the file, the HSM agent will automatically recall the file from storage.

Most backup/HSM and SRM tools (including Tivoli Storage Manager and TPC for Data) use metadata such as the creation/modify/accesss dates, file name, and file type for operation. For more sophisticated file management criteria, you need to implement a content management solution; see Chapter 9, “Data lifecycle and content management solution” on page 165, for more information.

If data retention is needed for compliance solutions, you can use an appliance-based solution, such as the IBM System Storage DR550, or a more general purpose solution, such as IBM System Storage Archive Manager.


Figure 6-4 Software components

6.1.4 Hardware infrastructureThe hardware environment has three components:

� Storage environment� Backup environment� Archive environment

In each of these, we might use different types of storage media. See Figure 6-5 on page 110 for an overview. It shows the more expensive and reliable storage on the left. As you move to the right it becomes a lower class of storage, down to backup, and finally archives. This corresponds to the selection of tiers: From right to left is tier 1 to 4 in this example. In a horizantal perspective, we show the virtualized environment, using the SAN Volume Controller (SVC). Overall, the components can be managed from a centralized SMI-S management point.

Software Infrastructure / mechanisms toward Automation

IBM DB2 Conte nt Ma nage r

Metadata for OnDemand, Record Management Teamwide Collaboration, Content, V ideoCharger, CommonStore

TSM (Tivoli Stora ge Mana ge r)

HSM (scheduled) T4

Pools

TPC (Tivoli P roductivity Center)

for data : Monitor, Reports

for fabric : Manage SAN

SM

I-S

Orchestration

CM System Administra tor Clie nt

TSM & HSM (GUI)

TPM (GUI)

TPC (GUI)

TPM (Tivoli Provisioning Mana ge r)

for disk : Manage Disk

for replication: Manage HWCopyServices


.

Figure 6-5 Storage Hardware Infrastructure

Storage environmentAs client space requirements grow, the request for new storage solutions is not only based on user, application, and storage subsystem features. The serviceability of the new configuration is one major step that requires you to have more flexibility to choose among different vendors and technologies. Market-leading products in particular may often be incompatible with other vendors’ hardware products. Often the interconnection is simply impossible, or can only be achieved by giving up the advantage of premium features. In storage products these are often copy services like synchronous copy (PPRC) or FlashCopy and snapshot features. Although most enterprise products include such additional features, many of them are not compatible between different products even from the same vendor. Virtualization is a key means to attack this problem with a product such as the SVC. As you can see in Figure 6-5, the SVC is used between the different storage products and the virtual logical drives represented to the server operating systems. It provides these advantages:

� Independency of disk vendors� Tested interoperability� Simplification in storage server deployment, because of a single multipath diskdriver� Reliability by mirroring virtual disks across different vendors’ storage subsystems� Single provisioning point for all your disk management tasks� Freedom of online migration across tiers

For a detailed description of the SVC features see Chapter 4, “Product overview” on page 37. ILM best practices require the flexibility to move data across tiers. SVC is one possibility for manually migrating data into the new storage environment. While implementing ILM best practices, the ability to build tiers is one of the foundation tasks to be done. Usually this includes different moving tasks across an existing storage infrastructure, but also the task of migrating to new storage products. Downtime can be minimized by leveraging the import and migrate features of SVC. As the SVC moves data across the SAN, the migration can be done without minimal impact to the servers.

Among storage and server practitioners, a very commonly used storage partitioning rule is to have the same logical drive size for every server. While this simplifies administration, the big disadvantage is wasting space. ILM best practices save space by exactly assigning the space required. Obviously there is a need to get more space in the future or to evenly spread the access to storage among all available physical disk resources. This is another feature of

T1Platinum

T3Silver

T4BronzeT2

Gold

VirtualDisk

VirtualDisk

SAN Volume Controller (SVC)

VirtualDisk

EnterpriseDisk

VirtualDisk

Compliance

WORMTape

Storage Environment Backup Environment Archive

Backup

Archive

SAN ( Block Level )

DS8000, DS6000, ESS& non IBM Products

Enterprise

Tape WORMTape

VirtualDisk

Tier implementation

g

staged

staged

WORM Disk

Virtualized Filesystem (Global Namespace)

TSM for DR

SM

I-S

DS4000 (Fiber)& non IBM Products

Mid Range Cost centricDS4000 (SATA)

& non IBM Products

VirtualDisk

VirtualDiskVirtual

DiskonlineMigrate

DS4000 (SATA) / Enterprsie Tape Library& non IBM Products

Backup ArchiveDR550

& non IBM Products

VaultPools

LAN ( File Level )


SVC. You now use pools with standard-sized chunks to create so-called virtual disks. You can either increase individual virtual disks on the SVC, or add new virtual disks and concatenate them with a Logical Volume Manager (LVM). If a virtual disk is not being used, it can be deleted, returning the free space to the pool for future re-allocation.

Interoperability often requires diligence to find a common known level of features. Advanced features like copy services are almost never interoperable among vendors. With a tiered storage environment there can be a need to copy data across tiers, meaning from an enterprise storage product to a midrange or low cost storage product. The copy service feature is almost never compatible outside of the component class. SVC provides storage class and vendor-independent SAN-based copy services. This also simplifies the management of copy service licences across the enterprise, as you only need one licence. Check out your savings in copy service licenses.

Backing up data from the storage environment Most backup solutions are built on a client-server structure. It is the responsibility of the backup server to decide where the backup data is stored, thus the movement is always controlled by a backup server. This also means that backup metadata is transported across a LAN connection, while backuped data itself can be sent either through a LAN or through a SAN connection. ILM best practices help simplify the backup configuration by reducing the amount of data to be backed up. See the Redbook IBM Tivoli Storage Manager Implementation Guide, SG24-5416, for more details.

Backup environmentTraditionally, the backup environment consists of tape libraries as the ultimate backup destination. Consolidation can be achieved by using midrange to enterprise libraries, which can be divided into independent partitions. Flexibility is given to move resources like tape drives or tape slots across logical libraries without hardware intervention.

Today’s backup solutions are often based on the principle of save everything because of the simplicity of implementation. However, such a mentality leads to expensive implementation because of:

� High hardware cost for many duplicated backup copies� Higher tape usage and more required tape drives for tape handling� Increasing backup time as data grows� Increasing restore time as data grows

As your environment evolves to an ILM process-based storage environment, you are not only saving data storage costs, but also reducing backup times.

Archive environmentToday, data archiving is controlled not only by law, but also by corporate governance rules and the protection of corporate assets. Hence, there are company critical assets that need to be archived. Typically the retention time for this data is several years; however, for specific data, this can can vary from a few months to forever. And this can change, especially if there are ongoing litigations requiring your archived data. To differentiate transactional data from retention data you should consider some storage and data characteristics, as described in Table Table 6-1 on page 112.


Table 6-1 Retention-data characteristics 1

ILM best practices urge you to do the classification of your data as one of the first steps. But it is important to understand that there is no tool able to merge your retention value to data automatically. This is one characteristic defined in your information classes during the first ILM assessment. So the output of this ILM assessment makes you decide on what ILM policies you will use to automate your data management. It is important to understand that these policies are constantly changing, depending on the numerous compliancy rules. You should consider seeking appropriate legal counsel to ensure your solution is in compliance with those requirements. Clients remain responsible for ensuring that their information technology systems and data retention practices comply with applicable laws and regulations.

The key software products of the IBM data retention and compliance solution are IBM Tivoli Storage Manager (including IBM System Storage Archive Manager), IBM DB2 Content Manager, and TPC for Data. The required hardware infrastructure uses disk space and WORM-capable tape drives or optical libraries. The IBM System Storage DR550 is a preconfigured solution for data retention and is described further in the Understanding the IBM TotalStorage DR550, SG24-7091. The main focus of the DR550 is to provide for a secure storage system, where deletion or modification of data is completely disallowed except through well-defined retention and expiration policies.

ILM-based archiving practices do not focus only on compliance. Depending on the defined service levels for your data, you may want to archive non-compliance-based data. The same tools can be used to analyze your data, for example, TPC for Data.

6.1.5 Management toolsAdministrators are confronted with a huge amount of different hardware management tools. This makes the management more error prone, and human error is usually the most common cause of service outages. With the increase in different tools, much time and effort is expended on education and support of these software products. For enterprise clients the complexity of different vendors’ interoperability is also present. ILM best practices move

Storage characteristics of retention-managed data include:

� Variable data retention period: Usually a minimum of a few months, up to forever.

� Variable data volume: Many clients are starting with 5 to 10 TB of storage for this kind of application (archive) in an enterprise. It also usually consists of a large number of small files.

� Data access frequency: Write once, read rarely, or read never: See data life cycle in the following list.

� Data read/write performance: Write handles volume; read varies by industry and application.

� Data protection: Pervasive client requirements for non-erasability, non-rewritability, and destructive erase (data shredding) when the retention policy expires.

Data characteristics of retention-managed data include:

� Data life cycle: Usage after capture, 30 to 90 days, and then near zero. Some industries have peaks that require access, such as check images in the tax season.

� Data rendering after long-term storage: Ability to view or use data stored in a very old data format (say after 20 years).

� Data mining: With all this data being saved, we believe there is intrinsic value in the content of the archive that could be exploited.


toward a centralized management solution. A first step was made by the Storage Networking Industry Association (SNIA) by establishing a storage management initiative specification (SMI-S). The interface to can be used to implement the following functionality:

� Volume management� LUN masking/mapping� Asset management� Status/event monitoring

This interface is based on the common information model (CIM) and Web-based enterprise management (WBEM) technology. With CIM/WBEM technology, clients are able to manage multi-vendor products with a single management application. Figure 6-6 shows where this technology can be found.

Figure 6-6 Placement of WBEM and CIM technology

We used the IBM Totalstorage Productivity Center (TPC) to manage our lab storage environment. There is a TPC section in 4.2, “TotalStorage Productivity Center for Data” on page 38. The cross management of different vendors’ hardware is a very young and growing field. You should carefully review the compatibility and interoperability features of a management tool. The best way to test the functionality in your environment is done by performing a proof of concept (POC) with management tools of different vendors. As the next step in ILM best practices is to automate your tasks, you should pay close attention to the ability to use scripts with these management tools.

As a goal, a centralized storage infrastructure management should be able to combine or automate several subtasks to help the administrator solve daily management tasks. An example could be following situation: Your application administrator requires more disk space. You as the storage administrator know the required service level and the derived profiles. Now you need to know the required size and performance of the new disk space. Today this task would include the use of several tools on different storage products. It is a time-consuming task a lot of administrators have to ignore because of lack of time, tools, or even lack of understanding of the higher management for a new investment need. Now imagine a centralized storage management tool using its ability to check all your storage assets and giving you the best option by enforcing the service levels, where you get an estimate on your storage growth to be requested in advance and a final report to enable exact charge back to your clients.

SNIA HBA APIProvider

SwitchProvider

CIMOMCIMOM

LibraryProvider

CIMOM

Provider

CIMOMCIMOM

ArrayProvider

SNIA-SMLProvider

Disk arraysFC HBA’s

TapeLibraries

CIMOM

FC Switches

CIMOM

SwitchProvider

CIMOM

WBEM(XML over HTTP)

DeviceManager

Democlient

Managementapplications

CIMOM

Other LibraryProvider

TapeLibraries


Manual storage management is a risk. The intent of management automation is to eliminate the possibility of human errors and to reduce the time to provide storage to a server. These tasks are commonly known as storage provisioning. An example product is IBM Tivoli Provisioning Manager (TPM) with a storage-only focus. How does it work today without storage provisioning automation? Manually. A manual storage provisioning process requires a wide-spread knowledge of products, from software to hardware. It includes many different management tools, and in enterprise companies the coordination of several administrators across business lines. This is time consuming and includes many points of misunderstanding and introduction of errors. As an example, here is a list of the common steps that would be required to assign a new SAN-attached LUN to a server:

1. Add a volume (storage subsystem).

a. Select a storage subsystem.b. Select or create a new volume.c. Select host HBA ports (WWNs).d. Select subsystem controller ports (WWNs).e. Map the volume to controller ports.f. Map the volume to host HBA ports.

2. Set paths (SAN fabric switches).

g. Determine whether multiple paths are required.h. Create or update zones.i. Get an active zone set.j. Add a zone to the zone set.k. Activate the zone set.

3. Set up replication (if necessary).

4. Map the HBA LUNs to the operating system and file system.

5. Update the volume group and file system (host server).

a. Add the physical volume to the volume group.b. Add the physical volume to the logical volume.c. Create or extend the file system.

6. Extend the application to use additional space.

7. Reconfigure backup.

This is an impressive list of skills an administrator needs to have and requires a lot of planning and coordination. Furthermore, you need to know your current environment. You need the documentation of the actual configuration of each involved device and also a change management document. Often each change requires a myriad of checks to be sure you have up-to-date information. Tivoli Provisioning Manager, especially in conjuction with TPC, can be used to automate these tasks. Its ability to perform tasks across end-to-end components and management disciplines is the great benefit of using TPM. If you are now thinking of automating tasks like the one in the example above, bear in mind that it is not always a good idea to copy only manual steps to create an automation process. A great amount of time should be spent defining the automation steps. Repeat the steps in your workflows to know where the risks are. Usually, defining the first automation task is the most complex, but once this first step is taken, TPM will significantly simplify and reduce the amount of time to manage your environment.


6.2 What to do now - The many entry points to ILM When considering ILM, it is common for organizations to start with a target like implementing tiered storage to reduce cost. This transformation is one ILM practice, but there are more. In fact, tiering is sometimg erroneously considered to be synonymous with ILM. Most small and medium businesses (SMBs) gets in contact with ILM because of tiering. Other entry points for ILM are in conjuction with the introduction of new disaster recovery capabilities, new security in classification of data, or new regulations or compliance requirements for long-term data archiving.

The foundation of all these entry points is tiering. Tiering your storage means defining a hierarchy of storage systems based on service requirements. But in an ILM context tiering is only the second step, for example, the output of the first step. The first step, as we have seen already, is to know what kind of data you have and to add business value to the data.

As covered in previous chapters, going through a complete ILM process involves working through a number of business processes; establishing service level agreements (SLAs); classifying data; creating information classes; and implementing, regularly monitoring, and reviewing your changes. You will need to consider, when deciding whether to embark on an ILM project, whether the end cost/efficiency savings will justify this investment. Get the big picture first. Where is ILM evolving from? What is the roadmap?

Many service offerings are available today for a first ILM assessment. After one week of work the output of such a review may not be quite what was expected, particularly for smaller enterprises, where the amount of application and services is limited to a few business-critical applications. Such small environments have service levels bound to every single application, and the hardware infrastructure is homogenous, containing one single class of disk storage. Your review may end up telling you to have three service levels for your three applications, which was exactly the way you where running your environment before ILM. The advantages lie more in the storage cost savings you can achieve by eliminating the need of buying more storage. You should ask yourself why there is a significant data growth in your company. The answer highly depends on you as the client. It may be because of new applications, or you are simply running out of volume space, to name a few possibilities. Companies are growing by users, increasing storage requests and wasting more and more space with redundant data. ILM best practices define policies and give you the tools with the power to manage all this data by stemming the constant growth in storage requirements. It is much less expensive to introduce these practices at an early stage of storage growth. If you see the possibility of your data growing significantly in the near term, you should consider first implementing ILM best practices. And although a final ILM implementation involves significant automation, we suggest starting easy by analyzing your environment and deciding, after your know about your data, what could be automated.

Note: For more information about TPM see the following IBM Redbooks and Redpaper:

� An Introduction to Storage Provisioning with Tivoli Provisioning Manager and TotalStorage Productivity Center, REDP-3900

� Exploring Storage Management Efficiencies and Provisioning - Understanding IBM TotalStorage Productivity Center and IBM TotalStorage Productivity Center with Advanced Provisioning, SG24-6373

� Provisioning On Demand Introducing IBM Tivoli Intelligent ThinkDynamic Orchestrator, SG24-8888


A common misunderstanding is implementing tiered storage and expecting this alone to account for significant storage savings. In these cases, often storage costs continue to rise. How can this be? The fact is that even though raw dollars per megabyte of storage continue to decrease, the overall costs increase, because management costs form an increasingly higher proportion of the total costs, combined also with the costs of monitoring and regulatory compliance. Enforcement of your service levels must be done by using your policies, which are the output of an ILM assessment, to configure storage resource management software.

Now that we have looked at the big picture, we can go into some specifics. There are examples of particular implementations in Chapter 7, “ILM initial implementation” on page 119, Chapter 8, “Enforcing data placement” on page 153, and Chapter 9, “Data lifecycle and content management solution” on page 165.


Part 3 Sample solutions

This section provides a three-stage solution process for implementing ILM.

Part 3



Chapter 7. ILM initial implementation

This chapter describes how to implement parts of an ILM solution that will reduce overall storage infrastructure and management costs in a short period of time. The solution will be based on three pillars, which are:

� Establishing a storage service management process� Reducing allocated storage space� Implementing tiered storage

As this is the primary entry point, we provide an ILM concept that is static in the sense that data will not move automatically between the active disk storage tiers over time. The solution is based on these three components:

� TotalStorage Productivity Center for Data� SAN Volume Controller� IBM Tivoli Storage Manager for Space Management

7


7.1 Storage managementA first step in the initial ILM implementation scenario is to implement and adapt storage management practices. The goal of this step is to create a service-based storage environment that has clearly defined goals and rules on what will be delivered to the users.

When starting to define a service-based environment, a couple of things should be documented and created. The starting point is always the so-called governance model. The governance model is the high-level service definition, which includes:

� The principles, which define the general rules by which the storage service is implemented and run. For each defined principle, the benefits that it brings to the business should be listed, as they are the justification for adopting it. In addition, principles should include the key tasks, required organization, and cost for the application of the principle. An example of a principle could be: “All business related data residing on the IT Systems will be stored on a centrally managed storage infrastructure.” Without such principles there is no foundation for any decision leading to change, making it very difficult to justify the changes. This becomes even more important when embarking on ILM projects, as they set a direction for the future that might take some time to realize fully.

� The policies, which define how things will work from a high-level perspective. For example, a policy could define who has the responsibility for a certain task or how a certain goal will be reached. A policy could, for example, define the security level for certain data, who can access it, and why.

� Finally, the guidelines provide a view of the future: What are the expected goals and their benefits. For example, in an ILM context, a guideline could be that occupation of storage devices should be utilized to a certain level (for example, 85 percent ), or that data should be stored according to its business value.

Although the above might seem abstract and exaggerated, going through this exercise will set a basic framework to facilitate the decisions required to reach the final goal. Implementing an ILM-based solution will require cooperation and decision making at a global level in an organization, with clear and stated management support.

Next to the governance model, the service definition should also contain the processes that will be followed. The following list provides an overview of possible processes involved in storage management activities:

� Capacity Management� Provisioning Management� Performance Management� Procurement� Monitoring and Alerting� Reporting� Backup� Asset Management� Incident/Problem Management� Policy Management, Relationship Management, Billing and Charge back

When mapped to the ITIL framework, the following two processes are most closely related to the first step of the ILM solution:

� Capacity management� Service level management

Although other ITIL processes like availability management, Business Continuity, change management, and financial management are also important, they will not have a direct input


for the ILM implementation tasks. There will be relationships between all of these processes. For example, availability management will be required to control violations in the service level managed by the service level management process. Also, financial management will be called upon for expansion of capacity when the capacity management process requests such an expansion.

In the following two sections we focus on the primary two processes, which will provide valuable input for the design of the ILM solution.

7.1.1 Capacity managementThe first process that we discuss is the capacity management process. The capacity management process focuses on the following three points:

� Monitoring the IT infrastructure and supporting components (resources) from a performance and usage aspect

� Improving efficiency by performing well defined tuning actions

� Planning for future requirements and growth, including the refresh of technology

The main goal of this process is to have the available infrastructure to meet requirements as they arise, at the most optimal cost. If this process is not followed, infrastructure expansions will tend to be reactive (rather than proactive), and cost analyses will be done ad hoc for individual requirements, rather than included in a general plan. Often costs will be higher if requests are handled and procured individually rather than grouped together (more components in one procurement action).

In addition, a capacity management plan ensures that no components are forgotten when a new request is fulfilled. For example, when adding systems to a storage environment, you need to not only review storage capacity, but also SAN capacity, cabling, host FC HBAs, and so on.

A process as defined by ITIL will always have an input and an output (see Figure 7-1). Depending on the type of process, the required input information and output results will differ.

Figure 7-1 Generic process

For the capacity management process, the following input can be used:

� A technology review, in order to understand how it can help to achieve the strategy and goals. Since we are providing a storage service that relies heavily on the available storage infrastructure, knowing what the current capabilities are will help to define the possibilities on offer. An example of this is the available I/O rate on disk devices. It makes no sense to specify a rate that is higher than that currently available on the market.

Note: The ITIL definition of capacity management includes performance management.

ProcessInput Output

Chapter 7. ILM initial implementation 121

� The existing service level agreements (SLAs). In 7.1.2, “Service level management” on page 124, we describe the service level management process. The link to the capacity management process is that a good service level agreement should specify the expected growth and future projects that are planned. For example, a service level agreement might state that a certain application currently needs 100 GB of disk capacity, and that it will need an additional 100 GB in 6 months. In addition, the SLA will also state the expected performance levels.

� The planned change requests.

� The financial and budgetary information, which provides a view on what can be spent on capacity or performance enhancements, both for the planned growth and the tuning operations.

The output of the capacity management process includes the capacity plan, the required capacity reports, alert levels, and recommendations to include in the service level requirements.

Now let us consider the capacity management process activities or tasks.

Tasks in capacity managementFigure 7-2 overviews the tasks that make up the capacity management process—monitoring, change, tuning, and analysis.

Figure 7-2 Capacity management tasks

The monitoring activity will monitor the utilization of the storage resources. This includes usage (current and future) and performance. As these are parts of the service level definitions for performance, it can be used to report on service level violations.

The reports required can be created using TPC for Data. An example is shown in Figure 7-3 on page 123.

Monitoring

Analysis

Tuning

Change

Service level management

Exception reports

Resource utilization

Exception reports

TPC for DataCapacity reports (current/growth)Usage of file systems and databases


Figure 7-3 Space usage over time

In addition to the growth reports, TPC for Data can also be used to create exception reports whenever the capacity of a storage component reaches a defined high threshold.

The analysis activity will be used to show trends in performance and utilization, and can be used to plan for growth. Note that growth can be attributed to two factors:

� Natural growth from existing applications and usage� Change-induced growth from changes to or implementation of new applications.

This analysis process will only reveal natural growth. Change caused by growth is an input to the capacity management process coming out of the change management process. In addition, the analysis phase can be used to compare current utilization against the baseline set in the initial architecture.

The tuning step and subsequent change step will define and implement better resource usage techniques based on the result of the analysis phase. It is logical that after a change action all phases will reiterate to review the change and find new issues or exceptions to the defined service levels.

The capacity planOne of the goals of the capacity management is the creation of a storage capacity plan. A capacity plan will be the primary input for all further storage infrastructure investments. As a result, it will allow the fulfillment of future storage requests, in line with business requirements. The plan should contain a current usage report, and future utilization trend information—ideally, split into near-term, mid-term and long-term.


7.1.2 Service level managementAlong with capacity management, service level management is a very important part of the storage management optimization process. The goal of service level management is to create and improve an IT Service (like storage), so that it aligns with client requirements and cost justifications. The base component of the service level management practice is the service level agreement (SLA). An SLA is a two-sided agreement between the service provider and the user. It documents the service targets and responsibilities. Although focus is usually given to establishing the targets, the responsibilities are equally important. They document who is responsible for what, and set the conditions under which the service levels are met. Figure 7-4 describes the activities and relationships between them in the service level management process.

Figure 7-4 Service level management activities

Establishing a service levelThe targets or service level requirements are a set of specifications that the storage service should meet. The following list provides the high-level topics that can be included in a service requirement definition:

� Availability

The availability component of the service level specifies the time periods in which the service is available. It includes the overall availability (for example, 99.9 percent) and can include specifications for planned and unplanned downtime, planned downtime acceptable periods, and required advanced notice.

� Performance

The performance component of an SLA sets the minimum performance levels that the infrastructure should meet. The most common metrics used are throughput and response time.

� Recoverability

The recoverability component defines what data should be recovered in the case of failure, and how long it should take. Commonly used specifications are the recovery time objective (RTO) and the recovery point objective (RPO). The RTO is the time needed to recover from a disaster (in other words, how long you can afford to be without your systems). The RPO describes the age of the data you want the ability to restore in the event of a disaster. For example, if your RPO is 6 hours, you want to be able to restore systems back to the state they were in no longer than 6 hours before the disaster. To achieve this, you need to be making backups or other data copies at least every 6 hours. Any data created or modified inside your recovery point objective will either be lost or must

Report

Establish and improve SLA

Monitor


be recreated during a recovery. If your RPO is that no data is lost, synchronous remote copy solutions are required.

It might be useful to describe the recoverability service levels related to different types of disasters. Clearly, the RTO will be different in the case of a site-wide disaster, compared to an accidental file deletion (which some users may also describe as a disaster).

� Accessibility

Probably the most abstract component of the storage service level, the accessibility component, also describes the most important part of the data management from an ILM perspective. Accessibility will provide information about required capacity and planned growth, but also about conditions to move data from one storage class to another. In addition, it can describe the access patterns a certain data type has, for example, block or file, sequential or random.

� Security

Data classification includes a security component, for example, confidential data, auditable data, etc. It can also include regulatory requirements for data—data that must be retained for set periods of time, and so on.

� Support

The support component describes what the helpdesk will do, and when it will do it. It should define response times for different types of incidents, as well as document the types of incidents to which the help desk will respond.

� Billing

Finally, the billing component describes the methodology used to charge back storage services used and set the cost, mostly in terms of the used capacity. This component is only required in the SLA for environments that have implemented billing for internal IT services.

As this list indicates, creating a service level for storage can be a complex and time-consuming task. However, in most cases, starting with just a subset (for example, one or two components) of the classes will allow the construction of data classes and subsequent storage classes or storage tiers. When considering storage tiers, the four most applicable components are availability, performance, recoverability, and accessibility.

A way to approach the actual creation of the requirements is to ask the users for their most important attribute of the storage service—availability, performance, recoverability, or accessibility. You can then focus on the most important component or components, reducing the complexity of the storage service level. As the service matures, additional components can be added. A second approach is to focus on the user pains. By asking which part of the storage service can be improved or what are the biggest issues, a view of things to fix and to define can be easily gained. While this has the potential advantage of fixing issues and improving the user satisfaction in the short-term, be careful not to set unrealistic expectations for the result. This might disappoint the users and inhibit further cooperation on service level agreements.

Creating a service level agreementThe flowchart in Figure 7-5 on page 126 provides an example of the steps involved in creating an SLA.


Figure 7-5 Steps to create a service level agreement

The steps are:

1. Define who will be involved in the SLA creation process; in other words, who is the user. For storage services, determining the users is not always as straightforward as it might seem. This is because the storage service is mostly a component of a larger IT function (for example, e-mail, Internet, intranet, file serving) delivered to business users. In most cases, the system or application administrators can provide the required input for the service level requirements. This presumes that a SLA already exists for the over-riding IT function. If this is not so, the users of the IT function should be included in the service level definition process.

2. Create a questionnaire to allow the users to define their most important components of the storage service. As noted above, this can be done based on capacity, availability, recoverability, and accessibility. The most difficult part here is to create a questionnaire that translates storage parameters into user terminology. Be careful not to overstate the requirement. Users may tend to set standards that are higher than the ones they actually require. To avoid this, work with a predefined set of available service levels, including a charge back or indicative cost component.

You should analyze the questionnaire answers to decide which components to focus on.

3. A draft service level proposal can be created based on the input received. It should be reviewed early on to see if the required service levels have achievable targets. It is pointless to create a service level definition for a storage service, which is impossible using available technology at a reasonable cost. It is also important in this phase to check whether monitoring can be accomplished on the most important parts of the service level objectives. For more details on the monitoring part, see the next section, “Monitoring” on page 127. If the review identifies either achievability or monitoring as critical factors, you may have to renegotiate the requirements.

Once the feasibility check is complete (and successful), the service level objectives can be agreed upon to create the service level agreement.

Note: A single function can have multiple service levels, depending on the type of data.

Define stakeholders

Issue questionnaire

Define most important

component

Create draft SL

Perform feasibility check

Agree on SL

Implement SL

Documented storage

capabilities

Renegotiate


MonitoringMonitoring is a critical part of service level management. It includes monitoring all components of the service level to allow further analysis. When defining monitors, the most important thing is that the monitors measure something meaningful, as expressed in the service level, and that they are aligned with the users’ perceptions. It is a good practice to combine measurements with user reviews to ensure that the measurements are in line with what the user experiences.

In addition, remember that the service level agreement consists of two parts:

� The agreed-upon service level objectives

� The agreed-upon conditions or user responsibilities under which these objectives can be obtained

As a result, the monitoring activity should also include the monitoring of the responsibilities and the conditions as defined in the service level to make sure these are not factors in not meeting the service level.

ReportingReports will allow the user and service provider to review the service levels delivered. The reporting part of a service level agreement should include the following:

� How the reporting will be done and to whom it will be distributed. This also includes details on how measurements will be taken.

� What will be reported on.

� When reports will be released. There are two basic types of report triggered by different events:

– A periodic report, triggered on a time basis

– A service level breach report, triggered by an exception condition, which reports on the incident and eventual corrective actions

7.2 Optimization of storage occupationThis section discusses the storage occupation optimization part of the initial ILM implementation scenario. It provides information about how to reduce the occupied space, turning it into allocatable free space. Free space allows the storage administrator to have capacity available for future capacity requirements, delaying and reducing the frequency of capacity upgrades.

Figure 7-6 on page 128 provides an overview of storage space allocation, and how the space is divided up.


Figure 7-6 Overview of space usage

The initial capacity, or raw capacity, defines the total capacity of all disk devices available in the storage subsystems. This raw capacity will typically be groups in RAID arrays. The overhead introduced at this level depends on the actual chosen RAID level. For RAID 10 arrays, this is around 50 percent. For RAID 5 arrays in a 7-disk plus parity configuration, the overhead is 12.5 percent. These overheads are required for availability reasons.

The usable capacity is the actual amount of capacity that will be available for allocation to the hosts. This capacity is divided into two parts:

� Allocated capacity, which is space that is currently allocated to hosts

� Free capacity, which is the part of the usable capacity that we want to increase so that it is usable for new host allocation requirements

To allow an increase in the usable capacity, we focus on two parts of the allocated capacity:

� The part of the allocated space that is currently not in use, that is, unused space. For details on this see 7.2.2, “Avoiding over allocation” on page 146.

� The part of the used space that is reclaimable. Reclaimable space is currently occupied by data that has little to no business value. The next section, 7.2.1, “Reclaimable space” on page 128, discusses the different components of this part of the used space, and shows techniques to reclaim it.

7.2.1 Reclaimable spaceAs explained above, data is of two types: Valuable business data and data that can be considered unnecessary (non-business data). When optimizing storage, it is a good practice

Raw Capacity

Useable capacity

Allocated

Used

Business data

Overhead (RAID)

FreeCapacity

Unused

Reclaimable

Non business Temporary Duplicate Stale

Allocated FreeCapacity

Space reclamation goalSpace reclamationfocus


to first reduce (or remove) this data from the storage infrastructure. Consider an analogy of moving from one house to another. When you do, you have two options:

� Perform a cleanup/consolidation activity before moving, throwing away anything that is broken, or that you no longer need. While this requires some effort up front, it results in faster packing and faster moving using fewer resources, and makes the job of fitting everything in the new house much easier, since everything can go directly where it is supposed to go.

� Pack everything, move everything, and clean up after the move. While this is easier, as no pre-sorting is required, you will spend longer packing and moving everything in your house; use more resources like boxes, truck space, and gas; and the unpack will be more difficult. This is because you have more things than you need, so the tendency is to dump everything in any room, then potentially have several iterations of moving things to their final storage location. If you have not reviewed your possessions before moving, you might even be buying a larger house than you really need, just to have space to store unnecessary items.

In the storage optimization process, sorting the data and removing any unnecessary data before starting the move to tiered storage makes the capacity planning and the actual data placement faster and more accurate. The unnecessary data is of one of the following types:

� Non-business data - Data that has no business use. Often this data is collected and stored by users contrary to (unenforced) company policies (personal files or files downloaded from the Internet).

� Temporary files - For example, files that are created during installations, or dump files.

� Duplicate data - Multiple versions of the same data object.

� Stale data - Data that has not been accessed in a long period of time belongs to users who are no longer active, or was part of obsolete applications, etc.

We now discuss techniques to locate and determine these data types.

Non-business dataWhile it is clear that non-business data should not be residing on enterprise class storage, few people actually implement rules or processes to enforce this. While this might seem illogical, the most usual reasons are very understandable, namely:

� It is difficult to identify what is business data and what is not.� There is no accurate idea on how much storage space is consumed by this type of data.� There are no clear SLAs with end users on how to manage non-business data.

Distinguishing business data from non-business data is not an easy task. While a naive blanket rule might be to prohibit files based on their file extension, for example, media files like mp3, mpeg, wmv, and wma, the growing popularity of technologies such as e-learning and podcasting makes these file types appropriate to a business context in some cases. Therefore such rules are too coarsely grained. Another issue is that it is not always easy to know what volume this type of data represents. If non-business data has not even been identified, how can you measure how much storage it is conuming? Finally, in most cases, there is no existing agreement between users and the storage service provider on what types of data should be stored on the storage systems.

Here we give one methodology for differentiating between business and non-business data.

1. A prerequisite for this method is the adoption of a critical guiding principle—all managed data should be described within a service level agreement (as described in 7.1.2, “Service level management” on page 124).


Adopting this principle will allow us to map the data to the services supported and identify which users and applications should have which data. Then we can proceed.

2. Map each function described by an SLA to the actual applications used. For example, the e-mail service might be provided by the Lotus Domino application. File serving should also be included.

3. After identifying the applications, the next step is to identify the location of the application data. The data location can be defined as the system on which it resides, and/or the directories in which the data is placed. Depending on the consolidation of applications on a system, choose one or both.

4. Using TPC for Data, create a report of the type of files (based on file extension) that use the most space. A sample is shown in Figure 7-7. You should limit the number of types detected to reduce the subsequent complexity. Depending on the usage level, between 10 to 20 file extensions might be sufficient. The actual number of types that need to be investigated upon depend on the utilization. For example, if the top five file types use 95 percent of your storage, do not bother examining 20 file types. If the top 20, however, only represent 50 percent, you will need to add more file types.

Figure 7-7 Top 10 file types using the most space

5. Create a matrix to map applications to file types, indicating which file type is used by which application. You will need the help of the application or system administrators. For example, a file extension of .nsf is a Lotus Domino database, an extension of .xls is used by Microsoft Exchange. Table 7-1 provides an example.

Table 7-1 Define which application uses what file type

Application A Application B Conclusion

File type 1 Yes No Used by A


In the above table, we can see that file type 2 is not used by any application. As a result, this type of data should not exist. Other file types are considered as valid, since they are used by at least one application.

6. Create a TPC for Data report that shows the file types relative to their location. Based on the other information gathered, we can now create an exception report that will list all file types that are not in their defined location. With the above information we can now conclude the following about the file types existing in the storage environment:

– File types that belong to an application and are in the correct location. These are part of the business data.

– File types allowed by the applications, but that are not in the correct location. These might be non-business data.

– File types that are not associated with any application, and as a result are not defined by an IT function. These files are probably non-business data.

7. After having identified the non-business data, the final step is to deal with it. Basically, there are three options:

– Create exception reports and communicate them to the users, indicating violations of the service level agreement.

– Move identified data to a quarantine location, from which data can be recovered if required.

– Delete the files.

This whole process is summarized in Figure 7-8 on page 132.

File type 2 No No Not used

File type 3 Yes Yes Used by A&B



Figure 7-8 Defining non-business data

The issue with this method is that it might be difficult to do for file servers. File servers often do not have specific file types that are considered to be allowed. If there is such a specification, then there is no problem. If there is no such specification, then it is very likely that there is non-business data on the file servers, as that is a common place for users to store files. As a result, an additional control should be added to the location and file type conditions we defined above. One way to do this is to create a TPC for Data report listing the top space using file types as a function of the users to which these files belong.

When this list has been generated, it should be carefully reviewed. Some users will be able to justify using a lot of storage for a certain type of file (for example, a graphics designer can easily justify having a high number of image type files). For users who do not have a business need for a particular file type (or who exceed an acceptable quota for these types of files).

Obtain SLAs

Identify applications

Define top file types (space

usage)

Create matrix of used file types per application

Create report listing file types in

function of location

Create exception report

Handle exceptions

Delete

Communicate

Update locations Move

Required ?No Yes

Correct location?

Yes No

Document data location per application


further investigation will be needed to more precisely define the nature of these files. This carries some risk:

� Creating a negative user relationship� Need to repeat these analyses regularly

As a result, be careful before starting this investigation process. Make sure the possible space reclamation is sufficient to justify the steps taken. If the suspected data volumes are only a small fraction of the space you want to reclaim, it is probably not worth pursuing. This assumes that a reclamation goal has actually been set. A quick way to determine the potential space gain is to just look at the top space using file types. If the reported space usage is much less than the reclamation goal, the results of this operation might not be optimal.

In addition, these operations should be defined clearly in the SLA as part of the reporting activity. An example of an SLA statement could be:

“A monthly review will be performed by the storage management team, that will generate and distribute a list of the following usage violations for the file servers:

� Top 20 file types that use the most disk space� Users owning the above files

For each violation, a justification will need to be provided. If the files cannot be qualified as business data, a delete operation will be performed after informing the file owner.”

Temporary filesA second category of non-business data is temporary files. Temporary files are typically created at one point in time, and are intended to be deleted shortly afterwards. This, however, is not always the case. As a result, controlling the space occupied by temporary files might be an easy way to reclaim space. Again, the main issue is identifying these files, as they can be located in different locations throughout the file systems. One advantage we have is that they usually follow some naming convention. As a result, the following rules could be used to detect them:

� Temporary application files

– Look for files ending with tmp (*.tmp) and files containing a tilde (~) somewhere in the file name (~*.*, *.~*).

– Look for temporary directories or file systems like tmp, temp, and temporary.

� Dump and trace files

– Look for files ending with dmp (*.dmp) or having a name like dump (dump*).– Look for files ending with trc (*.trc) or having a name like trace (trace*).– Look for directories or file systems like dmp or dump.

� Log files

– Look for files starting or ending with log (log* or *log).– Look for log directories.

This list can be supplemented with specific temporary files. Again, the application administrators should be involved to help define what is temporary data. It could be included in the application data definition activity in the determination of non-business data (see “Non-business data” on page 129 for details).

Once you have identified the correct types, reports from TPC for Data can be analyzed to determine the space occupied by the temporary files. It should be clear that this space cannot be considered as automatically deletable. Temporary files can contain valid information that


is required for operations. They should follow one rule, however: There is no reason that temporary data would grow more rapidly than other data. So the amount of temporary data must be compared to the used space and monitored over time. Ideally, the ratio should stay constant.

First, define the baseline ratio (that is, the current percentage of space used by temporary files). Again, TPC for Data can give this information. Next, monitor to see if the percentage changes over time. The following cases will explain what the results could be:

� Constant ratio

The first possible result (see Figure 7-9) is that the ratio of the space used by temporary files to the total used space remains constant over time. This means that there is no increase in temporary files; or better, no increase larger than the increase in total used space. Make sure to trend this ratio over a reasonable amount of time, and compare start values with end values. You do not want to mistake a one-off spike for a continuing trend, for example, when lots of trace/dump data is collected for problem determination, or during an application migration.

Figure 7-9 Ratio between temporary space and used space remains constant

If the ratio is constant, there is a good chance that you will not be able to reclaim any of the temporary space.

� Increasing ratio

Your monitoring may show the ratio increasing, as shown in Figure 7-10 on page 135. If so, it is likely that applications are not cleaning their temporary files, or that system and application administrators are using high-end disk space to store dumps or trace files. It certainly warrants a closer investigation. You should also realize that the increasing space consumption has probably been going on for some time, that is, it did not just happen

Note: Even if the ratio remains constant, you should look at the initial ratio between temporary space and used space. If this seems high, a review of used temporary space might be required to determine if too much is being kept. You could do a one-time purge of very old trace/dump and other temporary data to reclaim some space.

SpaceUsedSpaceTemporary

__

Time

Baselineratio

Temporary increases can be normal, use averaging techniques


when you started the monitoring. This means that the initial ratio is probably too high, and a cleanup operation might provide significant space gains.

Figure 7-10 Ratio between temporary used space and total used space increasing

A point of discussion is the actual increase percentage at which this investigation should be triggered. Suppose you are monitoring a system that has a 20 percent ratio between temporary and total used space. If you see an increase of 1 percent per month in the ratio, your ratio will increase to 32 percent by the end of the year. If you have a yearly growth of 20 percent on your total storage, the temporary space would represent 6 percent of this growth for one year.

� Decreasing ratio

A final possibility is a (sudden) decrease in the ratio. This could be caused by:

– A cleanup operation in the temporary space– An increase in the used space

Neither of these necessarily indicate a problem situation. However, you should reset the baseline to the lower ratio to allow accurate future trending.

Note: A sudden increase in the ratio, with an almost constant curve before and after the increase, does not mean there is an issue, as this could simply be the result of a reclamation operation in the used data volume not impacting the temporary space. If this occurs, you should, however, reset the baseline ratio to allow further comparisons.


__

Time

Baselineratio

Increase


Figure 7-11 Decrease in ratio between temporary and used space

When starting to review the temporary space, be sure to add a section in the service level objectives to cover this operation. For example, you could add a clause in the conditions part of the objectives that indicates what is an acceptable ratio between temporary and used space. If this ratio is exceeded, a corrective action (remove files) will be performed, or justification will be required from the group owning the system or application.

Now we show how to create TPC for Data reports on temporary space in the file systems using filters on directory names. For example, we could define a filter that shows the space usage in any directory that starts with tmp or temp.

Start by creating a profile that will gather this information, as shown in Figure 7-12 on page 137.


__

Time

Baselineratio

Set new baseline here


Figure 7-12 Creating a profile - Defining statistics to gather

On the File Filters tab, define filters to search for the temporary directories (Figure 7-13).

Figure 7-13 Defining the file filters

When the profile has been created, define a scan that will use this profile (Figure 7-14 on page 138 and Figure 7-15 on page 138).


Figure 7-14 Defining the scan systems

Figure 7-15 Defining the scan profile

Now run the scan and go to the reporting section. The reports we need are located in the File Summary section. Select the correct profile and generate the report (Figure 7-16 on page 139).


Figure 7-16 Generating a report

In this example, the results (shown in Figure 7-17) are shown by file system. Ideally, the temporary space should be analyzed system wide. This is because we are looking at the overall ratio between temporary space and used space. Temporary data often resides in a different file system from the application data, for example, the file system in which the operating system and applications are installed. The generated report shows the space used by temporary directories, as well as the space used in the entire file system.

Figure 7-17 Temporary space report

Duplicate dataA third component of the non-business data is duplicate data. Duplicate data is made up of:

� Data intentionally duplicated (replicated) to allow two applications to use the same data.

� Data duplicated by users, typically on file servers. Maybe everyone wanted their own personal copy of the CEO’s last address to the stockholders.

We discuss the first case in Chapter 8, “Enforcing data placement” on page 153.


For the second case, since duplicate files basically contain the same information, they should not exist. With TPC for Data, duplicate files can easily be located throughout a file system. This means that we do not have the same detection problem as with the previous two types of non-business data. The problem here is to reclaim the space used by duplicate files.

Duplicate data cannot simply be consolidated by deleting all of the extra versions of a file. Doing this would obliterate one of the important metadata parts of the file, namely the path in which it is located. As most users and applications rely on the path as the means to locating the file, removing or changing the path would mean a loss of data. So, in order to remove or reduce duplicate data, a change in data sharing practices should be done.

On UNIX systems, symbolic links might solve the above issue. This will work if the files will remain duplicates over time.

On Windows file servers, consider rearranging the data, moving from a user-centric schema for the home directories to an organization-centric schema. For organizations that are project oriented, a project-based directory might also be appropriate. See Figure 7-18 for an example.

Figure 7-18 Organizational and project-based file-sharing structure

With the above structure in place, it will now be possible for end users to share data files between several people, reducing the need for all of them to have their own copy. Once this is available, TPC for Data could be used to generate lists of duplicate data still residing in user directories, and this list could then be used to create an awareness or enforcement of the use of the shared directories.

Note: TPC for Data detects duplicate files by comparing the metadata of the files. However, it will not scan the files for contents. This means that there is no 100 percent guarantee that both files will actually be identical. On the other hand, it also does not detect duplicate information stored in files with different metadata (for example, different names).

User directories

Organization 1

Members

Global

Project 1

Team

Global

Standard home directory

Directory for an organization

Contains files for all members of an organization

Contains files from members of an organizationfor sharing with other organizations

Directory for a project

Contains files for the project team

Contains files for other project teams


There is much that could be done by future technology solutions in the area of duplicate data. We have considered here only file-based data. Semi-structured data, like e-mail systems, are an ideal breeding place for duplicate data copies because of frequent forwarding of attachments. Ideally, a solution to this should be implemented at the mail server, so that attachments would actually point to a central file repository. Another way to attack this is to restrict the maximum size of forwarded e-mails, although this has separate user-related issues.

Structured data, like databases, also often contains duplicate data because of lack of control or poor design in of creating tables with duplicate information, duplicate records, etc. We expect solutions and service offerings to address these issues to become more prevalent in the future.

Stale dataA final type of non-business data is stale data. Stale data is data that has not been accessed in a certain period of time, or data that has access rates below a certain threshold. Stale data can be separated further into the following categories:

� Obsolete data (for example, information that is no longer current or data that is no longer used by applications)

� Archive style data (for example, snapshots or point-in-time copies of particular files)

� Data belonging to users who no longer work for the company (also commonly called orphaned data)

� Data that is accessed periodically, but with long periods of inactivity (for example, financial data that is used at the end of the quarter or year)

These are quite distinct categories; therefore, different handling methods are required. Obsolete data, for example, could be deleted, archive and orphaned data could go to offline tape storage, while periodic data could go to nearline storage.

The issue again, however, is to distinguish these four types of data. Unless they are clearly defined already (which is highly improbable), there is no intrinsic way to determine which data is of which type. Therefore, we can only differentiate them according to the available metadata, that is, the inactivity period.

A key point is to have a clear view of this inactivity metric. In other words, how long does a file need to be inactive before it can be considered stale? The definition of these periods should be part of the service level agreement (accessibility part). Once this is defined, a two-tiered archiving solution might be created, which will:

� Store data on archival disk for a period of time, after a certain inactivity period has passed

� Move data to tape storage after a certain time has passed, typically equal to the inactivity time of the periodic data

This process, shown graphically in Figure 7-19 on page 142, can be accomplished using Tivoli Storage Manager for Space Management (commonly referred to as HSM).


Figure 7-19 Moving stale data in a two-tier Tivoli Storage Manager HSM solution

The figure shows a setup using a two-tiered Tivoli Storage Manager solution with HSM. The space management client is installed on the servers managing the volumes that are eligible for space management. When a file meets the conditions for migration, it will be automatically moved to the Tivoli Storage Manager server storage, leaving a stub file behind at the original location. A stub file is basically a pointer to the content of the file. To a user, it appears that the file still resides on the original file system. However, the user only sees the pointer or stub. When accessed, a so-called recall of the file will be performed automatically and transparently, copying the file back from Tivoli Storage Manager managed storage (disk or tape) to a file system managed storage space (disk). The conditions for moving a file depend on the operating system on which the HSM client is installed, but typically include file age (can be defined by creation, last access, or last modification date), file location, or name (including the extension), and space used within the original file system.

The advantage of setting up a two-tiered HSM solution is that files can be retrieved from disk instead of tape in a short initial period of time. This is ideal for files that have a very high probability of being accessed in a certain period of time (like the periodically accessed files), since the recall time will be limited to the network transfer time of the file. If it has moved to tape, there is additional tape processing time, including mounting the tape and moving to the location of the required file.

With the above in mind, the following steps can be performed to establish an HSM-based inactive data management system.

A first step, as usual, is to determine the applicability and benefit of an HSM-based solution. TPC for Data can create reports that will show the average age of files, based on the last access date. There are three dates applicable to the file age:

� Last access date� Last modification date� Creation date

Of these three values, the last access date is probably the most accurate one when thinking in terms of utilization frequency of a file.

Based on this report, you can show the data access patterns. Figure 7-20 on page 143 is an example of a TPC report on last access times for files.

Note: As well as this general two-tiered implementation, Tivoli Storage Manager for Space Management can also make use of different management classes, allowing management of files (for example, move from disk to tape). To do this, however, a clear definition of file types and/or location must first be obtained.

TSM Storage PoolsActive data tiers

File is moved to TSM by the TSM HSM client after x days of inactivity.

TSM migrates file from disk to tape after y days.


Figure 7-20 TPC for data access time reporting

The above report can guide you on what the ideal retention periods would be to keep inactive or stale data on your disk subsystems. For the two-tiered HSM solution explained above, you need to determine two periods:

� An initial stale period (x), after which the data will be moved to the primary Tivoli Storage Manager disk storage

� A second period (y), after which the data moves from disk storage to tape storage

Table 7-2 on page 144 provides an example of the distribution of data over date last accessed. When analyzing the data, you can add a column to indicate possible savings on your primary data tiers. To do this, make a cumulative summation of the percentages per access period, starting with the oldest data. The table shows that 10 percent of the data has not been accessed in over 1 year. A further 5 percent has not been accessed in more than 9 months, but less than a year. Therefore, if we moved all data that has not been accessed in 9 months or longer to lower tier storage, the potential capacity savings on our prime storage would be 10 + 5 percent = 15 percent.

Note: The above report can be created at different levels. You can display the access distribution relative to the number of files or relative to the total space used by the files. As our primary concern at this point is space reclamation, it should be viewed at the level of the total space consumed by the files.


Table 7-2 Usage in function of access date example

You should first define the time period a file must be inactive before it is moved from disk to Tivoli Storage Manager storage pools. While this period should be part of the SLA, it might be a good idea to look at the above table and come up with a proposed value. A common value for this is one month. This means that the potential space savings for this data profile will be 75 percent of the total volume (referring again to the Savings percentage column).

As well as the volumes as such, this data can also give you some important information about the prospective load there will be on the HSM environment. The table shows that 20 percent of the data volume is accessed every 1 or 2 months. If the total storage is 1000 GB, this means 200 GB is accessed every 1 or 2 months. Then, assuming 30 days in a month, the largest access frequency for that group of data is 200 GB divided by 30 days, or approximately 7 GB per day. Using this, we can create a view of the environment, as shown in Table 7-3 (using a 1000 GB environment).

Table 7-3 Access loads for data based on age

Based on this, if we archive all files that have not been accessed in one month or longer, the average load on the recall process will be about 19 GB per day (worst case average, shown

Data accessed Percentage of the total data size

Savings percentage

Less than 1 day 10 100

Between 1 day and 1 week 5 95

Between 1 week and 1 month 10 85

Between 1 and 2 months 20 75




Between 9 months and 1 year 5 15

Over 1 year 10 10

Data accessed Volume Days Load Cumulative load

Less than 1 day 100 GB 0 N/A N/A

Between 1 day and 1 week 50 GB 1 50 GB/day 84 GB/day

Between 1 week and 1 month 100 GB 7 15 GB/day 34 GB/day

Between 1 and 2 months 200 GB 30 7 GB/day 19 GB/day



Between 6 and 9 months 50 GB 180 0.3 GB/day 0.8 GB/day

Between 9 months and 1 year 50 GB 270 0.2 GB/day 0.5 GB/day

Over 1 year 100 GB 365 0.3 GB/day 0.3 GB/day


in the Cumulative load column). As a result, the archiving solution should be designed to handle this load.

Next, we should determine how long the archived data will remain on the Tivoli Storage Manager disk storage pools before being moved to tape. Here the periodically accessed data might be a good indicator. If you have data that is accessed typically in 3 monthly (quarterly) periods, you could plan to leave this data on the Tivoli Storage Manager disk storage pools for this period, depending also on the total volume of such data. So this means that we would need to keep the data an additional 2 months on Tivoli Storage Manager disk (remember, we already assumed that data was only moved to Tivoli Storage Manager after an inactivity period of one month). Figure 7-21 shows an overview of where the data will reside in function of time.

Figure 7-21 HSM Data placement in function of time

This means that 50 percent of the current active data will be moved to tape, based on the 50 percent cumulative percentage amount shown in Table 7-2 on page 144 for the “between 3 and 6 months” last access.

Figure 7-22 on page 146 summarizes the above information.

Note: The above numbers reflect the expected average recall load for the HSM solution, that is, how much will move back from archival storage to active storage. Assuming a constant overall volume, we can then also conclude that if 19 GB per day is recalled, 19 GB per day is also archived.

Disk tier (active) TSM disk storage pool TSM tape storage pool

A

Day 0: file A is accessed for the last time

Access


A

Day 30: file A is migrated to the TSM disk storage pool using the space management client. A pointer is left on the active disk.

HSM Migrate

A


A

Day 90: file A is migrated from the TSM disk storage pool to the TSM tape storage pool using the TSM MigrateByAge function.

TSM Migrate

A


Figure 7-22 Overview of two-tier HSM implementation

The starting point is one (or more) active storage tiers that contain 1000 GB of data. The first rule adopted (and that should be included in the SLA data accessibility section) is that all data that has not been accessed in 30 days will be migrated to Tivoli Storage Manager storage pools using the HSM client. The immediate result will be (based on a TPC for Data analysis) that 750 GB will be migrated from the active data tiers, freeing up 75 percent of the storage. A second step is that we implemented a TSM migration process from disk storage pools to tape storage pools that will move files that have resided at least two months in the disk storage pool (and as a result have not been accessed in 90 days). The result is that the Tivoli Storage Manager disk pool will contain 250 GB, the tape pool 500 GB. The average load is calculated using the same TPC for Data report as the last access time analysis.

The above information can be used for file systems accessed at a file level. For file systems containing database or e-mail type (structured on semi-structured) data, archiving should be under the control of the database application, in combination with a content management system. As the complexity of this surpassed the goal of this first ILM implementation level, it will be discussed later in Chapter 9, “Data lifecycle and content management solution” on page 165.

7.2.2 Avoiding over allocationAfter the non-business data residing on the storage infrastructure, over-allocated file systems and application space is a second big contributor to the non-optimal usage of the available



Active data tiers

1000 GB

250 GB

250 GB

750GB19GB/day

19GB/day

15GB/day

19GB/day

4GB/day

250GB

4GB/day

500GB


capacity. Typically this is caused by lack of planning in assigning the initial file system space or overestimating the space required to avoid a subsequent out-of-space condition. This can be hard to fix, as older server systems may not allow you to easily change the file system space assignment. Use of volume managers (for example, LVM on AIX) makes this less of an issue, since they allow online expansion of file systems. Virtualization, for example, SAN Volume Controller (SVC), also allows you to extend allocated volumes online. This means that when a file system reaches an upper usage limit, it can easily be extended to match the additional space requirements, without impacting the attached systems functionality.

Database space containers also tend to be over-allocated. In some cases, this is due again to a lack of good growth numbers. Using correct capacity management (as described in 7.1.1, “Capacity management” on page 121), this problem can be resolved for the future.

To analyze the gravity of the current situation, TPC for Data can again be used—specifically the file system free space report (see Figure 7-23). This report will provide an overview of the global over allocation, as well as views per system or per file system.

Figure 7-23 File system unused space reporting

If the report shows an issue with over allocation, you need a process to fix it. A first step is to create a capacity plan. This will provide you with a (funded) growth pattern, based on historical data. In addition, the SLA accessibility component will also provide you with expected growth patterns. Next, define a process to handle space allocation. The crucial thing here is to define how long it will take to allocate space in case of an expansion requirement, that is, what the elapsed time period will be between the triggering of the

Note: Online file system expansion is not supported on all operating systems. For details see:

http://www-03.ibm.com/servers/storage/software/virtualization/svc/index.html


http://www-03.ibm.com/servers/storage/software/virtualization/svc/index.html

process (manual or automatic) and the actual availability of the extra capacity for the host. When this information is known, the curve shown in Figure 7-24 can be used to define the level at which an alert should be posted, triggering the space allocation process.

Figure 7-24 Defining the space allocation trigger level

Once the trigger level is defined, TPC for Data can be used to define exception reports to indicate that the maximum usage level defined has been reached.

As well as file systems, databases can also be using too much space. A complicating factor is that raw logical volumes are often used for database table spaces. If this is so, the file system reports will not provide any information, as these raw volumes are only accessible by the database management system itself. To allow scanning the space usage of database systems, the database component of TPC for Data must be used. Figure 7-25 provides an example of a database free space report.

Figure 7-25 Database unused space report

Once the allocation issues have been identified, you want to fix them, to reduce the total amount of available free space. This will involve moving data, which may require downtime/application outages.

Figure 7-26 on page 149 provides an overview of actions and their interconnection to fix and monitor space allocations.

Note: When defining the lead time to add space to a file system, a distinction should be made between allocations from the free capacity and allocations requiring the addition of raw capacity, as they both have significantly different lead times.

Time required for space allocation

Growth curve

100%

Allocationtrigger

File system usage (%)

Time


Figure 7-26 Handling over-allocated file systems and databases

7.3 Tiered storageA final part of the initial ILM implementation is the addition of storage tiers to your storage environment. As explained in Chapter 3, “Implementing ILM” on page 29, storage tiers enable you to distribute data based on the required service level. In fact, it actually enables the different SLA levels to be implemented at the appropriate cost point. To allow the creation of tiers, the capacity management and service level management processes should be in place. Capacity management is required to allow the sizing of the different tiers, and to allow the addition of capacity when required by growth. The service level management process will be required to match the capabilities of the storage tiers to the actual user requirements as agreed upon in the SLA.

When designing the different storage tiers, it is likely that the number of different data classes (a data class is a collection of data that has the same service level requirements) is higher than the practical limit of storage tiers. Normally, two or three active storage tiers should be sufficient to fill most requirements. Figure 7-27 on page 150 provides an overview on how to do this.

Growth planning

Create report(usage level)

Detect overallocation

Allocate capacityFix overallocation

Maximum usage level reached ?

No

Yes

Define allocation trigger

Capacity management Service level management

Allocation procedure lead time


Figure 7-27 Matching data classes to storage tiers

Match each data class to a higher capability storage class. For some data classes (Data6 in this example) requirements might be so high that the resulting infrastructure cost would be too high. For this type of data, a reiteration of the definition of the data class might be required, and a mapping to a lower level storage class might be necessary.

7.3.1 What storage devices to useThe storage tiers defined here are static, so that in normal conditions, data classes will not move to a different tier over time. This next stage in ILM evolution is covered in Chapter 9, “Data lifecycle and content management solution” on page 165. Clearly there is a wide choice of disk and tape technology that could be used to implement this kind of tiered storage. However, we suggest using the SVC for the higher tiered storage for the following reasons:

� The SVC allows easy integration and movement of existing storage volumes to the new assigned storage tiers, without disrupting access to the data residing on these volumes. In addition, the disk migration functionality can also be used on rare occasions where a data class needs to move from one tier to another. Note that the movement occurs on the logical volume level; files are not moved individually.

� The SVC provides additional functions, which can reduce the operational management complexities of having different types of storage subsystems for different tiers. These functions include:

– Common copy functions (remote copy and Point-in-Time Copy) for all storage components

– Common volume assignments

– Possibility to extend logical volume sizes

As discussed in “Stale data” on page 141, Tivoli Storage Manager for space management can also be used to add two tiers for archival storage. Typically, archived data can effectively use tape as a media, which is cheaper, since total cost of ownership of tape is around 10 to

Capability

Requirements

Tier 3 Tier 2 Tier 1

Data1

Data2Data3

Data4Data5

Increase in capability

Data6


20 times lower than disk. Using Tivoli Storage Manager for space management for archive data also introduces automatic data movement does occur.

This completes the first ILM implementation scenario. In the next chapters we expand on this initial scenario and add the automatic data movement functions.



Chapter 8. Enforcing data placement

In this chapter we discuss the second step in an ILM implementation process. After implementing service level management and other storage management related processes, defining an initial data classification and implementing storage tiers, as shown in Chapter 7, “ILM initial implementation” on page 119, this chapter will add to these initial definitions.

The focus in this chapter is the expansion of the data classes, moving to an individual file placement level. In addition, we define a way to enforce data placement and automatically move files that do not comply with the rules.

8


8.1 Moving from the initial ILM scenarioWith the initial ILM implementation explained in Chapter 7, “ILM initial implementation” on page 119, we related the data placement to the initial value of the data. This means that each data class is mapped to a certain storage tier (based on input received from the service level management process), and that these mappings are static in overtime (see Figure 8-1). In addition, mapping was performed at the logical volume level, meaning that data class granularity was limited.

Figure 8-1 Initial static ILM implementation

As a second step, we maintain the static mapping over time, but we add a function that allows automatic data placement as a function of its data class, thus enforcing mapping the correct data class to the correct storage tier (see Figure 8-2 on page 155). In addition, we will allow data classes to become more detailed, adding individual data types rather than individual applications or systems (which was done in Chapter 7, “ILM initial implementation” on page 119).


Tier 1

Data Class 1

Tier 2

Data Class 2

Tier 3

Data Class 2


Figure 8-2 Adding automated data placement

The initial ILM design shown in Chapter 7, “ILM initial implementation” on page 119, can be optimized in two ways:

� Add an increased control over where data is placed. Using the initial implementation, if a DBA places a database on a file server, the only point of control is the SLA violation, which is triggered after the event. In this chapter, we show a way to automatically fix this.

� Increase granularity of data class assignments. In the first design, an entire logical disk needed to be mapped to one tier. In this design, we will be able to create data classes that will include individual file or database objects.

8.2 Requirements for data placement enforcementWhen implementing the next step in ILM, we add additional requirements to the storage environment. From Chapter 7, “ILM initial implementation” on page 119, we retain the following three main requirements:

� Create and apply storage management policies, with a focus on the service level objectives definition.

� Define initial data classes, dividing the total data volumes into valid and invalid data.

� Create a tiered storage environment, using the SAN Volume Controller as a single front-end to simplify operations when using different back-end storage devices.

Now we need to add the following two functions:

� Add a function that places data according to the SLA.

� Define rules for data placement according to file types. These rules should be included in the SLAs. Figure 8-3 on page 156 shows how this ties into the already existing ILM architecture, including the defined service level objectives (SLOs).

Data placementenforcement


Tier 1

Data Class 1

Tier 2

Data Class 2

Tier 3

Data Class 2

Chapter 8. Enforcing data placement 155

Figure 8-3 Adding file-based location rules

8.2.1 Data classificationIn 7.2.1, “Reclaimable space” on page 128, we discussed the fact that stored data can be split into two parts:

� The valid data, which has a business value� The invalid data, consisting mainly of non-business data and duplicate data

We will start this discussion assuming both types of data are already identified, so we can focus on the valid data. The first distinction to make is the difference between known and unknown data. Known data is data that is assigned to or associated with an application. Unknown data is data of unknown origin. Due to a principle adopted earlier, all valid data should be assigned to an SLA and, as a consequence to an application, this unknown data should not be part of the valid data pool, and can as a result be considered invalid.

As we said, known data is assigned to or associated with an application. Remember the table we created, assigning certain file types to certain applications (see Table 8-1).

Table 8-1 Define which application uses what file type

All data

Valid Data Invalid Data

Known dataApplication

Function

Modified SLA

Move from system or volume wide SLO’s to specific data SLO’s

UnknownData

Should not exist,classified as invalid.

Well defined Undefined

DefinableUn-definable

LocationFile name(parts) Date

Owner (UID/GID)

Rules

Tier 1 Tier 2Tier 3

Data Classes


File type 1 Yes No Used by A

File type 2 No No Not used

File type 3 Yes Yes Used by A&B


This table should be our starting point, as it defines the known data. However, since our placement policies will have more detail than just the application or the system, we must expand the table, adding a clear definition of each file type per application.

At this point, file identification and classification should be done using the file’s metadata. Unfortunately, metadata of a file is rather limited, making this a more difficult task. When defining the data parts of each application, we should come up with a description for each type of file based on the file name (or parts of it), the location (or parts of it), the dates (creation, modification, or last access), the owning user or group (UNIX only), or a combination of these attributes.

As always, there is a chance that some of the data cannot be defined or identified by this method. In this case, two options exist:

� Review the data assignment to see if somehow the data can be transformed or moved to a location in which it becomes definable.

� Create a catch-all data class, which will be used for all the undefinable data. You will need to identify a storage tier to place it on. One approach could be that, since you do not know the explicit value of the data, you should place it on the highest tier, to be safe. Or you could choose a lower tier.

Once the data is defined, rules need to be created to assign each type of data to a certain data class and as a result to a certain tier. It is important here that the SLOs reflect these rules, and that they are agreed upon in an SLA.

Table 8-2 shows an example of a data classification exercise based on three identification rules:

� The extension of the file � The location of the file� File name or part of the file name (specification)

Table 8-2 Data classification based on application and file metadata

Applications Data Identification Data requirements

Extension Locationa, b Specification Performance Capacity Availability

Office automation

Word doc, dot homedirs\<uid> All Low Very high Medium

Excel® xls, xla, xlt, csv

homedirs\<uid> All Low Very high Medium

PowerPoint®

ppt, pot, pps homedirs\<uid> All Low Very high Medium

Visio® vsd, vss, vdx, vsx, vtx

homedirs\<uid> All Low Very high Medium

Text based txt, rtf, ?? homedirs\<uid> All Low Very high Medium

Acrobat pdf, ps homedirs\<uid> All Low Very high Low

Imaging jpg, gif, bmp, tiff

homedirs\<uid> All Low Very high Low

Archives zip, rar homedirs\<uid> All Low Very high Low

Intranet/Internet


Here is the basic process for constructing a table like this:

1. Define which IT functions will be considered. Examples here are file sharing (could be office automation as well), Internet, and e-mail.

2. Define the applications that are used in the function. These can be applications that offer the function, for example, Lotus Domino for e-mail, or applications that are used to handle the content that the function produces. For example, the Internet function might allow users to download multimedia files, which are then handled by the appropriate applications.

3. Identify the data assigned to the application. Note that we are looking at the actual data, and not the application files (executables, configuration) themselves. A first identifier can be the file extension. For example, a Microsoft Word document will typically have doc as the extension. Lotus Domino databases might have nsf or ntf as the extension. Sometimes, however, extensions are not so clear. For example, basic text documents might have txt or rtf extensions, but could have others as well. Take, for example, a read.me file. Where sometimes this is straightforward, at other times it is not possible. This is the undefinable part of our data. Under normal conditions, this should not form a large proportion of the total data, since the fact that it does not use a well defined extension also makes it likely that the file type itself is not commonly used. If it does turn out to be a commonly used extension, the location (directory) or part of the file name can also be used as an identifier to classify the data further.

To be assured that undefined data does not form a significant proportion of the total data, use the TPC for Data file type analysis reports, listing space used by file type.

4. Identify the location or locations of the application data in the file system structure. Where this was less important initially, because file systems were typically handled as one entity and placed as such, it now becomes mandatory to perform this step for two reasons:

– By defining the location, you will be able to place data accordingly.

– The location defines the rules for data placement compliance. If a certain file type does not meet the location rule, it will be considered as invalidly placed and will need to be moved.

When defining the location, we should remember why we are doing this classification: To place data (individual files) on a tiered storage device infrastructure. In a non-virtualized file system environment, files will end up on different drives (or file system mountpoints) based on their classification. So, it is not useful to define the location, for example, as

Multimedia mp3, wmv homedirs\<uid> All Low Very high Low

Acrobat pdf homedirs\<uid> All Low Very high Low

HTML html, htm, xml

homedirs\<uid> All Low Medium Low

E-mail

Domino nsf, ntf <uid>\notes\data mail db ml*.nsf

High High High

Other db’s Low Very high Medium

a. Including all subdirectories starting from this path, except any directory with temp as the directoryname. Temporary files should be placed on the lowest available storage tier.b. If a file is in a directory is called \homedirs\<uid>\application, do not move it.

Applications Data Identification Data requirements

Extension Locationa, b Specification Performance Capacity Availability


D:\homedirs\<uid>\data, as the drive letter might change depending on the final tier placement. Instead, use a location relative to the file system name (D:\ in this case). So a correct location would be \homedirs\<userid>\data. Also, decide if data may be legitimately located in subdirectories, maybe specifying which ones. For example, you could allow any data in any subdirectory for a file server \homedirs\<uid>\, except files located in temporary directories, indicated by a \temp directory.

5. Add any further specification for the data, including the file name. For example, a Domino server might contain mail databases and other databases (for example, a team database). A rule could be that mail databases have a higher requirement than team databases. This means that there should be a unique identifier for these mail databases, for example, all files starting with mail.This completes the definition of the different data types based on their metadata. The second half of the table lays out the actual requirements for these different data types.

In this example, we created requirement definitions based on three different characteristics: Performance, capacity, and availability. For each of these requirements, a classification is made indicating the importance (ranging from low, medium, and high, to very high). There are many different ways to define requirements of the different data types, as explained in “Establishing a service level” on page 124. For this example though, we created a basic approach to establishment of the SLOs, not including detailed definitions of the different requirements (“What does it mean if someone says he requires high performance?”). It is clear that in a real-life situation, these classifications should be mapped to actual numbers, so that interpretation is left to a minimum and that SLOs can be measured afterwards.

This completes the classification of the data and the definition of their requirements shown in Table 8-2 on page 157. Next, we need to define the different data classes that came out of the above analysis. Basically, this means that we need to record all the distinct combinations of the capacity, performance, and availability requirements.

Table 8-3 shows the four different classes we would require based on the input.

Table 8-3 Data classes

Combining Table 8-2 on page 157 and Table 8-3 gives us the data classification table, Table 8-4.

Table 8-4 Application to data class mappings

Data class Requirements

Performance Capacity Availability

Data class 1 High High High

Data class 2 Low Very high Medium

Data class 3 Low Very high Low

Data class 4 Low Medium Low

Applications Data class 1 Data class 2 Data class 3 Data class 4

Office automation


This completes the initial data classification. A final part of the data classification exercise is to map the data classes to the actual storage tiers. Assume the available tiers for online storage (non-archive) have been defined as shown in Table 8-5.

Table 8-5 Tier definitions

If we compare the tiers (Table 8-5) to the data class requirements (Table 8-4 on page 159), we can see that we do not have exact matches. Also, the number of required data classes exceeds the number of available storage tiers. This means that we must define a best match for the different data classes. We can see that the requirements for data class 1 are higher than the ones available in tier 2. As a result, data class 1 is mapped to tier 1. Data class 2 has higher requirements that tier 3, but lower ones than tier 2. As a result, data class 2 can be mapped to tier 2. Data class 3 and 4 will be mapped to tier 3.

Table 8-6 Tier definitions

Word X

Excel X

PowerPoint X

Visio X

Text based X

Acrobat X

Imaging X

Archives X

Intranet/Internet

Multimedia X

Acrobat X

HTML X

E-mail

Domino - Mail databases X

Domino - non-mail X

Storage tiers Requirements

Performance Capacity Availability

Tier 1 Very high High Very high

Tier 2 Medium Very high Medium

Tier 3 Low Very high Low


Data class 1 X

Data class 2 X

Applications Data class 1 Data class 2 Data class 3 Data class 4


Figure 8-4 puts it all together.

Figure 8-4 Complete picture of data types to tiers mapping

8.2.2 Enforcing data placementWe have just explained how to map certain data types to storage tiers using the data classes. In this section we discuss defining rules to allow us to enforce the agreed data classes.

We use TPC for Data to detect data placement violations and to create alerts that will initiate the correct placement procedures. As a result, we must define our rules based on the view TPC for Data has of our data. TPC for Data works in a server-centric mode of operation, meaning it will analyze the data from the application or file server perspective (in contrast to analyzing files from a storage subsystem perspective). This means that the rules themselves must be defined from a server point of view.

Continuing the example used so far, we look at the data from two servers:

� FileServer1, which hosts user office automation and Internet files� FileServer2, which hosts Lotus Notes user databases

Assume we will create the volume mappings shown in Figure 8-5 on page 162. Our two servers are mapped to the tiered storage, with server 1 using the D:\ drive for tier 1 and the

Data class 3 X

Data class 4 X


T1

T2

T3

Office automation

WordExcel

PPTVisio

Text

AcrobatImages

Archives

Intranet

MultiMedia

Acrobat

HTML

E-Mail

Mail DB

Other DB

Service Level AgreementData Classes

Data Class 1

Data Class 2

Data Class 3

Data Class 4


E:\ drive for tier 2. Server 2 uses drive D:\ for tier 2 and drive E:\ for tier 3. From a user point of view, this means four mappings to the file servers are required, two for each server.

Figure 8-5 Server tiered volume mapping

We can define rules using two techniques:

� Prescriptive rules, defining which files belong on a certain tier and move scripts to move all data that does not belong there.

� Exception rules, defining which files do not belong on a certain tier and move scripts to move them to another tier.

Table 8-7 provides an overview of the applicable rules.

Table 8-7 Rule definition

Rule ID Rule

1 If the system name is FileServer1, move all files that do not correspond to mail*.nsf and are on the D: (tier1) drive to the E: drive (tier 2).

2 If the system name is FileServer1, move all files that correspond to mail*.nsf and are on the E: drive to the D: drive.

3 If the system name is FileServer1, create a report of all files that do not end in nsf or ntf.

4 If the system name is FileServer2, move all files that end with mp3 or wvm and that are in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder.

5 If the system name is FileServer2, move all files that end with pdf or ps and that are in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder.

6 If the system name is FileServer2, move all files that end with jpg, gif, bmp, or tiff and that are in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder.

7 If the system name is FileServer2, move all files that end with zip or rar and that are in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder.

SAN

FileServer1

FileServer2

Tier 1

D:

Tier 2

D:

E:

Tier 3

E:

LAN

Client Workstation

\\server1\d$:\user1\notes\data

\\server1\e$:\user1\notes\data

\\server2\d$:\user1\notes\data

\\server2\e$:\user1\notes\data


Note that we have explicitly stated the server name to which each rule should apply. This will avoid undesirable side effects of moving files on other servers. Being explicit in this way gives greater control over what gets moved, but also increases the number of rules required and associated complexity.

The first three rules set the environment for FileServer1, which contains the Domino databases for the users. Rule 1 states that if it is not a mail database, it should be moved to tier 2. Rule 2 actually moves mail databases to tier 1, as we want all mail databases to be on the same tier for performance reasons. The third rule creates a list of non-Domino files that are on this file server. As they should not be here, a report is created that lists violations of this rule. Note that the actual data movement will be triggered by TPC for Data, but that the script required to execute the move needs to be written separately. TPC for Data provides the ability to launch a Visual Basic® script (Windows) or a Perl script (UNIX). As the scripts execute on the system detecting the violation (in this case, FileServer1), this means that the script can only move data between drives to which it has access. This means that rule 3 cannot be coded to move non-Domino files to FileServer2, as FileServer1 has no access to these disks.

Rules 4 through 8 provide the data placement for the home directories of the users, acting on the application classes defined before (see Table 8-2 on page 157). One addition that has been made is that the rules do not act on files that are located in the application subdirectory of the \homedirs\<uid> directory. This is to allow users to install application data that is linked to by an application. As applications can require that their data is kept together, this rule allows this.

Rule 9 moves all temporary files to tier 3, while rule 10 is the global exception rule that lists (and does not perform a move) all exceptions to the defined rules. If it should appear that the amount of files that are detected in this manner is too large, the rules (and service level) should be reviewed to include these files.

Figure 8-6 on page 164 shows an overview of the above.

8 If the system name is FileServer2, move all files that end with htm, html, or xml and that are in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder.

9 If the system name is FileServer2, move all files that are in a directory called tmp in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder.

10 If the system name is FileServer2, create a report that lists all files that are in the D:\homedirs directory and not in an application subfolder, and that do not end in doc, dot, xls, xla, xlt, csv, ppt, pot, pps, vsd, vss, vdx, vsx, vtx, txt, or rtf.

Rule ID Rule


Figure 8-6 Enforcing data placement using TPC for Data

One of the potential issues with implementing these data placement enforcement rules is a reduced user satisfaction level. As the rules are enforced, user data will be moved to different file systems, changing their directory access paths. While this might seam radical initially, there are some ways to mitigate the risk of dissatisfied users. Here are some ways to do this:

� Communicate the service level to users, explaining to them that they should place their data in the appropriate location. Correct data placement is a way to allow correct usage of available resources. This means that if a user uses a tier that is too high for a certain type of data that does not require the given performance or availability, it will reduce the available resources for data that require this.

� Inform users of file movement. This can be done by adding e-mail messages in the scripts, informing a user that his data has been moved.

� Configure applications in such a way that they use the correct location as the default location.

This completes the discussion on how to implement data placement enforcement.

T1

T2

T3

Office automation

WordExcel

PPTVisio

Text

AcrobatImages

Archives

Intranet

MultiMedia

AcrobatHTML

E-Mail

Mail DB

Other DB

D:\homedirs\<UID>

D:\homedirs\<UID>

D:\<uid>\notes\data\

E:\<uid>\notes\data\

E:\homedirs\<UID>

E:\homedirs\<UID>

Rule 4 (Constraint)All MP3 and WVM files must be on T3

Run TPC File Types

Report

Alert: Run Script

ConstraintViolation ?

Yes

Move data

Inform

End

No

*.mp3

*.mp3

Service Level AgreementData Classes

Data is placed in location where it should reside based on the agreed upon service levels.

Drive letters are relative to server containing the data

R1

R2

R3

R4R5

R5R6

R7

R8

R9

R10

Enforcement rule


Chapter 9. Data lifecycle and content management solution

In this chapter we discuss the third and final step in an ILM implementation process. After implementing service level management and other storage management related processes, defining an initial data classification, implementing storage tiers, and enforcing data placement, we now continue and add the following to the implementation:

� Data management as a function of the lifecycle of the data� Data management for database data and e-mail environments

9


9.1 Moving from the previous stepsIn the final ILM implementation step we continue on the road we took when starting with ILM in the initial implementation step. This means that we look further at how to place data in the appropriate storage tier based on the service level agreement. However, we add two new topics:

� Add the time or lifecycle dimension to the placement policy.� Look at structured and semi-structured data, comprised of databases and mail systems.

The above allow us to come to a totally integrated ILM solution for all types of data, approaching a correct placement towards the data value.

Figure 9-1 shows a summary of the placement of data as a function of its lifecycle. We still have the different data classes, but they are no longer bound to one storage tier. As the requirements of the data might vary according to the point in life of the data, it can be assigned to different storage classes at different points in time (a data class is never assigned to more then one storage tier at one point in time). This means that we need to add the lifecycle dimension to the service level agreement, and add a component of a method to move the data between tiers whenever needed.

Figure 9-1 Adding the lifecycle dimension

In the following section we go into further detail on how to define a lifecycle and how to manage the data accordingly.

9.2 Placement in function of moment in lifecycleAs explained in 1.2, “Why ILM is needed” on page 4, data value changes over time. This means that the placement as explained in Chapter 7, “ILM initial implementation” on page 119, and Chapter 8, “Enforcing data placement” on page 153, provides us with a system that allows placement in function of the initial value of the data, or better, the value of the data as reflected in the service level agreement discussing the data. Where this is already

Automaticdata mover


Tier 1

Data Class 1

Tier 2

Data Class 2

Tier 3

Data Class 2

Add SLO’s in functionof lifecycle of data


a step forward towards a unique placement of data, it does mean that we have old data mixed with newer data. Assuming that data loses its value over time, this means that we are actually storing data on a storage tier that surpasses the current value of the data.

In Chapter 7, “ILM initial implementation” on page 119, we discussed the possibility of using Tivoli Storage Manager for Space Management (HSM) solutions to move data to archival-type storage as a function of the time the data has not been used (accessed). The solution we present here is distinct from an HSM solution in that:

� HSM solutions typically move data off file systems, leaving a stub file (link) that automatically recalls the file when required. An archive solution moves the entire file, and the retrieve must be explicitly performed.

� HSM movement is done based on date only, while other events might be important to indicate that the data value has changed. An archive solution can do this.

9.2.1 Determining the value of the dataInformation or data value changes over time. This is a fact on which everyone seams to agree. The issue, however, is to understand how and why this happens. Figure 9-2 shows the results of a study performed by Enterprise System Group, indicating the change in value of information over time. We discussed this earlier in 1.2, “Why ILM is needed” on page 4; however, we repeat the chart for clarification.

Figure 9-2 The changing value of data over time

The data showed in the chart are averages across various industrys and can differ strongly in particular environments. In addition, the time dimension (X-axis) can also be thought of as an event dimension. Let us take the example of a sales cycle. At some point, an invoice is issued to the client. From that point to the point where the invoice is actually paid, the information needs to be kept at hand. Now, imagine you have a rule stating all invoices must be paid within 30 days. Does this mean that after 30 days the information can be considered as less important? No. The actual event that triggers the fact that the information changes in value is the fact or event that the invoice has been paid, even if this happens before the 30 days delay (or after). So instead of defining the value of this particular information as a function of time, a more meaningful and accurate way would be to define the value as a function of a certain event (which itself can be the expiration of a certain period of time). The actual event, or change trigger, should be dictated by the business processes supported by the information and ultimately the data.

7Days

14Days

21Days

28Days

3Months

6Months

9Months

1Year

5Years

10Years

0

20

40

60

80

100

120

Dat

a Va

lue

Data BaseDevelopment CodeEmailProductivity FilesMPEG

Source of graph: Enterprise Storage Group Time

Chapter 9. Data lifecycle and content management solution 167

This means that a first step in analyzing information as a function of its position in its life cycle is to draw up the business processes, indicating the link to the information and data used (a business process needs and creates information, which is made up of data). Figure 9-3 shows an overview of the relationship between the business process and the data.

Figure 9-3 Business process to data mapping

Once the links between the different process cycles and the data has been made, the value of the data in each business cycle must be defined. Note that one piece of data is typically used throughout the entire process, meaning the change trigger is typically associated with the movement from one cycle to another.

Next, the value of the data for a business process should be defined at each stage or cycle. Determining the value of the data is not easy, and requires knowledge of the business. The value of data can be defined by the potential loss if the data is unavailable. The availability of data can be defined using the following five states:

� Data is available with low response times (guaranteed performance).

This is the state that will typically be required for interactive transaction-based processes. For example, a person working in a financial environment should be able to enter transactions without noticeable delays.

� Data is available with higher response times (guaranteed availability and capacity).

This state is normally acceptable for aysynchronous-type processes, where the operation handling the data takes much longer than accessing it. A good example is e-mail. A delay of a second in sending an e-mail is not noticeable, and should not have a negative impact on business.

� Data is available on archival storage, and access can be delayed in the order of minutes.

This is the first archival state, typically achieved by using HSM solutions. Time to access data includes the time to retrieve the data from the Tivoli Storage Manager disk or tape storage pools. This is normally sufficient for parts of the business process that involve handling of old data. For example, a yearly inventory check could be allowed to wait for data.

Business process

Business processcycle



Information Information

Data Data Data

Input Output


� Data is available on archival media, and access can be delayed for hours.

In this case, the data is available on tape media (or similar), but not readily available. As a result, the recall of the data involves fetching and handling offline or off-site tape media. Normally, this is only usable for process parts that occur only on very rare occasions. An example is an audit where there is lead time available to retrieve data.

� Data is unavailable.

In this case, data is no longer available. This state is normally applicable when either of the following is true:

– Data is no longer required.

– Data can be recreated from other sources, and the expected frequency of use is very low.

A second step in determining the value is to create a loss versus cost analysis for each process cycle and its associated data. The loss, as explained above, is the potential loss in benefit if the data is located on a storage class that is not the ideal one. The ideal storage class for each data type is the one where the loss equals zero. The cost is the actual storage cost for each storage class. It is evident that a high performance disk solution will cost more per GB stored than a low cost tape solution.

Table 9-1 shows an example of how to do this.

Table 9-1 Loss and cost

For each process cycle, first determine the ideal class. As stated, this is the one where the loss is 0. For example, process cycle 1 requires storage class 1, process cycle 2 requires storage class 2, and process cycle 3 requires storage class 3. Next, the loss is noted for each storage class that is not the ideal one. For example, process cycle 1 would have a loss of 15 units if it was placed on storage class 3. The loss can be, for example, due to the fact that the duration of the transaction dictates the cost of the transaction.

The second part of the table is the cost. For each process cycle, we determined the data attached to it, and as a result the capacity required for it. Each storage class or tier has an associated cost per capacity. As a result, it is easy to determine how much it will cost to store the data from a process cycle on a certain storage class.

Next, it is the time to make the balance. Each solution will cost a certain amount. The total cost of each storage class is the sum of the storage cost and the loss. Figure 9-4 on page 170 shows this for one business process.

Process ID Storage class 1 Storage class 2 Storage class 3

Process cycle 1 Loss 0 2 15

Storage cost 10 7 5

Process cycle 2 Loss N/A 0 15

Storage cost N/A 7 5

Process cycle 3 Loss N/A N/A 0

Storage cost N/A N/A 5


Figure 9-4 Cost versus benefit for storage placement

The cost for storage class 1 is equal to the cost of storage. As this is the ideal storage class, the loss of placing data in that class is zero. The cost for the other two storage classes is the sum of their cost for storage and the loss. As you can see, it could be the case that the cost for a lower storage class is actually lower than the ideal solution (from a requirements point of view). In that case, there is no real reason not to place the data on the second storage class. The one thing that should be kept in mind is that not meeting service levels can also have other negative impacts than just financial, like user satisfaction of loss of image. As a result, make sure the impact of the lower service level on these other factors is well known and documented.

Figure 9-2 shows the above conclusion in numbers.

Table 9-2 Total solution cost

Process ID Storage class 1

Storage class 2

Storage class 3

Process Cycle 1 Loss 0 2 15

Storage cost 10 7 5

Total cost (loss + storage cost) 10 9 20

Benefit to ideal solution 0 1 -10

Benefit

Loss

Potential lossif storage not oncorrect storage classCost of

storing data on storage class

Storage class 1

Storage class 2

Storage class 3

0

Loss

Benefit

Cost

Ideal storage class


This concludes the discussion on determining the value and cost of the storage solution, based on the position of process data in the process lifecycle. In the next topic, we look at how we can perform the placement of this data in function of the business process.

9.2.2 Placement of dataOnce the value or use of the data is defined, a next step would be to define a trigger that indicates when the data changes from one state to another. From a business point of view, this might be rather easy. For example, the fact that an invoice has been paid provides an easy and logical description of the fact that the data associated with the order can change state, and that the process can move along. The problem, however, is linking the event in the business process to a state notification change for the data.

In order to understand the difficulties with this process, we must first make the distinction between the different types of data from a management perspective. A common way to describe differences is as follows:

� Structured data, which indicates data that is organized in a specific way, and where content has a limited level of freedom. In most cases, structure data is the data stored in the tables of a database. The advantage of structured data is that it is quite easy to create data classes based on the information stored, as the information itself can be used. For example, a database record could have a field indicating that the information is handled and can be archived.

� Unstructured data. Unstructured data is characterized by the fact that its content and format are totally free, without any rules. Typically, these are the normal files we find in our file systems. The main issue with unstructured data is to define rules to identify and manage (or classify) it, as the information available is limited to the standard metadata. Content-based solutions, which control which information is stored in the file, tend to be very difficult to apply. Imagine that you need to figure out, based on the content of a Microsoft Word document, whether the file is a business document. Short of a person manually scanning the document (which would be cost-prohibitive), another approach could be to search its content for a positive list of business terms. But, this list should be so large that it would basically filter nothing. In addition, problems like language, spelling errors, and others complicate this process. One way around this would be the use of template documents, making them lean to the semi-structured approach.

� Semi-structured data is data that lies between structured and unstructured data. Content is still mainly free format, but a lot of other information is fixed and needs to comply to certain rules. For example, e-mail messages can be considered as semi-structured. The

Process Cycle 2 Loss N/A 0 15

Cost N/A 7 5

Total cost (loss + storage cost) 7 20

Benefit to ideal solution N/A 0 -13

Process Cycle 3 Loss N/A N/A 0

Cost N/A N/A 5

Total cost (loss + storage cost) N/A N/A 5

Benefit to ideal solution N/A N/A 0

Process ID Storage class 1

Storage class 2

Storage class 3


content of an e-mail message is free; however, some fields like sender, destination, subject, urgency, confidentiality, and others are well defined. As a result, we can create more intricate rules for managing these types of data, as we have more information available.

For structured data (databases), this might be a field in the database indicating a change in state. However, for unstructured data and semi-structured data, no metadata information exists that indicates this state of change. As a result, our approach to move data based on the metadata as described in Chapter 8, “Enforcing data placement” on page 153, cannot be used as such for event-driven data placement. This means that an additional layer must be added to describe the documents’ state inside the business process, which will add information to the existing document metadata. Such products are commonly known as document management solutions. A document management solution will typically exist in a database, describing the information in a way that is in line with the business process using the document. As a result, the document management system might be used to gather information about which storage media the document should be.

If document management systems are not available, it might be worthwhile to try to approximate the event changes by using the available metadata. One way to do this is to link the file’s location or name to its state. For example, a user moving all his project-related documentation to an archive folder might be such a thing. As you can understand, this would mean that the responsibility of placing the data is left to the user, and, more importantly, that there is no way to identify these files if not placed correctly. As a result, this way of working might prove to be inefficient and unreliable for providing correct data placement.

Another indicator might be the data values of a file. As explained earlier, metadata of a file contains three dates:

� Creation data� Last write or modification date� Last access date

The creation date can typically not be used to describe state changes of a document, as it is fixed and only indicates when a file is created.

The remaining two dates do provide us with some information that indicates the use of the file. The last modification date will reflect when a file was last opened for update, and the last access date reflects when a file is opened for read. This means that either of these two dates provide us with the required information about when a file was last used, and what was the action performed on it. The date stamp, which most likely will be providing the required information, is the last access date. Deletion of data can be done by both solutions.

As explained above, using this last access date as the sole indicator of the file’s position in the business process would be incorrect. On the other hand, business events that trigger file state changes are typically definable in the period on which the data has not been used. Let us explain using the invoice example shown in Figure 9-5 on page 173.


Figure 9-5 Process example

Figure 9-5 shows four distinct items:

� The process step, for example, the creation of the invoice.

� The event that moves from one step in the process to the next one.

� The data action, which can be summarized using the following five actions: Create, access, modify, delete, and move. In the ILM context, we are interested especially in any operation that is not done by the process itself, which means the move or delete operations.

� The data used in this example simplified to a single file object.

When the invoice is issued, a document is created (invoice.doc in this example). Next, the process goes into a wait status, which can take a maximum of 30 days, as this is the period of time in which invoices are due for payment. When the wait period has elapsed, or if the payment is received, the next step begins. If the payment is not received within this 30-day period, the invoice will be reissued and a reminder will be sent. As a result, the original invoice is opened to provide input for the reminder invoice (reminder.doc) and another 30-day wait period begins. If the payment is received within this 30-day period, the invoice cycle is complete and the documents attached to it can be archived. Archiving consists of moving the documents to the next storage tier (online or archive type).

Now, the goal of this example is to find a way to define the different data actions as a function of the events occurring. Table 9-3 on page 174 provides an example on how to do this, listing every action that occurs on each part of the data.

Invoice paid

30 days elapsed

Create

Invoice.doc

Issue invoice

Reissue invoice

Archive

Wait for payment

Access Create

Reminder.doc

Original invoice is used to createreminder.

Process step Event Data action Data

Invoice send

Invoice resend

Move

30 days max


Table 9-3 Data actions in function of event

Once the above is done, a next step is to find a way to describe the data move or deletion actions in function of time elapsed between one of the dates describing these files (creation, access, modification). The easiest way to do this is to consider which events are time driven. In this example, the only event that is time driven is the passing of 30 days without receiving payment. Looking at Table 9-3, we can see that this event uses both the files involved in the process. As a result, we can conclude that if a file has not been accessed in 30 days, the event that describes the fact that the payment has been done has occurred. This means that we can conclude that the files may be moved to archive if they have not been accessed in more then 30 days.

Note that we are approximating the events in terms of the metadata of a file. As a result, the move based on the last access date may not always be correctly reflecting the actual business process. An example of this is that if the invoice gets paid in, for example, 5 days, the data will still reside on the initial storage tier for 25 extra days (30 - 5 = 25) until the 30-day counter has elapsed. A second example would be the possibility that more then one time-driven event occurs in our process, each using a different delay. If we would modify our example and state that a reminder of an invoice must be paid in a 10-day period (rather then a 30-day period), this would be the case. The most correct solution (from a process point of view) would be at that moment to take the largest delay, reducing the risk of not having the data at the correct location at the correct point in the process. The downside is that data might remain longer then required on a higher class storage tier.

9.2.3 Movement of dataWith the above defined, the next question is how to actually perform the move and delete operations defined by the process events. We offer two possibilities:

� TPC for Data with attached scripts� Tivoli Storage Manager for space management clients, including the Windows HSM client

Both solutions can be used to move data from one location to another; however, their use depends on what type of move you are performing. When the purpose is to move files from one file system to another (attached to another storage tier), TPC for Data is the most appropriate solution. When moving the data to archival-type storage, the space management solution is the best. The actual selection can be done based on the possible use states of the data, as we explained above. Table 9-4 on page 175 shows the selection process.

Invoice sent Invoice paid 30 days elapsed Invoice resent

Invoice.doc Create document (inherent to process step).

Move document. Open document (inherent to process step).

No action

Reminder.doc Document does not exist.

Move document (if it exists).

Create document (inherent to process step).

No action

Note: The above table describes the data portion in terms of effective file objects, using the name of the file. In real life, these should be noted in more general descriptive ways, like the path name, part of the file name, the extension, or the owner. This means that instead of using individual file names, file or data classes should be used.


Table 9-4 Best product to move data in function of data use

The TPC for Data movement process will be the same as the one described in 8.2.2, “Enforcing data placement” on page 161. An exception report will be generated, of a list of files that have not been accessed or modified in a certain time period and that respond to the specification of the data class to which they adhere. The actual data movement will then be done by running a script (automatically trigger by TPC for data) using the list of files provided.

For the space management solution, files will be migrated to the IBM Tivoli Storage Manager server. This function is described in “Stale data” on page 141.

9.2.4 Using document management systemsDocument management systems will add a layer to a normal document, allowing better management in function of its lifecycle. IBM Lotus Domino Document Manager provides such a layer.

Document Manager is a document manager system running on a Lotus Domino server, which is built on the storage metaphor of file rooms, file cabinets, binders (folders), and documents.

The document manager solution is built from the following components:

� The library

The library is the entry point into Document Manager. It is the main view from which users navigate the storage system to access documents. You can have more than one library per document manager server. Each library is a separate storage hierarchy with separate access control. While you cannot share documents between libraries, you can move documents and file cabinets from one library to another.

� File rooms

The file room provides a way to logically categorize file cabinets to facilitate navigation. All file cabinets are contained in a file room. When creating a new file cabinet, the user can add it to an existing file room or create a new file room.

� File cabinets

Document Manager uses file cabinets to organize and manage binders and documents. File cabinets consist of Notes databases, (nsf) files that reside on the Domino server. Because file cabinets are nsf files, the necessary Notes forms for entering information (metadata) into a document are contained in the file cabinet along with the document content. The views for accessing the information and the application logic that automates the processes related to the document are also included.

� Binders

Data use - Action to this state from any higher one

Action required

TPC for Data Space management

Data is accessed and updated on limited occasions and needs to be readily available.

Move X

Data is dormant for a defined period of time, after which it will be accessed.

Move X X

Data needs to be available, but chances of reuse are very low.

Move X

Data is no longer required. Delete X X


The Document Manager binder is a container within a file cabinet that is used to group documents logically. Each binder has attributes that facilitate organization and retrieval. System-generated attributes associated with every binder include the title, type, author, creation date, modification date, and number of documents. User-defined attributes can also be applied to every binder within the file cabinet regardless of binder type.

Binders can also be grouped in categories to facilitate organization of large collections of documents.

� Documents

A document in Document Manager is the information that is being managed. It can be a data file like a Word processed document or a spreadsheet, an OLE object, or a Notes document. It is given a descriptive title and saved in a binder within a file cabinet.

Each document has attributes, or metadata, that facilitate document organization and retrieval, generally describing the piece of information that is saved in the document repository. System-generated attributes are associated with every document and may include, for example, the document author, creation date, date of last modification, or document title. Application attributes are specific to the individual application and may include, for example, the project name, document type, or proposal number. These attributes can be configured.

Access to document content and attributes is limited to authorized managers, editors, and readers.

The check-in/check-out feature of Document Manager ensures that only one user can modify a document at a time. When the document is checked out, it is locked in Document Manager. When it is checked back in, it can be as a new draft, a new version, or an update.

The document manager will add certain ILM functions to the storage and management of documents. These include:

� Automation of document versioning

For example, if a user checks in a newer version or draft of a document, it can be automatically increased in version number. Other possibilities include the user’s ability to manually set a version or automatically overwrite the previous version, keeping only the most recent one.

� Document review and approval options

Collaborative documents are often formally reviewed by a group and finally approved for publication. Any current draft document can be submitted for review by any draft editor who then checks out the latest draft, opens the working copy, and sets up and initiates a review cycle. When the review cycle is complete, the approval cycle can be set up and initiated.

When you create a document type, you can specify default review parameters for documents of this type (including parameters like when the documents should be reviewed or approved), and identify the reviewers and approvals and their roles in the review and approval process (editor, reviewer, approver manager) and the type of review or approval (serial or parallel).

� Document archiving and retrieval options

When a document has progressed through its lifecycle and is no longer needed for regular and instant access, you can archive the document's content to an external storage facility where it can easily be recalled. Archiving large, out-of-date documents frees space for current Document Manager documents.

A document can be archived based on criteria you specify in the document type form, or it can be manually marked for archiving. A proxy document that contains the profile and


security information is retained in the file cabinet so that the document can be retrieved from the archive.

The following periodic background agents are used for the archiving and retrieval processes:

– Mark for Archive identifies documents that are ready for archiving based on the archive triggers defined on the document type form. If a current version of a document is being archived, the agent locks the document so that it cannot be checked out for editing before being archived.

– Archive to File System extracts a file attachment from the document, stores it in the external file system, and creates a proxy document.

– Retrieve from File System uses the information in the proxy document to retrieve the file attachment and restore it to the Document Manager document.

– The Archive to Tivoli Storage Manager add-in provides the capability to archive and retrieve Document Manager data on a Tivoli Storage Manager server. The add-in operates as a task that runs daily. It requires that a Tivoli Storage Manager Client be available on the Document Manager server.

– When a document is marked for IBM Tivoli Storage Manager retrieval, the add-in task attempts to retrieve any archived rich text and/or file attachments from the appropriate Tivoli Storage Manager server, as specified by the corresponding document type.

This completes the section of document management and file management in a lifecycle dimension. In the next topic, we discuss e-mail management.

9.3 E-mail managementUntil now, our main scope was file systems containing normal file objects. In this section, we start by looking at application data, and in particular e-mail messages. As discussed earlier, e-mail messages are considered to be semi-structured data, making them ready for more intricate management policies. One complication, however, is that e-mail messages are not file level types of objects, meaning they must be managed through the e-mail applications with an additional content management tool.

In the following sections, we look at how to do this, using the same approach as the one we used until now:

� Reclaim invalid space.� Establish policies.� Automate data placement.

9.3.1 Reclaim invalid spaceAs with normal file systems, a starting point in the implementation of ILM practices on e-mail environments is the reclamation of space that is currently used by information that is not used for business purposes, or that contains data that is no longer considered usable. As with normal file systems, e-mail environments tend to be a breeding place for invalid data.

Depending on the policies in place, the invalid data space usage can be even more important than normal file system environments. Think about the ease of distribution of documents, business related or not, to large numbers of users. When creating policies for e-mail environments, one of the key points to embrace is collaboration. Actually, this is what e-mail messaging is all about: Allow employees and clients to share information, improving collaboration. But, strangely enough, the use of e-mail attachments can tend to lessen the


efficiency of collaboration by creating separate, not yet necessarily idenditcal, versions of information. Imagine an e-mail message sent with a document, requesting a review of the information. In most cases, all receivers of the document will read, correct, and resend the document. This means that at that point, n different versions of the document exist, making it more difficult to manage and having an impact on the storage used.

Figure 9-6 shows an example of what could happen. Imagine a person sends an e-mail to five colleagues, requesting feedback. If all five answer with an updated version, and reply all to everyone in the initial addressee list, you would have 25 times the original message after the first resend. If a document was attached, well, you would need 25 times the space for storing one piece of information.

Figure 9-6 E-mail propagation

The above clearly shows the risk involved in using e-mail as the sole tool for information sharing. It might even be worse if any of the addressees feel the need to copy additional people. A way around this is to create a shared store in which information can be shared in a more intelligent and efficient way. Solutions like team rooms, Intranet Web sites, forums, or blogs are such solutions. Instead of spreading out the information, you are actually consolidating the information. The advantage of working this way is dual:

� Less space is required.

� Consolidation of information, making sure that everybody can access all information regarding a topic, also meaning that no multiple different versions of documents exist.

Where the above is not really part of the scope of an ILM solution, it should be an initial step towards storage optimization. Remember that reducing the amount of storage only makes further steps easier. What should be done from an e-mail management point of view is the implementation of two rules that indirectly enforce people not using the e-mail systems as document storage. These two practices include limitating mail box sizes and creating limits for attachment sizes.

TPC for Data can provide you with a general look at how much space is used by the mail system, based on a file type report. This means that for Lotus Domino environments, you can obtain usage numbers for individual mail boxes (as each mail box has its own domino mail database file). For MS Exchange environments it depends on the configuration, but in most cases you will only be able to see the global file size used by the mail database stores.

Grow

th


9.3.2 E-mail archivingE-mail archiving can be used to act upon the stale or old data in e-mail environments. This reduces the size of the active mail databases, improving performance and manageability. Figure 9-7 shows a diagram of a basic archiving system, where a content management solution provides an interface between the e-mail application and a storage manager, including the used archival storage.

Figure 9-7 E-mail archiving diagram

Today, many people have implemented a non-central archiving solution, in which the user initiates the archive task, saving older mails to a so-called archive mail database or file. The disadvantage of working in this way is that the archive is difficult to manage. The first possibility is that the user saves his file on his local workstation disk drive. As workstations are often not backed up, this means that there is a serious risk for file loss. A second possibility is to store this archive on centrally managed storage. The problem there is that this means that backup operations always back up the complete file, even if only a small part has been changed. As these files tend to grow to a large size, this means a considerable backup overhead.

As with normal file archiving, a first step in the process is to define the rules for archiving e-mail messages.

IBM provides two solutions for e-mail archiving: CommonStore for Lotus Domino and Commonstore for MS Exchange. The following sections explain how the products work and what they bring into the e-mail archiving task.

CommonStore for Lotus DominoCommonsStore for Lotus Domino will move documents or folders from the Lotus Notes databases to an archive location, from where they can be retrieved when required. The archiving can be done manually or by using an automated policy-driven solution. The policy will archive documents based on the following criteria:

� The age of the object. The age can be determined based on the creation date or the last modification date. In addition, you can specify to count the age between these dates and the current date, or a specified date. For example, a file created on 1/1/2005 could be compared against 30/1/2005, and show an age of 30 days.

� The size of the object, allowing documents larger than a certain size to be archived. In addition, the rule can be combined with a rule to only apply this rule in databases that are larger than a certain size.

� Any object that responds to a Lotus Notes formula.

E-Mail application

Content Manager

StorageManager

Storage Devices


If the archive process finds an object that responds to the policy, it will perform an archive action. The archiving can be done at different levels, which include:

� Archiving of the document attachments in Notes documents� Entire notes documents, including the attachments and information� Possibility to archive in other formats, including XML, ASCII, or RTF

When CommonStore sends the archived object, it can send it to any of the following three destinations:

� DB2 Content Manager� DB2 Content Manager OnDemand� Tivoli Storage Manager

After archiving the object, there are different possibilities on what to do with the original documents:

� Leave them untouched (only good as backup solution).� Keep a pointer to the archived document (transparent to users).� Delete all information (need for an external archive search and retrieval tool).

As stated above, CommonStore can work with two Content Manager solutions. There are a couple of advantages to doing this, rather then archiving directly to Tivoli Storage Manager. One of the advantages is that the Content Management database will store the metadata information concerning the archive object. As a result, searches and queries can be performed, even if the stub file is no longer available in the original Lotus Domino database.

Both of the applications (DB2 Content Manager and DB2 Content Manager OnDemand) are capable of ultimately sending the stored files to Tivoli Storage Manager. So, you are adding one layer between CommonStore and Tivoli Storage manager.

The differences between DB2 Content Manager and DB2 Content Manager OnDemand are at a functional and flexibility level. DB2 Content Manager is a complete product, which allows advance interaction using the APIs available. As a result, DB2 Content Manager can be a middleware for more complex, user-written applications. One of the advantages that DB2 Content Manager in combination with Lotus Domino has over DB2 Content Manager OnDemand is that it has a feature called single-instance-store. This means that if one person sends an e-mail with a PDF attachment in it to 50 people, the archiving system is clever enough to save just one copy of the PDF file and link the 50 archived e-mails to this one physical file, saving a large amount of storage space.

The advantages of DB2 Content Manager OnDemand over DB2 Content Manager in an archiving environment with Tivoli Storage Manager is that it will perform aggregation and compression of the archived objects. This means that it will not send single mail objects to the IBM Tivoli Storage Manager Server, but aggregated, larger files. As a result, the performance for storing and retrieving them is higher (especially when located on tape drives), and the impact on the IBM Tivoli Storage Manager Servers database is less.

CommonStore for MS ExchangeCommonStore for MS Exchange is very similar to CommonStore for Domino when it concerns functionality. The main differences lay in the definition of the policies, as they reflect the mail application structure. For example, the CommonStore for MS Exchange product allows archiving based on size of user mailboxes, rather then the database size.

All available archiving options are also the same, and include DB2 Content Manager and DB2 Content Manager OnDemand, as well as a direct interface to Tivoli Storage Manager.


9.4 IBM System Storage Archive ManagerThe IBM System Storage Archive Manager product is the new, renamed version of the Tivoli Storage Manager for Data Retention product. IBM System Storage Archive Manager is an archiving solution, based on Tivoli Storage Manager, with event-driven archive management. In a standard IBM Tivoli Storage Manager solution, archives are deleted automatically when a certain retention period elapsed. The retention period is defined as the number of days the archive is stored on the IBM Tivoli Storage Manager Server. With IBM System Storage Archive Manager, the retention period is controlled by the archive client sending the data, and this through the client API component. This means that each archive can have a different retention policy, based on the policy set in the application sending the data to the IBM System Storage Archive Manager server. There are two possibilities on how to set when data will be deleted or expired:

� Chronological archive retention� Event driven archive retention

IBM System Storage Archive Manager controls archive retention using three parameters: RETVER, RETINIT, and RETMIN. The retain version value (RETVER) within the archive copy group specifies the number of days to retain each archive object. This has always been available within the IBM Tivoli Storage Manager archive copy groups.

IBM System Storage Archive Manager introduces a parameter called the RETINIT parameter, which specifies when the time specified by the retain version (RETVER=n days) attribute is initiated. The possible values for this parameter are creation or event, which basically control whether the data will follow chronological retention rules or event-based retention rules. By setting this parameter to creation (RETINIT=creation) in the archive copy group, you specify that the retention time specified by the RETVER attribute (retver=n days) is initiated at the time an archive copy is stored on the server. This is referred to as chronological archive retention. By setting this parameter to event (RETINIT=event) in the archive copy group, you specify that the retention time (RETVER=n days) for the archived data is initiated by an application that utilizes API function calls. If the application never initiates the retention, the data is retained indefinitely. This method of archive retention is referred to as event-based archive retention.

The RETMIN value indicates the minimum number of days an archive needs to be kept, regardless of the value of the RETVER parameter. This was also introduced with the IBM System Storage Archive Manager.

9.4.1 Chronological archive retentionFigure 9-8 shows how a chronological retention policy works.

Figure 9-8 Standard IBM System Storage Archive Manager archive retention

Application stores object on

ISSAM Retention period

Time

ISSAM deletes the object

RETVER=365

Day 0 Day 365RETINIT=CREATE


With RETINIT=creation and RETVER=365 days, a file that is archived on day 0 is retained 365 days and becomes eligible for expiration. In this case, after 365 days from the time the data was created, all references to that data are deleted from the database.

9.4.2 Event-based retention policyIn certain situations, data retention periods cannot be easily defined, or they depend on events taking place after the data is archived. To answer to this problem, archives can now be managed using the occurrence of an event. This means that the retention counter will only start when an event occurs. To do this, the RETINIT parameter must be set to EVENT. In order to be able to maintain the archival storage, however, a second parameter was added, called RETMIN. This parameter controls the minimum retention period of the archive from the time of archiving.

Figure 9-9 shows an event-driven archiving mechanism.

Figure 9-9 Event driven archiving mechanism - Honoring RETVER

In the above example, the archived data is retained for a minimum of 730 days (RETMIN=730). If the retention time (RETVER) is activated through an event, IBM System Storage Archive Manager assigns an expiration date for this object, which is the date of the event (day x) plus the RETVER value (365 days). As a result, the expiration of the object will occur on day x+365.

If the expiration event occurs, and the RETVER timer reaches its end before the expiration of the RETMIN value, the file will be kept until this RETMIN value expires (see Figure 9-10 on page 183).


ISSAM

Time


RETVER=365

Day 0 Day x+365

RETINIT=EVENT

Application sends expiration start

event

Day x

RETMIN=730


Figure 9-10 Event driven archiving mechanism - honouring RETMIN - case 2

The IBM System Storage DR 550 (DR550) is an integrated solution of IBM System Storage Archive Manager and the required hardware, including a processor (POWER5™), disk (DS4000) and tape devices (optional), which provides policy-based non-erasable, non-rewriteable storage. For more information on this product Understanding the IBM TotalStorage DR550, SG24-7091.

Note: Event based expiration can only be used with applications using the IBM Tivoli Storage Manager API to send the event. A number of independent software vendor applications are available and certified for both IBM System Storage Archive Manager and the DR550. In addition, the IBM Content Management suite of applications is ready to used the benefits of the IBM System Storage Archive Manager solution. This includes the following products:

� IBM DB2 Content Manager for Multiplatforms� IBM Content Manager for z/OS� IBM DB2 Content Manager OnDemand for Multiplatforms� IBM DB2 Content Manager OnDemand for z/OS and OS/390®� IBM DB2 CommonStore for Exchange� IBM DB2 CommonStore for Lotus Domino� IBM DB2 CommonStore for SAP� IBM Backup Recovery and Media Services for iSeries™


ISSAM

Time


RETVER=365

Day 0 Day 730

RETINIT=EVENT

Application sends expiration start

event

Day x

RETMIN=730

Day x+365

x+365 < 730



Related publications

The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.

IBM RedbooksFor information on ordering these publications, see “How to get IBM Redbooks” on page 185. Note that some of the documents referenced here may be available in softcopy only.

� IBM TotalStorage Productivity Center V2.3: Getting Started, SG24-6490

� IBM TotalStorage SAN Volume Controller, SG24-6423

� Understanding the IBM TotalStorage DR550, SG24-7091

� The IBM TotalStorage Solutions Handbook, SG24-5250

� IBM Tivoli Storage Manager Implementation Guide, SG24-5416

� An Introduction to Storage Provisioning with Tivoli Provisioning Manager and TotalStorage Productivity Center, REDP-3900

� Exploring Storage Management Efficiencies and Provisioning - Understanding IBM TotalStorage Productivity Center and IBM TotalStorage Productivity Center with Advanced Provisioning, SG24-6373

� Provisioning On Demand Introducing IBM Tivoli Intelligent ThinkDynamic Orchestrator, SG24-8888

How to get IBM RedbooksYou can search for, view, or download Redbooks, Redpapers, Hints and Tips, draft publications and Additional materials, as well as order hardcopy Redbooks or CD-ROMs, at this Web site:

ibm.com/redbooks

Help from IBMIBM Support and downloads

ibm.com/support

IBM Global Services

ibm.com/services




http://www.ibm.com/support/

http://www.ibm.com/support/

http://www.ibm.com/services/

http://www.ibm.com/services/


Index

AAccess File Summary report 70access load reports 83access pattern 125Access Time report 83Access Time Summary report 71accessibility 125, 147active data 33Advanced Provisioning 45alert 44ANSI 15apacity planning 129API 55archive data 9archive management 14, 27archive management system 14asset reporting 42–43automated management 22availability 124Availability Management 19availability reporting 41, 44

Bbackup reporting 44backup window 6bare metal recovery 55billing 125Business Continuity 22business continuity 23business continuity management 19business data 128business data value 168

Ccapacity managemen 121Capacity Management 19capacity management 149

analysis 123monitoring 122

capacity plan 123capacity planning 70capacity reporting 44CCTA 17Change Management 19change request 122change trigger 167charge back 125CIM 113cluster

cache 48collaboration 177CommonsStore 179compliance data 11

© Copyright IBM Corp. 2006. All rights reserved.

Configuration Manaagement 18configuration repository 18consolidation 22content management 38, 56content manager 14Content Manager Family 11Continuous Operations 23copy services 110creation date 142Cristie 55CSV output 43

Ddata access patterns 142data classes 30, 125, 149, 166

enforcing 161data classification 30, 38, 89data lifecycle 4, 6data lifecycle management 12–13data placement 154, 161data replication 139data retention 4, 108data sharing 5data types 90data value 9, 168database

asset reporting 42trend reporting 42use too much space 148

database free space 148database reports 85Database Storage By Computer report 85database unused space 96data-centric storage 34DB2 24DB2 CommonStore 59DB2 Content Manager 25, 57, 180DB2 Content Manager Family 11DB2 Content Manager OnDemand 180Disaster Recovery 23Disk Capacity Summary report 72DMF 16DMS tablespaces 87document archiving 176document management 172document review 176document versioning 176DR550 24, 112, 183duplicate data 5, 129, 139, 156Duplicate files 91duplicate files 90, 97Duplicate Files report 81

187

EE-mail archiving 179enforce data classes 161enterprise content managemen 56ERP 14escalation 18ESS 24exception report 148exception reports 123

Ffailover 46file system free space report 147file system groups 67File Types report 82files

statistics 42filesystem capacity 40Financial Management 19FlashCopy 46free space 127

Ggovernance model 120growth reports 123

Hhardware infrastructure 26helpdesk 125Hierarchical Storage Manager 24high availability 23HSM 141, 167HTML output 43

IIBM Risk and Compliance framework 11IETF 15ILM 4, 22

assessment and planning 30quick assessment 65ROI 98SNIA 16storage occupation 127

ILM elements 6inactive data 5, 33, 46Incident Management 18information classes 31Information Lifecycle Management 22infrastructure management 26Infrastructure Simplification 22infrastructure simplification 22invalid data 92, 156ISO 15ISV 55IT services management 17ITIL 16, 120

LLargest Files report 80, 96last access date 142Last access date report 92last modification date 142load-balancing 46logical volume 154Lotus Domino

CommonStore 179Lotus Domino Document Manager 175LPAR 26LVM 147

Mmanage 20management reports 19Master Console 48master software repository 19metadata 141, 157, 171Modification Time report 84monitoring 34, 121, 126–127MS Exchange

CommonStore 180

NNAS 97non=business data 156non-business 91non-business data 128–129non-business files 82, 90, 97non-disruptive data movement 150

Oobsolete data 141OGC 17Oldest Orphaned Files report 73OLE 176On Demand 21on demand storage environment 25Oracle

wasted space 88orphaned data 141orphaned files 91

Pperformance 19, 124performance management 121performance trends 123periodic report 127Perl 163ping 41, 44planning 121policy-based archive management 14PPRC 110probe 42–44Problem Management 18problem resolution 18


profile 42

Qquick assessment 65

Rraw capacity 128raw logical volume 148recall 142recoverability 124recovery management 27recovery point objective 124recovery time objective 124Redbooks Web site 185

Contact us xvreference data 10regulation 10regulatory requirements 4, 125Release Management 19remote copy 125report file system group 92reporting 127

assets 43availability 44capacity 44usage 44usage violation 44

repository 18response tim 124retention managed data 9retention-managed data 11ROI 65, 98RPO 124RTO 30, 124

SSAN Volume Controller 147

GUI 46scalability 46storage utilization 46

Sarbanes-Oxley 100SATA 12, 100scan 42–43, 69

TPC for Data 44SDD 107security 125security violations 19Segments with Wasted Space report 88semi-structured 171semi-structured data 141service based environment 120Service Catalog 19Service Catalogue 19Service Continuity Management 19service definition 120Service Delivery 17, 19Service Delivery Agreements 19Service Desk 18

service level 154service level agreement 122, 124–125, 129, 133

creating 125monitoring 127reporting 127

service level breach report 127Service Level Management 18–19service level management 124, 149service level violations 122service levels 30service management process 17Service Support 17–18services management 17SLA 122, 124, 133, 155SLA see service level agreementSLAs 18SLO 155SMI 15SMI-S 15, 26SNIA 15SNMP 41

trap 44stale data 92, 129, 141stale files 91, 98standards 15storage

asset discovery 42capacity 44growth 44reporting 43usage trends 42

Storage Access Times report 74storage capacity plan 123Storage Capacity report 75storage class 125storage classes 125storage infrastructure 26, 38Storage Infrastructure Management 28storage management 4storage management optimization 124Storage Modification Times report 76storage occupation optimization 127storage pool 33storage resource utilization 122storage space allocation 127storage threshold 123storage tier 68, 154storage tiers 32, 38, 125, 149storage utilization 6storage virtualization 26structured data 141, 171stub file 142Subsystem Device Driver (SDD) 46support 125SVC 24, 107, 147, 150, 155Symantec 55symbolic link 140symbolic links 97synchronous copy 110synchronous remote copy 125

Index 189

System Storage Archive Manager 181systems layer 26

Ttape 150TCO 4, 9, 22technology refresh 121technology review 121temporary files 82, 91–92, 97, 129, 133threshold 123throughput 124tiered storage 7, 107, 129, 149tiered storage environment 155Tivoli Storage Manager 107, 142Tivoli Storage Resource Manager

alert 44Tivoli Storage Resource Manager for Databases

asset reporting 42trend reporting 42

Total Database Freespace report 87Total DMS Container Freespace report 87Total Freespace report 77TotalStorage Productivity Center for Data 38TPC 38TPC for Data 31, 68, 122–123, 130, 142, 147, 161

Access file summary report 70access load reports 83access time report 83access time summary report 71agent statistics 40alert 44alert log 44alerts 41, 161asset reporting 42–44availability reporting 41backup reporting

reportingbackup 44

capacity reporting 44CSV output 43dashboard 40database reports 85database storage by computer 85discovery 41disk capacity summary report 72duplicate files report 81Enterprise-wide Summary 40File Reports 80file statistics 42file types report 82filesystem capacity 40largest files report 80, 96last access date report 92modification time report 84oldest orphaned files report 73OS User Groups 44ping 41, 44probe 42–44profile 42reporting 43

reports 69scan 42–43, 69scheduled jobs 41script 44, 163segments with wasted space report 88storage access times report 74storage capacity report 75storage modification times report 76summary reporting 43System Reports 70total database freespace report 87total DMS container freespace report 87total freespace report 77trend reporting 42Triggered Action 44usage reporting 44user space summary 40user space usage report 78wasted space report 79

TPM 114–115trend reporting 42Triggered Action 44TSM

API 55TTPC for Data

HTML output 43tuning 121

UUltraBac 55unnecessary data 129

duplicate data 139non-business data 129stale data 141temporary files 133

unstructured data 56, 171UPS 48usable capacity 128usage reporting 44user centric organization 140user space summary 40User Space Usage report 78

Vvalid data 92, 156valid files 91value of data 168vDisk 48virtual disks 111virtualization 22, 26, 28, 33Visual Basic 163volume manager 147

Wwasted space 88Wasted Space report 79WBEM 113Windows


event log 41workload management 19WORM 112

Index 191


ILM Library: Techniques w

ith Tivoli Storage and IBM TotalStorage


ith Tivoli Storage and IBM

TotalStorage



TotalStorage Products


ith Tivoli Storage and IBM TotalStorage Products







®

SG24-7030-00 ISBN 0738496049

INTERNATIONAL TECHNICALSUPPORTORGANIZATION

BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE

IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information:ibm.com/redbooks

ILM Library:Techniques with Tivoli Storage and IBM TotalStorage ProductsLearn about basic ILM concepts

Use TPC for Data to assess ILM readiness

Stages to ILM implementation

Every organization has large amounts of data to store, use, and manage. For most, this quantity is increasing. However, over time, the value of this data changes. How can we map data to an appropriate storage media, so that it can be accessed in a timely manner when needed, retained for as long as required, and disposed of when no longer needed? Information Lifecycle Management (ILM) provides solutions. ILM is the process of managing information—from creation, through its useful life, to its eventual destruction—in a manner that aligns storage costs with the changing business value of information. We can think of ILM as an integrated solution of five IT management and infrastructure components working together: Service management (service levels), content management, workflow management (or process management), storage management, and storage infrastructure.

This IBM Redbook will help you understand what ILM is and why it is of value to you in your organization, and provide you with suggested ways to implement it using IBM products.

Back cover