Information Lifecycle Management Modeling using PowerDesigner

white PaPer

www.sybase.com

Information Lifecycle Management Modeling using PowerDesigner®

Ke LiSoftware Engineering Development

taBle of contents 1 Introduction 1 What is ILM 1 Generic ILM implementation 3 PowerDesigner Solution 3 Data classification 3 Model the databases 3 Model the Policy and Storages 4 Data Archiving Management 5 Simulation 5 Generic workflow 5 Example 6 Introduction 6 Model the IQ database 7 Define Lifecycle policy 8 Apply lifecycle policy to Tables 9 Generate Script and execute 10 Summary 10 Prospects 10 Conclusion 10 References

aBstractionInformation is the key property for an enterprise, how to construct a suitable IT environment to better manage business information becomes more and more important. The concept of Information Lifecycle Management (ILM) is trying to present solutions and guidelines for this purpose.

This article introduces the PowerDesigner solution for ILM, which is a modeling approach. We will also show an example to explain how to implement the lifecycle management using PowerDesigner based on a Sybase IQ data warehouse.

1

1 introductionBusiness information consists of any form of data with business context, such as emails, images, structured databases, even files not stored on electronic medium. It is always a key property for an enterprise, and the volume is growing expeditiously. Managing and maintaining such large amounts of data becomes extremely complex. In such a competitive business environment how to manage your enterprise information properly and effectively becomes more and more important.

Groups, companies, and vendors around the world are trying to present a solution to better meet enterprises’ needs. Information Lifecycle Management (ILM) is one of the concepts that serves this purpose.

1.1 What is ILMInformation Lifecycle Management has a large vision—the key purpose is using proper processes, tools, and services to align business needs to IT infrastructures. The Storage Networking Industry Association is one of the groups working on this subject. Here is their definition for ILM:

The policies, processes, practices, services and tools used to align the business value of information with the most appropriate and cost effective infrastructure from the time information is created through its final disposition. Information is aligned with business requirements through management policies and service levels associated with applications, metadata, and data.

If ILM is applied to an enterprise properly, it could:

Increases corporate competitivenessReduce the cost of maintaining for both hardware and softwareImprove process/application performance and effectivenessEnabling regulatory compliance and data protection

ILM is a high-level guidance; the implementation varies between different enterprises and environments. The policy, process, and tools used should meet certain enterprise needs, but there are still some generic ways to implement ILM.

1.2 Generic ILM implementationBasically, data could be stored on different storage devices, for example, high speed disk drives, optical discs, or tapes. The idea is to put different data onto different types of storage devices based on certain requirements, such as performance, cost, security, recovery speed, etc.

••••

2

figure 1-1 – Tiered storages

Statistics say that the access frequency of data decreases when time passes, storing all the data on the primary storage will ensure high maintenance costs. As the following figure shows, the same data will become more and more inactive through the timeline, yet at the same time the data volume continues to grow.

figure 1-2 – Data access frequency decreases

By moving data from high performance/expensive devices to low performance/cheap devices, it reduces the overall maintenance costs and accessibility, the data on the primary storage would be more efficient.

The most common rules to classify data is performance/cost, however, some other aspects such as data protection, accessibility, or security could be considered, too.

Data administrators can manually apply the lifecycle management but there are many data types, schemas, and the data volume is very large. How to define proper policies and easily manage them is complex and challenging.

Therefore, Sybase PowerDesigner introduces a modeling solution to help the user to achieve their goals.

3

2 Powerdesigner solutionPowerDesigner is an “ALL-IN-ONE” enterprise modeling tool, with superior data modeling and meta-data management capabilities.

Database products such as Sybase IQ provide strong data management and analysis functionalities, but usually have fundamental UI and interactions.

Therefore, Sybase has introduced the solution of ILM using PowerDesigner. It is designed to facilitate applying ILM to enterprises with a modeling approach. PowerDesigner is a conceptual modeling tool relative to administration tools based on specific databases, it provides different modules to support different databases.

We’ll be focusing on PowerDesigner 15.1 and the ILM support for structured data in a relational database.

2.1 Data classificationAs we discussed above, the business needs are to reduce the maintenance cost and increase efficiency. PowerDesigner models the following data level objectives to meet these requirements:

Availability: What is the period the data should be available on certain storage devices? Accessibility: Can the data be read-only, write-only or read-write?

2.2 Model the databases PowerDesigner Physical Data Model (PDM) contains meta-classes such as Tables, Columns, Indexes, and so on to model the corresponding concepts in a database. Users can create a physical data model from sketch, or he/she can create one based on an existing database. Once the link between the model and the real database is created, any change made to the database can be applied to the model, and vice versa.

2.3 Model the Policy and StoragesFor ILM modeling, PowerDesigner involved several new meta-classes to the physical data model:

figure 2-1 – Meta-model for ILM policy

••

4

The lifecycle object is a policy, it defines the rules to manage data in the tables. During the whole lifecycle we can define several phases. For each phase we specify a retention period which means how long the data should stay on certain storages. The total retention of the lifecycle is the sum of all the phases, after data has been retained for the retention period it should be purged or moved to off-line devices.

On the storage meta-class we should specify the access attributes and permissions to model the accessibility and security requirements. After the policy is properly defined we can apply it to tables. We can apply the same policy to several tables; for each table users need to specify when to start the lifecycle management by setting a start date attribute on the table. One policy can be applied to several tables.

The database should provide the ability to archive data, almost all modern database management systems support partitioning. The lifecycle management will be based on this technology. In our modeling we will specify a partition range attribute to indicate the data granularity for the migration. The smaller the partition range is the more frequently the data will be moved, but the amount of data will be smaller.

The following table shows the detailed specification for each attribute in the meta-classes:

Lifecycle The data lifecycle policy.

Start date The date when to start lifecycle management. It is a reference value and can be overridden by table instances.

Total retention The total period data will be retained; expired data will be purged.

Partition range The duration of the partitions, data created within this duration will be put into this partition. The migration is based on partitions, so this value determines the granularity of the data migration.

Phase One stage of the lifecycle policy.

Retention The period data will be retained on this stage.

Storage The logical storage device. (It is named “Tablespace” in PowerDesigner when modeling an IQ data warehouse).

Accessibility The data protection attributes: Read-only, read-write.

Cost The storage cost by data size (per GB).

Table

Start date The date when to start lifecycle management.

2.4 Data Archiving ManagementAfter we defined the lifecycle policy and applied it to the tables, our modeling task is finished. We then, need to interact with the database management tools or administration tools to perform the archiving.

PowerDesigner provides strong code-generation mechanisms. Users can generate data archiving SQL scripts to actually move and archive the data. The script could be different based on different databases.

The generated codes contain:

Partition creation scriptsPartition movement scriptsDrop partition and truncate data scripts

Database administrators can integrate these scripts to database administration tools and control the execution of the scripts.

•••

5

2.5 SimulationPowerDesigner also provides some simulation features. Cost saving analysis can estimate the cost saved by comparing applying vs. not applying the lifecycle policy for a table.

Users should specify the following information as inputs:

The cost for each type of storage deviceThe estimated row growth rate (per year) for specific tablesThe number of rows (per year) as a base number for the growth rate

The data size will be estimated based on the data type of the column. Users can also choose to generate reports for the analysis, where he/she can clearly see the cost saving status of each year and for the whole lifecycle period.

2.6 Generic workflowThe generic workflow to implement the ILM in PowerDesigner is shown in the following figure:

figure 2-2 – Generic workflow of ILM implementation

3 examPleIn this section, we’ll discuss a simplified “Fund Management” example to see how to use PowerDesigner to implement the ILM, based on a Sybase IQ data warehouse.

The software environment is:

Sybase PowerDesigner 15.1Sybase IQ 15.0Sybase Central Java Edition 6.0

•••

•••

6

3.1 IntroductionAssume the “NATIONAL” fund management cooperation runs its business based on large amounts of marketing and trading information. The company needs to analyze the data to make daily business decisions or to guide the long term strategies.

The requirements of different analysis are different. The daily business decisions require real-time reports to be able to perform actions based on market changes. The long term analysis requires large amounts of historical data and will generate reports on specific views. The performance requirement is not critical but the maintenance cost should be lower. For data rarely accessed but still needed to be on-line, they must be protected and be at a lower cost.

So we propose the following hierarchical storage systems to meet the business needs:

Primary storage: With high performance devices providing real-time analysis and reportsNear-line storage: With lower costing devices and medium access speed, providing analysis for long term purposeHistorical storage: With very low costs, providing data protection, keeping large amounts of data for rarely required analysis

3.2 Model the IQ databaseUsing the “Reverse engineering” command and select the prepared data source of the IQ database, a physical data model can be created based on the schema in the database. Notice that we should select “Sybase IQ 15.x” DBMS for the physical model.

figure 3-1 – Data reverse engineering options

After the PowerDesigner model is created, the link between the model and the real database is created, users can synchronize between the model and the database.

The import involved all the tables and storages. Notice the storage concept is named “Dbspace” in IQ database and “Tablespace” in PowerDesigner.

•••

7

We can take a simplified table “Transactions” as an example for our further discussion:

figure 3-2 – Customer table and Transactions table

3.3 Define Lifecycle policyNow, we would like to define the lifecycle policy and apply it to the “Transactions” table.

For each phase, we will specify how long the data will be retained:

Daily business phase 3 Months Primary storage

Fundamental analysis phase 1 Year Near-line storage

Historical and trend analysis phase 5 Years Historical storage

75 Months

In total by default, the data will be purged after 6 years and 3 months. But we still can choose to keep it on historical storage, or move it to tapes.

In the physical model, we can create a Lifecycle object, and specify the following attributes:

figure 3-3 – Lifecycle general properties

We can specify a start date to perform lifecycle management, which could be overridden by the one on the table. Here we set July 20th, 2009. The total retention is 75 months.

8

To simplify, we use one month as the partition range. It means data created within one month will be moved together. In this case data created between “7/20/2009” to “8/20/2009” will be grouped in one partition. And, after it is retained on the primary storage for 3 months, which is on “11/20/2009”, this partition will be moved to near-line storage.

We can create several phases under the lifecycle object, and we need to specify a retention period and the type of storage that will be storing the data:

figure 3-4 – Phase properties

3.4 Apply lifecycle policy to TablesOnce the lifecycle policy is defined, we can apply it to tables. Just add tables to the grid using “Add” button on the tool bar:

figure 3-5 – Tables applying this lifecycle policy

The start date can be specified here for the specific table, and the lifecycle policy will be managed on this date. A partition key should be specified for the partitioning, and in this case, it should be the column of data creation date (with “date” or “datetime” type).

9

3.5 Generate Script and executeAfter the policy is defined and applied to the table. We can generate partitioning scripts for the Sybase IQ data warehouse.

Because the lifecycle policy is endless, we should specify a date range for the generation; only scripts that need to be executed within this range will be generated. Another option is to specify whether to create all the partitions in the first script file, or to create each partition when it is needed.

figure 3-6 – Script generation options

The generated scripts are grouped by execution date. The purge data and drop partitions script will be put in a separate script file, so that the database administrator can decide whether to execute them or not. Notice that the generated scripts only manage the data created later than the start date, old data needs to be managed manually.

The following are some typical scripts generated for this example:

www.sybase.com

Sybase, Inc. Worldwide HeadquartersOne Sybase DriveDublin, CA 94568-7902U.S.A1 800 8 sybase

Copyright © 2009 Sybase, Inc. All rights reserved. Unpublished rights reserved under U.S. copyright laws. Sybase, the Sybase logo and PowerDesigner are trademarks of Sybase, Inc. or its subsidiaries. All other trademarks are the property of their respective owners. ® indicates registration in the United States. Specifications are subject to change without notice. 09/09 L03237

Database administrators are responsible for executing the scripts on proper time.

3.6 SummarySince the scripts are generated, the work of PowerDesigner for ILM is completed. Because PowerDesigner is a front-end tool in the whole process, it should not bind too tight to the back-end. Instead, it provides specific solutions for different back-end systems. Once modeled, the code generation could be integrated to different databases.

For PowerDesigner 15.1, the ILM is only supported for Sybase IQ 15.x. More DBMS will be supported in later versions.

4 ProsPectsAs described above, PowerDesigner supports the storage tiering ILM solution based on data retention policies. In the future more policies could be supported, for example, based on access statistics information, to better meet the business requirements.

Also, data migration between heterogeneous database products could be supported. Move data from OLTP databases (such as ASE) to OLAP databases (such as IQ) is a general use case.

5 conclusionThe ILM support in PowerDesigner 15.1 is an extension of its data modeling solutions. It provides a solution to manage a hierarchical storage system based on a time stamp policy, which meets the general business requirements and greatly reduced the effort to manually implement it.

Sybase PowerDesigner will continue focusing on this area, to provide new and improved solutions for the information lifecycle management.

6 referencesSybase PowerDesigner 15.1 user documentation Sybase IQ 15.0 user documentation SNIA: Information Lifecycle Management Roadmap Implementing ILM using Oracle Database 10g SNIA: ILM and Tiered Storage

Documents

Information Lifecycle Management Modeling using PowerDesigner