22
Essential Elements and Metrics for a Data Warehouse TCOE Amita Awasthi Infosys Limited (NASDAQ: INFY)

Essential Elements and Metrics for a Data Warehouse …conference.qaiglobalservices.com/stc2013/PDFs/Amita_Awasthi... · warehouse appliances, ETL, ... This paper is to elaborate

  • Upload
    votram

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Essential Elements and Metrics for a Data Warehouse TCOE

Amita Awasthi

Infosys Limited (NASDAQ: INFY)

Abstract

We know from our experience that Data warehouse is a must for large organizations, as it

provides insight into huge volume of data and enables them to take business decisions. Testing

of data warehouse becomes a critical factor as any issue with the quality of data in the data

warehouse can lead to huge issues.

It is not only a functional testing area but also a topic of research where we see rapid evolution

of tools and technology, the latest trend is Big Data which can support all the 3 Vs(Volume,

Velocity, Variety) of data which are big challenges in Data Warehouse. All the top Data

warehouse appliances, ETL, BI tool vendors are in the race to extend their offerings to support

hadoop and other Big Data platforms. Big data may also become a source of information to our

regular Enterprise Data Warehouse where it can feed in the unstructured data and helps in

advance analytics. Researchers are working to see how maximum benefit can be achieved by

combing EDW and Big Data.

There are other advancements as well like Data warehouse on cloud, Mobile Business

Intelligence etc. For any organization to keep a tab of these advancements and extract

maximum benefits out of the data it is very much required to have a dedicated Data Warehouse

Testing Center of excellence in place. This paper is to elaborate and discuss the essential

elements and metrics for a data warehouse testing center of excellence2

Abstract

The information provided in this paper is a result of work done in defining the data warehouse

testing center of excellence roadmap for 2 clients in last 8 months.

This paper explains the essential components and benefits of moving to a DWT COE.

3

FSI Client

Process

People

Infra/Tools

DWT Manager

Test Environment Management

Tools Management

Templates Estimates

Staffing

KM

Methodologies

Quality KPIs

Automation ROI

Project Mgmt.

Test Strategy

Project Manager Tools SME

Domain SME

Solution Architect

Technology SME

DWT Project 1

DWTProject 2

DWTProject 3

DWT Project 4

DWT Project 5

Group 2Group 1 Group 3 Group 4

Key Challenges

• Lack of DWT skilled resources, no competency development plan in place

• Spending too much time in collecting data/metrics. DWT specific metrics not defined

• Not able to focus on the latest trends in DWT space in market

• No centralize repository for processes, training, tools, SMEs, Best practices, templates etc.

• No clear direction on career opportunities for DWH tester

• Lessons learnt, best practices from similar project is not documented and shared across all DWT project

PeopleTechnology

Process

DWT

COE

Key Takeaways

• Characteristics of a Data Warehouse TCOE

• Metrics specific to Data Warehouse Testing

• Data Warehouse Competency enablement framework

• Latest technology trends in Data Warehouse and how QA is

prepared for them

• How your existing testing landscape can be transformed to Data

Warehouse TCOE

4

Target Audience

Audience Prerequisite- Basic Knowledge of Databases, Data

warehouse testing

Intended Audience- Managers, Technical Leads and Testers of a Data

Warehouse Testing Project

5

Speakers Profile

Amita Awasthi is a PMP certified Project Manager with Infosys. She did her B Tech from

HBTI, Kanpur. During her 13 years at Infosys she has gained experience in handling

large virtual teams and different type of clients, projects, people and technologies. She

has been recognized at organization level and a winner of “Infosys Excellence Award” ,

“KM Trailblazer Champion” and “People’s Manager” .

She is a SME for Data warehouse testing and Infosys DWH testing solution called

Perfaware. Thought leadership, knowledge management, Project & Program

Management, Data warehouse testing and Big Data are her key areas of interest and she

has presented papers in Internal and external forums (http://www.stepinforum.org/stepin-

summit-2012/plenaries/amita_awasti_track.html).

Currently she is managing multiple projects for a major US based Banking Customer, and

actively contributes to unit level activities.

The author can be reached at [email protected]

6

Essential Elements and Metrics for a Data Warehouse TCOE

7

Context & Background

In today’s world we all rely on data and make informed decisions, for any large

organization data warehouse is the holy grail of information which is helping them to

analyze the past, make decisions for today and future. It is no more limited to “after the

fact” analysis with the advent of continuous technology innovations in this space.

Managing and executing the data warehouse testing projects has become more

challenging and interesting as the service offering itself is getting refined with the latest

technology trend. We have seen that many of our clients are struggling with the

decentralized way of managing DWH testing projects and either moved or have future

roadmaps defined to move into the DWH testing Center of excellence model.

According to Gartner Hype Cycle for Information Infrastructure, 2012, “the Logical Data Warehouse

(LDW) is a new data management architecture for analytics which combines the strengths of

traditional repository warehouses with alternative data management and access strategy. The LDW

will form a new best practices by the end of 2015.”

There are some essential elements, metrics and roadmap definition for transforming to

Data Warehouse TCOE8

Reference: http://www.compositesw.com/solutions/logical-data-

warehouse/

Context & Background

The objective of this paper is to elaborate on the three essential elements of

Data warehouse TCOE

1. People

2. Process

3. Technology

The solution provided here talks about the challenges faced by clients in a

traditional data warehouse testing set up, what is the market perspective and

trend that we are seeing in current times. This can be used as a skeleton

framework to access the current DWH testing state, and outlining the roadmap

for moving to a mature DWT COE end state.

There are metrics defined specific to data warehousing which are crucial for

data quality and load.

9

DWH Testing Challenges in a typical implementation

Source Staging Data Warehouse

Data

Pu

blish

ing

Reporting and Analytics

Metadata

Raw data

Summary data

Outbound Extracts

Data Marts

In-memory databases

ETL ETL

Reports & Dashboards

Ad hoc analysis

Mobile Apps

Data Quality checks not performed

on source system data, few of the

DQ checks are

Duplicate check

Null value check

Metadata check

Pattern check

Heterogeneous data sources

QA Challenges

Static testing not performed prior

to test execution

Schema validations not done

Sampling strategy is used

causing incomplete coverage of

testing

Exhaustive testing not done due

to lack of automation

Huge volume of information coming in

DWH

How much history to store in data

warehouse, storage infrastructure vs.

cost and analytical requirements

Consistency of data to ensure data

correctness between reporting, ad

hoc query and analytics

Defects caught very later in the life

cycle during the review of extracts

and reports

No performance testing done for

ad hoc reports & queries

E2E data reconciliation is not done

from reports to source data

Lack of Skilled resources, Lack of DWH competency enablement framework, Lack of dedicated DWH Research track , Lack of differentiators and accelerators

10

How clients are dealing with DWH Testing Challenges – Market Perspective

• Based on market data we see that clients who don’t have a TCOE working

towards setting up a TCoE

• By implementing TCOE, huge cost savings and quality improvement are

achieved by many of Infosys clients and they have been able to compress

testing timelines as well

• DWT COE

• Cost effective solution

• Increased focus on reuse

• Improved data quality and availability of systems

• Improved time to market to meet stringent timelines requirements

• Effective DW&BI enables better management decisions and reduces risks

• Provide strategic direction for the organization in terms of tools, licensing,

processes and technology

11

Better Quality Through Data Test Strategies

( Exhaustive, Aggregate, Sampling , Risk Based Testing etc.)

Building Data Quality as the practice

(Metadata, Pattern, Statistical, relationship, Business Rules Analysis

etc. early in lifecycle)

End to End Coverage of the DW Lifecycle

( Defined DWH life cycle to ensure complete coverage in terms of functional

and non-functional requirements, also end-to-end data reconciliation)

Efficiency Through automated Data Testing( ETL Validation, Data Quality

Analysis, Performance Testing can be automated using in-house

/market Tools)

Centralization and better utilization of ETL/BI/DWT tools

(using the strategic tools across organization will help in saving license

costs, improved utilization and training requirements

DWH Testing career with defined growth path

( this will motivate people to learn and grow as career path is defined)

DWH Test Academy to Skill/Re-skill people, perform assessment,

improve technical capability(DWH Testing skill plan for beginners,

intermediate and expert level, planned technical assessment to ensure

improvement of skill level)

Centralized repository for any DWH Testing related artifacts

(process documents, templates, checklists, questionnaires etc.)

12

Metrics Driven QA framework(Data Quality, Data Load, Response

time etc.)

Better deployment and utilization of resources

(centralized control of DWH testers to be deployed in projects based on

project skill set requirements)

Knowledge and Best Practices sharing across data testing

projects(lessons learnt, defect repository, in-

house tools created etc.)

Keeping up with continuously evolving DW technology(benchmarking with industry

standards of data testing in terms of tools, preparedness to adopt new

technology, trends etc.)

Characteristics of a Data Warehouse TCOE ………….Contd.

Efficiency Through automated Data Testing

Centralization and better utilization of ETL/BI/DWT tools

Evaluate technology trends and identify new tools for adoption, keeping up with continuously evolving DW technology

DWH Testing career with defined growth path

DWH Test Academy to Skill/Re-skill people, perform assessment, improve technical capability

Better deployment and utilization of resources

Better Quality Through Data Test Strategies

Building Data Quality as the practice

End to End Coverage of the DW Lifecycle

Metrics driven QA framework

Centralized repository for any DWH Testing related artifacts

Knowledge and Best Practices sharing across data testing projects

DWH TCOE

People

Tools/Technology

Process

Characteristics of a Data Warehouse TCOE

13

Level 1

Level 2

Level 3

Level 4

• DWH Concepts,

• SQL Query writing

• Excel macros

• Data validations

• Basic query tools and

reporting

• ETL testing

• Test Data

Management

• Test Strategy

• Defect Analysis

• ETL&BI Tools

• Automation

• End to End Solution

usage- Estimation,

Planning, Data

modeling, ETL , Data

validation, Reporting,

Technology Trends,

Appliance testing

• Consulting – DWT,

Appliance testing, Big

Data, Mobile BI, DW

testing on cloud,

Analytics testing

Continuous improvement of individual technical competency

Clarity on the roles and career path ahead

Awareness of what trainings to attend, what certifications to attend, thought leadership

People Competency Framework and Roles in DWH Testing

People

14

• Test automation tools – QuerySurge, Informatica Data Validation etc.

• Excel based tools(macros) which can automate test steps like: test case creation, query creation, data comparison etc.

Test Automation

• This can be created based on our experience and can be referred to ensure all critical scenarios are covered in test planning and scriptingDefect Repository

• Ready to use DWT risk repository portal, this is invaluable for test risk planning.Risk Repository

• Business Value articulation case studies repository which can be used to implement best practices across similar projectsBVA Repository

• Reusable templates for test planning, test strategy, status reporting etc.

• Reusable checklists for test plan review, pre-execution checks, execution checks etc.Templates and Checklists

• DWT specific training program for different competency levels- basic, intermediate and advance

• DWT tools specific training program to create tools SMECompetency Development

• Research initiatives and repository of DWT publications to keep updated on latest trends in DWHThought Leadership

ProcessProcesses and Best Practices for DWH Testing

15

DWH Test MetricsCategory Direct Metrics Derived Metrics

Uniqueness # of duplicate records # of duplicate records/total number of records

Correctness & Consistency

# of records with pattern mismatch# of fields with inconsistent data occurrence

# of records with pattern mismatch/total number of records

Completeness # of records with null values in not nullable fields# of records with blank values in non blank fields

# of records with null values in not nullable fields/total number of records# of records with blank values/total number of records

Timeliness Delay in receiving data or feed files (hours/days) # of days delay in receiving data/ Test execution duration

Phase Containment # of data quality defects caught in each phase of project

# of data quality defects caught in one phase of project/#Total data quality defects caught in project

Data Load # of records loaded in target# of records rejected# of valid rejects# Total number of records in source

# of records loaded in target/(Total number of records in source- # of valid rejects)

Schema Validation # of entities missing from defined schema# of entities mismatching from defined schema# of data type mismatches for the fields

Schema validation means comparing the defined/documented database schema with the actual DB schema, PK/FK constraints also checked here

Performance Report response timeTime taken to complete End to End data load

% adherence can be calculated if SLAs are defined for report response and E2E data load time

Process

16

Top 3 Technology Trend in DWH/ BI

17

“Big Data drives Tomorrow’s

BI”

“Elastic DWH in the Cloud”

• Lower cost in Pay per use model, over

provisioning leading to high costs can

be avoided

• Expertise of building and maintaining

DWH is no longer needed within the

organization itself

• An elastic data warehousing system in

the cloud would automatically increase

or decrease the number of nodes used,

allowing one to save money

• Moving from wired world to wireless

world with an advantage of

smartphones/tablets

• Technological advancement created

the need for having information

available on the go for faster decision

making, better customer service,

efficiency in business processes and

improved employee productivity

• Most of the top banks have there

banking apps available on mobile

• All top BI vendors are offering mobile

BI capability

“Information on the Move”

• Enables huge storage of data-

petabytes

• Advantage of storing and analyzing

unstructured data from social

networks, public domain

• Helps in understand and predict

customer behavior can be used for

cross selling of products, customer

loyalty management, real time

fraud detection, compliance check

etc.

• All top BI vendors are offering big

data capabilities

Tools/Technology

DWT Assessment and Transformation Roadmap

1 Establish a DWH Testing Center of Excellence

2 Enhance and standardize the current DWH testing process framework for E2E Test Life Cycle by following a standard lifecycle approach

3 Implement key DWH test metrics

4 Identify strategic test tools and integrate current tools to enable end to end automation. Standardize the use of automation frameworks across projects

5 Leverage TDM function for better quality and timely provision of test data

6 Centralized knowledge repository of any DWH project templates, checklists, test artifacts, lessons learnt, trackers, questionnaires, training material etc.

8 Preparedness of DWH QA organization for adaption of new capabilities/services

Process Evaluation

18

• Identification of transformation initiatives based on QA Assessment recommendations

• Categorization of initiatives into short/medium/long term milestones

• Develop the plan for deployment of each initiative

7 Centralized training academy for skill/re-skill of DWH resources, technical assessment

Cost Saving

Knowledge SharingCompetency EnablementProcess adherence &

Improvement

Adapting to Latest Market Trends

19

Improved Control on Projects

Faster Time to Market

Benefits of establishing a Data Warehouse TCOE

Improved system AvailabilityBetter Resource utilization

Expected ROI of DWT COE - Key Dimensions

Key

DimensionsElements Metrics to track for success Typical Improvement

People Improved resource utilization • Resource utilization % 10 – 15%

Reduced resource on-boarding time

• Time taken to on-board

resource from request to

deployment

15 – 30%

Improved Competency level

• Technical Assessment

Results- # of people moved

from lower levels to higher

levels

Helps in better project execution

Process Following Standardized DWT processes• Process Compliance Index

• Cost of quality5% - 10 %

Re-use of test strategy, templates, best

practices, queries etc.

• Testing Cycle Time

• % of reuse8% - 10%

Predictive profiling of defects and proactive

strategies

• Defect removal effectiveness

• Defect Slippage5% -10 %

Early Validations to catch defect early in life

cycle• Defect Containment metrics 10% -20%

Tools/Technolo

gyAutomation of test process and execution

• % Reduction in Test

Execution Effort

• Testing coverage

10% -25%

Internal Test Infrastructure/ tool

Consolidation/virtualization

• % Reduction in license/infra

cost5% - 10%

2020

Conclusion

Data Warehouse testing is no more limited to data and report testing, it

is one of the rapidly changing technology areas and organizations need

to make dedicated investment to keep up with the Market trends.

As per Gartner they see future of Data Warehouse as “Logical Data

Warehouse”, real time analytics, data visualization, domain knowledge

to test industry specific use cases in data warehouse it has become

essential elements of data warehouse testing.

The benefits of having DWT COE cannot be ignored anymore and

moving to DWT COE is a path ahead for large organizations to make

maximum use of the golden mine of data.

21