6
A P P L I C A T I O N S A WHITE PAPER SERIES A PAPER ON INDUSTRY-BEST TESTING PRACTICES TO DELIVER ZERO DEFECTS AND ENSURE REQUIREMENT- OUTPUT ALIGNMENT Proven Testing Techniques in Large Data Warehousing Projects

Proven Testing Techniques in Large Data Warehousing Projects

  • Upload
    syntel

  • View
    3.012

  • Download
    0

Embed Size (px)

DESCRIPTION

A paper on industry-best testing practices to deliver zero defects and ensure requirement-output alignment.

Citation preview

Page 1: Proven Testing Techniques in Large Data Warehousing Projects

A P P L I C A T I O N S

A W H I T E P A P E R S E R I E S

A PAPER ON INDUSTRY-BEST TESTING PRACTICES TO

DELIVER ZERO DEFECTS AND ENSURE REQUIREMENT-

OUTPUT ALIGNMENT

Proven Testing Techniques in Large Data Warehousing

Projects

Page 2: Proven Testing Techniques in Large Data Warehousing Projects

Refining databases and streamlining data warehousing (DWH) are quickly becoming integral requirements in every business. Decision-makers are now realizing the need to study their business, scrutinize their data, and optimize available information to their advantage, in order to stay competitive.

Business information is available in many forms, but mostly in knowledge repositories of unstructured data. And while data warehousing projects are on the rise, testing plays a significant role, determining the success of each project by evaluating the final product to ensure it meets specified business needs and the scope of work. However, there are two key challenges involved in data warehousing projects - increased complexities and the significant volume of data.

To ensure a methodical analysis of the end product, businesses should focus on the following areas:•Data completeness and quality check•Referential integrity of facts and dimensions•Risk-based testing•Data obfuscation•Effective defect management•Communication process•Adherence to compliance standards

The aim of this whitepaper is to outline the key points of each testing aspect, while including a few critical success factors to help you cover all your bases and ensure meticulous and zero-defect solutions and services.

TABLE OF CONTENTS

EXECUTIVE SUMMARY

INDUSTRY-BEST PRACTICES IN DWH TESTING

2.1 DATA COMPLETENESS AND QUALITY CHECK

2.2 BI REPORT DATA TESTING

2.3 PERFORMANCE VALIDATION OF ETL AND REPORTS

CRITICAL SUCCESS FACTORS FOR TESTING

3.1 REFERENTIAL INTEGRITY OF FACTS AND DIMENSIONS

3.2 RISK-BASED TESTING

3.3 DATA OBFUSCATION

3.4 EFFECTIVE DEFECT MANAGEMENT

3.5 FOCUS ON AUTOMATION

SYNTEL’S BI/DW AND ANALYTICS OFFERINGS SOLUTION

ABOUT SYNTEL

© 2 0 1 2 S Y N T E L , I N C .

1

2

3

4

5

Executive Summary

Page 3: Proven Testing Techniques in Large Data Warehousing Projects

2. Best Practices in Data Warehouse TestingThe testing activities in data warehousing projects begin in the requirement-gathering phase and are carried out in an iterative manner. In data warehousing testing, every component of the project needs to be tested, both independently as well as when integrated. This varies from testing the data model, ETL scripts, reporting scripts and even the user interface reporting layer.

The important milestones involved in the data warehousing testing lifecycle are depicted in the diagram below:

2.1. DATA COMPLETENESS AND QUALITY CHECK

An integral part of DWH testing is verifying the quality and completeness of data. Data completeness testing ensures that all expected records from the source are loaded into the database by reconciling with error and reject records. A data quality check ascertains proper and accurate data, as per the recommended standard, is processed to the data warehouse; this includes data transformation testing.

The following activities are recommended, to determine data completeness and quality:

•Data extraction process for both historical and incremental loads•Data cleansing checks, based on standards; here the testing reject threshold is important• ‘Source to target’ transformation validation for thoroughness and accuracy•Historical and incremental transformation process validation• ‘Reject and error’ record analysis and validation•Scenario-based testing with specified transformation rules•Record reconciliation testing by comparing source, error, reject and target records to prevent record leakage•Data load process check for both historical and incremental load process•Negative testing for all the above mentioned cases

Data profiling is not related to validating data quality, but it is related to source data analysis and is usually conducted by SMEs and development teams. The Testing teams should focus on target data scenarios.

2.2. BI REPORT DATA TESTING

Another important aspect of DWH testing is confirmation of the accuracy and completeness of business intelligence (BI) reports. These may vary in appearance, turnaround time, report accuracy and usability, but testing this is of paramount importance as this will be reflected in the UI and is what the end users will eventually see.The following activities are key while testing BI reports:

•Restriction of users’ access to reports, with multiple layers of security•Validation of the accuracy and relevance of the data displayed in each report•Ensuring sufficient information for analyzing graphical reports •Relevancy of options in the drop down lists in each report•Testing of pop-up reports and child reports with proper data flow from parent reports•Functionality of additional features such as report storage into PDF formats, print options

2.3. PERFORMANCE VALIDATION OF ETL AND REPORTS

Loading and populating the data warehouse with relevant and complete data, and ensuring the relevance of reports constitutes 50% of business

Review BRD Set-up test data Testing Strategy Establish Entryand Exit criteria Write test cases SME Discussion Set-up test

environment

Smoke TestingETL datavalidation

Reportvalidation

Integrationtesting

Performancetesting User Acceptance

Testing

Defect Metrics Review Performance Statistics Lessons Learnt

Preparation

Execution

Process Improvement

• Basic Testing• Jobs and reports are accessible

• Cleanliness• Completeness• Quality• Business transformations

• Report validity• Relevance of data• Thoroughness• Availability of data• Consistency

• ETL, BO Integration• Complete cycle validation

• SME data validation• End user Demo / validation

• Check NFR• Scalability• Performance SLA• Peak user testing• Peak load testing

Page 4: Proven Testing Techniques in Large Data Warehousing Projects

expectations. But, these tasks have to completed within a given timeline and should be scalable to support the ever-growing system. Testing the performance of ETL and reports for responsiveness and scalability is critical to the success of the design.

Although there are many non-functional requirements (NFRs) surrounding the performance of ETL and Report response, it would be helpful to follow these guidelines:

•Execution with peak production volume to check for completion of the ETL process within the agreeable window•Analysis of ETL loading times, with a smaller amount of data, to gauge scalability issues•Verification of ETL processing times, component by component, to identify areas of improvement•Testing the timing of the reject process and developing processes on management of large volumes of rejected data•Shutdown of the server during ETL execution, to test for restart ability•Simulation of maximum concurrent user testing for all BI reports and for ad-hoc reports•Ensuring access to BI reports during ETL loads

3. Critical Success Factors for Testing

3.1. REFERENTIAL INTEGRITY OF FACTS AND DIMENSIONS

A number of data warehouses are modeled as dimensions and facts. In these scenarios, the important task would be to test the integrity between the dimensions and facts carefully. Since there are multiple representations of dimensions, such as slowly changing dimension (SCD), testing will check references and the point in time of reference. Table-level integrity constraints are not usually enforced in large data warehouses, so the checks have to be tested at the ETL layer and not at the database layer.

3.2. RISK-BASED TESTING

As the data present in data warehouses is huge, it is impossible to test every piece of data available. It is important to work with the business SMEs to identify risk prone areas while finalizing test cases. Key risk prone areas include the following:

•Items which will cause the highest damage to a project upon failure will carry the highest risk and should be tested thoroughly•Items which will be used frequently should also be considered for risk-based testing as the probability of failure is very high

After discussing the report criticality with the end users, these items should be documented in the test plan.

3.3. DATA OBFUSCATION

In most DWH testing cases, a subset of production data is considered for performing testing activities. However, if this data contains sensitive information it can pose a potential risk.

In such cases, data obfuscation can be used to compile the test data in the test bed, but this is not an easy task.

The process owners need to consider factors such as secure information masking, catering specific data needs, ensuring referential integrity and data readability. In large DWH testing projects with secure information, it is advisable to use data masking or the test data generation tool.

3.4. EFFECTIVE DEFECT MANAGEMENT

In large projects, the defects would be assigned across streams, followed by careful coordination, analysis and improvements to close the defects. Defect tracking tools such as HPQC and Test Director can be very helpful.

In data validation, scenario-based testing is predominant. All scenarios would not be listed as test cases, but the results of every scenario need to be captured effectively. A defect triage meeting can be a forum to discuss all cross stream defects and can feature as a recurring discussion with all stream members to understand and close defects.

3.5. FOCUS ON AUTOMATION

The need for additional or repeated testing in large projects will arise due to factors such as a change in requirements, defects, design changes or even enhancements. If the testing process is automated it will reduce the time taken and the manpower and effort invested. In a data warehouse testing environment, the following items could be automated:

•Test data generation•Regression testing suite•Performance testing suite•Data profiling tools - this does not directly pertain to testing but can help

Although there are tools available for these automations, the teams could choose to build a customized tool if the project needs are specific.Testing activities in a large data warehouse project are much more complex than normal software testing, and necessitates careful co-ordination and proper understanding of the data. A capable IT partner will be able to collaborate with you, understand your business and assess your project, while ensuring a no-defect environment with smooth and streamlined processes.

Page 5: Proven Testing Techniques in Large Data Warehousing Projects

4. Syntel's BI/DW and Analytics Offerings Solution

Syntel has delivered more than 700 Business Intelligence–Data Warehouse projects worldwide, across various industries. Our dedicated BI Practice is geared to provide quality services across the BI-DW systems lifecycle, by leveraging the cost effectiveness of onsite-offsite delivery. Syntel's value proposition is driven by an experienced team, with mature methodologies to provide consultancy across domain and application areas.

With our comprehensive domain knowledge of our clients’ industries, including trends, competitive environments, customers and stakeholders, our BI-DW-based solutions support clients’ overarching business strategy, while ensuring that the final output is aligned to their business needs. Our solutions include customized approaches, proven practices, innovative frameworks and adept techniques that streamline organizational activities, deliver applications with superior quality, and ensure a zero-defect environment.

Syntel’s solutions allow us to guide organizations through a transformational journey by reducing risks, optimizing costs and providing business benefits.

Some of Syntel’s in-house accelerators, developed by the BI-DW team, are as follows:

BusinessChallenges Syntel’sAccelerators

•Poor data quality•Delayed time-to-market•High risk of implementing BI projects

• SmartData - Syntel’s data quality enrichment tool and data governance frame-work

•Delivers 40% functionality at fractional cost of products

Increasing complexity due to:•New data sources •New data elements from existing sources•Increased efforts and lack of documentation

• DataIntegrationFramework to improve time-to-market•Automated source-to-target mapping documentation using SmartMap, with

80% of analysis efforts

Fragmented reporting environment•High total cost of ownership•Poor visibility into enterprise data

•Accelerators for reportmigration with 50-60% automation (e.g.: CoBo - Cognos to BO, ActJasper – Actuate to Jasper)

• CognostoSSRSmigrationframework• Reportrationalizationframework

Insurance KPI Reporting • PerformINS, a proprietary KPI reporting solution• Plug&PlayBISolution, saving 30% of efforts and costs•60+ KeyPerformanceIndicators(KPIs) to provide insights into business with

readily available dashboards and reports

Syntel can help you build defect-free applications, compliant with industry and regulatory requirements, accelerated by our innovative BI-DW solutions. For more information on Syntel’s capabilities and how we can leverage industry-best techniques to deliver a seamless, error-free business output, log onto www.syntelinc.com

CONSULTING SERVICES - Assessment, Strategy and Roadmap

-based BI delivery

Analytical and operational reporting Intuitive dashboards and scorecards Report inventory rationalization Mobile Upgrade and platform migration services Reporting services on Cloud Performance tuning

Data mining Statistical model development Big Data analytics Predictive modeling Text mining Forecasting and optimization

Data modeling and architecture Data integration Data quality and governance Master data management Metadata management Large size data warehouses Upgrade and platform migration services

DATA MANAGEMENT BUSINESS INSIGHTS BUSINESS FORESIGHTS

Page 6: Proven Testing Techniques in Large Data Warehousing Projects

SYNTEL:about

v i s i t S y n t e l ' s w e b s i t e a t w w w . s y n t e l i n c . c o m

SYNTEL525 E. Big Beaver, Third FloorTroy, MI 48083phone [email protected]

Syntel (NASDAQ:SYNT) is a leading global provider of integrated

information technology and Knowledge Process Outsourcing

(KPO) solutions spanning the entire lifecycle of business and

information systems and processes.

The Company is driven by its mission to create new opportunities

for clients by harnessing the passion, talent and innovation

of Syntel employees worldwide. Syntel leverages dedicated

Centers of Excellence, a flexible Global Delivery Model, and a

strong track record of building collaborative client partnerships

to create sustainable business advantage for Global 2000

organizations.

Syntel is assessed at SEI CMMi Level 5, and is ISO 27001 and

ISO 9001:2008 certified. As of June 30, 2012, Syntel employed

more than 20,000 people worldwide. To learn more, visit us at

www.syntelinc.com