19
Database and ETL Testing Process Methods, Issues, Recommendations Jan. 19, 2010 W. Yaddow [email protected] For internal use only – Not for external distribut

Data Verification In QA Department Final

Embed Size (px)

DESCRIPTION

Data warehouse and ETL testing should be conducted according to a process and checklist. This presentation provides an overview of recommended methods.

Citation preview

Page 1: Data Verification In QA Department Final

Database and ETL

Testing Process

Methods, Issues, Recommendations

Jan. 19, 2010

W. Yaddow

[email protected]

For internal use only – Not for external distribution

Page 2: Data Verification In QA Department Final

Agenda

1. QA objectives for ETL’s & data loading projects

2. Samples of QA data defect discoveries

3. Data quality tools / techniques used by QA team

4. ETL & data loading verification checks

5. Lessons learned in data verification

6. Recommendations for continued early involvement by QA

Page 3: Data Verification In QA Department Final

Data defects… a definition

Data Defects: Deviations from the correctness of data, generally

errors occurring prior to processing of data for analytics or reporting.

Errors can be the result of data model, low level design, data mapping,

or data loading prior to processing in an application.

Note: Data issues on displays or reports are not considered data defects when

they result from service calls or computation errors within an application.

Page 4: Data Verification In QA Department Final

QA objectives for ETL & data integration

1. Assure that all the records in source systems that should be migrated to a database are extracted -- no more, no less.

2. Verify that all of the components of the ETL / load process complete with no defects.

3. Verify that all of the source data is correctly transformed into dimension, fact and other tables.

4. Analyze ETL / load exception logs

Page 5: Data Verification In QA Department Final

QA role in data verification projects

1. QA develops verification methods to support data integration specific to projects.

2. QA executes tasks that demonstrate data verification is a critical link between the DS, DSO, application development and analytics teams.

3. QA continues to demonstrate that early data testing is the most efficient means of identifying and correcting defects.

Page 6: Data Verification In QA Department Final

Sample of data defect discoveries

% Data Defects

% Data Defects High or Critical Severity

App1 39% 48%

App2 26% 70%

App3 26% 33%

App4 6% 59%

App5 29% 68%

Note: Data as of 10/23/2009

Page 7: Data Verification In QA Department Final

Data integration & ETL error injection points

DATA TRACK PHASES

ARTIFACTS QA TASKS

Data and analysis requirements

Data design & requirements Reviews, comments

  Source data planning & profiling

Reviews, comments

Data flow and load design

Data model Reviews, comments

  Logical & physical data flow diagrams

Reviews, comments

  Data movement low level design (LLD)

Reviews, comments

  Data mappings & transformations, source to target.

Reviews, test planning, test case development.

  ETL design & logic Reviews, comments

  SQL and PL/SQL for data loads

Reviews, comments

  Data cleansing plan Reviews, comments

  Data load and ETL developer test plan

Reviews, comments

Data load /ETLs Execution

Extract, transpose, load Reviews, verification, defect reports

Data load / ETL load inspection

Workflow logs, session logs, error log tables, reject tables,

Reviews, verification, defect reports

1) DB design & planning

2) ETL, Data Load

Page 8: Data Verification In QA Department Final

Small sample; data verification

Page 9: Data Verification In QA Department Final

ETL & data loading verification checksBasic ETL and PL/SQL Verifications Conducted by QA

Verify mappings, source to target

Verify that all tables and specified fields were loaded from source to staging

Verify that keys were properly generated using sequence generator

Verify that not-null fields are populated

Verify no data truncation in each field

Verify data types and formats are as specified in design phase

Verify no duplicate records in target tables.

Verify transformations based on data low level design (LLD's)

Verify that numeric fields are populated with correct precision

Verify that every ETL session completed with only planned exceptions

Verify all cleansing, transformation, error and exception handling

Verify PL/SQL calculations and data mappings

Page 10: Data Verification In QA Department Final

Data verification training overview

1. Data Quality Overview

2. Testing: DQ Categories / Checks

3. Testing: DQ Case Study

4. DQ Test Management (planning, design, execution, tools)

5. DQ Benefits & Challenges

Page 11: Data Verification In QA Department Final

QA steps: data integration verification (1)

Data integration planning (Data model, LLD’s)

1. Gain understanding of data to be reported by the application… and the tables upon which each report is based (orgs, ratings, countries, analysts, etc.).

2. Review, understand data model – gain understanding of keys, flows from source to target

3. Review, understand data LLD’s and mappings: add, update sequences for all sources of each target table

ETL Planning and testing (source inputs & ETL design)

1. Participate in ETL design reviews

2. Gain in-depth knowledge of ETL sessions, the order of execution, restraints, transformations

3. Participate in development ETL test case reviews

4. After ETL’s are run, use checklists for QA assessments of rejects, session failures, errors

Page 12: Data Verification In QA Department Final

QA steps: data integration verification (2)

Assess ETL logs: session, workflow, errors

1. Review ETL workflow outputs, source to target counts

2. Verify source to target mapping docs with loaded tables using TOAD and other tools

3. After ETL runs or manual data loads, assess data in every table with focus on key fields (dirty data, incorrect formats, duplicates, etc.). Use TOAD, Excel tools. (SQL queries, filtering, etc.)

GUI and report validations

1. Compare reports with target data.

2. Verify that reporting meets user expectations

Analytics test team data validation

1. Test data as it is integrated into application.

2. Provide tools and tests for data validation.

Page 13: Data Verification In QA Department Final

From Source to Data Warehouse… Unit Testing

• Know data transformation rules!

• Run test cases for each transformation rule; include positive & negative situations

• Row counts: DWH (Destination) = Source + Rejected

• Verify process correctly uses all required data including metadata

• Cross reference DWH Dimensions and fact tables to source tables

• Verify all busines rule computations are correct

• Verify database queries, expected vs actual results

• Rejects are correctly handled and conform to business rules

• Slow-changing Dimensions eg. address, marital status processed correctly

• Correctness of surrogate keys eg. time zones, currencies in fact tables

Page 14: Data Verification In QA Department Final

Transforming Data, Source to Target

Page 15: Data Verification In QA Department Final

DQ tools / techniques used by QA team

TOAD / SQL Navigator•Data profiling for value range & boundary analysis •Null field analysis•Row counting•Data type analysis •Referential integrity analysis (key analysis)•Distinct value analysis by field•Duplicate data analysis (fields and rows)•Cardinality analysis•PL/SQL stored procedures & package verification

Excel•Data filtering for profile analysis•Data value sampling•Data type analysis

MS Access•Table and data analysis across schemas

QTP•Automated testing of templates and application screens

Analytics Tools•J – statistics, visualization, data manipulation•Perl – data manipulation, scripting•R – statistics

Page 16: Data Verification In QA Department Final

Data defect findings by QA team

Data Defects Types on six projects:1. Inadequate ETL and stored procedure design documents

2. Field values are null when specified as “Not Null”.

3. Field constraints and SQL not coded correctly for Informatica ETL

4. Excessive ETL errors discovered after entry to QA

5. Source data does not meet table mapping specifications (ex., dirty data)

6. Source to target mappings: 1) often not reviewed, 2) in error and 2) not consistently maintained through dev lifecycle

7. Data models are not adequately maintained during development lifecycle

8. Target data does not meet mapping specifications

9. Duplicate field values when defined to be DISTINCT

10. ETL SQL / transformation errors leading to missing rows and invalid field values

11. Constraint violations in source

12. Target data is incorrectly stored in nonstandard formats

13. Table keys are incorrect for important relationship linkages

Page 17: Data Verification In QA Department Final

Lessons learned

1. Formal QA data track verifications should continue early in the ETL design and data load process (independent of application development.

2. With access to ETL dev environment, QA can prepare for formal testing and offer feedback to dev team

3. Offshore teams need adequate and more representative samples of data for data planning and design

4. Data models, LLD’s, ETL design and data mapping documents need to be kept in sync until transition

5. QA resourcing for projects must include needs to accommodate data track verifications

Page 18: Data Verification In QA Department Final

Recommendations for data verifications

Detailed Recommendations for Development, QA, Data Services

1. Need analysis of a.) source data quality and b.) data field profiles before input to Informatica and other data-build services.

2. QA should participate in all data model and data mapping reviews.

3. Need complete review of ETL error logs and resolution of errors by ETL teams before DB turn-over to QA.

4. Early use of QC during ETL and stored procedure testing to target vulnerable process areas.

5. Substantially improved documentation of PL/SQL stored procedures.

6. QA needs dev or separate environment for early data testing. QA should be able to modify data in order to perform negative tests. (QA currently does only positive tests because the application and data base tests work in parallel in the same environment.)

7. Need substantially enhanced verification of target tables after each ETL load before data turn-over to QA.

8. Need mandatory maintenance of data models and source to target mapping / transformation rules documents from elaboration until transition.

9. Investments in more Informatica and off-the-shelf data quality analysis tools for pre and post ETL.

10. Investments in automated DB regression test tools and training to support frequent data loads.

Page 19: Data Verification In QA Department Final

Important resource for DB testers