View
8.309
Download
5
Category
Tags:
Preview:
DESCRIPTION
ETL Testing: A primer for Testers on Data Warehouses, ETL, Business Intelligence and how to test them. Are you hearing and reading about Big Data, Enterprise Data Warehouses (EDW), the ETL Process and Business Intelligence (BI)? The software markets for EDW and BI are quickly approaching $22 billion, according to Gartner, and Big Data is growing at an exponential pace. Are you being tasked to test these environments or would you like to learn about them and be prepared for when you are asked to test them? RTTS, the Software Quality Experts, provided this groundbreaking webinar, based upon our many years of experience in providing software quality solutions for more than 400 companies. You will learn the answer to the following questions: • What is Big Data and what does it mean to me? • What are the business reasons for a building a Data Warehouse and for using Business Intelligence software? • How do Data Warehouses, Business Intelligence tools and ETL work from a technical perspective? • Who are the primary players in this software space? • How do I test these environments? • What tools should I use? This video is geared towards: QA Testers Automated Test Engineers Quality Assurance Analysts Business Analysts Developers Project Managers ...and anyone else who is (a) new to the EDW space, (b) wants to be educated in the business and technical sides and (c) wants to understand how to test them.
Citation preview
© 2011 Real-Time Technology Solutions, Inc.New York Philadelphia Atlanta www.rtts.com
What is a Data Warehouse and How Do I Test It?
A primer for Testers on Data Warehouses, the ETL process and Business Intelligence and how to test them
RTTS is the leading provider of software quality
for critical business applications
Fast FactsFounded:1996 - consulting firmLocations:New York (HQ), Atlanta, Philly, Phoenix
Geographic region:Americas, EMEA, APAC
Customer profile:Fortune 1000o 350+ customerso 500+ projects
Strategic Partners:HP, IBM, MSFT, Oracle,
RTTS’ Software:QuerySurge™,TOMOS ALM ™
The Software Quality Experts
Overview
What is Big Data? What is a Data Warehouse?
o About the ETL Processo The Data Warehouse marketplace
What is Business Intelligence?o The architectureo The BI marketplace
Testing the DW Architectureo Entry pointso The Mapping documento Functional test implementationo Test Tools
Testing BIo Functional test implementationo Performance Testing
Data Warehouse Test Tool demo Q&A
What is a Big Data?
Big data – defined as too much volume, velocity and variability to work on normal database architectures.
What is Big Data?
“The market for big data is $70 billion and growing by 15% a year.” - EMC COO Pat Gelsinger
SizeDefined as 5 petabytes or more 1 petabyte = 1,000 terabytes 1,000 terabytes = 1,000,000 gigabytes1,000,000 gigabytes = 1,000,000,000 megabytes
Big Data Impact
Handles more than 1 million customer transactions every hour.• data imported into databases that contain > 2.5 petabytes of data • the equivalent of 167 times the information contained in all the books in the US Library of
Congress.
Facebook handles 40 billion photos from its user base.
Google processes 1 Terabyte per hour
Twitter processes 85 million tweets per day
eBay processes 80 Terabytes per day
Others
Requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times.
Technologies include:• massively parallel processing (MPP) databases• data warehouses• datamining grids• distributed file systems• distributed databases• cloud computing platforms • the Internet, and • scalable storage system
Big Data Solutions
What is a Data Warehouse?
What is a Data Warehouse?
Data Warehouse• Typically a relational database that is designed for query and
analysis rather than for transaction processing
• A place where historical data is stored for archival, analysis and security purposes.
• Contains either raw data or formatted data
• Combines data from multiple sources• Sales• Salaries • Operational data • Human resource data• Inventory data• Web logs• Social networks• Internet text and docs• Other
Legacy DB
CRM/ERP DB
Finance DB
Data Warehouse – Business Case
Why build a Data Warehouse?• Data stored in operational systems (OLTP) not
easily accessible
• OLTP systems are not designed for end-user analysis
• The data in OLTP is constantly changing
• May lack of historical data
• Diverse forms of data stored in different platforms
Data Warehouse – Business Case
The Data Warehouse Business Solution• Collects data from different sources (other databases,
files, web services, etc)
• Integrates data into logical business areas
• Provides direct access to data with powerful reporting tools (BI)
Data Warehouse – about the data
The Data Warehouse data
• Subject-oriented
• Integrated
• Non-volatile
• Time-variant
Data Warehouse – the ETL process
ETL = Extract, Transform, Load
Why ETL?Need to load the data warehouse regularly (daily/weekly) so that it can serve its purpose of facilitating business analysis.
Transform – removing inconsistencies, adding missing fields, summarizing detailed data and deriving new fields to store calculated data.
Load – map the data and load it into the DW
100010110101010101010101010101111
101011111111111110101010101010101011 DATA LOAD
Extract - data from one or more OLTP systems and copied into the warehouse
Legacy DB
CRM/ERP DB
Finance DB
Data Warehouse – the ETL process
Source Data ETL Process Target DW
Transform
1000101101010101 01010101010101111
101011111111111110101010101010101011 DATA LOAD
Load
Extract
Data Warehouse – the marketplace
“The data warehousing market will see a compound annual growth rate of 11.5% through 2013 to reach a total of $13.2 billion in revenue.” - Consulting Specialist, The 451 Group
Data Warehouse sizeSmall data warehouses: < 5 TBMidsize data warehouses: 5 TB - 20 TBLarge data warehouses: >20 TB- Analyst firm, Gartner
Leaders in Data Warehouse Data Management Systems
- Analyst firm Gartner’s ‘Magic Quadrant for Data Warehouse Database Management Systems’
Data Warehouse – the marketplace
Delivery Models• Stand-alone DBMS software • Cloud offerings• Data warehouse appliances
Leading Appliance Makers
Business Intelligence (BI)
Business Intelligence (BI)
B.I. – What is it?• Software applications used in spotting,
digging-out, and analyzing business data
• provides easy access to data and uses it in day to day operations, integrates data into logical business areas
• provides historical, current and predictive views of business operations
• made up of several related activities, including data mining, online analytical processing, querying and reporting.
Business Intelligence (BI) - Who uses it?
Wal-Mart uses vast amounts of data and category analysis to dominate the industry.
Amazon and Yahoo follow a "test and learn" approach to business changes.
Hardee’s, Wendy’s, and T.G.I. Friday’s use BI to make strategic decisions.
Business Intelligence (BI) & Data Marts
Data MartA database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise.
Typically hold aggregated data and some granular data. It is a subset of the DW and makes it more efficient for Business Intelligence reporting.
Legacy DBCRM/
ERP DB
Finance DB
ETL ETL
Source Data ETL Process Target DW ETL Process Data Mart
Business Intelligence (BI)
Legacy DB
CRM/ERP DB
Finance DBETL ETL
Source DataETL Process Target DW
ETL Process
Data Mart
B.I.– the marketplace
“Worldwide business intelligence (BI) platform, analytic applications and performance management (PM) software revenue reached $10.5 billion in 2010, a 13.4 percent increase from 2009 revenue of $9.3 billion”
“The four large "stack" vendors (SAP, Oracle, IBM and Microsoft) continue to consolidate the market, owning 59 percent of the market share. ”
- Analyst firm Gartner
- Analyst firm Forrester Research’s ‘Forrester Wave’
Leaders in BI
Testing a Data Warehouse Architecture
Resources involved
• Business Analysts create requirements
• QA Testers develop and execute test plans and test cases. ***Skill Set required: Very strong SQL!!!
• Architects set up test environments
• Developers perform unit tests
• DBAs test for performance and stress
• Business Users perform functional User Acceptance Tests
Testing a DW – Resources Involved
For the purposes of this presentation, we will focus on a strategy for Testers.
An effective data warehouse testing strategy focuses on the main structures within the data warehouse architecture:
1) The Sources2) The ETL layer3) The data warehouse itself4) The front-end (BI) data warehouse applications
Testing the Data Warehouse
Testing the Data Warehouse - Entry Points
Recommended functional test strategy: Test every entry point in the system (feeds, databases, internal messaging, front-end transactions).
The goal: provide rapid localization of data issues between points
test entry point(s) test entry point test entry point
Legacy DB
CRM/ERP DB
Finance DB
ETL ETL
Source Data ETL Process Target DW ETL Process Data Mart
BI
Target DW
Testing the Data Warehouse - Entry Points
Legacy DB
CRM/ERP DB
Finance DB
Source Data
File
File
Staging DBETL Process
ETL
ETL
ETL
ETL
ETL
ETL
test entry pointstest entry points
test entry points test entry points
Data MartsETL Process
ETL
ETL
BI
BI
Possible architecture
ETL
ETL
ETL
ETL
ETL
ETL
ETL Process
Testing the DW – Mapping Document
a.k.a. Source to Target Map
It’s the critical element required to efficiently plan the ETL process.
Intention: capture business rules data flow mapping and data movement requirements.
Mapping Doc specifies: Source input definition Target/output details Business & data transformation
rules Absolute data quality
requirements Optional data quality
requirements.
Testing the DW – Mapping Document
SELECT c.idCustomer "Customer ID", c.lastName "Customer Last Name", c.firstName "Customer First Name", o.idOrder "Order Number", p.name "Product Name", op.quantity "Quantity Ordered", CASE WHEN os.idOrderStatus = 5 AND o.refundDate IS NOT NULL THEN 'Returned' WHEN (os.idOrderStatus = 3 OR os.idOrderStatus = 4) AND o.shipDate IS NOT NULL THEN 'Delivered' ELSE 'Processing' END "Order Status"FROM Sales.Orders o, Sales.OrderStatus os, Sales.OrderProduct op, Sales.Product p, Sales.Category cat, Sales.Customer cWHERE o.order_idOrderStatus = os.idorderstatus ANDop.orderProduct_idOrder = o.idOrder ANDop.orderProduct_idProduct = p.idProduct ANDp.product_idCategory = cat.idCategory ANDcat.name = 'Electronics' ANDo.order_idCustomer = c.idCustomer ANDo.orderDate BETWEEN '01-SEP-10' AND '07-SEP-10'ORDER BY c.idCustomer, c.lastName, c.firstName, o.idorder
Source SELECT u.idUser "Customer ID", u.lastName "Customer Last Name", u.firstName "Customer First Name", p.idPurchase "Purchase Number", i.name "Item Name", oi.quantity "Quantity Ordered", ps.status "Purchase Status"FROM dw.Purchase p, dw.PurchaseStatus ps, dw.OrderItem oi, dw.Item i, dw.user_ u, dw.category catWHERE p.purchase_idPurchaseStatus = ps.idPurchaseStatus ANDoi.orderItem_idPurchase = p.idPurchase ANDoi.orderItem_idItem = i.idItem ANDp.purchase_idUser = u.idUser ANDi.item_idCategory = cat.idCategory ANDcat.name = 'Electronics' ANDSUBSTR(p.purchaseDate, 1, 5) BETWEEN '09-01' AND '09-07' ANDSUBSTR(p.purchaseDate, -2) = '10'ORDER BY u.idUser, u.lastname, u.firstname, p.idpurchase
Target
Testing the DW – Implementation
Implementation of Functional Test
What is going on in the marketplace?
1. Manual Execution
2. Automated execution with standard test tools
3. Bulk automation with DW Test Tool
Review Mapping
Docs
Write SQL in favorite editor
Run TESTs
Dump results to a file
Compare results manually or
w/compare tool
Report Defects and
issues
Tools Tasks
Timeline
Testing the DW– Manual Testing Flow
Manual ETL Testing Flow Comments Check points across each leg so that each transformation is checked.
If a file compare tool is used, care must be taken to ensure that the result rows for each query are in the same order (the db is under no obligation to return rows in a specified order, unless the sql indicates an order).
This process can quick result in 100’s or 1,000’s of pairs of queries.
Only a very small sampling can be performed.
Testing the DW– Manual Testing Flow
Functional Automation ETL Testing flow
1. Similar to previous - Extract mappings from mapping document
2. Write pairs of queries that test between any two points in the architecture.
3. Issue the queries via a Functional Automation tool
4. Have the functional Scripts dump the query result-sets to files
5. Compare the files, either by writing automation code or by using a file compare tool.
This process is dependent on the speed of the automation tool; only a fraction of the data can be covered per ETL per build.
Functional Tester
Testing the DW– Automated Testing Flow
SQL (source)
SQL (target)
SQL (source) SQL
(target)
Legacy DB
CRM/ERP DB
Finance DB
Testing the DW– DW Test Tool
Data Warehouse Test Automation tool
• Validates bulk verification up to 100% of all data
• Provides a huge increase in coverage and verification of your data
• Tremendously decreases your testing time and costs (i.e. huge ROI)
Testing the DW– DW Test Tool
Testing the DW– Functional Test of BI
Functional Testing of BI
1. Extract mappings from mapping doc for the data mart
2. Execute reports
3. Verify that data is correct
Verify to the source
Verify field lengths and field level data
Verify logical dependencies of fields
Functional Tester
Automation tools can and should be used for regression purposes.
For Business Intelligence (BI) applications, performance requirements must be met during batch report execution and normal user activity.
For BI applications, performance requirements must be met during batch report execution and normal user activity.
Since most BI applications are customized to meet the specific business requirements and data model of the organization, it is risky to rely on the initial performance testing done by the software vendor prior to their release.
It is therefore a common practice to test the performance of BI applications before their initial deployment and before any major system updates and upgrades.
Testing the DW– Performance Test of BI
Performance Tester
QuerySurge™ DEMONSTRATION
Automating ETL Testing
DEMO
Recommended