Upload
prince-jain
View
214
Download
0
Embed Size (px)
Citation preview
8/17/2019 Flat File Testing
1/10
What are Flat Files?
Flat files are extensively used for exchanging data between
enterprises or between organizations within an enterprise. Flatfiles come in two forms - delimited files such as CS !commaseparated" files or fixed width files.
What is Flat File #esting?
Flat File testing is the process of validating the $uality of data inthe flat file as well as ensuring that the data in the flat file hasbeen consumed appropriately by the application or %#&
process.
Challenges in Flat File #esting?
#esting of inbound flat files presents uni$ue challenges becausethe producer of the flat file is usually different organizationswithin an enterprise or an external vendor. Conse$uently' theremight be differences in the format and content of thefiles since there is no easy way to enforce the data type anddata $uality constraints on the data in the flat files. (ssues in flatfile data can cause failures in the consuming process. Whilethe file processing re$uirements are different from pro)ect topro)ect' the focus of this use case is to list out some of thecommon chec*s that need to be performed for validating flatfiles.
Flat File Testing Categories
FLAT FILE INGESTION TESTING
8/17/2019 Flat File Testing
2/10
When data is moed using flat files !etween enter"rises or organi#ations within
enter"rise$ it is im"ortant to "erform a set of file ingestion alidations on the
in!ound flat files !efore consuming the data in those files%
File name validation
Files are ftp+ed or copied over to a specific folder for processing. #hese files usually have a specific
naming convention so that the process consuming the file is able to understand the contents and
date. From a testing standpoint' the file name pattern needs to be validated to verify that it meets the
re$uirement.
%xample, government agency that gets files from multiple vendors on a periodic basis. #he arriving files
should follow a naming convension of +CompanyCodeContent#ype/ate#imestamp.csv+. 0owever' the
files coming in from a specific vendor do not have have the correct company name.
Size and Format of the flat files
lthough' flat files are generally delimited or fixed width' it is common to have a header and footer in
these files. Sometimes' these headers have a rowcount that can be used to verify that the file
contains the entire data as expected.
Some of the relevant chec*s are,
erify that the size of the file is within the expected range where applicable.
erify that the header' footer and column heading rows have the expected format and have the
expected location within the flat file.
1erform any row count chec*s to cross chec* the data in the header with the values in the
delimited data.
%xample, financial reporting company generates files with a header that contains the summary amount
with the line items having the detailed split. #he sum of the amounts in the line items should match the
summary amount in the header.
File arrival' processing and deletion times
Files arrive periodically into a specific networ* folder or an ftp location before getting consumed by a
process. 2sually' there are specific re$uirements that need to be met regarding the file arrival time'
order of arrival and retaining them.
%xample, pharma company gets a set of files from a vendor on a daily basis. #he process consuming
this files expects the complete set of files to be available before processing
3. file that were supposed to come yesterday was delayed. (t came in sometime after today+s file arrived
8/17/2019 Flat File Testing
3/10
causing issues due to difference in the order of processing the files.
4. fter the files gets processed' it is supposed to be moved to a specific directory where it is to be
retained for a specified period of time and deleted. 0owever' the file did not get copied over.
5. Compare the transformed data in the target table with the expected values for the test data.
Automate file ingestion testing using ETL &alidator
%#& alidator comes with Com"onent Test Case and File Watcher which can be used to test Flat
Files.
Flat File Com"onent' Flat file component is part of the Component #est Case. (t can be used to
define data t("e anddata )ualit( rules on the incoming flat file. #he data in the flat file can also be
compared with data from the database.
File Watcher' 2sing File Watcher test plans can be triggered automatically when a new file comes
into a directory so that the test cases on the file can be executed automatically before the files are
used further by the consuming process.
SFT* Connection' 6a*es it easy compare and validate flat files located in a remote SF#1
location.
FLAT FILE +ATA T,*E TESTING
The "ur"ose of +ata T("e testing is to erif( that the t("e and length of the data
in the flat file is as ex"ected%
/ata #ype Chec*
erify that the type and format of the data in the inbound flat file matches the expected data type for
the file. For date' timestamp and time data types' the values are expected to be in a specific format
so that they can be parsed by the consuming process.
%xample, n (/ column of the flat file is expected to have only numbers. 0owever' few rows in the flat file
have characters.
/ata &ength Chec*
&ength of string and number data values in the flat file should match the maximum allowed length for
those columns.
8/17/2019 Flat File Testing
4/10
%xample, /ata for the comments column has more than 7888 characters in the inbound flat file while the
limit for the corresponding column in the database is only 4888 characters.
9ot 9ull Chec*
erify that any re$uired data elements in the flat file have data for all the rows.
%xample, /ate of :irth is a re$uired data element but some of the records are missing values in the
inbound flat file.
Automate flat file data t("e testing with ETL &alidator
%#& alidator provides the capability to specify data type chec*s on the flat file in the flat file
com"onent. :ased on the data types specified' %#& alidator automatically chec* all the records
in the incoming flat file to find any invalid records.
FLAT FILE +ATA -.ALIT, TESTING
The "ur"ose of +ata -ualit( tests is to erif( the accurac( of the data in thein!ound flat files%
/uplicate /ata Chec*s
Chec* for duplicate rows in the inbound flat file with the same uni$ue *ey column or a uni$ue
combination of columns as per business re$uirement.
%xample, :usiness re$uirement says that a combination of First 9ame' &ast 9ame' 6iddle 9ame and
/ata of :irth should be uni$ue for the Customer list flat file.
Sam"le )uer( to identif( du"licates /assuming that the flat file data can !e im"orted into a
data!ase ta!le0SELECT fst_name, lst_name, mid_name, date_of_birth, count(1) FROM Customer RO!" #$
fst_name, lst_name, mid_name %&' count(1)*1
;eference /ata Chec*s
Flat file standards may dictate that the values in certain columns should adhere to a values in a
domain. erify that the values in the inbound flat file conforms to reference data standards.
8/17/2019 Flat File Testing
5/10
%xample, alues in the countrycode column should have a valid country code from a Country Code
domain.
select distinct countr+_code from address
minus
select countr+_code from countr+
/ata alidation ;ules
6any data fields can contain a range of values that cannot be enumerated. 0owever' there are
reasonable constraints or rules that can be applied to detect situations where the data is clearly
wrong. (nstances of fields containing values violating the validation rules defined represent a $uality
gap that can impact inbound flat file processing.
%xample, /ate of birth !/
8/17/2019 Flat File Testing
6/10
%#& alidator supports defining of data $uality rules in Flat File Com"onent for automating the data
$uality testing without writing any database $ueries. Custom rules can be defined and added to the
/ata 6odel template.
FLAT FILE +ATA CO4*LETENESS TESTING
+ata in the in!ound flat files is generall( "rocessed and loaded into a data!ase%
In some cases the ou"ut ma( also !e another flat file% The "ur"ose of +ata
Com"leteness tests are to erif( that all the ex"ected data is loaded in the target
from the in!ound flat file% Some of the tests that can !e run are ' Com"are and
&alidate counts$ aggregates /min$ max$ sum$ ag0 and actual data !etween the flat
file and target%
;ecord Count alidation
Compare count of records of the flat file and database table. Chec* for any re)ected records.
%xample, simple count of records comparison between the source and target tables.
Source -uer( /assuming the flat file data is loaded into 5customer5 ta!le for alidation0
SELECT count(1) src_count FROM customer
Target -uer(
SELECT count(1) t/t_count FROM customer_dim
Column /ata 1rofile alidation
Column or attribute level data profiling is an effective tool to compare source and target data without
actually comparing the entire data. (t is similar to comparing the chec*sum of your source and target
data. #hese tests are essential when testing large amounts of data.
Some of the common data profile comparisons that can be done between the flat file and target are,
Compare uni)ue alues in a column between the flat file and target
Compare max$ min$ ag$ max length$ min length values for columns depending of the data type
Compare null alues in a column between the flat file and target
For important columns' compare data distri!ution /fre)uenc(0 in a column between the flat file
and target
%xample 3, Compare column counts with values !non null values" between source and target for each
column based on the mapping.
Source -uer( /assuming the flat file data is loaded into 5customer5 ta!le for alidation0
8/17/2019 Flat File Testing
7/10
SELECT count(ro_id), count(fst_name), count(lst_name), a0/(re0enue) FROM customer
Target -uer(
SELECT count(ro_id), count(first_name), count(last_name), a0/(re0enue) FROM customer_dim
%xample 4, Compare the number of customers by country between the source and target.
Source -uer( /assuming the flat file data is loaded into 5customer5 ta!le for alidation0
SELECT countr+, count() FROM customer RO!" #$ countr+
Target -uer(
SELECT countr+_cd, count() FROM customer_dim RO!" #$ countr+_cd
Compare entire flat file and target data
Compare data !values" between the flat file and target data effectively validating 388> of the data. (n
regulated industries such as finance and pharma' 388> data validation might be a compliance
re$uirement. (t is also a *ey re$uirement for data migration pro)ects. 0owever' performing 388> data
validation is a challenge when large volumes of data is involved. #his is where %#& testing tools
such as %#& alidator can be used because they have an inbuilt %& engine !%xtract' &oad'
alidate" capabile of comparing large values of data.
%xample, Write a source $uery on the flat file that matches the data in the target table after
transformation.
Source -uer( /assuming the flat file data is loaded into 5customer5 ta!le for alidation0
SELECT cust_id, fst_name, lst_name, fst_name223,322lst_name, 4O# FROM Customer
Target -uer(
SELECT inte/ration_id, first_name, Last_name, full_name, date_of_birth FROM Customer_dim
Automate flat file data com"leteness testing using ETL &alidator
%#& alidator comes with Flat File Com"onent and +ata *rofile Com"onent as part
of Com"onent Test Case for automating the comparison of flat file and target data. (t ta*es care of
loading the flat file data into a table for running validations.
+ata *rofile Com"onent' utomatically computes profile of the flat file data and target $uery
results - count' count distinct' nulls' avg' max' min' maxlength and minlength.
Com"onent Test Case' 1rovides a visual test case builder that can be used to compare multiple
flat files and target.
8/17/2019 Flat File Testing
8/10
FLAT FILE +ATA T6ANSFO64ATION TESTING
+ata in the in!ound Flat File is transformed !( the consuming "rocess and
loaded into the target /ta!le or file0% It is im"ortant to test the transformed data %
There are two a""roaches for testing transformations 7 white !ox testing and
!lac2!ox testing
#ransformation testing using White :ox approach
White box testing is a testing techni$ue' that examines the program structure and derives test data
from the program logiccode.
For transformation testing' this involves reviewing the transformation logic from the flat file data
ingestion design document and corresponding code to come up with test cases.
#he steps to be followed are listed below,
;eview the transformation design document
pply transformations on the flat file data using S@& or a procedural language such as 1&S@& to
reflect the %#& transformation logic
Compare the results of the transformed data with the data in the target table or target flat file.
#he advantage with this approach is that the tests can be rerun easily on a larger data set. #he
disadvantage of this approach is that the tester has to reimplement the transformation logic.
%xample, (n a financial company' (n a financial company' the interest earned on the savings account is
dependent the daily balance in the account for the month. #he daily balance for the month is part of an
inbound CS file for the process that computes the interest.
3. ;eview the re$uirement and design for calculating the interest.
4. (mplement the logic using your favorite programming language.
5. Compare your output with data in the target table.
#ransformation testing using :lac* :ox approach
:lac*-box testing is a method of software testing that examines the functionality of an application
without peering into its internal structures or wor*ings. For transformation testing' this involves
reviewing the transformation logic from the mapping design document setting up the test data
appropriately.
#he steps to be followed are listed below,
8/17/2019 Flat File Testing
9/10
;eview the re$urements document to understand the transformation re$uirements
1repare test data in the flat file to reflect different transformation scenarios
Come with the transformed data values or the expected values for the test data from the previous
step
Compare the results of the transformed test data in the target table with the expected values.
#he advantage with this approach is that the transformation logic does not need to be
reimplemented during the testing. #he disadvantage of this approach is that the tester needs to
setup test data for each transformation scenario and come up with the expected values for the
transformed data manually.
%xample, (n a financial company' the interest earned on the savings account is dependent the daily
balance in the account for the month.
3. ;eview the re$uirement for calculating the interest.
4. Setup test data in the flat file for various scenarios of daily account balance.
5. Compare the transformed data in the target table with the expected values for the test data.
Automate data transformation testing using ETL &alidator
%#& alidator comes with Com"onent Test Case which can be used to test transformations
using the White :ox approach or the :lac* :ox approach.
&isual Test Case 8uilder' Component test case has a visual test case builder that ma*es it easy
to rebuild the transformation logic for testing purposes.
Wor2schema' %#& alidator+s wor*schema stores the test data from source and target $ueries.
#his ma*es it easy for the tester to implement transformations and compare using a Scri"t
Com"onent.
8enchmar2 Ca"a!ilit(' 6a*es it easy baseline the target table !expected data" and compare the
latest data with the baselined data.
FLAT FILE INGESTION *E6FO64ANCE TESTING
8/17/2019 Flat File Testing
10/10
The goal of "erformance testing is to alidate that the "rocess consuming the
in!ound flat files is a!le to handle flat files with the ex"ected data olumes and
in!ound arrial fre)uenc(%
%xample 3, #he process ingesting the flat file might perform well when the data when there are only a few
records in the file but perform bad when there is large number of rows.
%xample 4, #he flat file ingestion process may also perform bad as the data volumes increase in the
target table.
%nd-to-%nd /ata #esting of Flat File ingestion
(ntegration testing of the inbound flat file ingestion process and the related applications involves the
following steps,
%stimate expected data volumes in each of the source flat files for the consuming process for thenext 3-5 years.
Setup test data for performance testing either by generating sample flat files or getting sample flat
files.
%xecute the flat file ingestion process to load the test data into the target.
%xecuting the flat file ingestion process again with large data in the target tables to identify
bottlenec*s.