Flat File Testing

Embed Size (px)

Citation preview

  • 8/17/2019 Flat File Testing

    1/10

    What are Flat Files?

    Flat files are extensively used for exchanging data between

    enterprises or between organizations within an enterprise. Flatfiles come in two forms - delimited files such as CS !commaseparated" files or fixed width files.

    What is Flat File #esting?

    Flat File testing is the process of validating the $uality of data inthe flat file as well as ensuring that the data in the flat file hasbeen consumed appropriately by the application or %#&

    process.

    Challenges in Flat File #esting?

    #esting of inbound flat files presents uni$ue challenges becausethe producer of the flat file is usually different organizationswithin an enterprise or an external vendor. Conse$uently' theremight be differences in the format and content of thefiles since there is no easy way to enforce the data type anddata $uality constraints on the data in the flat files. (ssues in flatfile data can cause failures in the consuming process. Whilethe file processing re$uirements are different from pro)ect topro)ect' the focus of this use case is to list out some of thecommon chec*s that need to be performed for validating flatfiles. 

    Flat File Testing Categories

    FLAT FILE INGESTION TESTING

  • 8/17/2019 Flat File Testing

    2/10

    When data is moed using flat files !etween enter"rises or organi#ations within

    enter"rise$ it is im"ortant to "erform a set of file ingestion alidations on the

    in!ound flat files !efore consuming the data in those files%

    File name validation

    Files are ftp+ed or copied over to a specific folder for processing. #hese files usually have a specific

    naming convention so that the process consuming the file is able to understand the contents and

    date. From a testing standpoint' the file name pattern needs to be validated to verify that it meets the

    re$uirement.

    %xample, government agency that gets files from multiple vendors on a periodic basis. #he arriving files

    should follow a naming convension of +CompanyCodeContent#ype/ate#imestamp.csv+. 0owever' the

    files coming in from a specific vendor do not have have the correct company name.

    Size and Format of the flat files

     lthough' flat files are generally delimited or fixed width' it is common to have a header and footer in

    these files. Sometimes' these headers have a rowcount that can be used to verify that the file

    contains the entire data as expected.

    Some of the relevant chec*s are,

      erify that the size of the file is within the expected range where applicable.

    erify that the header' footer and column heading rows have the expected format and have the

    expected location within the flat file.

    1erform any row count chec*s to cross chec* the data in the header with the values in the

    delimited data.

    %xample, financial reporting company generates files with a header that contains the summary amount

    with the line items having the detailed split. #he sum of the amounts in the line items should match the

    summary amount in the header.

    File arrival' processing and deletion times

    Files arrive periodically into a specific networ* folder or an ftp location before getting consumed by a

    process. 2sually' there are specific re$uirements that need to be met regarding the file arrival time'

    order of arrival and retaining them.

    %xample, pharma company gets a set of files from a vendor on a daily basis. #he process consuming

    this files expects the complete set of files to be available before processing

    3. file that were supposed to come yesterday was delayed. (t came in sometime after today+s file arrived

  • 8/17/2019 Flat File Testing

    3/10

    causing issues due to difference in the order of processing the files.

    4. fter the files gets processed' it is supposed to be moved to a specific directory where it is to be

    retained for a specified period of time and deleted. 0owever' the file did not get copied over.

    5. Compare the transformed data in the target table with the expected values for the test data.

    Automate file ingestion testing using ETL &alidator

    %#& alidator comes with Com"onent Test Case and File Watcher  which can be used to test Flat

    Files.

      Flat File Com"onent' Flat file component is part of the Component #est Case. (t can be used to

    define data t("e anddata )ualit( rules on the incoming flat file. #he data in the flat file can also be

    compared with data from the database.

    File Watcher' 2sing File Watcher test plans can be triggered automatically when a new file comes

    into a directory so that the test cases on the file can be executed automatically before the files are

    used further by the consuming process.

      SFT* Connection' 6a*es it easy compare and validate flat files located in a remote SF#1

    location.

    FLAT FILE +ATA T,*E TESTING

    The "ur"ose of +ata T("e testing is to erif( that the t("e and length of the data

    in the flat file is as ex"ected%

    /ata #ype Chec*

    erify that the type and format of the data in the inbound flat file matches the expected data type for

    the file. For date' timestamp and time data types' the values are expected to be in a specific format

    so that they can be parsed by the consuming process.

    %xample, n (/ column of the flat file is expected to have only numbers. 0owever' few rows in the flat file

    have characters.

    /ata &ength Chec*

    &ength of string and number data values in the flat file should match the maximum allowed length for 

    those columns.

  • 8/17/2019 Flat File Testing

    4/10

    %xample, /ata for the comments column has more than 7888 characters in the inbound flat file while the

    limit for the corresponding column in the database is only 4888 characters.

    9ot 9ull Chec*

    erify that any re$uired data elements in the flat file have data for all the rows.

    %xample, /ate of :irth is a re$uired data element but some of the records are missing values in the

    inbound flat file.

    Automate flat file data t("e testing with ETL &alidator 

    %#& alidator provides the capability to specify data type chec*s on the flat file in the flat file

    com"onent. :ased on the data types specified' %#& alidator automatically chec* all the records

    in the incoming flat file to find any invalid records.

    FLAT FILE +ATA -.ALIT, TESTING

    The "ur"ose of +ata -ualit( tests is to erif( the accurac( of the data in thein!ound flat files%

    /uplicate /ata Chec*s

    Chec* for duplicate rows in the inbound flat file with the same uni$ue *ey column or a uni$ue

    combination of columns as per business re$uirement.

    %xample, :usiness re$uirement says that a combination of First 9ame' &ast 9ame' 6iddle 9ame and

    /ata of :irth should be uni$ue for the Customer list flat file.

    Sam"le )uer( to identif( du"licates /assuming that the flat file data can !e im"orted into a

    data!ase ta!le0SELECT fst_name, lst_name, mid_name, date_of_birth, count(1) FROM Customer RO!" #$

    fst_name, lst_name, mid_name %&' count(1)*1

    ;eference /ata Chec*s

    Flat file standards may dictate that the values in certain columns should adhere to a values in a

    domain. erify that the values in the inbound flat file conforms to reference data standards.

  • 8/17/2019 Flat File Testing

    5/10

    %xample, alues in the countrycode column should have a valid country code from a Country Code

    domain.

    select distinct countr+_code from address

    minus

    select countr+_code from countr+ 

    /ata alidation ;ules

    6any data fields can contain a range of values that cannot be enumerated. 0owever' there are

    reasonable constraints or rules that can be applied to detect situations where the data is clearly

    wrong. (nstances of fields containing values violating the validation rules defined represent a $uality

    gap that can impact inbound flat file processing.

    %xample, /ate of birth !/

  • 8/17/2019 Flat File Testing

    6/10

    %#& alidator supports defining of data $uality rules in Flat File Com"onent for automating the data

    $uality testing without writing any database $ueries. Custom rules can be defined and added to the

    /ata 6odel template.

    FLAT FILE +ATA CO4*LETENESS TESTING

    +ata in the in!ound flat files is generall( "rocessed and loaded into a data!ase%

    In some cases the ou"ut ma( also !e another flat file% The "ur"ose of +ata

    Com"leteness tests are to erif( that all the ex"ected data is loaded in the target

    from the in!ound flat file% Some of the tests that can !e run are ' Com"are and

    &alidate counts$ aggregates /min$ max$ sum$ ag0 and actual data !etween the flat

    file and target%

    ;ecord Count alidation

    Compare count of records of the flat file and database table. Chec* for any re)ected records.

    %xample, simple count of records comparison between the source and target tables.

    Source -uer( /assuming the flat file data is loaded into 5customer5 ta!le for alidation0

    SELECT count(1) src_count FROM customer 

    Target -uer(

    SELECT count(1) t/t_count FROM customer_dim

    Column /ata 1rofile alidation

    Column or attribute level data profiling is an effective tool to compare source and target data without

    actually comparing the entire data. (t is similar to comparing the chec*sum of your source and target

    data. #hese tests are essential when testing large amounts of data.

    Some of the common data profile comparisons that can be done between the flat file and target are,

    Compare uni)ue alues in a column between the flat file and target

      Compare max$ min$ ag$ max length$ min length values for columns depending of the data type

      Compare null alues in a column between the flat file and target

      For important columns' compare data distri!ution /fre)uenc(0 in a column between the flat file

    and target

    %xample 3, Compare column counts with values !non null values" between source and target for each

    column based on the mapping.

    Source -uer( /assuming the flat file data is loaded into 5customer5 ta!le for alidation0

  • 8/17/2019 Flat File Testing

    7/10

    SELECT count(ro_id), count(fst_name), count(lst_name), a0/(re0enue) FROM customer 

    Target -uer(

    SELECT count(ro_id), count(first_name), count(last_name), a0/(re0enue) FROM customer_dim

    %xample 4, Compare the number of customers by country between the source and target.

    Source -uer( /assuming the flat file data is loaded into 5customer5 ta!le for alidation0

    SELECT countr+, count() FROM customer RO!" #$ countr+ 

    Target -uer(

    SELECT countr+_cd, count() FROM customer_dim RO!" #$ countr+_cd 

    Compare entire flat file and target data

    Compare data !values" between the flat file and target data effectively validating 388> of the data. (n

    regulated industries such as finance and pharma' 388> data validation might be a compliance

    re$uirement. (t is also a *ey re$uirement for data migration pro)ects. 0owever' performing 388> data

    validation is a challenge when large volumes of data is involved. #his is where %#& testing tools

    such as %#& alidator can be used because they have an inbuilt %& engine !%xtract' &oad'

    alidate" capabile of comparing large values of data.

    %xample, Write a source $uery on the flat file that matches the data in the target table after

    transformation.

    Source -uer( /assuming the flat file data is loaded into 5customer5 ta!le for alidation0

    SELECT cust_id, fst_name, lst_name, fst_name223,322lst_name, 4O# FROM Customer 

    Target -uer(

    SELECT inte/ration_id, first_name, Last_name, full_name, date_of_birth FROM Customer_dim

    Automate flat file data com"leteness testing using ETL &alidator 

    %#& alidator comes with Flat File Com"onent and +ata *rofile Com"onent as part

    of Com"onent Test Case for automating the comparison of flat file and target data. (t ta*es care of

    loading the flat file data into a table for running validations.

      +ata *rofile Com"onent' utomatically computes profile of the flat file data and target $uery

    results - count' count distinct' nulls' avg' max' min' maxlength and minlength.

      Com"onent Test Case' 1rovides a visual test case builder that can be used to compare multiple

    flat files and target.

  • 8/17/2019 Flat File Testing

    8/10

    FLAT FILE +ATA T6ANSFO64ATION TESTING

    +ata in the in!ound Flat File is transformed !( the consuming "rocess and

    loaded into the target /ta!le or file0% It is im"ortant to test the transformed data %

    There are two a""roaches for testing transformations 7 white !ox testing and

    !lac2!ox testing

    #ransformation testing using White :ox approach

    White box testing is a testing techni$ue' that examines the program structure and derives test data

    from the program logiccode.

    For transformation testing' this involves reviewing the transformation logic from the flat file data

    ingestion design document and corresponding code to come up with test cases.

    #he steps to be followed are listed below,

      ;eview the transformation design document

     pply transformations on the flat file data using S@& or a procedural language such as 1&S@& to

    reflect the %#& transformation logic

    Compare the results of the transformed data with the data in the target table or target flat file.

    #he advantage with this approach is that the tests can be rerun easily on a larger data set. #he

    disadvantage of this approach is that the tester has to reimplement the transformation logic.

    %xample, (n a financial company' (n a financial company' the interest earned on the savings account is

    dependent the daily balance in the account for the month. #he daily balance for the month is part of an

    inbound CS file for the process that computes the interest.

    3. ;eview the re$uirement and design for calculating the interest.

    4. (mplement the logic using your favorite programming language.

    5. Compare your output with data in the target table.

    #ransformation testing using :lac* :ox approach

    :lac*-box testing is a method of software testing that examines the functionality of an application

    without peering into its internal structures or wor*ings. For transformation testing' this involves

    reviewing the transformation logic from the mapping design document setting up the test data

    appropriately.

    #he steps to be followed are listed below,

  • 8/17/2019 Flat File Testing

    9/10

      ;eview the re$urements document to understand the transformation re$uirements

    1repare test data in the flat file to reflect different transformation scenarios

      Come with the transformed data values or the expected values for the test data from the previous

    step

      Compare the results of the transformed test data in the target table with the expected values.

    #he advantage with this approach is that the transformation logic does not need to be

    reimplemented during the testing. #he disadvantage of this approach is that the tester needs to

    setup test data for each transformation scenario and come up with the expected values for the

    transformed data manually.

    %xample, (n a financial company' the interest earned on the savings account is dependent the daily

    balance in the account for the month.

    3. ;eview the re$uirement for calculating the interest.

    4. Setup test data in the flat file for various scenarios of daily account balance.

    5. Compare the transformed data in the target table with the expected values for the test data.

    Automate data transformation testing using ETL &alidator

    %#& alidator comes with Com"onent Test Case which can be used to test transformations

    using the White :ox approach or the :lac* :ox approach.

      &isual Test Case 8uilder' Component test case has a visual test case builder that ma*es it easy

    to rebuild the transformation logic for testing purposes.

      Wor2schema' %#& alidator+s wor*schema stores the test data from source and target $ueries.

    #his ma*es it easy for the tester to implement transformations and compare using a Scri"t

    Com"onent.

      8enchmar2 Ca"a!ilit(' 6a*es it easy baseline the target table !expected data" and compare the

    latest data with the baselined data.

    FLAT FILE INGESTION *E6FO64ANCE TESTING

  • 8/17/2019 Flat File Testing

    10/10

    The goal of "erformance testing is to alidate that the "rocess consuming the

    in!ound flat files is a!le to handle flat files with the ex"ected data olumes and

    in!ound arrial fre)uenc(%

    %xample 3, #he process ingesting the flat file might perform well when the data when there are only a few

    records in the file but perform bad when there is large number of rows.

    %xample 4, #he flat file ingestion process may also perform bad as the data volumes increase in the

    target table.

    %nd-to-%nd /ata #esting of Flat File ingestion

    (ntegration testing of the inbound flat file ingestion process and the related applications involves the

    following steps,

    %stimate expected data volumes in each of the source flat files for the consuming process for thenext 3-5 years.

      Setup test data for performance testing either by generating sample flat files or getting sample flat

    files.

      %xecute the flat file ingestion process to load the test data into the target.

      %xecuting the flat file ingestion process again with large data in the target tables to identify

    bottlenec*s.