32
A MAJOR SEMINAR ON Data Validation System For A Relational Database Guided By: Ms. Swati Jain (HOD I.T Dept.) Submitted By: Ajay Kumar (IT/09/18) Submitted To: Ms. Shazia Haque Mr. Manish Prajapati DEPARTMENT OF INFORMATION TECHNOLOGY POORNIMA COLLEGE OF ENGINEERING, JAIPUR POORNIMA COLLEGE OF ENGINEERING , JAIPUR

Data Validation

Embed Size (px)

DESCRIPTION

ppt

Citation preview

Page 1: Data Validation

A MAJOR SEMINAR ON

Data Validation System For A Relational Database

Guided By:Ms. Swati Jain(HOD I.T Dept.)

Submitted By:Ajay Kumar(IT/09/18)

Submitted To:Ms. Shazia Haque

Mr. Manish Prajapati

DEPARTMENT OF INFORMATION TECHNOLOGY POORNIMA COLLEGE OF ENGINEERING, JAIPUR

POORNIMA COLLEGE OF ENGINEERING , JAIPUR

Page 2: Data Validation

SEMINAR OUTLINE Introduction About base paper Need of data validation Methods of data validation Data validation techniques Relational Database Relational model RDBMS Base & derived relation Relational operators Normalizations Conclusion References

Page 3: Data Validation

AUTHORS

BAAH Barida Computer Science Department University of Port Harcourt Port Harcourt, Nigeria [email protected]

Kabari, Ledisi Giok* (Member, IEEE) Computer Science department Rivers State Polytechnic, Bori, Nigeria [email protected]

Page 4: Data Validation

JOURNAL

(IJARCS) focusing on theories, methods and applications in computer science and relevant fields.

It is an international scientific journal that aims to contribute to the constant scientific research and training, so as to promote research in the field of computer science.

It covers areas like computer engineering, computer networks, biometrics and bioinformatics, database management system, Artificial Intelligence, Software Engineering and many more.

Page 5: Data Validation

INTRODUCTION

Data validation is the process of ensuring that a program operates on clean, correct and useful data.

The simplest data validation verifies that the characters provided come from a valid set.

Incorrect data validation can lead to data corruption or a security vulnerability.

A validation process involves two distinct steps: (a) Validation Check (b) Post-Check action

Page 6: Data Validation

NEED OF DATA VALIDATION

To avoid system failure. To check the validity and consistency of data before using the data set.

Page 7: Data Validation

METHODS OF DATA VALIDATION Character check – The character check ensure that only the expected

characters are present in a field.

Batch totals- This checks the missing records. The numerical fields of the all records may be added together in a batch.

Check digits- This check is performed for numerical data. In this check an extra digit is added to the end of the number that is calculated from the digits of that number. When the data are entered then the computer checks this calculation.

Consistency checks- this methods checks data in these fields corresponds to the other fields.

Control totals- In this type of checking a total is done on one or more columns of database which is available in almost all records of that table.

Cross-system consistency checks- This type of check compares data in different system to confirm its consistency.

Page 8: Data Validation

CONT… Data type checks- checks the data type of the input data and if it does

not appear the desired data type then an error message will be displayed to the user.

File existence check- This type of check, checks whether a file with the specified name exists.

Format check- This type of check ensures that whether the data is in a specified format. For e.g. dates have to be in format DD/MM/YYYY. We can use regular expression for this type of checking.

Hash totals- It is same as the batch total that is done on one or more numeric fields that appear in the tuple of a relation in the relational database.

  Limit check- Unlike range check, the data is check for only one limit i.e.

upper limit or lower limit. Logic check- This type of checking ensures that whether the input

value does not create an logical error.

Page 9: Data Validation

CONT… Presence check- This type of check ensures whether the important

data is not missed out.

Range check- This type of check ensures that the entered data should lie in a specified range.

Referential integrity- In a relational database if we want to link two table then primary key and foreign key are used. For foreign key validation the referencing table must refer to a valid tuple in the referenced table.

Spelling and grammar check- This type of checking looks for the spelling and grammar errors.

Uniqueness check- In this type of checking the uniqueness of desired values is checked. This can be applied to several fields like address, Mobile number etc.

Page 10: Data Validation

DATA VALIDATION TECHNIQUES

1. Accept Known good –

• Also known as ‘whitelist’ or ‘positive’ validation.

• In this the data is one of a set of tightly constrained known good values.

• Any entered data that doesn’t match should be rejected.

• The data should be-o length checkedo Range checked if a numeric valueo Syntax or grammar should be checked

 

Page 11: Data Validation

CONT…

2. Reject Known Bad –• Also known as ‘negative’ or ‘blacklist’.

• The Reject Known Bad strategy is very dangerous, because we have to maintain the set of ‘known bad’ data

• For this strategy we use regular expressions. So to validate the data the regular expression should run over every field. That is the reason this strategy is slow and not secure.

Page 12: Data Validation

CONT…

3. Sanitize – •In this rather reject or accept the entered data is converted into an acceptable format.

 Sanitize with Whitelist –

•Any characters which are not part of an approved list can be removed, encoded or replaced.

•Sanitize with Blacklist –

•Eliminate or translate characters (such as to HTML entities or to remove quotes) in an effort to make the input "safe". As most fields have a particular grammar, it is simpler, faster, and more secure to simply validate a single correct positive test than to try to include complex and slow sanitization routines for all current and future attacks.

Page 13: Data Validation

RELATIONAL DATABASE

A method for structuring data in the form of sets of records or tuples so that relations between different entities and attributes can be used for data access and transformation.

A database that is perceived by the user as a collection of two dimensional tables.

Each table contains one or more columns those define the attributes of that table.

Page 14: Data Validation

RDBMS

A database system made up of files with data elements in two-dimensional array (rows and columns).

This database management system has the capability to recombine data elements to form different relations resulting in a great flexibility of data usage.

Page 15: Data Validation

RELATIONAL MODEL

In this model data is stored in tables. Each table contains columns for each field. Applications access data by specifying queries, which use operations

such as select , project and join. The relational model contains the following components:

• Collection of objects or relations• Set of operations to act on the relations• Data integrity for accuracy and consistency

Page 16: Data Validation

CONT…

Page 17: Data Validation

BASE AND DERIVED RELATION

Baseo Relations those store data.o in implementations are called “tables”.

Derivedo The relations those are derived from the base relations.o we can also apply operators on these derived relations. o In implementations these are called “view” or “queries”.

Page 18: Data Validation

RELATIONAL OPERATORS

Queries made against the relational database, and the derived relations in the database are expressed in a relational calculus or a relational algebra.

In total there are eight operators are found in relational theory, namely SELECT, PROJECT, JOIN, INTERSECT, UNION, DIFFERENCE, PRODUCT and DIVIDE.

Page 19: Data Validation

OPERATOR: SELECT Needs a single table as its operand. Can be used to list either all row values or it can yield only those

row values that match a specified criterion.

Page 20: Data Validation

OPERATOR: PROJECT

Uses a single table as its operand Yields all values for selected attributes

Page 21: Data Validation

OPERATOR: UNION Needs two tables as its operands Combines all rows from two tables, excluding

duplicate rows. Tables, used as operands, must be UNION compatible

with each other.

Page 22: Data Validation

OPERATOR: INTERSECT

Needs two tables as its operands Yields only the rows that appear in both the tables Operand tables must be UNION compatible with each other

Page 23: Data Validation

OPERATOR: DIFFERENCE

Needs two tables as its operands Yields all rows in one table not found in the other table—that is, it

subtracts one table from the other. Requires the UNION compatibility of the operand tables.

Page 24: Data Validation

OPERATOR: PRODUCT

Needs two tables as its operands Yields all possible pairs of rows from the two tables. The yielded result is also known as the Cartesian product.

Page 25: Data Validation

OPERATOR: DIVIDE

DIVIDE requires the use of one single-column table and one two-column table

Page 26: Data Validation

OPERATOR: JOIN

Allows us to combine information from two tables Uses two table having a common attribute as its operands JOIN allows the use of independent tables, linked by common

attributes, resulting in minimal redundancy possible.

Page 27: Data Validation

NORMALIZATION

normalization is the process of splitting tables with redundant information into two or more tables

The goal of normalization is to reduce or even eliminate data redundancy

1st Normal Form (1NF)o There are no duplicated rows in the table.o Each cell is single-valued (i.e., there are no repeating groups or

arrays).o Entries in a column (attribute, field) are of the same kind.

Page 28: Data Validation

CONT…

2nd Normal Form (2NF) A table is in 2NF if it is in 1NF and if all non-key attributes are

dependent on all of the key.

3rd Normal Form (3NF) A table is in 3NF if it is in 2NF and if it has no transitive dependencies.

Boyce- Codd Normal Form (BCNF) A table is in BCNF if it is in 3NF and if every determinant is a

candidate key.

Page 29: Data Validation

CONCLUSION

As Data validation has to do with client side or end user to ensure that only clean, correct and useful data are accepted while those data that are not useful to the relational database system are rejected by the display of an error message to alert the user or client while entering data into the database system.

Page 30: Data Validation

REFERENCES

M. Arkady “Data Quality Assessment”, Technics Publication, LLC(2007) D. Scott and R. Sharp “Specifying and enforcing application-level web

security policies”, IEEE knowledge Data Engineering, vol. 15, no. 4(2003) "Derivability, Redundancy, and Consistency of Relations Stored in Large

Data Banks", E.F. Codd, IBM Research Report, 1969 E. F. Codd, The Relational Model for Database Management, Addison-

Wesley Publishing Company, 1990

Page 31: Data Validation
Page 32: Data Validation

QUERIES