Upload
doandung
View
259
Download
0
Embed Size (px)
Citation preview
NADRA REGISTRATION SYSTEM
MCS 1
Group Members:
A-Qudoos Butt
Ch. Usman Asif
Abeer Iqbal
Imran Ali
Submitted to
Mam Zar Afshan
Revision History
Date Description Author Comments
<date> <Version 1> <Your Name> <First Revision>
Document Approval
The following Software Requirements Specification has been accepted and approved by the
following:
Signature Printed Name Title Date
Table of Contents
1. INTRODUCTION.......................................................................................................................
1.1 PURPOSE...................................................................................................................................
1.2 SCOPE.......................................................................................................................................
1.3 DEFINITIONS, ACRONYMS, AND ABBREVIATIONS....................................................................
1.4 REFERENCES.............................................................................................................................
1.5 OVERVIEW................................................................................................................................
2. GENERAL DESCRIPTION.......................................................................................................
2.1 PRODUCT PERSPECTIVE............................................................................................................
2.2 PRODUCT FUNCTIONS...............................................................................................................
2.3 USER CHARACTERISTICS..........................................................................................................
2.4 GENERAL CONSTRAINTS..........................................................................................................
2.5 ASSUMPTIONS AND DEPENDENCIES.........................................................................................
3. SPECIFIC REQUIREMENTS...................................................................................................
3.1 EXTERNAL INTERFACE REQUIREMENTS...................................................................................
3.1.1 User Interfaces..................................................................................................................
3.1.2 Hardware Interfaces.........................................................................................................
3.1.3 Software Interfaces...........................................................................................................
3.1.4 Communications Interfaces..............................................................................................
3.2 FUNCTIONAL REQUIREMENTS..................................................................................................
3.2.1 <Functional Requirement or Feature #1>.......................................................................
3.2.2 <Functional Requirement or Feature #2>.......................................................................
3.3 USE CASES...............................................................................................................................
3.3.1 Use Case #1......................................................................................................................
3.3.2 Use Case #2......................................................................................................................
3.5 NON-FUNCTIONAL REQUIREMENTS.........................................................................................
3.5.1 Performance.....................................................................................................................
3.5.2 Reliability..........................................................................................................................
3.5.3 Availability........................................................................................................................
3.5.4 Security.............................................................................................................................
3.5.5 Maintainability.................................................................................................................
3.5.6 Portability.........................................................................................................................
3.7 DESIGN CONSTRAINTS.................................................................................................................
3.8 LOGICAL DATABASE REQUIREMENTS..........................................................................................
3.9 OTHER REQUIREMENTS................................................................................................................
4. ANALYSIS MODELS.................................................................................................................
4.1 SEQUENCE DIAGRAMS..............................................................................................................
4.3 DATA FLOW DIAGRAMS (DFD)...............................................................................................
4.2 STATE-TRANSITION DIAGRAMS (STD)....................................................................................
5. CHANGE MANAGEMENT PROCESS.............................................................................................
NADRA REGISTRATION SYSTEM
1. INTRODUCTION
NADRA stands for National Database & Registration Authority. NADRA has gained
international recognition for its success in providing solutions for identification, e-
governance and secure documents that deliver multi-pronged goals of mitigating identity
theft, safe-guarding the interests of our clients, and facilitating the public. Employing more
than 17,000 people in more than 800 domestic offices and five international offices,
NADRA is one of the largest organizations in Pakistan. In Pakistan, all adult citizens must
register for the Computerized National Identity Card (CNIC) with a unique number upon
reaching the age of 18. It serves as an identification document to authenticate an individual's
identity as the citizen of Pakistan. Before introduction of the CNIC, manual National Identity
Cards (NICs) were issued to citizens of Pakistan. Today, the Government has shifted all its
existing records of National Identity Cards (NIC) to the central computerized database
managed by NADRA. New CNIC's are machine-readable and carry facial and fingerprint
information. Every citizen is required to have a NIC number, and the number is required for
many activities such as getting a driver license or passport, registering a vehicle, receiving
social insurance/Zakat funding, enrolling in school, college or technical institute, filing a
legal affidavit, wiring funds, paying taxes, opening a bank account, getting a utility
connection (electricity, phone, mobile phone, water and sewer, natural gas), etc.
1.1 PURPOSE:
This Software Requirement Specification (SRS) document states in precise and explicit language
those functions and capabilities that the software system must provide, as well as states any
required constraints by which the system must abide. The application whose requirements have
been specified in this document is: CNIC Data Mart and Fraud Detection by the use of Business
Intelligence. The document decomposes the problem into component parts. This act of writing
down software requirements in a well-designed format organizes information, places borders
around the problem, solidifies ideas, and helps break down the problem into its component parts
in an orderly fashion.
This automated system is developing because we face many problems in manual database
system. We face inconsistency and redundancy so we decide to build a new system. In Manual
system it is difficult to keep data secure. Sometimes users lost the people data and users face
problem in performing action with data.
1.2 INTENDED AUDIENCE AND READING SUGGESTIONS:
This document is basically the understanding of the development team about NADRA’s
requirements for the project prior to any actual design or development work. It’s a two-way
insurance policy that assures that both the organization and the development team understand the
other’s requirements at this given point in time.
Thus, the intended audiences of this document are both the developers of the project and the
users.
1.3 REFERENCES FOR THIS PROJECT:
1. www.nadra.gov.pk
2. www.techwr-l.com
3. www.kimballgroup.com/html/articles
2 SCOPE OF FINAL YEAR PROJECT
2.1 SCOPE OF WORK for this Project.
The project encompasses the development of data mart for the Computerized National Identity
Cards (CNIC) issued to the citizens of Pakistan. The project involves gathering of data from
multiple sources and bringing them to one interface for extraction, transformation and loading
into the CNIC Data Mart. The data mart will be created on functional lines and will be based on a
natural break of the data in the National Data Warehouse. The data mart will be designed with
bilingual support (Urdu and English) therefore Unicode handling will be involved throughout the
project life cycle.
Mentioned below are the tasks and components included in the scope of the project:
Data from multiple Online Transaction Processing Systems (OLTPs) will be extracted. In
this process a set of data will be identified and retrieved from the operational systems to
be brought to one interface. Only the data relevant to analysis will be extracted to the
staging area.
Transformation will be performed on the data extracted in to the staging area. In this
process data will be converted from application-specific data into enterprise data and new
data will be created using formulas and calculations. The data will be validated, cleansed
and integrated from multiple tables and source systems by applying business rules to
make it conformable to the logical and subject-oriented structure of the data mart.
The formatted (transformed) data will then be loaded into the dimensions and cubes of the
target data mart schema, in the form of batches. The loading of data in the form of batches
will ensure efficient loading of the data in the data mart.
2.2 WORK OUT OF SCOPE PROJECT.
Mentioned below are the tasks and components not included in the scope of the project:
Data from existing databases of citizens will be used for populating the CNIC data mart
hence the project does not include the process of getting hold of of data from NIC
applicants.
The project deals with business rules only relevant to the process of duplicate detection.
Other business rules identified and implemented on the National Data Warehouse fall
outside the scope of this project.
The project involves the development of a data mart for the Computerized National
Identity Cards (CNIC) only. Hence, any other national data kept at NADRA will not be
included in the scope of the project.
The data that will be separated out from the data mart as prospective duplicate can be
dealt with in various ways. One of the ways can make use of Image Processing tools for
matching finger prints and photos of the records singled out as probable duplicates. Such
activities will not be a part of the project.
Data mining algorithm will be employed for the purpose of finding out fraud specific
trends and patterns only. Thus the data mining algorithm which will eventually be
employed will not cater to any other aspects of data mining.
2.3 GOALS AND OBJECTIVES FOR PROJECT
The proposed system is required to meet the following objectives:
To improve the existing system design by introducing the ‘Dimensional’ data modeling
technique, for designing the National Data Warehouse, as an alternative to the ‘ER’ data
modeling technique; thus, aiming to ensure efficient information storage, processing and
retrieval and better user understandability., “The dimensional model is the only viable
technique for achieving both user understandability and high query performance”.
To provide an efficient utility for data extraction, transformation and loading for the
National Data Warehouse.
3 GENERAL DESCRIPTION:
This section of the Registration system should describe the general factors that affect 'the
product and its requirements. It should be made clear that this section does not state specific
requirements; it only makes those requirements easier to understand.
Every citizen is required to have a NIC number, and the number is required for many activities
such as getting a driver license or passport, registering a vehicle, receiving social insurance/Zakat
funding, enrolling in school, college or technical institute, filing a legal affidavit, wiring funds,
paying taxes, opening a bank account, getting a utility connection (electricity, phone, mobile
phone, water and sewer, natural gas), etc.
3.1 PRODUCT PERSPECTIVE
Among the various projects of NADRA, one is the issuance of state-of-the-art National
Identity Cards (NICs) to all adult citizens of Pakistan. These NICs are duly backed by the
computerized database and data warehouse respectively called the Citizens’ Database and
National Data Warehouse (NDWH).
NADRA has created National Data Warehouse, which is integrated and interfaced with the
citizen databases for optimum utilization by all users ensuring economy of effort and resources.
The project that is going to be developed is a component of a larger system i.e. the functional
National Data Warehouse. The functionality of the project will complement the existing
functional data warehouse.
3.2 PRODUCT FUNCTIONS FOR THIS PROJECT.
The list of the major functions which will be performed in the data warehouse application
is given below:
The system will transform the extracted data in to strategic information before loading it
in to the data warehouse. Data will be cleansed and validated for accuracy and ensuring
that all values conform to a standard definition.
The system will load the data extracted and cleaned to the data mart. The major set of
functions consists of taking the prepared data, applying it to the data warehouse, and
storing it in the database there. The tool built for loading data will load the data in to the
data mart in the form of batches.
3.3 USER CHARACTERISTICS
The intended users of this software are the Database Administrators or DBAs. Thus, the
Database Administrator class or the DBA class will emerge as the only user class that will
interact with the end product.
The DBA class will have the privileges of initiating and stopping the ETL functions (that
include Extraction, Transformation and Loading of data).
After deployment, the extent of user interaction with the end product will usually be
confined only to report viewing.
3.4 GENERAL CONSTRAINTS
This section depicts the issues that might limit the options available to the development
team. Following are some of the general constraints:
* Hardware Limitations
Massive quantity of CNIC data will be required to carry forward the operations of the
software. Because of the large bulk of data, machines with large storage capacity and high
processing power will be required.
* Security Constraints
The CNIC data at NADRA is highly private and will require a considerable amount of security
measures to be applied during the design, development and the usage of the software.
* Maintenance Issues
Once handed over to the organization, the software, its documentation, its handling and its
maintenance will completely be the responsibility of the organization.
4 SPECIFIC REQUIREMENTS:
4.1 USER INTERFACES
Due to the limited user interaction, the number of user interfaces will be limited. The subsequent
paragraphs will provide a brief description of each of the user interfaces.
The first interface will be the “User Login” interface. A number of DBAs will use the system
and each of them will have their own logins and access rights for using the data and the software.
The users will log on to the system, by providing their login names and passwords, to initiate any
operation.
The next interface, after logging in to the system, will provide the users with access to the
system functions (according to their access rights). A user having complete rights will be able to
manipulate the Data Warehousing functions. The Data Warehousing functions include the
sequential tasks of data extraction, data transformation (according to the required format) and
data loading (in to the data mart). The user will be allowed to initiate or to stop the sequential
execution of the ETL functions. The user will be able to view the progress of these operations.
4.2 SOFTWARE INTERFACES
The data mart developed as a part of the project, will have an interface with multiple various
OLTP systems at NADRA, for data collection. After data collection ETL functions will be
performed to load this data in to the data mart.
For an efficient implementation it is important to employ those tools and the technologies that
best support the features of the system under consideration. For this system the tools and
technologies adopted are described below:
4.3 HARDWARE INTERFACES:
The basic Hardware’s components that are used in registration system which affect the overall
performance of the system.
ORACLE DATABASE SERVER
The database management system employed for the creation of the data mart will be Oracle
Database Server. The Oracle Enterprise Edition version of Oracle supports a large number of
users or large databases with advanced features for extensibility, performance, and management.
The Oracle database is a broad and powerful product. It has also added some performance
enhancements that specifically apply to data warehousing applications.
Oracle features full Unicode 3.0 support. National Language Support (NLS) provides character
sets and associated functionality, such as date and numeric formats, for a variety of languages.
All data may be stored as Unicode, or select columns may be incrementally stored as Unicode.
MICROSOFT VISUAL STUDIO
The data mart front end application will be developed using Visual C# in Microsoft Visual
Studio . environment. With an extensive set of visual designers, a range of programming
languages, and integrated Visual Database Tools, Visual Studio enables user to build powerful
software quickly.
SQL SERVER
The data homogenization area for the data mart application will be developed in the native
DBMS of the source systems i.e. Microsoft SQL Server 2000. Microsoft SQL Server 2000 also
includes powerful features to support multilingual operations and environments. Extensive
multilingual features make SQL Server 2000 a compelling database product and applications
platform.
MICROSOFT ACCESS 2000
Microsoft Access is a part of MS-Office that provides database solutions for this Final year
project. It works very well in windows and is used in many applications as database. It is a
powerful and easy to use database. In this system Microsoft Access 2000 will be employed for
developing the database for application user accounts.
MICROSOFT WINDOWS PROFESSIONAL
The citizen data mart application will be developed to run on Microsoft Windows Professional
2000 platform.
4.4 USER DOCUMENTATION
User manuals will be delivered along with the end product and the other deliverables. The user
manuals will familiarize the users with the working of the end product.
5 EXTERNAL INTERFACE REQUIREMENTS PROJECT:
5.1 USER INTERFACES
Due to the limited user interaction, the number of user interfaces will be limited. The subsequent
paragraphs will provide a brief description of each of the user interfaces.
The first interface will be the “User Login” interface. A number of DBA’s will use the system
and each of them will have their own logins and access rights for using the data and the software.
The users will log on to the system, by providing their login names and passwords, to commence
any operation.
The next interface, after logging in to the system, will provide the users with access to the system
functions (according to their access rights). A user having complete rights will be able to
manipulate the Data Warehousing functions. The Data Warehousing functions include the
sequential tasks of data extraction, data transformation (according to the required format) and
data loading (in to the data mart). The user will be allowed to initiate or to stop the sequential
execution of the ETL functions. The user will be able to view the progress of these operations.
5.2 SOFTWARE INTERFACES
The data mart developed as a part of the project, will have an interface with multiple
heterogeneous OLTP systems at NADRA, for data collection. After data collection ETL
functions will be performed to load this data in to the data mart.
5.3 SYSTEM FEATURES
The following section provides a list of the features present in the system that develop for this
project. These features have been elaborate by describing distinct requirements associated with
each feature so as to provide a better understanding of what each feature constitutes.
6 DATA EXTRACTION
Data that will be used in a data warehouse must be extracted from the operational systems that
contain the source data. Data is initially extracted during the data warehouse creation process,
and on-going periodic extractions occur during updates of the data warehouse. Data extraction
can be a simple operation, if the source data resides in a single relational database, or a very
complex operation, if the source data resides in multiple heterogeneous operational systems.
Given below are the functional requirements of the data extraction module:
Functional Requirements:
1. The extraction module should enable the user to extract from the multiple
operational systems currently in use.
2. The data extracted from the diverse data sources e.g. ‘Online Data Acquisition
Database’ and ‘Form Based Registration Database’ must be homogenized. The
data must be consolidated from these different operational systems on a common
platform
3. The data extraction process should bring all the source data into a common,
consistent format.
4. All irregularities or inconsistencies that might exist between the disparate sources
should be removed during the homogenization process. All inconsistencies should
be resolved for common data elements coming from multiple sources.
5. Preliminary data cleansing should be performed on the data extracted from the
multiple data sources.
6. A log of all tasks performed should be written in a log file so as to maintain an
audit trail of all the activities carried out.
It should be ensured that there are no validation errors in the data extracted from the source
systems.
o The data with missing mandatory fields or insufficient information must be
rejected.
o Default values should be provided for missing fields if applicable.
o The data should be checked for correctness and accuracy as well.
o Information about all the records rejected must be recorded for future reference.
The reasons of rejection should also be maintained so that it can be communicated
to the required person.
6.1 DATA TRANSFORMATION
After extracting the data, it needs to be transformed in to strategic information before loading it in
to the data warehouse. Since this feature of the system takes the data that has been extracted from
multiple data sources as an input, before moving the extracted data from the source systems into
the data warehouse, inevitably various kinds of data transformations have to be performed. Data
transformation is the cleansing and validation of data for accuracy and ensuring that all values
conform to a standard definition.
The data has to be transformed according to standards because it comes from many dissimilar
source systems. It has to be ensured that after all the data is put together, the combined data does
not violate any business rules.
One major effort in data transformation is the improvement of data quality. This includes filling
the missing values for attributes in the extracted data. Data quality is of paramount importance in
a data warehouse because the effect of strategic decisions based on incorrect information can be
devastating.
Described below are the functional requirements for the data transformation module:
6.2 Functional requirements:
1. The data extraction process should bring all the source data into a common, consistent
format.
2. All irregularities or inconsistencies that might exist between the disparate sources should
be removed during the homogenization process. All inconsistencies should be resolved
for common data elements coming from multiple sources.
3. Preliminary data cleansing should be performed on the data extracted from the multiple
data sources.
o It should be ensured that there are no validation errors in the data extracted from
the source systems.
o The data with missing mandatory fields or insufficient information must be
rejected.
o Default values should be provided for missing fields if applicable.
o The data should be checked for correctness and accuracy as well.
o Information about all the records rejected must be recorded for future reference.
The reasons of rejection should also be maintained so that it can be communicated
to the required person.
4. After the data has been extracted from the data sources, the basic validation checks have
been performed and incorrect or incomplete records have been rejected, the extraction
module should export the data to the staging area so that a sequence of transformations
can be applied on the data, to make it ready to be loaded into the data mart.
5. In case the extraction process is cancelled by the user or in case of an error, the module
should roll back the performed activities so that the homogenization and staging areas are
ready for the ensuing extraction.
6. A log of all tasks performed should be written in a log file so as to maintain an audit trail
of all the activities carried out.
6.4 DATA LOADING
This feature incorporate the tasks that have to be performed to load the data that has been
extracted and cleansed into the data warehouse. The major set of functions consists of taking the
prepared data, applying it to the data warehouse, and storing it in the database there. Load images
are created to correspond to the target files to be loaded in the data warehouse database.
Described below are the functional requirements for the data loading module:
6.5 Functional requirements:
1. The data loading module should be initiated only when the data has been completely
cleansed and transformed.
2. The data should be loaded sequentially in the form of batches for reducing the loading
time; for loading the data warehouse may take an inordinate amount of time.
3. ‘Initial Load’ should be used to load the data into the data mart for the very first time.
‘Load’ mode should be used for initial loading. All the further runs should be applied
using ‘Append’ mode or ‘constructive merge’.
4. ‘Incremental load’ should be used for applying ongoing changes as necessary in a
periodic manner. The ‘constructive merge’ mode should be used for incremental loading,
as the historical perspective of data is important.
5. A record of all information about the data load should be written in a log file so as to
maintain an audit trail of all the activities carried out.
6. In case the loading process is cancelled by the user or in case of an error, the module
should roll back the performed activities.
7 NON FUNCTIONAL REQUIREMENTS
7.1 EASY TO USE Graphical User Interface (GUI)
GUI is most important part for every project.
An effective and friendly Graphical User Interface is critical for effective system performance.
The users’ view of a system is conditioned chiefly by experience with its interface. If the user
interface is unsatisfactory, the users’ view of the system will be negative regardless of any
niceties of internal computer processing. The system may be described as hard to learn, or
clumsy, tiring and slow to use.
Keeping in mind the significance of a good GUI, all the interfaces for different workflows of the
system processes must be in accordance with a good standard format and consistency must be
followed throughout the course. Every minute GUI attribute must be given chief significance and
end-user satisfaction must be born in mind while placing, arranging, assigning and relating icons,
buttons and menus.
7.2 EFFICIENCY
Efficiency of a data warehouse system is concerned with the minimum query processing time as
well as optimal use of the system resources. In designing the proposed system, the efficiency
factor must be taken well into consideration and various mechanisms such as indexing should be
used.
7.3 SECURITY REQUIREMENTS
The data that is eventually to be loaded into the data mart is confidential and its security is of
paramount importance. To assure the confidentiality, integrity and availability of data, security
measures which ensure that different categories of corporate data are protected to the degree
necessary must be employed. Effective and efficient access control restrictions will have to be
enforced so that the end-users can access only the data or programs for which they have
legitimate privileges.
7.4 DATA INTEGRITY REQUIREMENTS
To guarantee data integrity, i.e. assurance that data or information has not been altered or
destroyed in an unauthorized manner, a control mechanism will have to be used to prevent all
users from updating and deleting the data in the data mart. It should also be ensured that the
various components of the system are accessible only through grant of rights by the
administrator.
7.5 FLEXIBLE ARCHTECTURE
Flexibility is the effort needed to modify operational program. In case of design and
development of a data warehouse/data mart, not all of the requirements are known up front.
Missing parts of the requirements usually show up after users begin to use the data warehouse.
Thus, one of the requirements of the data mart architecture is that it should be flexible so that it
can accommodate additional user needs as and when they surface.
7.6 PERFORMANCE REQUIREMENTS
The performance of a Data Warehouse is largely a function of the quantity and type of data
stored within a database and the query/data loading workload placed upon the system.
When designing and managing the data warehouse there are numerous decisions that need to be
made that can have a huge impact on the final performance of the system. Following are some of
the requirements that will have to be fulfilled by proper design of the data mart to boost
performance of the system:
Ensuring the consistency of data from disparate data sources.
Selecting a proper data modeling technique for the data warehouse design.
Ensuring the proper amount of data partitioning, indexing, aggregation and
summarization.
Ensuring proper management of data storage.
Periodic updates and purging of data warehouse data.
Besides there software performance requirements, the hardware will also have an impact on
performance as described below:
A Data Warehouse is required to run queries on a large table that involves full table scans.
The response times for these queries are very critical. Therefore the performance will be
affected by the choice of machines employed to run the various data matching algorithms.
A powerful machine with a good processing speed will influence the time required to
perform functions on massive amounts of data.
7.7 SOFTWARE QUALITY ATTRIBUTES
The following table depicts the software quality attributes for the end product on a scale of one
to ten:
S.No Software Quality Attributes
1 Correctness
2 Efficiency
3 Flexibility
4 Maintainability
5 Interoperability
6 Security/Integrity
7 Usability
8 Testability
9 Reliability
10 Reusability
11 Robustness
Following is a brief description of each quality attribute:
Correctness is the extent to which a program/software satisfies specifications, fulfills
user’s mission objectives.
Efficiency is the amount of computing resources and code required to perform function.
Flexibility is the effort needed to modify operational program.
Interoperability is the effort needed to couple one system with another.
Reliability is the extent to which program performs with required precision.
Integrity means the property that data or information have not been altered or destroyed
in an unauthorized manner.
Reusability is the extent to which it can be reused in another application.
Testability is the effort needed to test to ensure performs as intended
Usability is the effort required to learn, operate, prepare input, and interpret output.
Robustness is the resilience of the system, especially when under stress or when
confronted with invalid input.
8 Acronyms and Abbreviations
Term Description
SRS Software Requirements Specification
NADRA National Database and Registration Authority
CNIC Computerized National Identity Card
BI Business Intelligence
REQ Requirement