Upload
ajit-singh-kushwah
View
253
Download
6
Embed Size (px)
Citation preview
8/3/2019 Srs Submit
1/36
8/3/2019 Srs Submit
2/36
Market Basket
Analysis
Software requirement specification
MADHAV INSTITUTE OF
TECHNOLOGY AND SCIENCE
Gwalior-474005
M.P.
PRO
JEC
TGUIDE:
Prof.Ak
hilesh
8/3/2019 Srs Submit
3/36
Tiwari
Department of CSE & IT
MITS, Gwalior
I B M A
S
S
I G
D R
u O
o U
u P
s
8/3/2019 Srs Submit
4/36
TEAM MEMBERS:
Anushri Jain
Shruti Goyal
Ashish Bandil
Ajit Singh Kushwah
8/3/2019 Srs Submit
5/36
Market Basket Analysis Version I
Software Requirement Specification 30-01-2012
Assiduous Group
Table of Contents
Description Page no.
1. Introduction
1.1 Purpose 4
1.2 Scope 4
1.3 Definition, Acronyms, and Abbreviations 5
1.4 References 6
1.5 Technologies to be used 7
1.6 Overview 8
2. Overall Description
2.1 Product Perspective 8
8/3/2019 Srs Submit
6/36
2.2 Software Interface 10
Assiduous Group/MITS Page 2
8/3/2019 Srs Submit
7/36
Market Basket Analysis Version I
Software Requirement Specification 30-01-2012
Assiduous Group
2.
3 Hardware Interface 10
2.
4 Product Function 11
2.5 User Characteristics 11
2.
6 Constraints 11
2.7Architecture Design 12
2.
8 Assumptions and Dependencies 15
8/3/2019 Srs Submit
8/36
8/3/2019 Srs Submit
9/36
Assiduous Group/MITS
Market Basket Analysis
Software Requirement SpecificationAssiduous Group
1. Introduction:
1.1 Purpose:
Version 1
30-01-2012
The amount of data being collected in databases today far exceeds our ability to reduce
and analyze data without the use of automated analysis techniques. Many scientific and
transactional business databases grow at a phenomenal rate. Knowledge discovery in
databases (KDD) is the field that is evolving to provide automated analysis solutions.
In view of above, purpose is to analyze market basket data for t he extraction of hidden
trends and buying behavior of customers.
1.2 Scope:
Suppose as a manager of an All Electronics branch, you would like to learn the buying
habits of your customers. Specifically, you wonder, Which groups or sets of items are
customers likely to purchase on a given trip to the store? To answer your question
market basket analysis may be performed on the retail data of customer transactions at
your store. The results may be used to plan marketing or advertising strategies, as well as
catalog design. For instance market basket analysis may help managers design different
store layouts. In one strategy, items that are frequently purchased together can be placed
in close proximity in order to further encourage the sale of such items together.
Although Market Basket Analysis conjures up pictures of shopping carts
and supermarket shoppers, it is important to realize that there are many other areas in
which it can be applied. These include:
Analysis of credit card purchases.
Analysis of telephone calling patterns.
Identification of fraudulent medical insurance claims.
(Consider cases where common rules are broken).
Analysis of telecom service purchases.
Assiduou
8/3/2019 Srs Submit
10/36
s Group/MITS
Page 4
8/3/2019 Srs Submit
11/36
Market Basket Analysis
Software Requirement SpecificationAssiduous Group
Version I
30-01-2012
1.3 Definitions, Acronyms, and Abbreviations:
1.3.1Data Mining
Data Mining refers to extracting or "mining" knowledge from huge amount of data. Many
people treat data mining as a synonym for another popularly used term, knowledge
discovery in databases, or KDD. Data Mining is simply an essential step in the process of
knowledge discovery in databases.
1.3.2Data Mining Techniques
Fast technological changes and related research has led to the development of many
data mining techniques and systems. Because of the inherent differences in the data
model, specific techniques are developed to mine different types of databases. Different
classification schemes have been used in the literature to categorize data mining
methods based on the kind of databases to be studied (Such as transactional databases,
relational databases, spatial databases, temporal databases, multimedia databases, and
Internet information databases etc.), the kind of technique to be utilized (such as
autonomous knowledge miner, data driven miner, and query driven miner etc.) and the
kind of knowledge to be discovered. According to the last classification scheme the
following are the common data mining techniques:
Association Rule Classification
Clustering
Sequence Rule
Generalization and Summarization etc.
Since the proposed project is related to Association Rule Mining, a brief description of
Association Rule Mining is given below.
Assiduou
s
8/3/2019 Srs Submit
12/36
Group/MITS
Page 5
8/3/2019 Srs Submit
13/36
Market Basket Analysis
Software Requirement SpecificationAssiduous Group
1.3.3 Association Rule
Version I
30-01-2012
Mining association rules in transactional or relational databases has recently attracted a
lot of attention in databases communities. The task is to find interesting associations or
correlations among a large set of data i.e. to identify sets of items or predicates that
frequently occurs together and then formulate rules that characterize their relationship.
For example one may find, from a large set of transaction data, such an association rule
as if a customer buys "X", he/she usually buys "Y", in the same transaction. Here "X" and
"Y" are individual items or set of items. Retail stores frequently use association rules in
order to assist marketing, advertising, floor-management and inventory control etc.
Although they have a direct applicability to retail business, they can also be used for
other purposes.
A formal statement of the association rule problem is given in [1].
Let I = { i1, i2, i3, i4,.......,im}, I , be a set of m distinct literals called items.Let D be a set of transaction (variable length) over I. Each transaction contains a
set of items i1, i
2, i
3, i
4,............,i
kI.
An association rule is an implication of the form XY, where X, Y I and
XY = . Here 'X' is called the antecedent or body and 'Y' is called consequent
or head of the rule.
1.4 References:[1] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of
items in Large Databases. In proceedings of ACM-SIGMOD International conference on
management of Data, Washington D.C., May 1993, pp 207-216.
[2] Abraham Silberschatz, Henry F. Korth and S. Sudarshan. Database System
Concept. The McGRAW HILL Companies Fifth Edition 2006.
[3]
Margaret H. Dunham. Data Mining, PEARSON Education Sevent Edition 2005.
8/3/2019 Srs Submit
14/36
Assiduous Group/MITS
Page 6
8/3/2019 Srs Submit
15/36
Market Basket Analysis
Software Requirement Specification
Assiduous Group
1.5 Technologies to be used:
Version I
30-01-2012
J2EE: (Servlet, JSP, JAX Java Platform, Enterprise Edition or Java EE is a widely
used platform for server programming in the Java programming language. The Java
Platform (Enterprise Edition) differs from the Java Standard Edition Platform (Java SE)
in that it adds libraries which provide functionality to deploy fault-tolerant, distributed,
Multi-tier Java software, based largely on modular components running on an application
8/3/2019 Srs Submit
16/36
serverP, Java Beans)
JAVA: Application architecture. Java is an object-oriented programming language
developed by Sun Microsystems a company best known for its high end UNIX
workstations. Java language was designed to be small, simple, and portable across
platforms, operating systems, both at the source and at the binary level, which means that
Java programs (applet and application) can run on any machine that has the Java virtual
machine (JVM) installed.
DB2 9.7: IBM Database. DB2 Database is the database management system that delivers a
flexible and cost effective database platform to build robust on demand business
applications and supports the J2EE and web services standards.
RAD 7.0: Development tool. IBM Rational Application Developer for WebSphere
Software is an integrated development environment (IDE), made
by IBM's Rational Software division, for visually designing, constructing, testing, and
deploying Web services, portals, and Java (J2EE) applications.
Assiduous Group/MITS
Page 7
8/3/2019 Srs Submit
17/36
Market Basket Analysis
Software Requirement SpecificationAssiduous Group
1.6 Overview:
Version I
30-01-2012
-I- Overall Description: Processes during the tenure of project -
(i) Study of Apriori Algorithm
(ii) Data Collection
(iii) Implementation of Apriori Algorithm
(iv) Development of user interface
(v) Application of Apriori on collected market basket data
(vi) Analysis of results
-I- Specific Requirements:
Real-life dataset (Market Basket Data)
2. Overall Description:
2.1 Product Perspective:
Client Tier
It implements the "look and feel" of an application. It is responsible for the presentation
of data, receiving user events and controlling the user interface. Most ecommerce
applications are web-based. The programming languages used are the combination of
HTML, CSS and Javascript.
Application Tier
This layer implements the business logic of the applications. It is usually powered by a
Java Application Server (WebSphere). There're several sub-layers within the application
layer.
8/3/2019 Srs Submit
18/36
Assiduous Group/MITS
Page 8
8/3/2019 Srs Submit
19/36
Market Basket Analysis
Software Requirement Specification
Assiduous Group
Data Tier
Version I
30-01-2012
This is the layer that manages the persistence of application information. It is
usually powered by a relational database server ( MS SQL Server).
Stored Procedures and Functions are used to execute database server-side
processes pertinent to data integrity. Business logic processes should be part of
application layer in general, not part of data layer.
8/3/2019 Srs Submit
20/36
Fig I : Object Oriented Scenario(of three tier architecture)
Assiduous Group/MITS
Page 9
8/3/2019 Srs Submit
21/36
Market Basket Analysis
Software Requirement SpecificationAssiduous Group
2.2 Software Interface:
Version I
30-01-2012
Front End Client: HTML , Dream Weaver
Web Server: Apache, Tomcat, Web Sphere.
Back End: DB2 9.7
2.3 HardwareInterface:
Minimum
Requirements:
Client Side
Processor
Intel Pentium III or
RAM
Disk Space
Internet Explorer 6 AMD 800 MHz 128 MB 100 MB
S
erver Side
P
8/3/2019 Srs Submit
22/36
Intel Pentium III or
RAM Disk SpaceWeb Sphere AMD 800 MHz 1 GB 3.5 GB
Data Tier
Processor
Intel Pentium III orDB 2
RAM
256 MB
Disk Space
500 MB
Assiduous Group/MITS
AMD 800 MHz
Page 10
8/3/2019 Srs Submit
23/36
Market Basket Analysis
Software Requirement SpecificationAssiduous Group
2.4 Product Functions:
Version I
30-01-2012
Developed /proposed product will include the following functions-
1.Specify input data: Define the data to be mined, data may be in the form of
dataset file or any other file etc.
2.Process data/ preprocess the input data:3.Select technique/algorithm: Select the appropriate data mining algorithm.
4.Work on results: Select visualization tools to analyze the result.
2.5 User Characteristics:
Users can be characterized as:
1. General (Non Technical User): This category includes general users having no
technical information.
2. Technical User: This category includes users having technical information.
3. Analyst: This category includes users having the ability to analyze the data as well
as result.
2.6 Constraints:
Proposed application requires user-specified Support and Confidence framework as
constraints, description of which is as follows-
Support and Confidence Framework:
Support: The first number is called the support for the rule. The support is simply the
number of transactions that include all items in the antecedent and consequent parts of the
rule. (The support is sometimes expressed as a percentage of the total number of records
in the database.)
measure of how often the collection of items in an association occur together as a
percentage of all the transactions
In 2% of the purchases at hardware store, both pick and shovel were
8/3/2019 Srs Submit
24/36
bought
Assiduous Group/MITSPage 11
8/3/2019 Srs Submit
25/36
Market Basket Analysis
Software Requirement SpecificationAssiduous Group
Version I
30-01-2012
Rules originating from the same itemset have identical support but can have
different confidence
support = #tuples(LHS, RHS)/N
Confidence: The other number is known as the confidence of the rule. Confidence is the
ratio of the number of transactions that include all items in the consequent as well as the
antecedent (namely, the support) to the number of transactions that include all items in
the antecedent.
confidence of rule B given A is a measure of how much more likely it is that B
occurs when A has occurred
100% meaning that B always occurs if A has occurred
confidence = #tuples(LHS, RHS) / #tuples(LHS)
Example: bread and buttermilk [90%, 1%]
For example, if a supermarket database has 100,000 point-of-sale transactions, out of
which 2,000 include both items A and B and 800 of these include item C, the association
rule "If A and B are purchased then C is purchased on the same trip" has a support of 800
transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000).
One way to think of support is that it is the probability that a randomly selected
transaction from the database will contain all items in the antecedent and the consequent,
whereas the confidence is the conditional probability that a randomly selected transaction
will include all the items in the consequent given that the transaction includes all the
items in the antecedent.
2.7 Architecture Design:
Architecture of our developed product is inspired with the 3-tier architecture.
The architecture of a database system is greatly influenced by the underlying computer
system on which the database system runs.Database systems can be centralized, or client
server, where one server machine executes work on behalf of multiple client machines.
8/3/2019 Srs Submit
26/36
Assiduous Group/MITS Page 12
8/3/2019 Srs Submit
27/36
Market Basket Analysis
Software Requirement Specification
Assiduous Group
Version I
30-01-2012
In case of three tier architecture, the client machine acts as merely a front end and does
not contain any direct database calls. Instead, the client ends the communication with an
application server, usually through a forms interface. The application server in turn
communicates with a database system to access data. The business logic of the
application, which says what actions to carry out under what conditions, is embedded in
the application server, instead of being distributed across multiple clients. Three tier
applications are more appropriate for large applications, and for applications that run on
the
World Wide Web.
The architecture is given in [2].
8/3/2019 Srs Submit
28/36
Fig 2: Three tier architecture
Assiduous Group/MITS
Page 13
8/3/2019 Srs Submit
29/36
Market Basket Analysis
Software Requirement SpecificationAssiduous Group
Client Tier
Version I
30-01-2012
It implements the "look and feel" of an application. It is responsible for the presentation
of data, receiving user events and controlling the user interface. Most ecommerce
applications are web-based. The programming languages used are the combination of
HTML, CSS and Javascript. JSP or ASP are used for dynamic content.
HTML is a Web authoring markup language for defining content structures and
rendering a web page.
Javascript is commonly used for client-side validation. Javascript does have some
control over the look-and-feel of a page in dynamic HTML.
Application Tier
This layer implements the business logic of the applications. It is usually powered by a
Java Application Server (WebLogic or WebSphere). There're several sub-layers within
the application layer.
Control Layer is the interface layer between presentation tier and application tier.
The implementation of this layer is dependent on the languages used for
implementing the presentation tier.
Transaction Layer usually implements business processes that may involve many
business objects. In J2EE architecture, session beans are commonly used for
implementing the transaction layer. Transaction Layer and Business Object Layer
are not constrained by the programming languages for the presentation and the
database used for persistence.
Business Object Layer consists of objects that represent business entities which
always should be 100% independent of database used for data persistence.
Data Access Object (DAO) Layer is the interface between the application tier and
persistence tier. Besides the methods for "creating", "retrieving", "updating" and
"removing" a business object from database, DAO objects implement other
8/3/2019 Srs Submit
30/36
Assiduous Group/MITSPage 14
8/3/2019 Srs Submit
31/36
Market Basket Analysis
Software Requirement SpecificationAssiduous Group
Version I
30-01-2012
business-specific methods as well. Even with JDBC, DAO objects may not be
100% database independent.
Data Tier
This is the layer that manages the persistence of application information. It is
usually powered by a relational database server (MS SQLServer
Stored Procedures and Functions are used to execute database server-side
processes pertinent to data integrity. Business logic processes should be part of
application layer in general, not part of data layer.
2.8 Assumptions and Dependencies:
Support : The first number is called the support for the rule. The support is simply the
number of transactions that include all items in the antecedent and consequent parts of the
rule. (The support is sometimes expressed as a percentage of the total number of records
in the database.)
measure of how often the collection of items in an association occur together as a
percentage of all the transactions
In 2% of the purchases at hardware store, both pick and shovel were
bought
Rules originating from the same itemset have identical support but can have
different confidence
support = #tuples(LHS, RHS)/N
Confidence : The other number is known as the confidence of the rule. Confidence is the
ratio of the number of transactions that include all items in the consequent as well as the
antecedent (namely, the support) to the number of transactions that include all items in
the antecedent.
8/3/2019 Srs Submit
32/36
Assiduous Group/MITSPage 15
8/3/2019 Srs Submit
33/36
Market Basket Analysis
Software Requirement SpecificationAssiduous Group
Version I
30-01-2012
confidence of rule B given A is a measure of how much more likely it is that B
occurs when A has occurred
100% meaning that B always occurs if A has occurred
confidence = #tuples(LHS, RHS) / #tuples(LHS)
Example: bread and buttermilk [90%, 1%]
For example, if a supermarket database has 100,000 point-of-sale transactions, out of
which 2,000 include both items A and B and 800 of these include item C, the association
rule "If A and B are purchased then C is purchased on the same trip" has a support of 800
transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000).
One way to think of support is that it is the probability that a randomly selected
transaction from the database will contain all items in the antecedent and the consequent,
whereas the confidence is the conditional probability that a randomly selected transaction
will include all the items in the consequent given that the transaction includes all the
items in the antecedent.
Assiduous
8/3/2019 Srs Submit
34/36
Group/MITS
Page 16
8/3/2019 Srs Submit
35/36
Market Basket Analysis
Software Requirement Specification
Assiduous Group
Version I
30-01-2012
Special Thanks
We convey a special thanks to our department and to our
college. We also convey a special thanks to all these
softwares and websites, they have been helping a lot in
doing the project.
Assiduous
8/3/2019 Srs Submit
36/36
Group/MITS
Page 17