Srs Submit

Embed Size (px)

Citation preview

  • 8/3/2019 Srs Submit

    1/36

  • 8/3/2019 Srs Submit

    2/36

    Market Basket

    Analysis

    Software requirement specification

    MADHAV INSTITUTE OF

    TECHNOLOGY AND SCIENCE

    Gwalior-474005

    M.P.

    PRO

    JEC

    TGUIDE:

    Prof.Ak

    hilesh

  • 8/3/2019 Srs Submit

    3/36

    Tiwari

    Department of CSE & IT

    MITS, Gwalior

    I B M A

    S

    S

    I G

    D R

    u O

    o U

    u P

    s

  • 8/3/2019 Srs Submit

    4/36

    TEAM MEMBERS:

    Anushri Jain

    Shruti Goyal

    Ashish Bandil

    Ajit Singh Kushwah

  • 8/3/2019 Srs Submit

    5/36

    Market Basket Analysis Version I

    Software Requirement Specification 30-01-2012

    Assiduous Group

    Table of Contents

    Description Page no.

    1. Introduction

    1.1 Purpose 4

    1.2 Scope 4

    1.3 Definition, Acronyms, and Abbreviations 5

    1.4 References 6

    1.5 Technologies to be used 7

    1.6 Overview 8

    2. Overall Description

    2.1 Product Perspective 8

  • 8/3/2019 Srs Submit

    6/36

    2.2 Software Interface 10

    Assiduous Group/MITS Page 2

  • 8/3/2019 Srs Submit

    7/36

    Market Basket Analysis Version I

    Software Requirement Specification 30-01-2012

    Assiduous Group

    2.

    3 Hardware Interface 10

    2.

    4 Product Function 11

    2.5 User Characteristics 11

    2.

    6 Constraints 11

    2.7Architecture Design 12

    2.

    8 Assumptions and Dependencies 15

  • 8/3/2019 Srs Submit

    8/36

  • 8/3/2019 Srs Submit

    9/36

    Assiduous Group/MITS

    Market Basket Analysis

    Software Requirement SpecificationAssiduous Group

    1. Introduction:

    1.1 Purpose:

    Version 1

    30-01-2012

    The amount of data being collected in databases today far exceeds our ability to reduce

    and analyze data without the use of automated analysis techniques. Many scientific and

    transactional business databases grow at a phenomenal rate. Knowledge discovery in

    databases (KDD) is the field that is evolving to provide automated analysis solutions.

    In view of above, purpose is to analyze market basket data for t he extraction of hidden

    trends and buying behavior of customers.

    1.2 Scope:

    Suppose as a manager of an All Electronics branch, you would like to learn the buying

    habits of your customers. Specifically, you wonder, Which groups or sets of items are

    customers likely to purchase on a given trip to the store? To answer your question

    market basket analysis may be performed on the retail data of customer transactions at

    your store. The results may be used to plan marketing or advertising strategies, as well as

    catalog design. For instance market basket analysis may help managers design different

    store layouts. In one strategy, items that are frequently purchased together can be placed

    in close proximity in order to further encourage the sale of such items together.

    Although Market Basket Analysis conjures up pictures of shopping carts

    and supermarket shoppers, it is important to realize that there are many other areas in

    which it can be applied. These include:

    Analysis of credit card purchases.

    Analysis of telephone calling patterns.

    Identification of fraudulent medical insurance claims.

    (Consider cases where common rules are broken).

    Analysis of telecom service purchases.

    Assiduou

  • 8/3/2019 Srs Submit

    10/36

    s Group/MITS

    Page 4

  • 8/3/2019 Srs Submit

    11/36

    Market Basket Analysis

    Software Requirement SpecificationAssiduous Group

    Version I

    30-01-2012

    1.3 Definitions, Acronyms, and Abbreviations:

    1.3.1Data Mining

    Data Mining refers to extracting or "mining" knowledge from huge amount of data. Many

    people treat data mining as a synonym for another popularly used term, knowledge

    discovery in databases, or KDD. Data Mining is simply an essential step in the process of

    knowledge discovery in databases.

    1.3.2Data Mining Techniques

    Fast technological changes and related research has led to the development of many

    data mining techniques and systems. Because of the inherent differences in the data

    model, specific techniques are developed to mine different types of databases. Different

    classification schemes have been used in the literature to categorize data mining

    methods based on the kind of databases to be studied (Such as transactional databases,

    relational databases, spatial databases, temporal databases, multimedia databases, and

    Internet information databases etc.), the kind of technique to be utilized (such as

    autonomous knowledge miner, data driven miner, and query driven miner etc.) and the

    kind of knowledge to be discovered. According to the last classification scheme the

    following are the common data mining techniques:

    Association Rule Classification

    Clustering

    Sequence Rule

    Generalization and Summarization etc.

    Since the proposed project is related to Association Rule Mining, a brief description of

    Association Rule Mining is given below.

    Assiduou

    s

  • 8/3/2019 Srs Submit

    12/36

    Group/MITS

    Page 5

  • 8/3/2019 Srs Submit

    13/36

    Market Basket Analysis

    Software Requirement SpecificationAssiduous Group

    1.3.3 Association Rule

    Version I

    30-01-2012

    Mining association rules in transactional or relational databases has recently attracted a

    lot of attention in databases communities. The task is to find interesting associations or

    correlations among a large set of data i.e. to identify sets of items or predicates that

    frequently occurs together and then formulate rules that characterize their relationship.

    For example one may find, from a large set of transaction data, such an association rule

    as if a customer buys "X", he/she usually buys "Y", in the same transaction. Here "X" and

    "Y" are individual items or set of items. Retail stores frequently use association rules in

    order to assist marketing, advertising, floor-management and inventory control etc.

    Although they have a direct applicability to retail business, they can also be used for

    other purposes.

    A formal statement of the association rule problem is given in [1].

    Let I = { i1, i2, i3, i4,.......,im}, I , be a set of m distinct literals called items.Let D be a set of transaction (variable length) over I. Each transaction contains a

    set of items i1, i

    2, i

    3, i

    4,............,i

    kI.

    An association rule is an implication of the form XY, where X, Y I and

    XY = . Here 'X' is called the antecedent or body and 'Y' is called consequent

    or head of the rule.

    1.4 References:[1] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of

    items in Large Databases. In proceedings of ACM-SIGMOD International conference on

    management of Data, Washington D.C., May 1993, pp 207-216.

    [2] Abraham Silberschatz, Henry F. Korth and S. Sudarshan. Database System

    Concept. The McGRAW HILL Companies Fifth Edition 2006.

    [3]

    Margaret H. Dunham. Data Mining, PEARSON Education Sevent Edition 2005.

  • 8/3/2019 Srs Submit

    14/36

    Assiduous Group/MITS

    Page 6

  • 8/3/2019 Srs Submit

    15/36

    Market Basket Analysis

    Software Requirement Specification

    Assiduous Group

    1.5 Technologies to be used:

    Version I

    30-01-2012

    J2EE: (Servlet, JSP, JAX Java Platform, Enterprise Edition or Java EE is a widely

    used platform for server programming in the Java programming language. The Java

    Platform (Enterprise Edition) differs from the Java Standard Edition Platform (Java SE)

    in that it adds libraries which provide functionality to deploy fault-tolerant, distributed,

    Multi-tier Java software, based largely on modular components running on an application

  • 8/3/2019 Srs Submit

    16/36

    serverP, Java Beans)

    JAVA: Application architecture. Java is an object-oriented programming language

    developed by Sun Microsystems a company best known for its high end UNIX

    workstations. Java language was designed to be small, simple, and portable across

    platforms, operating systems, both at the source and at the binary level, which means that

    Java programs (applet and application) can run on any machine that has the Java virtual

    machine (JVM) installed.

    DB2 9.7: IBM Database. DB2 Database is the database management system that delivers a

    flexible and cost effective database platform to build robust on demand business

    applications and supports the J2EE and web services standards.

    RAD 7.0: Development tool. IBM Rational Application Developer for WebSphere

    Software is an integrated development environment (IDE), made

    by IBM's Rational Software division, for visually designing, constructing, testing, and

    deploying Web services, portals, and Java (J2EE) applications.

    Assiduous Group/MITS

    Page 7

  • 8/3/2019 Srs Submit

    17/36

    Market Basket Analysis

    Software Requirement SpecificationAssiduous Group

    1.6 Overview:

    Version I

    30-01-2012

    -I- Overall Description: Processes during the tenure of project -

    (i) Study of Apriori Algorithm

    (ii) Data Collection

    (iii) Implementation of Apriori Algorithm

    (iv) Development of user interface

    (v) Application of Apriori on collected market basket data

    (vi) Analysis of results

    -I- Specific Requirements:

    Real-life dataset (Market Basket Data)

    2. Overall Description:

    2.1 Product Perspective:

    Client Tier

    It implements the "look and feel" of an application. It is responsible for the presentation

    of data, receiving user events and controlling the user interface. Most ecommerce

    applications are web-based. The programming languages used are the combination of

    HTML, CSS and Javascript.

    Application Tier

    This layer implements the business logic of the applications. It is usually powered by a

    Java Application Server (WebSphere). There're several sub-layers within the application

    layer.

  • 8/3/2019 Srs Submit

    18/36

    Assiduous Group/MITS

    Page 8

  • 8/3/2019 Srs Submit

    19/36

    Market Basket Analysis

    Software Requirement Specification

    Assiduous Group

    Data Tier

    Version I

    30-01-2012

    This is the layer that manages the persistence of application information. It is

    usually powered by a relational database server ( MS SQL Server).

    Stored Procedures and Functions are used to execute database server-side

    processes pertinent to data integrity. Business logic processes should be part of

    application layer in general, not part of data layer.

  • 8/3/2019 Srs Submit

    20/36

    Fig I : Object Oriented Scenario(of three tier architecture)

    Assiduous Group/MITS

    Page 9

  • 8/3/2019 Srs Submit

    21/36

    Market Basket Analysis

    Software Requirement SpecificationAssiduous Group

    2.2 Software Interface:

    Version I

    30-01-2012

    Front End Client: HTML , Dream Weaver

    Web Server: Apache, Tomcat, Web Sphere.

    Back End: DB2 9.7

    2.3 HardwareInterface:

    Minimum

    Requirements:

    Client Side

    Processor

    Intel Pentium III or

    RAM

    Disk Space

    Internet Explorer 6 AMD 800 MHz 128 MB 100 MB

    S

    erver Side

    P

  • 8/3/2019 Srs Submit

    22/36

    Intel Pentium III or

    RAM Disk SpaceWeb Sphere AMD 800 MHz 1 GB 3.5 GB

    Data Tier

    Processor

    Intel Pentium III orDB 2

    RAM

    256 MB

    Disk Space

    500 MB

    Assiduous Group/MITS

    AMD 800 MHz

    Page 10

  • 8/3/2019 Srs Submit

    23/36

    Market Basket Analysis

    Software Requirement SpecificationAssiduous Group

    2.4 Product Functions:

    Version I

    30-01-2012

    Developed /proposed product will include the following functions-

    1.Specify input data: Define the data to be mined, data may be in the form of

    dataset file or any other file etc.

    2.Process data/ preprocess the input data:3.Select technique/algorithm: Select the appropriate data mining algorithm.

    4.Work on results: Select visualization tools to analyze the result.

    2.5 User Characteristics:

    Users can be characterized as:

    1. General (Non Technical User): This category includes general users having no

    technical information.

    2. Technical User: This category includes users having technical information.

    3. Analyst: This category includes users having the ability to analyze the data as well

    as result.

    2.6 Constraints:

    Proposed application requires user-specified Support and Confidence framework as

    constraints, description of which is as follows-

    Support and Confidence Framework:

    Support: The first number is called the support for the rule. The support is simply the

    number of transactions that include all items in the antecedent and consequent parts of the

    rule. (The support is sometimes expressed as a percentage of the total number of records

    in the database.)

    measure of how often the collection of items in an association occur together as a

    percentage of all the transactions

    In 2% of the purchases at hardware store, both pick and shovel were

  • 8/3/2019 Srs Submit

    24/36

    bought

    Assiduous Group/MITSPage 11

  • 8/3/2019 Srs Submit

    25/36

    Market Basket Analysis

    Software Requirement SpecificationAssiduous Group

    Version I

    30-01-2012

    Rules originating from the same itemset have identical support but can have

    different confidence

    support = #tuples(LHS, RHS)/N

    Confidence: The other number is known as the confidence of the rule. Confidence is the

    ratio of the number of transactions that include all items in the consequent as well as the

    antecedent (namely, the support) to the number of transactions that include all items in

    the antecedent.

    confidence of rule B given A is a measure of how much more likely it is that B

    occurs when A has occurred

    100% meaning that B always occurs if A has occurred

    confidence = #tuples(LHS, RHS) / #tuples(LHS)

    Example: bread and buttermilk [90%, 1%]

    For example, if a supermarket database has 100,000 point-of-sale transactions, out of

    which 2,000 include both items A and B and 800 of these include item C, the association

    rule "If A and B are purchased then C is purchased on the same trip" has a support of 800

    transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000).

    One way to think of support is that it is the probability that a randomly selected

    transaction from the database will contain all items in the antecedent and the consequent,

    whereas the confidence is the conditional probability that a randomly selected transaction

    will include all the items in the consequent given that the transaction includes all the

    items in the antecedent.

    2.7 Architecture Design:

    Architecture of our developed product is inspired with the 3-tier architecture.

    The architecture of a database system is greatly influenced by the underlying computer

    system on which the database system runs.Database systems can be centralized, or client

    server, where one server machine executes work on behalf of multiple client machines.

  • 8/3/2019 Srs Submit

    26/36

    Assiduous Group/MITS Page 12

  • 8/3/2019 Srs Submit

    27/36

    Market Basket Analysis

    Software Requirement Specification

    Assiduous Group

    Version I

    30-01-2012

    In case of three tier architecture, the client machine acts as merely a front end and does

    not contain any direct database calls. Instead, the client ends the communication with an

    application server, usually through a forms interface. The application server in turn

    communicates with a database system to access data. The business logic of the

    application, which says what actions to carry out under what conditions, is embedded in

    the application server, instead of being distributed across multiple clients. Three tier

    applications are more appropriate for large applications, and for applications that run on

    the

    World Wide Web.

    The architecture is given in [2].

  • 8/3/2019 Srs Submit

    28/36

    Fig 2: Three tier architecture

    Assiduous Group/MITS

    Page 13

  • 8/3/2019 Srs Submit

    29/36

    Market Basket Analysis

    Software Requirement SpecificationAssiduous Group

    Client Tier

    Version I

    30-01-2012

    It implements the "look and feel" of an application. It is responsible for the presentation

    of data, receiving user events and controlling the user interface. Most ecommerce

    applications are web-based. The programming languages used are the combination of

    HTML, CSS and Javascript. JSP or ASP are used for dynamic content.

    HTML is a Web authoring markup language for defining content structures and

    rendering a web page.

    Javascript is commonly used for client-side validation. Javascript does have some

    control over the look-and-feel of a page in dynamic HTML.

    Application Tier

    This layer implements the business logic of the applications. It is usually powered by a

    Java Application Server (WebLogic or WebSphere). There're several sub-layers within

    the application layer.

    Control Layer is the interface layer between presentation tier and application tier.

    The implementation of this layer is dependent on the languages used for

    implementing the presentation tier.

    Transaction Layer usually implements business processes that may involve many

    business objects. In J2EE architecture, session beans are commonly used for

    implementing the transaction layer. Transaction Layer and Business Object Layer

    are not constrained by the programming languages for the presentation and the

    database used for persistence.

    Business Object Layer consists of objects that represent business entities which

    always should be 100% independent of database used for data persistence.

    Data Access Object (DAO) Layer is the interface between the application tier and

    persistence tier. Besides the methods for "creating", "retrieving", "updating" and

    "removing" a business object from database, DAO objects implement other

  • 8/3/2019 Srs Submit

    30/36

    Assiduous Group/MITSPage 14

  • 8/3/2019 Srs Submit

    31/36

    Market Basket Analysis

    Software Requirement SpecificationAssiduous Group

    Version I

    30-01-2012

    business-specific methods as well. Even with JDBC, DAO objects may not be

    100% database independent.

    Data Tier

    This is the layer that manages the persistence of application information. It is

    usually powered by a relational database server (MS SQLServer

    Stored Procedures and Functions are used to execute database server-side

    processes pertinent to data integrity. Business logic processes should be part of

    application layer in general, not part of data layer.

    2.8 Assumptions and Dependencies:

    Support : The first number is called the support for the rule. The support is simply the

    number of transactions that include all items in the antecedent and consequent parts of the

    rule. (The support is sometimes expressed as a percentage of the total number of records

    in the database.)

    measure of how often the collection of items in an association occur together as a

    percentage of all the transactions

    In 2% of the purchases at hardware store, both pick and shovel were

    bought

    Rules originating from the same itemset have identical support but can have

    different confidence

    support = #tuples(LHS, RHS)/N

    Confidence : The other number is known as the confidence of the rule. Confidence is the

    ratio of the number of transactions that include all items in the consequent as well as the

    antecedent (namely, the support) to the number of transactions that include all items in

    the antecedent.

  • 8/3/2019 Srs Submit

    32/36

    Assiduous Group/MITSPage 15

  • 8/3/2019 Srs Submit

    33/36

    Market Basket Analysis

    Software Requirement SpecificationAssiduous Group

    Version I

    30-01-2012

    confidence of rule B given A is a measure of how much more likely it is that B

    occurs when A has occurred

    100% meaning that B always occurs if A has occurred

    confidence = #tuples(LHS, RHS) / #tuples(LHS)

    Example: bread and buttermilk [90%, 1%]

    For example, if a supermarket database has 100,000 point-of-sale transactions, out of

    which 2,000 include both items A and B and 800 of these include item C, the association

    rule "If A and B are purchased then C is purchased on the same trip" has a support of 800

    transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000).

    One way to think of support is that it is the probability that a randomly selected

    transaction from the database will contain all items in the antecedent and the consequent,

    whereas the confidence is the conditional probability that a randomly selected transaction

    will include all the items in the consequent given that the transaction includes all the

    items in the antecedent.

    Assiduous

  • 8/3/2019 Srs Submit

    34/36

    Group/MITS

    Page 16

  • 8/3/2019 Srs Submit

    35/36

    Market Basket Analysis

    Software Requirement Specification

    Assiduous Group

    Version I

    30-01-2012

    Special Thanks

    We convey a special thanks to our department and to our

    college. We also convey a special thanks to all these

    softwares and websites, they have been helping a lot in

    doing the project.

    Assiduous

  • 8/3/2019 Srs Submit

    36/36

    Group/MITS

    Page 17