Identifying Syntax Semantic

Embed Size (px)

Citation preview

  • 8/17/2019 Identifying Syntax Semantic

    1/63

    identifying syntax and semantic relation between articles

    Chapter 1

    INTRODUCTION

    The Web has undergone exponential growth since its birth, and this expansion hasgenerated a number of problems; in this paper we address two of these: 1. The proliferation of 

    documents that are identical or almost identical. 2. The instability of U!s. The basis of our 

    approach is a mechanism for disco"ering when two documents are #roughly the same#; that is,

    for disco"ering when they ha"e the same content except for modifications such as formatting,

    minor corrections, webmaster signature, or logo. $imilarly, we can disco"er when a document is

    #roughly contained# in another. %pplying this mechanism to the entire collection of documents

    found by the %lta&ista spider yields a grouping of the documents into clusters of closely related

    items. %s explained below, this clustering can help sol"e the problems of document duplication

    and U! instability. The duplication problem arises in two ways: 'irst, there are documents that

    are found in multiple places in identical form. $ome examples are '%( )'re*uently %s+ed

    (uestions or '- )e*uest 'or -omments documents. The online documentation for popular 

     programs. ocuments stored in se"eral mirror sites. !egal documents. $econd, there are

    documents that are found in almost identical incarnations because they are:

    1 ifferent "ersions of the same document.

    2 The same document with different formatting.

    / The same document with site specific lin+s, customi0ations or contact information.

    -ombined with other source material to form a larger document.

    The instability problem arises when a particular U! becomes undesirable because: The

    associated document is temporarily una"ailable or has mo"ed. The U! refers to an old "ersion

    and the user wants the current "ersion. The U! is slow to access and the the user wants an

    identical or similar document that will be faster to retrie"e. n all these cases, the ability to find

    documents that are syntactically similar to a gi"en document allows the user to find other,

    acceptable "ersions of the desired item. U3s )Uniform esource 3ames ha"e often been

    [Type text] Page 1

  • 8/17/2019 Identifying Syntax Semantic

    2/63

    identifying syntax and semantic relation between articles

    suggested as a way to pro"ide functionality similar to that outlined abo"e. U3s are a

    generali0ed form of U!s )Uniform esource !ocators. 4owe"er, instead of naming a resource

    directly 5 as U!s do by gi"ing a specific ser"er, port and file name for the resource 5 U3s

     point to the resource indirectly through a name ser"er. The name ser"er is able to translate the

    U3 to the #best# )based on some criteria U! of the resource. The main ad"antage of U3s is

    that they are location independent. % single, stable U3 can trac+ a resource as it is renamed or 

    mo"es from ser"er to ser"er. % U3 could direct a user to the instance of a replicated resource

    that is in the nearest mirror site, or is gi"en in a desired language. Unfortunately, progress

    towards U36s has been slow. The mechanism we present here pro"ides an alternati"e solution

    dentical documents do not need to be handled specially in our algorithm, but they add to

    the computational wor+load and can be eliminated *uite easily. dentical documents ob"iously

    share the same set of shingles and so, for the clustering algorithm, we only need to +eep one

    representati"e from each group of identical documents. Therefore, for each document we

    generate a fingerprint that co"ers its entire contents. When we find documents with identical

    fingerprints, we eliminate all but one from the clustering algorithm. %fter the clustering has been

    completed, the other identical documents are added into the cluster containing the one +ept

    "ersion. We can expand the collection of identical documents with the #lexically5e*ui"alent#

    documents and the #shingle5e*ui"alent# documents. The lexically5e*ui"alent documents areidentical after they ha"e been con"erted to canonical form. The shingle5e*ui"alent documents are

    documents that ha"e identical shingle "alues after the set of shingles has been selected.

    7b"iously, all identical documents are lexically5e*ui"alent, and all lexically e*ui"alent

    documents are shingle e*ui"alent. We can find each set of documents with a single fingerprint.

    dentical documents are found with the fingerprint of the entire original contents. !exically5

    e*ui"alent documents are found with the fingerprint of the entire canonicali0ed contents. $hingle

    e*ui"alent documents are found with the fingerprint of the set of selected shingles.

    Objectives of Project- Modules of the Project

    819 esign and e"elopment of Article  $ubissio! Al"orith which is used to submit the

    articles.

    [Type text] Page 2

  • 8/17/2019 Identifying Syntax Semantic

    3/63

    identifying syntax and semantic relation between articles

    829 esign and e"elopment of Data Clea!i!" Al"orith  which is used to remo"e the

    unwanted data +nown as stop words.

    8/9 esign and e"elopment of To#e!i$ed Al"orith which is used to obtain to+ens in a text

    document.

    89 esign and e"elopment of %&!tactic Relatio! Al"orith to find the syntactic relations

     between documents

    89 esign and e"elopment of %ea!tic  Relatio! Al"orith to find the semantic relations

     between documents

    89 esign and e"elopment of %core Coputatio! Al"orith used to compute the scores of 

    the documents

    8

  • 8/17/2019 Identifying Syntax Semantic

    4/63

    identifying syntax and semantic relation between articles

     Fig: Stages for Relation Algorithm

    The following goals are defined

    1. Article %ubissio! = This module is responsible for storage of articles

    2.  Data Clea!i!"5 This module is used in order to remo"e stop words from the %rticle.

    /. To#e!i$atio!'s5 This process in used to obtain all the +eywords of the %rticle and assign

    them a uni*ue as well as the web site id.

    . %&!tatic Relatio!  = This >odule is responsible for finding the syntactic relation of 

    articles i.e "erb,ad"erb and ad?ecti"es of articles

    . %ea!tic Relatio! ( This module is used to find out the "arious semantic relations i.e

    hypernism

    . $core -omputation = This is used to measure the score with respect to syntax and

    semantic relation

    [Type text] Page 4

    Article

    Submission

    Data

    Cleaning Token

    Determination

    Syntantic

    elation

    Semantic

    elation

    Score

    Computatio

    n

    similarity

    measure

  • 8/17/2019 Identifying Syntax Semantic

    5/63

    identifying syntax and semantic relation between articles

    Proble Defi!itio!

    $emantic and syntactic relations play an important role of applications in recent years,

    especially on $emantic Web, nformation etrie"al, nformation @xtraction, and (uestion

    %nswering. $emantic and syntactic relations content main ideas in the sentences or paragraphs.

    This pro?ect presents our proposed algorithms for identifying semantic and syntactic relations

     between ob?ects and their properties in order to enrich a domain specific ontology, namely

    -omputing omain 7ntology, which is used in nformation extraction system

    Previous Approach

    Disadva!ta"es of Previous Approach

    Proposed Approach

    The proposed approach is to automatically identify the syntactic and semantic relations

    that might be found in text documents of articles of specific domain. %fterward, we extract these

    relations in order to enrich domain specific ontology. This ontology can be used in many

    applications, such as nformation etrie"al, nformation @xtraction, and (uestion answering

    focusing on computing domain. 'or this purpose, we propose a methodology, which combine

     3atural !anguage Arocessing )3!A and >atching !earning!

    Methodolo"&

    [Type text] Page "

  • 8/17/2019 Identifying Syntax Semantic

    6/63

    identifying syntax and semantic relation between articles

     Fig: Methodology of the Project 

    'ig shows the >ethodology of the pro?ect

    Article %ubissio!

    The %rticle $ubmission is used for submitting the article with article name and article description

    )ie* Articles

    This module is responsible for "iewing the articles

    Data Clea!i!"

    This module is responsible for preprocessing and cleaning of the text data. The module

    ma+es use of $top words in order to perform the analysis and do the cleaning .ata -leaning is

    used for remo"ing the stop words from each of the tweets and clean them. %fter the data cleaning

     process is completed the clean data can be represented as a set

    %top*ords

    [Type text] Page #

    Data CleaningArticle Submission $ie% Articles

    Stop %or&

    Analysis

    '&enti(y Syntax elations

    )in& $erb* A&+erb* ,oun '&enti(y Symantic elations

    $ynonyms, hyponyms,

    hypernyms of instance data

    )in& t-e articles are

    similar base& on

    Syntax an&

    Semantic

     Tokeni.ation

  • 8/17/2019 Identifying Syntax Semantic

    7/63

    identifying syntax and semantic relation between articles

    These are the set of words which do not ha"e any specific meaning. The data mining

    forum has defined set of +eywords. $top words are words which are filtered out before or 

    after processing of natural language data )text. There is not one definite list of stop words which

    all tools use and such a filter is not always used. The list of stopwords used in the algorithm are

    as follows

    a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,

    can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers

     ,him,his,how,however,i,if,in,into,is,it,its,just,least,let,lie,liely,may,me,might,most,must,my,neithe

    r,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,tha

    n,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,w

    here,which,while,who,whom,why,will,with,would,yet,you,your 

    %&!ta+ A!al&$er

    $yntactic relations are the relations between concepts or words in the sentence with

    respect to "erb, ad"erb

    Ide!tif&i!" the %ea!tic Relatio!s

     !dentifiyng semantic relations %s mentioned abo"e, the sentence layer also includes

    sentences that are deri"ed from synonyms, hyponyms and hypernyms of instances of ingredient

    layer. We use Word3et to find a set of synonyms, hyponyms and hypernyms of instances from

    ingredient layer. Word3et is an ontology that includes many different domains. 4owe"er, we

    only focus on computing domain.

    Articles are siilar based o! %&!ta+ a!d %ea!tic

    'or the articles find the syntax relations li+e "erb, ad"erb and then semantic relations are

    found based on $ynonyms, 4yponyms and 4ypernyms. f the syntax and semantic "alues are

    found then if the "alue of similarity based on greater than BC then they are considered same.

    [Type text] Page /

    http://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/Natural_language_processing

  • 8/17/2019 Identifying Syntax Semantic

    8/63

    identifying syntax and semantic relation between articles

    Chapter,

    IT.RATUR. %UR)./

    n the paper 819 titled DSyntactic clustering of the webE the authors ha"e de"eloped an

    efficient way to determine the syntactic similarity of files and ha"e applied it to e"ery document

    on the World Wide Web. Using this mechanism, we built a clustering of all the documents that

    are syntactically similar. Aossible applications include a #!ost and 'ound# ser"ice, filtering the

    results of Web searches, updating widely distributed web5pages, and identifying "iolations of 

    intellectual property rights.

    n the paper 829 titled D "fficient near#du$licate detection for %&A forumE the authors

     propose that r addresses the issue of redundant data in large5scale collections of (F% forums.

    The authors propose and e"aluate a no"el algorithm for automatically detecting the near5

    duplicate (F% threads. The main idea is to use the distributed index and >ap educe

    framewor+ to calculate pair wise similarity and identify redundant data fast and scalable. The

     proposed method was e"aluated on a real5world data collection crawled from a popular (F%

    forum. @xperimental results show that our proposed method can effecti"ely and efficiently detect

    near duplicate content in large web collections. Two distributed in"erted index methods to

    calculate similarities in parallel using >ap educe framewor+. We defined the near duplicate

    (F% thread and used the e"aluated signatures, parallel similarity calculating and a liner 

    combination method to extract near5duplications. @xperimental results in the real5world

    collection show that the proposed method can be effecti"ely and efficiently used to detect near5

    duplicates. %bout 1.

  • 8/17/2019 Identifying Syntax Semantic

    9/63

    identifying syntax and semantic relation between articles

    The similarity measure can be ac*uired by comparing the exterior to+ens of inter5sentences, but

    rele"ance measure can be obtained only by comparing the interior meaning of the sentences. n

    this paper, we described a method to explore the *uantified conceptual relations of word5pairs by

    using the definition of a lexical item in modern -hinese standard dictionary, and proposed a

     practical approach to measure the inter5sentence rele"ance. The results of the examples show that

    our approach can sol"e the problem of how to measure the rele"ance of two sentences without

    )or "ery low similarity but with a certain rele"ance. This method is also compatible with the

    current cosine similarity method.

    n the paper 89 titled D 'etection and ($timi)ed 'is$osal of *ear 'u$licate PagesE the

    authors describe that $earch engine is an important tool for users to access networ+ information

    resources. 4owe"er, a large number of duplicate and near5duplicate pages added user6s burden.

    -urrently, search engines only remo"e duplicate pages, but ha"e not yet any effecti"e strategies

    in detecting and disposing near5duplicate pages. This paper analy0ed the existing algorithms to

    select an appropriate algorithm to detect near5duplicate pages, and optimi0ed the disposing

    strategy to ensure that near5duplicate pages would not ta+e up too much space in search results

    while being used effecti"ely. These will allow users to retrie"e needed information more easily.

    n the paper 89 titled D +ext ased Similarity Metrics and 'elta for Semantic -eb

    .ra$hsE the authors describe that ecogni0ing that two $emantic Web documents or graphs are

    similar and characteri0ing their differences is useful in many tas+s, including retrie"al, updating,

    "ersion control and +nowledge base editing. We describe se"eral text5based similarity metrics

    that characteri0e the relation between $emantic Web graphs and e"aluate these metrics for three

    specialc cases of similarity: similarity in classes and properties, similarity disregarding

    differences in base5Us, and "ersioning relation5 ship. We apply these techni*ues for a special

    use case 5 generating a delta between "ersions of a $emantic Web graph. We ha"e e"aluated our 

    system on se"eral tas+s using a collection of graphs from the archi"e of the $woogle $emantic

    Web search engine.

    [Type text] Page

  • 8/17/2019 Identifying Syntax Semantic

    10/63

    identifying syntax and semantic relation between articles

    n the paper 8

  • 8/17/2019 Identifying Syntax Semantic

    11/63

    identifying syntax and semantic relation between articles

    n the paper 8I9 titled D Ada$tive near#du$licate detection via similarity learning E the

    authors describe that present a no"el near5duplicate document detection method that can easily

     be tuned for a particular domain. 7ur method represents each document as a real5"alued

    sparse  5gram "ector, where the weights are learned to optimi0e for a specified similarity

    function, such as the cosine similarity or the Jaccard coefficient. 3ear5duplicate documents can

     be reliably detected through this impro"ed similarity measure. n addition, these "ectors can be

    mapped to a small number of hash5"alues as document signatures through the locality sensiti"e

    hashing scheme for efficient similarity computation

    n the paper 81B9 titled 20earning to extract ey$hrases from text E the authors describe

    that ecent commercial software, such as >icrosoftKs Word I< and &erityKs $earch I

  • 8/17/2019 Identifying Syntax Semantic

    12/63

    identifying syntax and semantic relation between articles

    Chapter 0

    %oft*are Reuiree!t %pecificatio!s

    ,21 %oft*are Reuiree!ts %pecificatio!s

    % $oftware e*uirements $pecification )$$ is a complete description of the beha"ior 

    of the system to be de"eloped. t includes the functional and non functional re*uirement for the

    software to be de"eloped. The functional re*uirement includes what the software should do and

    non functional re*uirement include the constraint on the design or implementation of the system.

    e*uirements must be measurable, testable, related to identified needs or opportunities, and

    defined to a le"el of detail sufficient for system design.

    What the software has to do is directly percei"ed by its users = either human users or 

    other software systems. The common understanding between the user and de"eloper is captured

    in re*uirements document. The writing of software re*uirement specification reduces

    de"elopment effort, as careful re"iew of the document can re"eal omissions, misunderstandings,

    and inconsistencies early in the de"elopment cycle when these problems are easier to correct.

    The $$ discusses the product but not the pro?ect that de"eloped it; hence the $$ ser"es as a

     basis for later enhancement of the finished product. The $$ may need to be altered, but it does

     pro"ide a foundation for continued production e"aluation.

    Resource Reuiree!t

    Netbea! ID. 32421 5 3etbean is a multi5language software de"elopment en"ironment comprising

    an  integrated de"elopment en"ironment  )@ and an extensible  plug5in  system. t is written

     primarily in Ja"a and can be used to de"elop applications in Ja"a and, by means of the "arious

     plug5ins, in other languages as well, including -, -MM, -7L7!, Aython, Aerl, A4A, and others.

    [Type text] Page 12

    http://en.wikipedia.org/wiki/Software_development_environmenthttp://en.wikipedia.org/wiki/Integrated_development_environmenthttp://en.wikipedia.org/wiki/Integrated_development_environmenthttp://en.wikipedia.org/wiki/Integrated_development_environmenthttp://en.wikipedia.org/wiki/Plug-in_(computing)http://en.wikipedia.org/wiki/Software_development_environmenthttp://en.wikipedia.org/wiki/Integrated_development_environmenthttp://en.wikipedia.org/wiki/Integrated_development_environmenthttp://en.wikipedia.org/wiki/Plug-in_(computing)

  • 8/17/2019 Identifying Syntax Semantic

    13/63

    identifying syntax and semantic relation between articles

     3etbean employs plug5ins in order to pro"ide all of its functionality on top of )and including the

    runtime system, in contrast to some other applications where functionality is typically hard

    coded. The 3etbean $N includes the 3etbean ?a"a de"elopment tools )JT, offering an @

    with a built5in incremental Ja"a compiler and a full model of the Ja"a source files. This allows

    for ad"anced refactoring techni*ues and code analysis. The @ also ma+es use of a wor+space,

    in this case a set of metadata o"er a flat file space allowing external file modifications as long as

    the corresponding wor+space #resource# is refreshed afterwards.

    6ava Develope!t 7it 5 The 6ava Develope!t 7it )6D7  is an 7racle -orporation product

    aimed at Ja"a de"elopers. $ince the introduction of Ja"a, it has been by far the most widely used

    Ja"a $N. 7n 1< 3o"ember 2BB, $un announced that it would be released under the O3U

    Oeneral Aublic !icense )OA!, thus ma+ing it free software. This happened in large part on G>ay 2BB

  • 8/17/2019 Identifying Syntax Semantic

    14/63

    identifying syntax and semantic relation between articles

    •  ?ar = the archi"er, which pac+ages related class libraries into a single ?ar file. This tool

    also helps manage J% files.

    •  ?a"ah = the - header and stub generator, used to write nati"e methods

    •  ?a"ap = the class file disassembler 

    •  ?a"aws = the ?a"a web start launcher for J3!A applications

    •  ?console = Ja"a >onitoring and >anagement -onsole

    •  ?db = the debugger

    •  ?hat = Ja"a 4eap %nalysis Tool )experimental

    •  ?info = This utility gets configuration information from a running Ja"a process or crash

    dump. )experimental

    •  ?map = This utility outputs the memory map for Ja"a and can print shared ob?ect memory

    maps or heap memory details of a gi"en process or core dump. )experimental

    •  ?ps = Ja"a &irtual >achine Arocess $tatus Tool lists the instrumented 4ot$pot Ja"a

    &irtual >achines )J&>s on the target system. )experimental

    •  ?runscript = Ja"a command5line script shell.

    •  ?stac+ = utility which prints Ja"a stac+ traces of Ja"a threads )experimental

    •  ?stat = ?a"a "irtual machine statistics monitoring tool )experimental

    •  ?statd = ?stat daemon )experimental

    •  policytool = the policy creation and management tool, which can determine policy for a

    Ja"a runtime, specifying which permissions are a"ailable for code from "arious sources

    • &isual&> = "isual tool integrating se"eral command line JN tools and lightweight

     performance and memory profiling capabilities.

    • wsimport = generates portable J%P5W$ artifacts for in"o+ing a web ser"ice.

    [Type text] Page 14

  • 8/17/2019 Identifying Syntax Semantic

    15/63

    identifying syntax and semantic relation between articles

    • x?c = Aart of the Ja"a %A for P>! Linding )J%PL %A. t accepts an P>! schema and

    generates Ja"a classes.

    @xperimental tools may not be a"ailable in future "ersions of the JN.

    The JN also comes with a complete Ja"a untime @n"ironment, usually called a  $rivate 

    runtime, due to the fact that it is separated from the #regular# J@ and has extra contents. t

    consists of a Ja"a &irtual >achine and all of the class libraries present in the production

    en"ironment, as well as additional libraries only useful to de"elopers, such as the

    internationali0ation libraries and the ! libraries.

    -opies of the JN also include a wide selection of example programs demonstrating the use of

    almost all portions of the Ja"a %A.

    %*i!"5 The Ja"a 'oundation -lasses )J'- consists of fi"e ma?or parts: %WT, $wing, and

    %ccessibility, Ja"a 2, and rag and rop. Ja"a 2 has become an integral part of %WT, $wing

    is built on top of %WT, and %ccessibility support is built into $wing. The fi"e parts of J'- are

    certainly not mutually exclusi"e, and $wing is expected to merge more deeply with %WT in

    future "ersions of Ja"a. $wing is a set of classes that pro"ides more powerful and flexible

    components than are possible with the %WT. n addition to the familiar components, $wing

    supplies tabbed panes, scroll panes, trees, and tables. t pro"ides a single %A capable of 

    supporting multiple loo+5and feels so that de"elopers and end5users are not loc+ed into a single

     platformHs loo+5and5feel. The $wing library ma+es hea"y use of the >&- software design

     pattern, which conceptually decouples the data being "iewed from the user interface controls

    through which it is "iewed. $wing possesses se"eral traits such asQ1. Alatform5independence

    2.@xtensibility /.-omponent5oriented .-ustomi0able . -onfigurable . !oo+ and feel. Alatform

    independence both in terms of its expression and its implementation, extensibility which allows

    for the #plugging# of "arious custom implementations of specified framewor+ interfaces Users

    can pro"ide their own custom implementation of these components to o"erride the default

    implementations. -omponent5orientation allows responding to a well5+nown set of commands

    specific to the component. $pecifically, $wing components are Ja"a Leans components,

    compliant with the Ja"a Leans -omponent %rchitecture specifications. Through customi0able

    [Type text] Page 1"

  • 8/17/2019 Identifying Syntax Semantic

    16/63

    identifying syntax and semantic relation between articles

    feature users will programmatically customi0e a standard $wing component by assigning specific

     borders, colors, bac+grounds, opacities, etc, configurable that allows $wing to respond at runtime

    to fundamental changes in its settings. 'inally loo+ and feel allows one to speciali0e the loo+ and

    feel of widgets, by modifying the default "ia runtime parameters deri"ing from an existing one,

     by creating one from scratch, or, beginning with J2$@ .B, by using the !oo+ and 'eel which is

    configured with an P>! property file.

     J2EE Platform

    %s you might be already +nowing, J2@@ is a platform for executing ser"er side Ja"a

    applications. Lefore J2@@ was born, ser"er side Ja"a applications were written using "endor 

    specific %As. @ach "endor had uni*ue %As and architectures. This resulted in a huge learning

    cur"e for Ja"a de"elopers and architects to learn and program with each of these %A sets and

    higher costs for the companies. e"elopment community could not reuse the lessons learnt in the

    trenches. -onse*uently the entire Ja"a de"eloper community was fragmented,isolated and

    stunted thus ma+ing "ery difficult to build serious enterprise applications in Ja"a. 'ortunately the

    introduction of J2@@ and its adoption by the "endors has resulted in standardi0ation of its %As.

    This in turn reduced the learning cur"e for ser"er side Ja"a de"elopers. J2@@ specification

    defines a whole lot of interfaces and a few classes. &endors )li+e L@% and L> for instance

    ha"e pro"ided implementations for these interfaces adhering to the J2@@ specifications. These

    implementations are called J2@@ %pplication $er"ers.

    The J2@@ application ser"ers pro"ide the infrastructure ser"ices such as threading,

     pooling and transaction management out of the box. The application de"elopers can thus

    concentrate on implementing business logic. -onsider a J2@@ stac+ from a de"eloper 

     perspecti"e. %t the bottom of the stac+ is Ja"a 2 $tandard @dition )J2$@. J2@@ %pplication

    $er"ers run in the Ja"a &irutal >achine )J&> sandbox. They expose the standard J2@@interfaces to the application de"elopers. Two types1 of applications can be de"eloped and

    deployed on J2@@ application ser"ers = Web applications and @JL applications.

    These applications are deployed and executed in DcontainerEs. J2@@ specification defines

    containers for managing the lifecycle of ser"er side components. There are two types of 

    [Type text] Page 1#

  • 8/17/2019 Identifying Syntax Semantic

    17/63

    identifying syntax and semantic relation between articles

    containers 5 $er"let containers and @JL containers. $er"let containers manage the lifecycle of 

    web applications and @JL containers manage the lifecycle of @JLs.

     J2EE web application

    %ny web application that runs in the ser"let container is called a J2@@ web application.

    The ser"let container implements the $er"let and J$A specification. t pro"ides "arious entry

     points for handling the re*uest originating from a web browser. There are three entry points for 

    the browser into the J2@@ web application 5 $er"let, J$A and 'ilter. Rou can create your own

    $er"lets by extending the ?a"ax.ser"let.http.4ttp$er"let class and implementing the doOet) and

    doAost) method. Rou can create J$As simply by creating a text file containing J$A mar+up

    tags.one web5xml file.text file containing J$A mar+up tags. Rou can create 'ilters by

    implementing the ?a"ax.ser"let.'ilter interface. The ser"let container becomes aware of $er"lets

    and 'ilters when they are declared in a special file called web5xml 2% J2@@ web application has

    exactly one web5xml file

    % ser"let is the most basic J2@@ web component. t is managed by the ser"let container. %ll

    ser"lets implement the $er"let interface directly or indirectly. n general terms, a ser"let is the

    endpoint for re*uests adhering to a protocol. 4owe"er, the $er"let specification mandates

    implementation for ser"lets that handle 4TTA re*uests only. Lut you should +now that it is

     possible to implement the ser"let and the container to handle other protocols such as 'TA too.When writing $er"lets for handling 4TTA re*uests, you generally subclass 4ttp$er"let class.

    4TTA has six methods of re*uest submission = O@T, A7$T, AUT, 4@%

    and @!@T@. 7f these, O@T and A7$T are the only forms of re*uest submission rele"ant to

    application de"elopers. 4ence your subclass of 4ttp$er"let should implement two methods = 

    doOet) and doAost) to handle O@T and A7$T respecti"ely

    [Type text] Page 1/

  • 8/17/2019 Identifying Syntax Semantic

    18/63

    identifying syntax and semantic relation between articles

    Aresentation Tier $trategies

    Technologies used for the presentation tier can be roughly classified into three

    categories:

    1. >ar+up based endering )e.g. J$As

    2. Template based Transformation )e.g. &elocity, P$!T

    /. ich content )e.g. >acromedia 'lash, 'lex, !as0lo

     Markup based Rendering 

    J$As are perfect examples of mar+up based presentation tiers. n mar+up based

     presentation, "ariety of tags are defined )?ust li+e 4T>! tags. The tag definitions may be purely

    for presentation or they can contain business logic. They are mostly client tier specific. @.g. J$A

    tags producing 4T>! content. % typical J$A is interpreted in the web container and the

    conse*uent generation of 4T>!. This 4T>! is then rendered in the web browser.

    n the last section, you saw how $er"lets produced output 4T>! in addition to executing

     business logic. $o why arenHt $er"lets used for presentation tierK The answer lies in the

    separation of concerns essential in real world J2@@ pro?ects. Lac+ in the days when J$As didnHt

    exist, ser"lets were all that you had to build J2@@ web applications. They handled re*uests from

    the browser,in"o+ed middle tier business logic and rendered responses in 4T>! to the browser.

     3ow thatHs a problem. % $er"let is a Ja"a class coded by Ja"a programmers. t is o+ay to handle

     browser re*uests and ha"e business and presentation logic in the ser"lets since that is where they belong. 4T>! formatting and rendering is the concern of page author who most li+ely does not

    +now Ja"a. $o, the *uestion arises, how to separate these two concerns intermingled in $er"letsK

    J$As are the answer to this dilemma. J$As are ser"lets in disguiseS

    [Type text] Page 10

  • 8/17/2019 Identifying Syntax Semantic

    19/63

    identifying syntax and semantic relation between articles

    The philosophy behind J$A is that the page authors +now 4T>!. 4T>! is a mar+up

    language. 4ence learning a few more mar+up tags will not cause a paradigm shift for the page

    authors. %t least it is much easier than learning Ja"a and 77S J$A pro"ides some standard tags

    and ?a"a programmers can pro"ide custom tags. Aage authors can write ser"er side pages by

    mixing 4T>! mar+up and J$A tags. $uch ser"er side pages are called J$As. J$As are called

    ser"er side pages because it is the ser"let container that interprets them to generate 4T>!. The

    generated 4T>! is sent to the client browser.

    J$As are ser"er side pages. $er"er side pages in other languages are parsed e"ery time they are

    accessed and hence expensi"e. n J2@@, the expensi"e parsing is replaced by generating Ja"a

    class from the J$A. The first time a J$A is accessed, its contents are parsed and e*ui"alent Ja"a

    class is generated and subse*uent accesses are fast as a snap. 4ere is some twist to the story. The

    Ja"a classes that are generated by parsing J$As are nothing but $er"letsS n other words, e"ery

    J$A is parsed at runtime )or precompiled to

    generate $er"let classes.

     Presentation Logic and Business Logic – Whats the difference!

    The term Lusiness !ogic refers to the middle tier logic = the core of the system usually

    implemented as core J%&%. The code that controls the J$A na"igation, handles user inputs and

    in"o+es appropriate business logic is referred to as Aresentation !ogic. The actual J$A = the front

    end to the user contains html and custom tags to render the page and as less logic as possible. %

    rule of thumb is the dumber the J$A gets, the easier it is to maintain. n reality howe"er, some of 

    the presentation logic percolates to the actual J$A ma+ing it tough to draw a line between the

    two.

    >odel 1 architecture is the easiest way of de"eloping J$A based web applications. t

    cannot get any easier. n >odel 1, the browser directly accesses J$A pages. n other words, user 

    re*uests are handled directly by the J$A. -onsider a 4T>! page with a hyperlin+ to a J$A. When

    user clic+s on the hyperlin+, the J$A is directly in"o+ed. This is shown in 'igure

    [Type text] Page 1

  • 8/17/2019 Identifying Syntax Semantic

    20/63

    identifying syntax and semantic relation between articles

    The ser"let container parses the J$A and executes the resulting Ja"a ser"let. The J$A

    contains embedded code and tags to access the >odel Ja"aLeans. The >odel Ja"aLeans contains

    attributes for holding the 4TTA re*uest parameters from the *uery string. n addition it contains

    logic to connect to the middle tier or directly to the database using JL- to get the additional

    data needed to display the page. The J$A is then rendered as 4T>! using the data in the >odel

    Ja"aLeans and other 4elper classes and tags.

     Problems with Model " #rchitecture

    >odel 1 architecture is easy. There is some separation between content )>odel Ja"aLeans and

     presentation )J$A. This separation is good enough for smaller applications. !arger applications

    ha"e a lot of presentation logic. n >odel 1 architecture, the presentation logic usually leads to a

    significant amount of Ja"a code embedded in the J$A in the form of scriptlets. This is ugly and

    maintenance nightmare e"en for experienced Ja"a de"elopers. n large applications, J$As are

    de"eloped and maintained by page authors. The intermingled scriptlets and mar+up results in

    unclear definition of roles and is"ery problematic.

    [Type text] Page 2

  • 8/17/2019 Identifying Syntax Semantic

    21/63

    identifying syntax and semantic relation between articles

    %pplication control is decentrali0ed in >odel 1 architecture since the next page to be displayed

    is determined by the logic embedded in the current page. ecentrali0ed na"igation control can

    cause headaches. %ll this leads us to >odel 2 architecture of designing J$A pages

     >odel 2 %rchitecture = >&-

    The >odel 2 architecture for designing J$A pages is in reality, >odel &iew -ontroller )>&-

    applied to web applications. 4ence the two terms can be used interchangeably in the web world.

    >&- originated in $mallTal+ and has since made its way into Ja"a community. >odel 2

    architecure and its deri"ati"es are the cornerstones for all serious and industrial strength web

    applications designed in the real world. 4ence it is essential for you understand this

     paradigmthoroughly. 'igure shows the >odel 2 )>&- architecture.

    The main difference between >odel 1 and >odel 2 is that in >odel 2, a controller handles the

    user re*uest instead of another J$A. The controller is implemented as a $er"let. The following

    steps are executed when the user submits the re*uest.

    1. The -ontroller $er"let handles the userHs re*uest. )This means the hyperlin+ 

    in the J$A should point to the controller ser"let.

    2. The -ontroller $er"let then instantiates appropriate Ja"aLeans based on the

    re*uest parameters )and optionally also based on session attributes.

    /. The -ontroller $er"let then by itself or through a controller helpercommunicates with themiddle tier or directly to the database to fetch the re*uired data.

    . The -ontroller sets the resultant Ja"aLeans )either same or a new one in one

    of the following contexts = re*uest, session or application.

    [Type text] Page 21

  • 8/17/2019 Identifying Syntax Semantic

    22/63

    identifying syntax and semantic relation between articles

    . The controller then dispatches the re*uest to the next "iew based on the

    re*uest U!.

      . The &iew uses the resultant Ja"aLeans from $tep to display data.

    The sole function of the J$A in >odel 2 architecture is to display the data from the Ja"aLeans

    set in the re*uest, session or application scopes.

     #d$antages of Model 2 #rchitecture

    $ince there is no presentation logic in J$A, there are no scriptlets. This means lesser

    nightmares. 83ote that although >odel 2 is directed towards elimination ofscriptlets, it does not

    architecturally pre"ent you from adding scriptlets. This has led to widespread misuse of >odel 2

    architecture.

    With >&- you can ha"e as many controller ser"lets in your web application. n fact you

    can ha"e one -ontroller $er"let per module. 4owe"er there are se"eral ad"antages of ha"ing a

    single controller ser"let for the entire web application.

    [Type text] Page 22

  • 8/17/2019 Identifying Syntax Semantic

    23/63

    identifying syntax and semantic relation between articles

    n a typical web application, there are se"eral tas+s that you want to do for e"ery

    incoming re*uest. 'or instance, you ha"e to chec+ if the user re*uesting an operation is

    authori0ed to do so. Rou also want to log the userHs entry and exit from the web application for 

    e"ery re*uest. Rou might li+e to centrali0e the logic for dispatching re*uests to other "iews. The

    list goes on. f you ha"e se"eral controller ser"lets, chances are that you ha"e to duplicate the

    logic for all the abo"e tas+s in all those places. % single controller ser"let for the web application

    lets you centrali0e all the tas+s in a single place. @legant code and easier to maintain.

    Web applications based on >odel 2 architecture are easier to maintain and extend since

    the "iews do not refer to each other and there is no presentation logic in the "iews. t also allows

    you to clearly define the roles and responsibilities in large pro?ects thus allowing better 

    coordination among team members.

    %ontroller gone bad – &at %ontroller 

    f >&- is all that great, why do we need $truts after allK The answer lies in the difficulties

    associated in applying bare bone >&- to real world complexities. n medium to large

    applications, centrali0ed control and processing logic in the ser"let = the greatest plus of >&- is

    also its wea+ness. -onsider a mediocre application with 1 J$As. %ssume that each page has fi"e

    hyperlin+s )or fi"e form submissions. The total number of user re*uests to be handled in the

    application is &- framewor+, a centrali0ed controller 

    ser"let handles e"ery user re*uest. 'or each type of incoming re*uest there is D if E bloc+ in the

    doOet method of the controller $er"let to process the re*uest and dispatch to the next "iew. 'or 

    this mediocre application of ours, the controller $er"let has

  • 8/17/2019 Identifying Syntax Semantic

    24/63

    identifying syntax and semantic relation between articles

    >&- with configurable controller 

    When application gets large you cannot stic+ to bare bone >&-. Rou ha"e to extend it somehow

    to deal with these complexities. 7ne mechanism of extending >&- that has found

    widespread adoption is based on a configurable controller $er"let. The >&- with configurable

    controller ser"let is shown in 'igure

    When the 4TTA re*uest arri"es from the client, the -ontroller $er"let loo+s up in a

     properties file to decide on the right  6andler class for the 4TTA re*uest. This 4andler class is

    referred to as the Re'uest (andler . The Re/uest 6andler contains the presentation logic for that

    4TTA re*uest including business logic in"ocation. n other words, the e*uest 4andler does

    e"erything that is needed to handle the 4TTA re*uest. The only difference so far from the bare

     bone >&- is that the controller ser"let loo+s up in a properties file to instantiate the 4andler 

    instead of calling it directly

    [Type text] Page 24

  • 8/17/2019 Identifying Syntax Semantic

    25/63

    identifying syntax and semantic relation between articles

    %t this point you might be wondering how the controller ser"let would +now to

    instantiate the appropriate 4andler. The answer is simple. Two different 4TTA re*uests cannot

    ha"e the same U!. 4ence you can be certain that the U! uni*uely identifies each 4TTA

    re*uest on the ser"er side and hence each U! needs a uni*ue 4andler. n simpler terms, there is

    a one5to5one mapping between the U! and the 6andler class. This information is stored as +ey5

    "alue pairs in the properties file. The -ontroller $er"let loads the properties file on startup to find

    the appropriate Re/uest 6andler for each incoming U! re*uest.

    The controller ser"let uses Ja"a eflection to instantiate the e*uest 4andler. 4owe"er 

    there must be some sort of commonality between the e*uest 4andlers for the ser"let to

    generically instantiate the e*uest 4andler. The commonality is that all  Re/uest 6andler classes

    implement a common interface. !et us call this common interface as (andler )nterface. n its

    simplest form, the 6andler !nterface has one method say, execute). The controller ser"let reads

    the properties file to instantiate the Re/uest 6andler

    The -ontroller $er"let instantiates the e*uest 4andler in the doOet) method and

    in"o+es the execute) method on it using Ja"a eflection. The execute) method in"o+es

    appropriate business logic from the middle tier and then selects the next "iew to be presented to

    the user. The controller ser"let forwards the re*uest to the selected J$A "iew. %ll this happens in

    the doOet) method of the controller ser"let. The doOet) method lifecycle ne"er changes.

    What changes is the Re/uest 6andler Hs execute) method. Rou may not ha"e reali0ed it,

     but you ?ust saw how $truts wor+s in a nutshellS $truts is a controller ser"let based configurable

    >&- framewor+ that executes predefined methods in the handler ob?ects. nstead of using a

     properties file $truts uses P>! to store more useful information

    $truts

    [Type text] Page 2"

  • 8/17/2019 Identifying Syntax Semantic

    26/63

    identifying syntax and semantic relation between articles

    n $truts, there is only one controller ser"let for the entire web application. This

    controller ser"let is called %ction$er"let and resides in the pac+age org.apache.struts.action.

    t intercepts e"ery client re*uest and populates an  ActionForm from the 4TTA re*uest

     parameters. %ction'orm is a normal Ja"aLeans class. t has se"eral attributes corresponding to

    the 4TTA re*uest parameters and getter, setter methods for those attributes. Rou ha"e to create

    your own  ActionForm for e"ery 4TTA re*uest handled through the $truts framewor+ by

    extending the org.apache.struts.action.%ction'orm class.

    'or the lac+ of better terminology, let us coin a termto describe the classes such as %ction'orm = 

    7iew 'ata +ransfer (bject . 7iew 'ata +ransfer (bject is an ob?ect that holds the data from html

     page and transfers it around in the web tier framewor+ and application classes.

    The %ction$er"let then instantiates a 4andler. The 4andler class name is obtained from

    an P>! file based on the U! path information. This P>! file is referred to as $truts

    configuration file and by default named as struts#config5xml .

    [Type text] Page 2#

  • 8/17/2019 Identifying Syntax Semantic

    27/63

    identifying syntax and semantic relation between articles

    The 4andler is called  Action in the $truts terminology. This class is created by extending the

    %ction class in org.apache.struts.action pac+age. The %ction class is abstract and defines a single

    method called execute). Rou o"erride this method in your own Actions and in"o+e the business

    logic in this method. The execute) method returns the name of next "iew )J$A to be shown to

    the user. The %ction$er"let forwards to the selected "iew.

     3ow, that was $truts in a nutshell. $truts is of5course more than ?ust this. t is a full5fledged

     presentation framewor+. Throughout the de"elopment of the application, both the page author 

    and the de"eloper need to coordinate and ensure that any changes to one area are appropriately

    handled in the other. t aids in rapid de"elopment of web applications by separating the concerns

    in pro?ects.'or instance, it has custom tags for J$As. The page author can concentrate on

    de"eloping the J$As using custom tags that are specified by the framewor+. The application

    de"eloper wor+s on creating the ser"er side representation of the data and its interaction with a

     bac+ end data repository. 'urther it offers a consistent way of handling user input and processing

    it.

    ,2,2 Operati!" .!viro!e!t

    ,2,212 8ard*are Reuiree!ts

    The hardware re*uirements of the pro?ect are summari0ed in the following table

    %l No Paraeter Descriptio!

    1 %> BB>L51OL

    2 4ard is+ 12BOL51BOL

    / Ja"a e"elopment Nit5&ersion JN 1.

    atabase >y$(!

    atabase 'ront @nd 4eildi $(!Toad 'or >y$(! Tool 'or Ja"a e"elopment @cclipse

    < 'ront @nd Technology J$A

    G 'ramewor+ $pring5'ramewor+  

    I $e"er TomcatG.B

    [Type text] Page 2/

  • 8/17/2019 Identifying Syntax Semantic

    28/63

    identifying syntax and semantic relation between articles

    2.2.2. Software Requirements

    The software re*uirements is summari0ed in the following table

    %l No Paraeter Nae Paraeter )alue

    1 e"elopment !anguage J%&%

    2 Ja"a e"elopment Nit &ersion Jd+ 1.

    / Ja"a un Time @n"ironment Jre

    atabase for outing Tables Lac+end >y$(!

    atabase 'ront @nd for outing Tables 4eildi $(!

    atabase 'ront @nd for exporting @xcel

    $heets

    Toad 'or >y$(!

    < e"elopment Tool @ccilpse

    G $e"er Type Web $er"er  

    I Web $er"er Tomcat .B

    11 'ramewor+ Used $tructs 'ramewor+  

    12 &iew Technology Used Ja"a $er"er Aages1/ esigning -ascading $tyle $heets

    ,202 9u!ctio!al Reuiree!ts

    The following are the functional re*uirements of the pro?ect

    1. Article %ubissio! = This module is responsible for storage of articles

    2.  Data Clea!i!"5 This module is used in order to remo"e stop words from the %rticle.

    /. To#e!i$atio!'s5 This process in used to obtain all the +eywords of the %rticle and assign

    them a uni*ue as well as the web site id.

    [Type text] Page 20

  • 8/17/2019 Identifying Syntax Semantic

    29/63

    identifying syntax and semantic relation between articles

    . %&!tatic Relatio!  = This >odule is responsible for finding the syntactic relation of 

    articles i.e "erb,ad"erb and ad?ecti"es of articles

    . %ea!tic Relatio! ( This module is used to find out the "arious semantic relations i.e

    hypernism

    . $core -omputation = This is used to measure the score with respect to syntax and

    semantic relation

    2.4. Non functional requirements

    I!terface reuiree!ts

    4ow will the new system interface with its en"ironmentK

    User interfaces and Duser5friendlinessE

    nterfaces with other systems

    Perfora!ce reuiree!ts

    timespace boundswor+loads, response time, throughput and a"ailable storage spacee.g. Ethe system must handle 1,BBB transactions per second#

    reliabilitythe a"ailability of componentsintegrity of information maintained and supplied to the systeme.g. #system must ha"e less than 1hr downtime per three months#

    [email protected]. permissible information flows, or who can do what

    sur"i"[email protected]. system will need to sur"i"e fire, natural catastrophes, etc

    Operati!" reuiree!ts

      physical constraints )si0e, weight,

     personnel a"ailability F s+ill le"el

    [Type text] Page 2

  • 8/17/2019 Identifying Syntax Semantic

    30/63

    identifying syntax and semantic relation between articles

     accessibility for maintenance

     en"ironmental conditions

    8. Summary

    The chapter describes the information $oftware e*uirements $pecifications, 7perating

    @n"ironment54ardware e*uirements F $oftware e*uirements, 'unctional e*uirements, 3on

    functional re*uirements, User characteristics, %pplications of Aro?ect and %d"antages of $ystem

    Chapter 3

    [Type text] Page 3

  • 8/17/2019 Identifying Syntax Semantic

    31/63

    identifying syntax and semantic relation between articles

    High Leel !esign

    3.".High Leel !esign

      esign is one of the most important phases of software de"elopment. The design is a

    creati"e process in which a system organi0ation is established that will satisfy the functional

    and non5functional system re*uirements. !arge $ystems are always are decomposed into sub5

    systems that pro"ide some related set of ser"ices. The output of the design process is a

    description of the $oftware architecture.

    !ata #low !iagram $ Leel %

     T-e le+el is t-e initial le+el Data o% &iagram an& its generally calle& as t-e

    context le+el &iagram! 't is common practice (or a &esigner to &ra% a context5le+el

    D)D 6rst %-ic- s-o%s t-e interaction bet%een t-e system an& outsi&e entities! T-is

    context5le+el D)D is t-en explo&e& to s-o% more &etail o( t-e system being

    mo&ele&!

    )ig7 D)D 8e+el

    [Type text] Page 31

    Articles'&enti(y

    Syntax an&

    Semantic

    elation

    elation 9atrix an&

    Similarity

  • 8/17/2019 Identifying Syntax Semantic

    32/63

    identifying syntax and semantic relation between articles

    D)D 8e+el1

    [Type text] Page 32

    Article

    SubmissionData

    Cleaning

     Tokeni.ation

    Syntactic

    elation

    Semantic

    elation

    Score an&

    Similarity

    :btaine&

  • 8/17/2019 Identifying Syntax Semantic

    33/63

    identifying syntax and semantic relation between articles

    3.4."& !ata #low !iagram $ Leel 2

    #ig. 3.4 Leel 2

    [Type text] Page 33

    Articles ea& t-e 8ist

    o( Articles

    Data Cleaning

     Tokeni.ation)in& all

    Syntax

    elations

    +erb*a&+erb

    an&

    a&;ecti+e

    Score an&

    Similarity

    :btaine&

    Stop%or&s

    )in& all

    semantic

    elations

  • 8/17/2019 Identifying Syntax Semantic

    34/63

    identifying syntax and semantic relation between articles

    023 Activit& dia"ra

    )ig Acti+ity Diagram

    [Type text] Page 34

    SP ?eb!xml

    5ser+let!xml9o&el

    Controller

    D

    @

    8

    @

    A

     T

    @

    S

    @

    $

    '

    C

    @

    D

    A

     T

    A

    A

    C

    @

    S

    S

    % T% L%$@

  • 8/17/2019 Identifying Syntax Semantic

    35/63

    identifying syntax and semantic relation between articles

    The abo"e figure gi"es description about the system architecture which is followed in the

    industries in order to a de"elopment of any routing software.

    The figure shows that the user interface is designed in the 4T>!J$A pages and then the

    re*uest goes to the web container and web container "erifies the re*uest in the web.xml file

     by loo+ing first into the url pattern and then it goes to the ser"let name and then it searches

    for the corresponding ser"let name in the ser"let tag and loo+s into the ser"let class and

    creates an ob?ect of %ction $er"let and then the action ser"let will delegate its ?ob to e*uest

    Arocessor.

    The re*uest processor will loo+ for the action to which must be called in loo+ed up in the

    stucts5config.xml and corresponding action form is called and then the action is called. The

    action class will then call the delegate , then the delegate calls the ser"ice and ser"ice calls

    the ata %ccess layer and results goes exactly in the opposite way and the resultant J$A page

    is loaded

    Model

    This is the Alain 7ld Ja"a 7b?ect which will ha"e the getters and setters and setters gets

    automatically called and data the user has entered will be a"ailable.

    Co!troller

    This is the class which is used to fetch the user entered data and then processes it and

    calls the delegate layer and obtains the results.

    Dele"ate

    [Type text] Page 3"

  • 8/17/2019 Identifying Syntax Semantic

    36/63

    identifying syntax and semantic relation between articles

    elegate is the layer which contains nothing but call to an appropriate ser"ice.

    %ervice

    This is the layer which is responsible for entire algorithmic implementation. This is the

    layer which contains the hea"y weight implementation of entire algorithms. 'uture the

    ser"ice would re*uire the help of ata %ccess !ayer for some operations and many other 

    helper classes.

    Data Access a&er

    This is the layer which deals with only the -U operations namely -reate, etrie"e,

    Update and elete. t has no other usage. This layer has been used in order to fetch the data

    from the routing tables.

    Database

    This is the place where all the tables would ha"e been placed ha"e been placed.

    3.'. (se case )iagrams

    The Use -ase iagram is described in the following fig

    [Type text] Page 3#

  • 8/17/2019 Identifying Syntax Semantic

    37/63

    identifying syntax and semantic relation between articles

    'ig: Use -ase iagram

    [Type text] Page 3/

    Bser

    Syntactic

    ealation

    Article

    Submissio

    n

    Data

    Cleaning

     Tokeni.ing

    Semantic

    elation

    Score

    Computa

    tion

    Stop

    ?or&s

  • 8/17/2019 Identifying Syntax Semantic

    38/63

    identifying syntax and semantic relation between articles

    Chapter 4

    !etaile) )esign

    4.". *urpose

    etailed esign is a phase where in the internal logic of each of the modules specified in

    high5le"el design is determined. n this phase details and algorithmic design of each of the

    modules is specified. 7ther low5le"el components and subcomponents are also described. @ach

    subsection of this section will refer to or contain a detailed description of system software

    component. This chapter also discusses about the control flow in the software with much more

    details about software modules by explaining the details about each of the functionality.

    This chapter presents the following

    • !ife -ycle of Oeneric 'low

    • 'lowchart for each module

    :2121 ife C&cle of ;e!eric 9lo*

    This section deals with !ife cycle of $ecurity &ulnerability etection, %nalysis and

    emediation in @nterprise %pplications and $tate diagrams and possible transitions between the

    states.

     Fig859: 0ife 1ycle of the Process

    [Type text] Page 30

    $ie% SP Action )orm Action

  • 8/17/2019 Identifying Syntax Semantic

    39/63

    identifying syntax and semantic relation between articles

    The following are the stages for any user action

    1. &iew = this is the location in which the user will enter the data and performs some action

    2. %ction 'orm5 this is the A7J7 which will contain the "ariables as defined in the "iew, the

    setters of the method and getters of the method. The data will get automatically binded.

    /. %ction: This is a class which contains the execute method which will be responsible for 

    handling the logic of the pro?ect and is responsible for delegating the result to an

    appropriate "iew. This will also ma+es use of helper methods to perform business logic.

    4.2. (ser +nterface !esign

    4.2.". Har)ware +nterfaces

    There are no specific hardware interfaces used in the system

    4.2.2. Software ,raphical (ser +nterfaces

    %bbre"iated .4!  )pronounced .((#ee. % program interface that ta+es ad"antage of 

    the computer6s graphics capabilities to ma+e the program easier to use. Well5designed graphical

    user interfaces can free the user from learning complex command languages. 7n the other hand,

    many users find that they wor+ more effecti"ely with a command5dri"en interface, especially if 

    they already +now the command language.

    Oraphical user interfaces, such as >icrosoft Windows and the one used by the %pple

    >acintosh, feature the following basic components:

    [Type text] Page 3

    http://www.webopedia.com/TERM/P/program.htmlhttp://www.webopedia.com/TERM/I/interface.htmlhttp://www.webopedia.com/TERM/C/computer.htmlhttp://www.webopedia.com/TERM/G/graphics.htmlhttp://www.webopedia.com/TERM/C/command_language.htmlhttp://www.webopedia.com/TERM/U/user.htmlhttp://www.webopedia.com/TERM/U/user.htmlhttp://www.webopedia.com/TERM/C/command_driven.htmlhttp://www.webopedia.com/TERM/C/command_driven.htmlhttp://www.webopedia.com/TERM/M/Microsoft_Windows.htmlhttp://www.webopedia.com/TERM/M/Microsoft_Windows.htmlhttp://www.webopedia.com/TERM/M/Macintosh_computer.htmlhttp://www.webopedia.com/TERM/M/Macintosh_computer.htmlhttp://www.webopedia.com/TERM/P/program.htmlhttp://www.webopedia.com/TERM/I/interface.htmlhttp://www.webopedia.com/TERM/C/computer.htmlhttp://www.webopedia.com/TERM/G/graphics.htmlhttp://www.webopedia.com/TERM/C/command_language.htmlhttp://www.webopedia.com/TERM/U/user.htmlhttp://www.webopedia.com/TERM/C/command_driven.htmlhttp://www.webopedia.com/TERM/M/Microsoft_Windows.htmlhttp://www.webopedia.com/TERM/M/Macintosh_computer.htmlhttp://www.webopedia.com/TERM/M/Macintosh_computer.html

  • 8/17/2019 Identifying Syntax Semantic

    40/63

    identifying syntax and semantic relation between articles

      poi!ter 5 % symbol that appears on the display screen and that you mo"e

    to select ob?ects and commands. Usually, the pointer appears as a small angled

    arrow. Text 5processing applications, howe"er, use an !#beam $ointer  that is shaped li+e a

    capital ! .

      poi!ti!" device 5 % de"ice, such as a mouse or  trac+ball, that enables you to select

    ob?ects on the display screen.

      ico!s 5 $mall pictures that represent commands, files, or  windows. Ly mo"ing the

     pointer to the icon and pressing a mouse button,   you can execute a command

    or con"ert the icon into a window. Rou can also mo"e the icons around the display screen

    as if they were real ob?ects on your des+.

      des#top 5 The area on the display screen where icons are grouped is often referred to

    as the des+top because the icons are intended to represent real ob?ects on a real des+top.

      *i!do*s5 Rou can di"ide the screen into different areas. n each window, you

    can run a different program or display a different file. Rou can mo"e windows around the

    display screen, and change their shape and si0e at will.

      e!us 5 >ost graphical user interfaces let you execute commands by selecting a

    choice from a menu.

    The Oraphical User interface is de"eloped using the 4T>! and J$A language

    6ava%erver Pa"es Tech!olo"&

    Ja"a$er"er Aages )J$A technology allows you to easily create web content that has both

    static and dynamic components. J$A technology ma+es a"ailable all the dynamic capabilities of 

    Ja"a $er"let technology but pro"ides a more natural approach to creating static content. The

    main features of J$A technology are as follows:

    1. % language for de"eloping J$A pages, which are text5based documents that describe how

    to process a re*uest and construct a response

    2. %n expression language for accessing ser"er5side ob?ects

    [Type text] Page 4

    http://www.webopedia.com/TERM/P/pointer.htmlhttp://www.webopedia.com/TERM/D/display_screen.htmlhttp://www.webopedia.com/TERM/D/display_screen.htmlhttp://www.webopedia.com/TERM/S/select.htmlhttp://www.webopedia.com/TERM/S/select.htmlhttp://www.webopedia.com/TERM/O/object.htmlhttp://www.webopedia.com/TERM/C/command.htmlhttp://www.webopedia.com/TERM/T/text.htmlhttp://www.webopedia.com/TERM/A/application.htmlhttp://www.webopedia.com/TERM/I/I_beam_pointer.htmlhttp://www.webopedia.com/TERM/P/pointing_device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/M/mouse.htmlhttp://www.webopedia.com/TERM/T/trackball.htmlhttp://www.webopedia.com/TERM/T/trackball.htmlhttp://www.webopedia.com/TERM/I/icon.htmlhttp://www.webopedia.com/TERM/F/file.htmlhttp://www.webopedia.com/TERM/W/window.htmlhttp://www.webopedia.com/TERM/W/window.htmlhttp://www.webopedia.com/TERM/B/button.htmlhttp://www.webopedia.com/TERM/B/button.htmlhttp://www.webopedia.com/TERM/E/execute.htmlhttp://www.webopedia.com/TERM/E/execute.htmlhttp://www.webopedia.com/TERM/C/convert.htmlhttp://www.webopedia.com/TERM/D/desktop.htmlhttp://www.webopedia.com/TERM/R/run.htmlhttp://www.webopedia.com/TERM/M/menu.htmlhttp://www.webopedia.com/TERM/P/pointer.htmlhttp://www.webopedia.com/TERM/D/display_screen.htmlhttp://www.webopedia.com/TERM/S/select.htmlhttp://www.webopedia.com/TERM/O/object.htmlhttp://www.webopedia.com/TERM/C/command.htmlhttp://www.webopedia.com/TERM/T/text.htmlhttp://www.webopedia.com/TERM/A/application.htmlhttp://www.webopedia.com/TERM/I/I_beam_pointer.htmlhttp://www.webopedia.com/TERM/P/pointing_device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/M/mouse.htmlhttp://www.webopedia.com/TERM/T/trackball.htmlhttp://www.webopedia.com/TERM/I/icon.htmlhttp://www.webopedia.com/TERM/F/file.htmlhttp://www.webopedia.com/TERM/W/window.htmlhttp://www.webopedia.com/TERM/B/button.htmlhttp://www.webopedia.com/TERM/E/execute.htmlhttp://www.webopedia.com/TERM/C/convert.htmlhttp://www.webopedia.com/TERM/D/desktop.htmlhttp://www.webopedia.com/TERM/R/run.htmlhttp://www.webopedia.com/TERM/M/menu.html

  • 8/17/2019 Identifying Syntax Semantic

    41/63

    identifying syntax and semantic relation between articles

    /. >echanisms for defining extensions to the J$A language

    % SP $age is a text document that contains two types of text: static data, which

    can be expressed in any text5based format )such as 4T>!, $&O,W>!, and P>!, and

    J$A elements, which construct dynamic content.

    C8APT.R:

    Detailed Desi"!

    etailed esign is a phase where in the internal logic of each of the modules specified in

    high5le"el design is determined. n this phase details and algorithmic design of each of the

    modules is specified. 7ther low5le"el components and subcomponents are also described. @ach

    subsection of this section will refer to or contain a detailed description of system software

    component. This chapter also discusses about the control flow in the software with much more

    details about software modules by explaining the details about each of the functionality.

    This chapter presents the following

    • !ife -ycle of Oeneric 'low

    • 'lowchart for each module

  • 8/17/2019 Identifying Syntax Semantic

    42/63

    identifying syntax and semantic relation between articles

     Fig859: 0ife 1ycle of the Process

    The following are the stages for any user action

    . &iew = this is the location in which the user will enter the data and performs some action

    . >odel5 this is the A7J7 which will contain the "ariables as defined in the "iew, the

    setters of the method and getters of the method. The data will get automatically binded.

    . -ontroller: This is a class which contains the execute method which will be responsible

    for handling the logic of the pro?ect and is responsible for delegating the result to an

    appropriate "iew. This will also ma+es use of helper methods to perform business logic.

    etailed design chapter can be described by using the flowcharts

    [Type text] Page 42

    $ie% SP 9o&el Controller

  • 8/17/2019 Identifying Syntax Semantic

    43/63

    identifying syntax and semantic relation between articles

    Article Module

    The article module is responsible for storage of articles. %rticle name and article

    description acts as an input

     Fig: Article Module

    :20 Data Clea!i!"

    [Type text] Page 43

     

    Start

     Article *ame and Article 'escri$tion

    etrie"e the !ist of %rticle names in the application

    C-eck

    Article in

    artclenameli

    st,:

    Storage o( Article is success(ul

    $ali&ation

    error

     @S

  • 8/17/2019 Identifying Syntax Semantic

    44/63

    identifying syntax and semantic relation between articles

    This is the processing which the data is cleaned from unwanted symbols and set of stop

    words. The data undergoes first a delimitation process and then it undergoes a cleaning process.

    The set of stop words used in the case of data mining is as gi"en in the below snippet.

    ist of %top =ords

    $top Word

     A

    about 

    above

    across

    after

    afterwards

    again

    against 

    all 

    almost 

    alone

    along

    already

    also

    although

    [Type text] Page 44

  • 8/17/2019 Identifying Syntax Semantic

    45/63

    identifying syntax and semantic relation between articles

    always

    am

    among

    amongst 

    amoungst 

    amount 

    an

    and 

    another

    any

    anyhow

    anyone

    anything

    anyway

    anywhere

    are

    around 

    as

    at 

    back

    [Type text] Page 4"

  • 8/17/2019 Identifying Syntax Semantic

    46/63

    identifying syntax and semantic relation between articles

    be

    became

    because

    become

    becomes

    becoming

    been

    before

    beforehand 

    behind 

    being

    below

    beside

    besides

    between

    beyond 

    bill 

    both

    bottom

    but 

    [Type text] Page 4#

  • 8/17/2019 Identifying Syntax Semantic

    47/63

    identifying syntax and semantic relation between articles

    by

    call 

    can

    cannot 

    cant 

    co

    computer

    con

    could 

    couldnt 

    cry

    de

    describe

    detail 

    do

    done

    down

    due

    during

    each

    [Type text] Page 4/

  • 8/17/2019 Identifying Syntax Semantic

    48/63

    identifying syntax and semantic relation between articles

    eg

    eight 

    either

    eleven

    else

    elsewhere

    empty

    enough

    etc

    even

    ever

    every

    everyone

    everything

    everywhere

    except 

     few

     fifteen

     fify

     fill 

    [Type text] Page 40

  • 8/17/2019 Identifying Syntax Semantic

    49/63

    identifying syntax and semantic relation between articles

     find 

     fire

     first 

     five

     for

     former

     formerly

     forty

     found 

     four

     from

     front 

     full 

     further

    get 

    give

    go

    had 

    has

    hasnt 

    [Type text] Page 4

  • 8/17/2019 Identifying Syntax Semantic

    50/63

    identifying syntax and semantic relation between articles

    have

    he

    hence

    her

    here

    hereafter

    hereby

    herein

    hereupon

    hers

    herse” 

    him

    himse” 

    his

    how

    however

    hundred 

    ie

    if 

    [Type text] Page "

  • 8/17/2019 Identifying Syntax Semantic

    51/63

    identifying syntax and semantic relation between articles

    in

    inc

    indeed 

    interest 

    into

    is

    it 

    its

    itse” 

    keep

    last 

    latter

    latterly

    least 

    less

    ltd 

    made

    many

    may

    me

    [Type text] Page "1

  • 8/17/2019 Identifying Syntax Semantic

    52/63

    identifying syntax and semantic relation between articles

    meanwhile

    might 

    mill 

    mine

    more

    moreover

    most 

    mostly

    move

    much

    must 

    my

    myse” 

    name

    namely

    neither

    never

    nevertheless

    next 

    nine

    [Type text] Page "2

  • 8/17/2019 Identifying Syntax Semantic

    53/63

    identifying syntax and semantic relation between articles

    no

    nobody

    none

    noone

    nor

    not 

    nothing

    now

    nowhere

    of 

    off 

    often

    on

    once

    one

    only

    onto

    or

    other

    others

    [Type text] Page "3

  • 8/17/2019 Identifying Syntax Semantic

    54/63

    identifying syntax and semantic relation between articles

    otherwise

    our

    ours

    ourselves

    out 

    over

    own

     part 

     per

     perhaps

     please

     put 

    rather

    re

    same

    see

    seem

    seemed 

    seeming

    seems

    [Type text] Page "4

  • 8/17/2019 Identifying Syntax Semantic

    55/63

    identifying syntax and semantic relation between articles

    serious

    several 

    she

    should 

    show

    side

    since

    sincere

    six 

    sixty

    so

    some

    somehow

    someone

    something

    sometime

    sometimes

    somewhere

    still 

    such

    [Type text] Page ""

  • 8/17/2019 Identifying Syntax Semantic

    56/63

    identifying syntax and semantic relation between articles

    system

    take

    ten

    than

    that 

    the

    their

    them

    themselves

    then

    thence

    there

    thereafter

    thereby

    therefore

    therein

    thereupon

    these

    they

    thick

    [Type text] Page "#

  • 8/17/2019 Identifying Syntax Semantic

    57/63

    identifying syntax and semantic relation between articles

    thin

    third 

    this

    those

    though

    three

    through

    throughout 

    thru

    thus

    to

    together

    too

    top

    toward 

    towards

    twelve

    twenty

    two

    un

    [Type text] Page "/

  • 8/17/2019 Identifying Syntax Semantic

    58/63

    identifying syntax and semantic relation between articles

    under

    until 

    up

    upon

    us

    very

    via

    was

    we

    well 

    were

    what 

    whatever

    when

    whence

    whenever

    where

    whereafter

    whereas

    whereby

    [Type text] Page "0

  • 8/17/2019 Identifying Syntax Semantic

    59/63

    identifying syntax and semantic relation between articles

    wherein

    whereupon

    wherever

    whether

    which

    while

    whither

    who

    whoever

    whole

    whom

    whose

    why

    will 

    with

    within

    without 

    would 

    yet 

    you

    [Type text] Page "

  • 8/17/2019 Identifying Syntax Semantic

    60/63

    identifying syntax and semantic relation between articles

    your

    yours

    yourself 

    yourselves

    -o)ule 2 !ata Cleaning

    The 'lowchart for the ata -leaning process is gi"en below

    [Type text] Page #

    Data @xtraction

    Bsing Delimiter an&

    Data Cleaning using

    Stop %or&s

    epository

    Bnclean Data

    Stop ?or&s

    epository

  • 8/17/2019 Identifying Syntax Semantic

    61/63

    identifying syntax and semantic relation between articles

    &ig* +ata %leaning Process &lowchart 

    -o)ule 3 /o0eni1ation%fter the data is cleaned then the to+en extraction process begins in which all the words

    in the %rticles are referred as to+ens and are extracted. The To+en extraction is done with the

    help of again delimiters. The flowchart for the to+en extraction is as gi"en below

    [Type text] Page #1

    Start

    ?ebsite Brl

    @xtract the indi"idual words with help of a delimiter li+e comma or a space

    -lean the symbols and if the word belongs to stop word remo"e it

    -lean data is stored in the repository

    Stop

  • 8/17/2019 Identifying Syntax Semantic

    62/63

    identifying syntax and semantic relation between articles

     Fig: 3eyword "xtraction Process

    [Type text] Page #2

    Data @xtraction

    Bsing Delimiter

    Eey%or&sepository

    Clean Data

  • 8/17/2019 Identifying Syntax Semantic

    63/63

    identifying syntax and semantic relation between articles

    The 'lowchart for the Text @xtraction process is gi"en below

    &ig* ,oken E-traction Process &lowchart 

    'ig shows the To+en extraction process where the clean data is scanned to obtain

    Start

    -lean data

    @xtract the indi"idual words with help of a delimiter li+e comma or a space

    Stop

    iFG no o(

    tokens

    $tore to+en in repository

    iM1