Upload
mahendra-varma
View
226
Download
0
Embed Size (px)
Citation preview
8/17/2019 Identifying Syntax Semantic
1/63
identifying syntax and semantic relation between articles
Chapter 1
INTRODUCTION
The Web has undergone exponential growth since its birth, and this expansion hasgenerated a number of problems; in this paper we address two of these: 1. The proliferation of
documents that are identical or almost identical. 2. The instability of U!s. The basis of our
approach is a mechanism for disco"ering when two documents are #roughly the same#; that is,
for disco"ering when they ha"e the same content except for modifications such as formatting,
minor corrections, webmaster signature, or logo. $imilarly, we can disco"er when a document is
#roughly contained# in another. %pplying this mechanism to the entire collection of documents
found by the %lta&ista spider yields a grouping of the documents into clusters of closely related
items. %s explained below, this clustering can help sol"e the problems of document duplication
and U! instability. The duplication problem arises in two ways: 'irst, there are documents that
are found in multiple places in identical form. $ome examples are '%( )'re*uently %s+ed
(uestions or '- )e*uest 'or -omments documents. The online documentation for popular
programs. ocuments stored in se"eral mirror sites. !egal documents. $econd, there are
documents that are found in almost identical incarnations because they are:
1 ifferent "ersions of the same document.
2 The same document with different formatting.
/ The same document with site specific lin+s, customi0ations or contact information.
-ombined with other source material to form a larger document.
The instability problem arises when a particular U! becomes undesirable because: The
associated document is temporarily una"ailable or has mo"ed. The U! refers to an old "ersion
and the user wants the current "ersion. The U! is slow to access and the the user wants an
identical or similar document that will be faster to retrie"e. n all these cases, the ability to find
documents that are syntactically similar to a gi"en document allows the user to find other,
acceptable "ersions of the desired item. U3s )Uniform esource 3ames ha"e often been
[Type text] Page 1
8/17/2019 Identifying Syntax Semantic
2/63
identifying syntax and semantic relation between articles
suggested as a way to pro"ide functionality similar to that outlined abo"e. U3s are a
generali0ed form of U!s )Uniform esource !ocators. 4owe"er, instead of naming a resource
directly 5 as U!s do by gi"ing a specific ser"er, port and file name for the resource 5 U3s
point to the resource indirectly through a name ser"er. The name ser"er is able to translate the
U3 to the #best# )based on some criteria U! of the resource. The main ad"antage of U3s is
that they are location independent. % single, stable U3 can trac+ a resource as it is renamed or
mo"es from ser"er to ser"er. % U3 could direct a user to the instance of a replicated resource
that is in the nearest mirror site, or is gi"en in a desired language. Unfortunately, progress
towards U36s has been slow. The mechanism we present here pro"ides an alternati"e solution
dentical documents do not need to be handled specially in our algorithm, but they add to
the computational wor+load and can be eliminated *uite easily. dentical documents ob"iously
share the same set of shingles and so, for the clustering algorithm, we only need to +eep one
representati"e from each group of identical documents. Therefore, for each document we
generate a fingerprint that co"ers its entire contents. When we find documents with identical
fingerprints, we eliminate all but one from the clustering algorithm. %fter the clustering has been
completed, the other identical documents are added into the cluster containing the one +ept
"ersion. We can expand the collection of identical documents with the #lexically5e*ui"alent#
documents and the #shingle5e*ui"alent# documents. The lexically5e*ui"alent documents areidentical after they ha"e been con"erted to canonical form. The shingle5e*ui"alent documents are
documents that ha"e identical shingle "alues after the set of shingles has been selected.
7b"iously, all identical documents are lexically5e*ui"alent, and all lexically e*ui"alent
documents are shingle e*ui"alent. We can find each set of documents with a single fingerprint.
dentical documents are found with the fingerprint of the entire original contents. !exically5
e*ui"alent documents are found with the fingerprint of the entire canonicali0ed contents. $hingle
e*ui"alent documents are found with the fingerprint of the set of selected shingles.
Objectives of Project- Modules of the Project
819 esign and e"elopment of Article $ubissio! Al"orith which is used to submit the
articles.
[Type text] Page 2
8/17/2019 Identifying Syntax Semantic
3/63
identifying syntax and semantic relation between articles
829 esign and e"elopment of Data Clea!i!" Al"orith which is used to remo"e the
unwanted data +nown as stop words.
8/9 esign and e"elopment of To#e!i$ed Al"orith which is used to obtain to+ens in a text
document.
89 esign and e"elopment of %&!tactic Relatio! Al"orith to find the syntactic relations
between documents
89 esign and e"elopment of %ea!tic Relatio! Al"orith to find the semantic relations
between documents
89 esign and e"elopment of %core Coputatio! Al"orith used to compute the scores of
the documents
8
8/17/2019 Identifying Syntax Semantic
4/63
identifying syntax and semantic relation between articles
Fig: Stages for Relation Algorithm
The following goals are defined
1. Article %ubissio! = This module is responsible for storage of articles
2. Data Clea!i!"5 This module is used in order to remo"e stop words from the %rticle.
/. To#e!i$atio!'s5 This process in used to obtain all the +eywords of the %rticle and assign
them a uni*ue as well as the web site id.
. %&!tatic Relatio! = This >odule is responsible for finding the syntactic relation of
articles i.e "erb,ad"erb and ad?ecti"es of articles
. %ea!tic Relatio! ( This module is used to find out the "arious semantic relations i.e
hypernism
. $core -omputation = This is used to measure the score with respect to syntax and
semantic relation
[Type text] Page 4
Article
Submission
Data
Cleaning Token
Determination
Syntantic
elation
Semantic
elation
Score
Computatio
n
similarity
measure
8/17/2019 Identifying Syntax Semantic
5/63
identifying syntax and semantic relation between articles
Proble Defi!itio!
$emantic and syntactic relations play an important role of applications in recent years,
especially on $emantic Web, nformation etrie"al, nformation @xtraction, and (uestion
%nswering. $emantic and syntactic relations content main ideas in the sentences or paragraphs.
This pro?ect presents our proposed algorithms for identifying semantic and syntactic relations
between ob?ects and their properties in order to enrich a domain specific ontology, namely
-omputing omain 7ntology, which is used in nformation extraction system
Previous Approach
Disadva!ta"es of Previous Approach
Proposed Approach
The proposed approach is to automatically identify the syntactic and semantic relations
that might be found in text documents of articles of specific domain. %fterward, we extract these
relations in order to enrich domain specific ontology. This ontology can be used in many
applications, such as nformation etrie"al, nformation @xtraction, and (uestion answering
focusing on computing domain. 'or this purpose, we propose a methodology, which combine
3atural !anguage Arocessing )3!A and >atching !earning!
Methodolo"&
[Type text] Page "
8/17/2019 Identifying Syntax Semantic
6/63
identifying syntax and semantic relation between articles
Fig: Methodology of the Project
'ig shows the >ethodology of the pro?ect
Article %ubissio!
The %rticle $ubmission is used for submitting the article with article name and article description
)ie* Articles
This module is responsible for "iewing the articles
Data Clea!i!"
This module is responsible for preprocessing and cleaning of the text data. The module
ma+es use of $top words in order to perform the analysis and do the cleaning .ata -leaning is
used for remo"ing the stop words from each of the tweets and clean them. %fter the data cleaning
process is completed the clean data can be represented as a set
%top*ords
[Type text] Page #
Data CleaningArticle Submission $ie% Articles
Stop %or&
Analysis
'&enti(y Syntax elations
)in& $erb* A&+erb* ,oun '&enti(y Symantic elations
$ynonyms, hyponyms,
hypernyms of instance data
)in& t-e articles are
similar base& on
Syntax an&
Semantic
Tokeni.ation
8/17/2019 Identifying Syntax Semantic
7/63
identifying syntax and semantic relation between articles
These are the set of words which do not ha"e any specific meaning. The data mining
forum has defined set of +eywords. $top words are words which are filtered out before or
after processing of natural language data )text. There is not one definite list of stop words which
all tools use and such a filter is not always used. The list of stopwords used in the algorithm are
as follows
a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,
can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers
,him,his,how,however,i,if,in,into,is,it,its,just,least,let,lie,liely,may,me,might,most,must,my,neithe
r,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,tha
n,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,w
here,which,while,who,whom,why,will,with,would,yet,you,your
%&!ta+ A!al&$er
$yntactic relations are the relations between concepts or words in the sentence with
respect to "erb, ad"erb
Ide!tif&i!" the %ea!tic Relatio!s
!dentifiyng semantic relations %s mentioned abo"e, the sentence layer also includes
sentences that are deri"ed from synonyms, hyponyms and hypernyms of instances of ingredient
layer. We use Word3et to find a set of synonyms, hyponyms and hypernyms of instances from
ingredient layer. Word3et is an ontology that includes many different domains. 4owe"er, we
only focus on computing domain.
Articles are siilar based o! %&!ta+ a!d %ea!tic
'or the articles find the syntax relations li+e "erb, ad"erb and then semantic relations are
found based on $ynonyms, 4yponyms and 4ypernyms. f the syntax and semantic "alues are
found then if the "alue of similarity based on greater than BC then they are considered same.
[Type text] Page /
http://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/Natural_language_processing
8/17/2019 Identifying Syntax Semantic
8/63
identifying syntax and semantic relation between articles
Chapter,
IT.RATUR. %UR)./
n the paper 819 titled DSyntactic clustering of the webE the authors ha"e de"eloped an
efficient way to determine the syntactic similarity of files and ha"e applied it to e"ery document
on the World Wide Web. Using this mechanism, we built a clustering of all the documents that
are syntactically similar. Aossible applications include a #!ost and 'ound# ser"ice, filtering the
results of Web searches, updating widely distributed web5pages, and identifying "iolations of
intellectual property rights.
n the paper 829 titled D "fficient near#du$licate detection for %&A forumE the authors
propose that r addresses the issue of redundant data in large5scale collections of (F% forums.
The authors propose and e"aluate a no"el algorithm for automatically detecting the near5
duplicate (F% threads. The main idea is to use the distributed index and >ap educe
framewor+ to calculate pair wise similarity and identify redundant data fast and scalable. The
proposed method was e"aluated on a real5world data collection crawled from a popular (F%
forum. @xperimental results show that our proposed method can effecti"ely and efficiently detect
near duplicate content in large web collections. Two distributed in"erted index methods to
calculate similarities in parallel using >ap educe framewor+. We defined the near duplicate
(F% thread and used the e"aluated signatures, parallel similarity calculating and a liner
combination method to extract near5duplications. @xperimental results in the real5world
collection show that the proposed method can be effecti"ely and efficiently used to detect near5
duplicates. %bout 1.
8/17/2019 Identifying Syntax Semantic
9/63
identifying syntax and semantic relation between articles
The similarity measure can be ac*uired by comparing the exterior to+ens of inter5sentences, but
rele"ance measure can be obtained only by comparing the interior meaning of the sentences. n
this paper, we described a method to explore the *uantified conceptual relations of word5pairs by
using the definition of a lexical item in modern -hinese standard dictionary, and proposed a
practical approach to measure the inter5sentence rele"ance. The results of the examples show that
our approach can sol"e the problem of how to measure the rele"ance of two sentences without
)or "ery low similarity but with a certain rele"ance. This method is also compatible with the
current cosine similarity method.
n the paper 89 titled D 'etection and ($timi)ed 'is$osal of *ear 'u$licate PagesE the
authors describe that $earch engine is an important tool for users to access networ+ information
resources. 4owe"er, a large number of duplicate and near5duplicate pages added user6s burden.
-urrently, search engines only remo"e duplicate pages, but ha"e not yet any effecti"e strategies
in detecting and disposing near5duplicate pages. This paper analy0ed the existing algorithms to
select an appropriate algorithm to detect near5duplicate pages, and optimi0ed the disposing
strategy to ensure that near5duplicate pages would not ta+e up too much space in search results
while being used effecti"ely. These will allow users to retrie"e needed information more easily.
n the paper 89 titled D +ext ased Similarity Metrics and 'elta for Semantic -eb
.ra$hsE the authors describe that ecogni0ing that two $emantic Web documents or graphs are
similar and characteri0ing their differences is useful in many tas+s, including retrie"al, updating,
"ersion control and +nowledge base editing. We describe se"eral text5based similarity metrics
that characteri0e the relation between $emantic Web graphs and e"aluate these metrics for three
specialc cases of similarity: similarity in classes and properties, similarity disregarding
differences in base5Us, and "ersioning relation5 ship. We apply these techni*ues for a special
use case 5 generating a delta between "ersions of a $emantic Web graph. We ha"e e"aluated our
system on se"eral tas+s using a collection of graphs from the archi"e of the $woogle $emantic
Web search engine.
[Type text] Page
8/17/2019 Identifying Syntax Semantic
10/63
identifying syntax and semantic relation between articles
n the paper 8
8/17/2019 Identifying Syntax Semantic
11/63
identifying syntax and semantic relation between articles
n the paper 8I9 titled D Ada$tive near#du$licate detection via similarity learning E the
authors describe that present a no"el near5duplicate document detection method that can easily
be tuned for a particular domain. 7ur method represents each document as a real5"alued
sparse 5gram "ector, where the weights are learned to optimi0e for a specified similarity
function, such as the cosine similarity or the Jaccard coefficient. 3ear5duplicate documents can
be reliably detected through this impro"ed similarity measure. n addition, these "ectors can be
mapped to a small number of hash5"alues as document signatures through the locality sensiti"e
hashing scheme for efficient similarity computation
n the paper 81B9 titled 20earning to extract ey$hrases from text E the authors describe
that ecent commercial software, such as >icrosoftKs Word I< and &erityKs $earch I
8/17/2019 Identifying Syntax Semantic
12/63
identifying syntax and semantic relation between articles
Chapter 0
%oft*are Reuiree!t %pecificatio!s
,21 %oft*are Reuiree!ts %pecificatio!s
% $oftware e*uirements $pecification )$$ is a complete description of the beha"ior
of the system to be de"eloped. t includes the functional and non functional re*uirement for the
software to be de"eloped. The functional re*uirement includes what the software should do and
non functional re*uirement include the constraint on the design or implementation of the system.
e*uirements must be measurable, testable, related to identified needs or opportunities, and
defined to a le"el of detail sufficient for system design.
What the software has to do is directly percei"ed by its users = either human users or
other software systems. The common understanding between the user and de"eloper is captured
in re*uirements document. The writing of software re*uirement specification reduces
de"elopment effort, as careful re"iew of the document can re"eal omissions, misunderstandings,
and inconsistencies early in the de"elopment cycle when these problems are easier to correct.
The $$ discusses the product but not the pro?ect that de"eloped it; hence the $$ ser"es as a
basis for later enhancement of the finished product. The $$ may need to be altered, but it does
pro"ide a foundation for continued production e"aluation.
Resource Reuiree!t
Netbea! ID. 32421 5 3etbean is a multi5language software de"elopment en"ironment comprising
an integrated de"elopment en"ironment )@ and an extensible plug5in system. t is written
primarily in Ja"a and can be used to de"elop applications in Ja"a and, by means of the "arious
plug5ins, in other languages as well, including -, -MM, -7L7!, Aython, Aerl, A4A, and others.
[Type text] Page 12
http://en.wikipedia.org/wiki/Software_development_environmenthttp://en.wikipedia.org/wiki/Integrated_development_environmenthttp://en.wikipedia.org/wiki/Integrated_development_environmenthttp://en.wikipedia.org/wiki/Integrated_development_environmenthttp://en.wikipedia.org/wiki/Plug-in_(computing)http://en.wikipedia.org/wiki/Software_development_environmenthttp://en.wikipedia.org/wiki/Integrated_development_environmenthttp://en.wikipedia.org/wiki/Integrated_development_environmenthttp://en.wikipedia.org/wiki/Plug-in_(computing)
8/17/2019 Identifying Syntax Semantic
13/63
identifying syntax and semantic relation between articles
3etbean employs plug5ins in order to pro"ide all of its functionality on top of )and including the
runtime system, in contrast to some other applications where functionality is typically hard
coded. The 3etbean $N includes the 3etbean ?a"a de"elopment tools )JT, offering an @
with a built5in incremental Ja"a compiler and a full model of the Ja"a source files. This allows
for ad"anced refactoring techni*ues and code analysis. The @ also ma+es use of a wor+space,
in this case a set of metadata o"er a flat file space allowing external file modifications as long as
the corresponding wor+space #resource# is refreshed afterwards.
6ava Develope!t 7it 5 The 6ava Develope!t 7it )6D7 is an 7racle -orporation product
aimed at Ja"a de"elopers. $ince the introduction of Ja"a, it has been by far the most widely used
Ja"a $N. 7n 1< 3o"ember 2BB, $un announced that it would be released under the O3U
Oeneral Aublic !icense )OA!, thus ma+ing it free software. This happened in large part on G>ay 2BB
8/17/2019 Identifying Syntax Semantic
14/63
identifying syntax and semantic relation between articles
• ?ar = the archi"er, which pac+ages related class libraries into a single ?ar file. This tool
also helps manage J% files.
• ?a"ah = the - header and stub generator, used to write nati"e methods
• ?a"ap = the class file disassembler
• ?a"aws = the ?a"a web start launcher for J3!A applications
• ?console = Ja"a >onitoring and >anagement -onsole
• ?db = the debugger
• ?hat = Ja"a 4eap %nalysis Tool )experimental
• ?info = This utility gets configuration information from a running Ja"a process or crash
dump. )experimental
• ?map = This utility outputs the memory map for Ja"a and can print shared ob?ect memory
maps or heap memory details of a gi"en process or core dump. )experimental
• ?ps = Ja"a &irtual >achine Arocess $tatus Tool lists the instrumented 4ot$pot Ja"a
&irtual >achines )J&>s on the target system. )experimental
• ?runscript = Ja"a command5line script shell.
• ?stac+ = utility which prints Ja"a stac+ traces of Ja"a threads )experimental
• ?stat = ?a"a "irtual machine statistics monitoring tool )experimental
• ?statd = ?stat daemon )experimental
• policytool = the policy creation and management tool, which can determine policy for a
Ja"a runtime, specifying which permissions are a"ailable for code from "arious sources
• &isual&> = "isual tool integrating se"eral command line JN tools and lightweight
performance and memory profiling capabilities.
• wsimport = generates portable J%P5W$ artifacts for in"o+ing a web ser"ice.
[Type text] Page 14
8/17/2019 Identifying Syntax Semantic
15/63
identifying syntax and semantic relation between articles
• x?c = Aart of the Ja"a %A for P>! Linding )J%PL %A. t accepts an P>! schema and
generates Ja"a classes.
@xperimental tools may not be a"ailable in future "ersions of the JN.
The JN also comes with a complete Ja"a untime @n"ironment, usually called a $rivate
runtime, due to the fact that it is separated from the #regular# J@ and has extra contents. t
consists of a Ja"a &irtual >achine and all of the class libraries present in the production
en"ironment, as well as additional libraries only useful to de"elopers, such as the
internationali0ation libraries and the ! libraries.
-opies of the JN also include a wide selection of example programs demonstrating the use of
almost all portions of the Ja"a %A.
%*i!"5 The Ja"a 'oundation -lasses )J'- consists of fi"e ma?or parts: %WT, $wing, and
%ccessibility, Ja"a 2, and rag and rop. Ja"a 2 has become an integral part of %WT, $wing
is built on top of %WT, and %ccessibility support is built into $wing. The fi"e parts of J'- are
certainly not mutually exclusi"e, and $wing is expected to merge more deeply with %WT in
future "ersions of Ja"a. $wing is a set of classes that pro"ides more powerful and flexible
components than are possible with the %WT. n addition to the familiar components, $wing
supplies tabbed panes, scroll panes, trees, and tables. t pro"ides a single %A capable of
supporting multiple loo+5and feels so that de"elopers and end5users are not loc+ed into a single
platformHs loo+5and5feel. The $wing library ma+es hea"y use of the >&- software design
pattern, which conceptually decouples the data being "iewed from the user interface controls
through which it is "iewed. $wing possesses se"eral traits such asQ1. Alatform5independence
2.@xtensibility /.-omponent5oriented .-ustomi0able . -onfigurable . !oo+ and feel. Alatform
independence both in terms of its expression and its implementation, extensibility which allows
for the #plugging# of "arious custom implementations of specified framewor+ interfaces Users
can pro"ide their own custom implementation of these components to o"erride the default
implementations. -omponent5orientation allows responding to a well5+nown set of commands
specific to the component. $pecifically, $wing components are Ja"a Leans components,
compliant with the Ja"a Leans -omponent %rchitecture specifications. Through customi0able
[Type text] Page 1"
8/17/2019 Identifying Syntax Semantic
16/63
identifying syntax and semantic relation between articles
feature users will programmatically customi0e a standard $wing component by assigning specific
borders, colors, bac+grounds, opacities, etc, configurable that allows $wing to respond at runtime
to fundamental changes in its settings. 'inally loo+ and feel allows one to speciali0e the loo+ and
feel of widgets, by modifying the default "ia runtime parameters deri"ing from an existing one,
by creating one from scratch, or, beginning with J2$@ .B, by using the !oo+ and 'eel which is
configured with an P>! property file.
J2EE Platform
%s you might be already +nowing, J2@@ is a platform for executing ser"er side Ja"a
applications. Lefore J2@@ was born, ser"er side Ja"a applications were written using "endor
specific %As. @ach "endor had uni*ue %As and architectures. This resulted in a huge learning
cur"e for Ja"a de"elopers and architects to learn and program with each of these %A sets and
higher costs for the companies. e"elopment community could not reuse the lessons learnt in the
trenches. -onse*uently the entire Ja"a de"eloper community was fragmented,isolated and
stunted thus ma+ing "ery difficult to build serious enterprise applications in Ja"a. 'ortunately the
introduction of J2@@ and its adoption by the "endors has resulted in standardi0ation of its %As.
This in turn reduced the learning cur"e for ser"er side Ja"a de"elopers. J2@@ specification
defines a whole lot of interfaces and a few classes. &endors )li+e L@% and L> for instance
ha"e pro"ided implementations for these interfaces adhering to the J2@@ specifications. These
implementations are called J2@@ %pplication $er"ers.
The J2@@ application ser"ers pro"ide the infrastructure ser"ices such as threading,
pooling and transaction management out of the box. The application de"elopers can thus
concentrate on implementing business logic. -onsider a J2@@ stac+ from a de"eloper
perspecti"e. %t the bottom of the stac+ is Ja"a 2 $tandard @dition )J2$@. J2@@ %pplication
$er"ers run in the Ja"a &irutal >achine )J&> sandbox. They expose the standard J2@@interfaces to the application de"elopers. Two types1 of applications can be de"eloped and
deployed on J2@@ application ser"ers = Web applications and @JL applications.
These applications are deployed and executed in DcontainerEs. J2@@ specification defines
containers for managing the lifecycle of ser"er side components. There are two types of
[Type text] Page 1#
8/17/2019 Identifying Syntax Semantic
17/63
identifying syntax and semantic relation between articles
containers 5 $er"let containers and @JL containers. $er"let containers manage the lifecycle of
web applications and @JL containers manage the lifecycle of @JLs.
J2EE web application
%ny web application that runs in the ser"let container is called a J2@@ web application.
The ser"let container implements the $er"let and J$A specification. t pro"ides "arious entry
points for handling the re*uest originating from a web browser. There are three entry points for
the browser into the J2@@ web application 5 $er"let, J$A and 'ilter. Rou can create your own
$er"lets by extending the ?a"ax.ser"let.http.4ttp$er"let class and implementing the doOet) and
doAost) method. Rou can create J$As simply by creating a text file containing J$A mar+up
tags.one web5xml file.text file containing J$A mar+up tags. Rou can create 'ilters by
implementing the ?a"ax.ser"let.'ilter interface. The ser"let container becomes aware of $er"lets
and 'ilters when they are declared in a special file called web5xml 2% J2@@ web application has
exactly one web5xml file
% ser"let is the most basic J2@@ web component. t is managed by the ser"let container. %ll
ser"lets implement the $er"let interface directly or indirectly. n general terms, a ser"let is the
endpoint for re*uests adhering to a protocol. 4owe"er, the $er"let specification mandates
implementation for ser"lets that handle 4TTA re*uests only. Lut you should +now that it is
possible to implement the ser"let and the container to handle other protocols such as 'TA too.When writing $er"lets for handling 4TTA re*uests, you generally subclass 4ttp$er"let class.
4TTA has six methods of re*uest submission = O@T, A7$T, AUT, 4@%
and @!@T@. 7f these, O@T and A7$T are the only forms of re*uest submission rele"ant to
application de"elopers. 4ence your subclass of 4ttp$er"let should implement two methods =
doOet) and doAost) to handle O@T and A7$T respecti"ely
[Type text] Page 1/
8/17/2019 Identifying Syntax Semantic
18/63
identifying syntax and semantic relation between articles
Aresentation Tier $trategies
Technologies used for the presentation tier can be roughly classified into three
categories:
1. >ar+up based endering )e.g. J$As
2. Template based Transformation )e.g. &elocity, P$!T
/. ich content )e.g. >acromedia 'lash, 'lex, !as0lo
Markup based Rendering
J$As are perfect examples of mar+up based presentation tiers. n mar+up based
presentation, "ariety of tags are defined )?ust li+e 4T>! tags. The tag definitions may be purely
for presentation or they can contain business logic. They are mostly client tier specific. @.g. J$A
tags producing 4T>! content. % typical J$A is interpreted in the web container and the
conse*uent generation of 4T>!. This 4T>! is then rendered in the web browser.
n the last section, you saw how $er"lets produced output 4T>! in addition to executing
business logic. $o why arenHt $er"lets used for presentation tierK The answer lies in the
separation of concerns essential in real world J2@@ pro?ects. Lac+ in the days when J$As didnHt
exist, ser"lets were all that you had to build J2@@ web applications. They handled re*uests from
the browser,in"o+ed middle tier business logic and rendered responses in 4T>! to the browser.
3ow thatHs a problem. % $er"let is a Ja"a class coded by Ja"a programmers. t is o+ay to handle
browser re*uests and ha"e business and presentation logic in the ser"lets since that is where they belong. 4T>! formatting and rendering is the concern of page author who most li+ely does not
+now Ja"a. $o, the *uestion arises, how to separate these two concerns intermingled in $er"letsK
J$As are the answer to this dilemma. J$As are ser"lets in disguiseS
[Type text] Page 10
8/17/2019 Identifying Syntax Semantic
19/63
identifying syntax and semantic relation between articles
The philosophy behind J$A is that the page authors +now 4T>!. 4T>! is a mar+up
language. 4ence learning a few more mar+up tags will not cause a paradigm shift for the page
authors. %t least it is much easier than learning Ja"a and 77S J$A pro"ides some standard tags
and ?a"a programmers can pro"ide custom tags. Aage authors can write ser"er side pages by
mixing 4T>! mar+up and J$A tags. $uch ser"er side pages are called J$As. J$As are called
ser"er side pages because it is the ser"let container that interprets them to generate 4T>!. The
generated 4T>! is sent to the client browser.
J$As are ser"er side pages. $er"er side pages in other languages are parsed e"ery time they are
accessed and hence expensi"e. n J2@@, the expensi"e parsing is replaced by generating Ja"a
class from the J$A. The first time a J$A is accessed, its contents are parsed and e*ui"alent Ja"a
class is generated and subse*uent accesses are fast as a snap. 4ere is some twist to the story. The
Ja"a classes that are generated by parsing J$As are nothing but $er"letsS n other words, e"ery
J$A is parsed at runtime )or precompiled to
generate $er"let classes.
Presentation Logic and Business Logic – Whats the difference!
The term Lusiness !ogic refers to the middle tier logic = the core of the system usually
implemented as core J%&%. The code that controls the J$A na"igation, handles user inputs and
in"o+es appropriate business logic is referred to as Aresentation !ogic. The actual J$A = the front
end to the user contains html and custom tags to render the page and as less logic as possible. %
rule of thumb is the dumber the J$A gets, the easier it is to maintain. n reality howe"er, some of
the presentation logic percolates to the actual J$A ma+ing it tough to draw a line between the
two.
>odel 1 architecture is the easiest way of de"eloping J$A based web applications. t
cannot get any easier. n >odel 1, the browser directly accesses J$A pages. n other words, user
re*uests are handled directly by the J$A. -onsider a 4T>! page with a hyperlin+ to a J$A. When
user clic+s on the hyperlin+, the J$A is directly in"o+ed. This is shown in 'igure
[Type text] Page 1
8/17/2019 Identifying Syntax Semantic
20/63
identifying syntax and semantic relation between articles
The ser"let container parses the J$A and executes the resulting Ja"a ser"let. The J$A
contains embedded code and tags to access the >odel Ja"aLeans. The >odel Ja"aLeans contains
attributes for holding the 4TTA re*uest parameters from the *uery string. n addition it contains
logic to connect to the middle tier or directly to the database using JL- to get the additional
data needed to display the page. The J$A is then rendered as 4T>! using the data in the >odel
Ja"aLeans and other 4elper classes and tags.
Problems with Model " #rchitecture
>odel 1 architecture is easy. There is some separation between content )>odel Ja"aLeans and
presentation )J$A. This separation is good enough for smaller applications. !arger applications
ha"e a lot of presentation logic. n >odel 1 architecture, the presentation logic usually leads to a
significant amount of Ja"a code embedded in the J$A in the form of scriptlets. This is ugly and
maintenance nightmare e"en for experienced Ja"a de"elopers. n large applications, J$As are
de"eloped and maintained by page authors. The intermingled scriptlets and mar+up results in
unclear definition of roles and is"ery problematic.
[Type text] Page 2
8/17/2019 Identifying Syntax Semantic
21/63
identifying syntax and semantic relation between articles
%pplication control is decentrali0ed in >odel 1 architecture since the next page to be displayed
is determined by the logic embedded in the current page. ecentrali0ed na"igation control can
cause headaches. %ll this leads us to >odel 2 architecture of designing J$A pages
>odel 2 %rchitecture = >&-
The >odel 2 architecture for designing J$A pages is in reality, >odel &iew -ontroller )>&-
applied to web applications. 4ence the two terms can be used interchangeably in the web world.
>&- originated in $mallTal+ and has since made its way into Ja"a community. >odel 2
architecure and its deri"ati"es are the cornerstones for all serious and industrial strength web
applications designed in the real world. 4ence it is essential for you understand this
paradigmthoroughly. 'igure shows the >odel 2 )>&- architecture.
The main difference between >odel 1 and >odel 2 is that in >odel 2, a controller handles the
user re*uest instead of another J$A. The controller is implemented as a $er"let. The following
steps are executed when the user submits the re*uest.
1. The -ontroller $er"let handles the userHs re*uest. )This means the hyperlin+
in the J$A should point to the controller ser"let.
2. The -ontroller $er"let then instantiates appropriate Ja"aLeans based on the
re*uest parameters )and optionally also based on session attributes.
/. The -ontroller $er"let then by itself or through a controller helpercommunicates with themiddle tier or directly to the database to fetch the re*uired data.
. The -ontroller sets the resultant Ja"aLeans )either same or a new one in one
of the following contexts = re*uest, session or application.
[Type text] Page 21
8/17/2019 Identifying Syntax Semantic
22/63
identifying syntax and semantic relation between articles
. The controller then dispatches the re*uest to the next "iew based on the
re*uest U!.
. The &iew uses the resultant Ja"aLeans from $tep to display data.
The sole function of the J$A in >odel 2 architecture is to display the data from the Ja"aLeans
set in the re*uest, session or application scopes.
#d$antages of Model 2 #rchitecture
$ince there is no presentation logic in J$A, there are no scriptlets. This means lesser
nightmares. 83ote that although >odel 2 is directed towards elimination ofscriptlets, it does not
architecturally pre"ent you from adding scriptlets. This has led to widespread misuse of >odel 2
architecture.
With >&- you can ha"e as many controller ser"lets in your web application. n fact you
can ha"e one -ontroller $er"let per module. 4owe"er there are se"eral ad"antages of ha"ing a
single controller ser"let for the entire web application.
[Type text] Page 22
8/17/2019 Identifying Syntax Semantic
23/63
identifying syntax and semantic relation between articles
n a typical web application, there are se"eral tas+s that you want to do for e"ery
incoming re*uest. 'or instance, you ha"e to chec+ if the user re*uesting an operation is
authori0ed to do so. Rou also want to log the userHs entry and exit from the web application for
e"ery re*uest. Rou might li+e to centrali0e the logic for dispatching re*uests to other "iews. The
list goes on. f you ha"e se"eral controller ser"lets, chances are that you ha"e to duplicate the
logic for all the abo"e tas+s in all those places. % single controller ser"let for the web application
lets you centrali0e all the tas+s in a single place. @legant code and easier to maintain.
Web applications based on >odel 2 architecture are easier to maintain and extend since
the "iews do not refer to each other and there is no presentation logic in the "iews. t also allows
you to clearly define the roles and responsibilities in large pro?ects thus allowing better
coordination among team members.
%ontroller gone bad – &at %ontroller
f >&- is all that great, why do we need $truts after allK The answer lies in the difficulties
associated in applying bare bone >&- to real world complexities. n medium to large
applications, centrali0ed control and processing logic in the ser"let = the greatest plus of >&- is
also its wea+ness. -onsider a mediocre application with 1 J$As. %ssume that each page has fi"e
hyperlin+s )or fi"e form submissions. The total number of user re*uests to be handled in the
application is &- framewor+, a centrali0ed controller
ser"let handles e"ery user re*uest. 'or each type of incoming re*uest there is D if E bloc+ in the
doOet method of the controller $er"let to process the re*uest and dispatch to the next "iew. 'or
this mediocre application of ours, the controller $er"let has
8/17/2019 Identifying Syntax Semantic
24/63
identifying syntax and semantic relation between articles
>&- with configurable controller
When application gets large you cannot stic+ to bare bone >&-. Rou ha"e to extend it somehow
to deal with these complexities. 7ne mechanism of extending >&- that has found
widespread adoption is based on a configurable controller $er"let. The >&- with configurable
controller ser"let is shown in 'igure
When the 4TTA re*uest arri"es from the client, the -ontroller $er"let loo+s up in a
properties file to decide on the right 6andler class for the 4TTA re*uest. This 4andler class is
referred to as the Re'uest (andler . The Re/uest 6andler contains the presentation logic for that
4TTA re*uest including business logic in"ocation. n other words, the e*uest 4andler does
e"erything that is needed to handle the 4TTA re*uest. The only difference so far from the bare
bone >&- is that the controller ser"let loo+s up in a properties file to instantiate the 4andler
instead of calling it directly
[Type text] Page 24
8/17/2019 Identifying Syntax Semantic
25/63
identifying syntax and semantic relation between articles
%t this point you might be wondering how the controller ser"let would +now to
instantiate the appropriate 4andler. The answer is simple. Two different 4TTA re*uests cannot
ha"e the same U!. 4ence you can be certain that the U! uni*uely identifies each 4TTA
re*uest on the ser"er side and hence each U! needs a uni*ue 4andler. n simpler terms, there is
a one5to5one mapping between the U! and the 6andler class. This information is stored as +ey5
"alue pairs in the properties file. The -ontroller $er"let loads the properties file on startup to find
the appropriate Re/uest 6andler for each incoming U! re*uest.
The controller ser"let uses Ja"a eflection to instantiate the e*uest 4andler. 4owe"er
there must be some sort of commonality between the e*uest 4andlers for the ser"let to
generically instantiate the e*uest 4andler. The commonality is that all Re/uest 6andler classes
implement a common interface. !et us call this common interface as (andler )nterface. n its
simplest form, the 6andler !nterface has one method say, execute). The controller ser"let reads
the properties file to instantiate the Re/uest 6andler
The -ontroller $er"let instantiates the e*uest 4andler in the doOet) method and
in"o+es the execute) method on it using Ja"a eflection. The execute) method in"o+es
appropriate business logic from the middle tier and then selects the next "iew to be presented to
the user. The controller ser"let forwards the re*uest to the selected J$A "iew. %ll this happens in
the doOet) method of the controller ser"let. The doOet) method lifecycle ne"er changes.
What changes is the Re/uest 6andler Hs execute) method. Rou may not ha"e reali0ed it,
but you ?ust saw how $truts wor+s in a nutshellS $truts is a controller ser"let based configurable
>&- framewor+ that executes predefined methods in the handler ob?ects. nstead of using a
properties file $truts uses P>! to store more useful information
$truts
[Type text] Page 2"
8/17/2019 Identifying Syntax Semantic
26/63
identifying syntax and semantic relation between articles
n $truts, there is only one controller ser"let for the entire web application. This
controller ser"let is called %ction$er"let and resides in the pac+age org.apache.struts.action.
t intercepts e"ery client re*uest and populates an ActionForm from the 4TTA re*uest
parameters. %ction'orm is a normal Ja"aLeans class. t has se"eral attributes corresponding to
the 4TTA re*uest parameters and getter, setter methods for those attributes. Rou ha"e to create
your own ActionForm for e"ery 4TTA re*uest handled through the $truts framewor+ by
extending the org.apache.struts.action.%ction'orm class.
'or the lac+ of better terminology, let us coin a termto describe the classes such as %ction'orm =
7iew 'ata +ransfer (bject . 7iew 'ata +ransfer (bject is an ob?ect that holds the data from html
page and transfers it around in the web tier framewor+ and application classes.
The %ction$er"let then instantiates a 4andler. The 4andler class name is obtained from
an P>! file based on the U! path information. This P>! file is referred to as $truts
configuration file and by default named as struts#config5xml .
[Type text] Page 2#
8/17/2019 Identifying Syntax Semantic
27/63
identifying syntax and semantic relation between articles
The 4andler is called Action in the $truts terminology. This class is created by extending the
%ction class in org.apache.struts.action pac+age. The %ction class is abstract and defines a single
method called execute). Rou o"erride this method in your own Actions and in"o+e the business
logic in this method. The execute) method returns the name of next "iew )J$A to be shown to
the user. The %ction$er"let forwards to the selected "iew.
3ow, that was $truts in a nutshell. $truts is of5course more than ?ust this. t is a full5fledged
presentation framewor+. Throughout the de"elopment of the application, both the page author
and the de"eloper need to coordinate and ensure that any changes to one area are appropriately
handled in the other. t aids in rapid de"elopment of web applications by separating the concerns
in pro?ects.'or instance, it has custom tags for J$As. The page author can concentrate on
de"eloping the J$As using custom tags that are specified by the framewor+. The application
de"eloper wor+s on creating the ser"er side representation of the data and its interaction with a
bac+ end data repository. 'urther it offers a consistent way of handling user input and processing
it.
,2,2 Operati!" .!viro!e!t
,2,212 8ard*are Reuiree!ts
The hardware re*uirements of the pro?ect are summari0ed in the following table
%l No Paraeter Descriptio!
1 %> BB>L51OL
2 4ard is+ 12BOL51BOL
/ Ja"a e"elopment Nit5&ersion JN 1.
atabase >y$(!
atabase 'ront @nd 4eildi $(!Toad 'or >y$(! Tool 'or Ja"a e"elopment @cclipse
< 'ront @nd Technology J$A
G 'ramewor+ $pring5'ramewor+
I $e"er TomcatG.B
[Type text] Page 2/
8/17/2019 Identifying Syntax Semantic
28/63
identifying syntax and semantic relation between articles
2.2.2. Software Requirements
The software re*uirements is summari0ed in the following table
%l No Paraeter Nae Paraeter )alue
1 e"elopment !anguage J%&%
2 Ja"a e"elopment Nit &ersion Jd+ 1.
/ Ja"a un Time @n"ironment Jre
atabase for outing Tables Lac+end >y$(!
atabase 'ront @nd for outing Tables 4eildi $(!
atabase 'ront @nd for exporting @xcel
$heets
Toad 'or >y$(!
< e"elopment Tool @ccilpse
G $e"er Type Web $er"er
I Web $er"er Tomcat .B
11 'ramewor+ Used $tructs 'ramewor+
12 &iew Technology Used Ja"a $er"er Aages1/ esigning -ascading $tyle $heets
,202 9u!ctio!al Reuiree!ts
The following are the functional re*uirements of the pro?ect
1. Article %ubissio! = This module is responsible for storage of articles
2. Data Clea!i!"5 This module is used in order to remo"e stop words from the %rticle.
/. To#e!i$atio!'s5 This process in used to obtain all the +eywords of the %rticle and assign
them a uni*ue as well as the web site id.
[Type text] Page 20
8/17/2019 Identifying Syntax Semantic
29/63
identifying syntax and semantic relation between articles
. %&!tatic Relatio! = This >odule is responsible for finding the syntactic relation of
articles i.e "erb,ad"erb and ad?ecti"es of articles
. %ea!tic Relatio! ( This module is used to find out the "arious semantic relations i.e
hypernism
. $core -omputation = This is used to measure the score with respect to syntax and
semantic relation
2.4. Non functional requirements
I!terface reuiree!ts
4ow will the new system interface with its en"ironmentK
User interfaces and Duser5friendlinessE
nterfaces with other systems
Perfora!ce reuiree!ts
timespace boundswor+loads, response time, throughput and a"ailable storage spacee.g. Ethe system must handle 1,BBB transactions per second#
reliabilitythe a"ailability of componentsintegrity of information maintained and supplied to the systeme.g. #system must ha"e less than 1hr downtime per three months#
[email protected]. permissible information flows, or who can do what
sur"i"[email protected]. system will need to sur"i"e fire, natural catastrophes, etc
Operati!" reuiree!ts
physical constraints )si0e, weight,
personnel a"ailability F s+ill le"el
[Type text] Page 2
8/17/2019 Identifying Syntax Semantic
30/63
identifying syntax and semantic relation between articles
accessibility for maintenance
en"ironmental conditions
8. Summary
The chapter describes the information $oftware e*uirements $pecifications, 7perating
@n"ironment54ardware e*uirements F $oftware e*uirements, 'unctional e*uirements, 3on
functional re*uirements, User characteristics, %pplications of Aro?ect and %d"antages of $ystem
Chapter 3
[Type text] Page 3
8/17/2019 Identifying Syntax Semantic
31/63
identifying syntax and semantic relation between articles
High Leel !esign
3.".High Leel !esign
esign is one of the most important phases of software de"elopment. The design is a
creati"e process in which a system organi0ation is established that will satisfy the functional
and non5functional system re*uirements. !arge $ystems are always are decomposed into sub5
systems that pro"ide some related set of ser"ices. The output of the design process is a
description of the $oftware architecture.
!ata #low !iagram $ Leel %
T-e le+el is t-e initial le+el Data o% &iagram an& its generally calle& as t-e
context le+el &iagram! 't is common practice (or a &esigner to &ra% a context5le+el
D)D 6rst %-ic- s-o%s t-e interaction bet%een t-e system an& outsi&e entities! T-is
context5le+el D)D is t-en explo&e& to s-o% more &etail o( t-e system being
mo&ele&!
)ig7 D)D 8e+el
[Type text] Page 31
Articles'&enti(y
Syntax an&
Semantic
elation
elation 9atrix an&
Similarity
8/17/2019 Identifying Syntax Semantic
32/63
identifying syntax and semantic relation between articles
D)D 8e+el1
[Type text] Page 32
Article
SubmissionData
Cleaning
Tokeni.ation
Syntactic
elation
Semantic
elation
Score an&
Similarity
:btaine&
8/17/2019 Identifying Syntax Semantic
33/63
identifying syntax and semantic relation between articles
3.4."& !ata #low !iagram $ Leel 2
#ig. 3.4 Leel 2
[Type text] Page 33
Articles ea& t-e 8ist
o( Articles
Data Cleaning
Tokeni.ation)in& all
Syntax
elations
+erb*a&+erb
an&
a&;ecti+e
Score an&
Similarity
:btaine&
Stop%or&s
)in& all
semantic
elations
8/17/2019 Identifying Syntax Semantic
34/63
identifying syntax and semantic relation between articles
023 Activit& dia"ra
)ig Acti+ity Diagram
[Type text] Page 34
SP ?eb!xml
5ser+let!xml9o&el
Controller
D
@
8
@
A
T
@
S
@
$
'
C
@
D
A
T
A
A
C
@
S
S
% T% L%$@
8/17/2019 Identifying Syntax Semantic
35/63
identifying syntax and semantic relation between articles
The abo"e figure gi"es description about the system architecture which is followed in the
industries in order to a de"elopment of any routing software.
The figure shows that the user interface is designed in the 4T>!J$A pages and then the
re*uest goes to the web container and web container "erifies the re*uest in the web.xml file
by loo+ing first into the url pattern and then it goes to the ser"let name and then it searches
for the corresponding ser"let name in the ser"let tag and loo+s into the ser"let class and
creates an ob?ect of %ction $er"let and then the action ser"let will delegate its ?ob to e*uest
Arocessor.
The re*uest processor will loo+ for the action to which must be called in loo+ed up in the
stucts5config.xml and corresponding action form is called and then the action is called. The
action class will then call the delegate , then the delegate calls the ser"ice and ser"ice calls
the ata %ccess layer and results goes exactly in the opposite way and the resultant J$A page
is loaded
Model
This is the Alain 7ld Ja"a 7b?ect which will ha"e the getters and setters and setters gets
automatically called and data the user has entered will be a"ailable.
Co!troller
This is the class which is used to fetch the user entered data and then processes it and
calls the delegate layer and obtains the results.
Dele"ate
[Type text] Page 3"
8/17/2019 Identifying Syntax Semantic
36/63
identifying syntax and semantic relation between articles
elegate is the layer which contains nothing but call to an appropriate ser"ice.
%ervice
This is the layer which is responsible for entire algorithmic implementation. This is the
layer which contains the hea"y weight implementation of entire algorithms. 'uture the
ser"ice would re*uire the help of ata %ccess !ayer for some operations and many other
helper classes.
Data Access a&er
This is the layer which deals with only the -U operations namely -reate, etrie"e,
Update and elete. t has no other usage. This layer has been used in order to fetch the data
from the routing tables.
Database
This is the place where all the tables would ha"e been placed ha"e been placed.
3.'. (se case )iagrams
The Use -ase iagram is described in the following fig
[Type text] Page 3#
8/17/2019 Identifying Syntax Semantic
37/63
identifying syntax and semantic relation between articles
'ig: Use -ase iagram
[Type text] Page 3/
Bser
Syntactic
ealation
Article
Submissio
n
Data
Cleaning
Tokeni.ing
Semantic
elation
Score
Computa
tion
Stop
?or&s
8/17/2019 Identifying Syntax Semantic
38/63
identifying syntax and semantic relation between articles
Chapter 4
!etaile) )esign
4.". *urpose
etailed esign is a phase where in the internal logic of each of the modules specified in
high5le"el design is determined. n this phase details and algorithmic design of each of the
modules is specified. 7ther low5le"el components and subcomponents are also described. @ach
subsection of this section will refer to or contain a detailed description of system software
component. This chapter also discusses about the control flow in the software with much more
details about software modules by explaining the details about each of the functionality.
This chapter presents the following
• !ife -ycle of Oeneric 'low
• 'lowchart for each module
:2121 ife C&cle of ;e!eric 9lo*
This section deals with !ife cycle of $ecurity &ulnerability etection, %nalysis and
emediation in @nterprise %pplications and $tate diagrams and possible transitions between the
states.
Fig859: 0ife 1ycle of the Process
[Type text] Page 30
$ie% SP Action )orm Action
8/17/2019 Identifying Syntax Semantic
39/63
identifying syntax and semantic relation between articles
The following are the stages for any user action
1. &iew = this is the location in which the user will enter the data and performs some action
2. %ction 'orm5 this is the A7J7 which will contain the "ariables as defined in the "iew, the
setters of the method and getters of the method. The data will get automatically binded.
/. %ction: This is a class which contains the execute method which will be responsible for
handling the logic of the pro?ect and is responsible for delegating the result to an
appropriate "iew. This will also ma+es use of helper methods to perform business logic.
4.2. (ser +nterface !esign
4.2.". Har)ware +nterfaces
There are no specific hardware interfaces used in the system
4.2.2. Software ,raphical (ser +nterfaces
%bbre"iated .4! )pronounced .((#ee. % program interface that ta+es ad"antage of
the computer6s graphics capabilities to ma+e the program easier to use. Well5designed graphical
user interfaces can free the user from learning complex command languages. 7n the other hand,
many users find that they wor+ more effecti"ely with a command5dri"en interface, especially if
they already +now the command language.
Oraphical user interfaces, such as >icrosoft Windows and the one used by the %pple
>acintosh, feature the following basic components:
[Type text] Page 3
http://www.webopedia.com/TERM/P/program.htmlhttp://www.webopedia.com/TERM/I/interface.htmlhttp://www.webopedia.com/TERM/C/computer.htmlhttp://www.webopedia.com/TERM/G/graphics.htmlhttp://www.webopedia.com/TERM/C/command_language.htmlhttp://www.webopedia.com/TERM/U/user.htmlhttp://www.webopedia.com/TERM/U/user.htmlhttp://www.webopedia.com/TERM/C/command_driven.htmlhttp://www.webopedia.com/TERM/C/command_driven.htmlhttp://www.webopedia.com/TERM/M/Microsoft_Windows.htmlhttp://www.webopedia.com/TERM/M/Microsoft_Windows.htmlhttp://www.webopedia.com/TERM/M/Macintosh_computer.htmlhttp://www.webopedia.com/TERM/M/Macintosh_computer.htmlhttp://www.webopedia.com/TERM/P/program.htmlhttp://www.webopedia.com/TERM/I/interface.htmlhttp://www.webopedia.com/TERM/C/computer.htmlhttp://www.webopedia.com/TERM/G/graphics.htmlhttp://www.webopedia.com/TERM/C/command_language.htmlhttp://www.webopedia.com/TERM/U/user.htmlhttp://www.webopedia.com/TERM/C/command_driven.htmlhttp://www.webopedia.com/TERM/M/Microsoft_Windows.htmlhttp://www.webopedia.com/TERM/M/Macintosh_computer.htmlhttp://www.webopedia.com/TERM/M/Macintosh_computer.html
8/17/2019 Identifying Syntax Semantic
40/63
identifying syntax and semantic relation between articles
poi!ter 5 % symbol that appears on the display screen and that you mo"e
to select ob?ects and commands. Usually, the pointer appears as a small angled
arrow. Text 5processing applications, howe"er, use an !#beam $ointer that is shaped li+e a
capital ! .
poi!ti!" device 5 % de"ice, such as a mouse or trac+ball, that enables you to select
ob?ects on the display screen.
ico!s 5 $mall pictures that represent commands, files, or windows. Ly mo"ing the
pointer to the icon and pressing a mouse button, you can execute a command
or con"ert the icon into a window. Rou can also mo"e the icons around the display screen
as if they were real ob?ects on your des+.
des#top 5 The area on the display screen where icons are grouped is often referred to
as the des+top because the icons are intended to represent real ob?ects on a real des+top.
*i!do*s5 Rou can di"ide the screen into different areas. n each window, you
can run a different program or display a different file. Rou can mo"e windows around the
display screen, and change their shape and si0e at will.
e!us 5 >ost graphical user interfaces let you execute commands by selecting a
choice from a menu.
The Oraphical User interface is de"eloped using the 4T>! and J$A language
6ava%erver Pa"es Tech!olo"&
Ja"a$er"er Aages )J$A technology allows you to easily create web content that has both
static and dynamic components. J$A technology ma+es a"ailable all the dynamic capabilities of
Ja"a $er"let technology but pro"ides a more natural approach to creating static content. The
main features of J$A technology are as follows:
1. % language for de"eloping J$A pages, which are text5based documents that describe how
to process a re*uest and construct a response
2. %n expression language for accessing ser"er5side ob?ects
[Type text] Page 4
http://www.webopedia.com/TERM/P/pointer.htmlhttp://www.webopedia.com/TERM/D/display_screen.htmlhttp://www.webopedia.com/TERM/D/display_screen.htmlhttp://www.webopedia.com/TERM/S/select.htmlhttp://www.webopedia.com/TERM/S/select.htmlhttp://www.webopedia.com/TERM/O/object.htmlhttp://www.webopedia.com/TERM/C/command.htmlhttp://www.webopedia.com/TERM/T/text.htmlhttp://www.webopedia.com/TERM/A/application.htmlhttp://www.webopedia.com/TERM/I/I_beam_pointer.htmlhttp://www.webopedia.com/TERM/P/pointing_device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/M/mouse.htmlhttp://www.webopedia.com/TERM/T/trackball.htmlhttp://www.webopedia.com/TERM/T/trackball.htmlhttp://www.webopedia.com/TERM/I/icon.htmlhttp://www.webopedia.com/TERM/F/file.htmlhttp://www.webopedia.com/TERM/W/window.htmlhttp://www.webopedia.com/TERM/W/window.htmlhttp://www.webopedia.com/TERM/B/button.htmlhttp://www.webopedia.com/TERM/B/button.htmlhttp://www.webopedia.com/TERM/E/execute.htmlhttp://www.webopedia.com/TERM/E/execute.htmlhttp://www.webopedia.com/TERM/C/convert.htmlhttp://www.webopedia.com/TERM/D/desktop.htmlhttp://www.webopedia.com/TERM/R/run.htmlhttp://www.webopedia.com/TERM/M/menu.htmlhttp://www.webopedia.com/TERM/P/pointer.htmlhttp://www.webopedia.com/TERM/D/display_screen.htmlhttp://www.webopedia.com/TERM/S/select.htmlhttp://www.webopedia.com/TERM/O/object.htmlhttp://www.webopedia.com/TERM/C/command.htmlhttp://www.webopedia.com/TERM/T/text.htmlhttp://www.webopedia.com/TERM/A/application.htmlhttp://www.webopedia.com/TERM/I/I_beam_pointer.htmlhttp://www.webopedia.com/TERM/P/pointing_device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/M/mouse.htmlhttp://www.webopedia.com/TERM/T/trackball.htmlhttp://www.webopedia.com/TERM/I/icon.htmlhttp://www.webopedia.com/TERM/F/file.htmlhttp://www.webopedia.com/TERM/W/window.htmlhttp://www.webopedia.com/TERM/B/button.htmlhttp://www.webopedia.com/TERM/E/execute.htmlhttp://www.webopedia.com/TERM/C/convert.htmlhttp://www.webopedia.com/TERM/D/desktop.htmlhttp://www.webopedia.com/TERM/R/run.htmlhttp://www.webopedia.com/TERM/M/menu.html
8/17/2019 Identifying Syntax Semantic
41/63
identifying syntax and semantic relation between articles
/. >echanisms for defining extensions to the J$A language
% SP $age is a text document that contains two types of text: static data, which
can be expressed in any text5based format )such as 4T>!, $&O,W>!, and P>!, and
J$A elements, which construct dynamic content.
C8APT.R:
Detailed Desi"!
etailed esign is a phase where in the internal logic of each of the modules specified in
high5le"el design is determined. n this phase details and algorithmic design of each of the
modules is specified. 7ther low5le"el components and subcomponents are also described. @ach
subsection of this section will refer to or contain a detailed description of system software
component. This chapter also discusses about the control flow in the software with much more
details about software modules by explaining the details about each of the functionality.
This chapter presents the following
• !ife -ycle of Oeneric 'low
• 'lowchart for each module
8/17/2019 Identifying Syntax Semantic
42/63
identifying syntax and semantic relation between articles
Fig859: 0ife 1ycle of the Process
The following are the stages for any user action
. &iew = this is the location in which the user will enter the data and performs some action
. >odel5 this is the A7J7 which will contain the "ariables as defined in the "iew, the
setters of the method and getters of the method. The data will get automatically binded.
. -ontroller: This is a class which contains the execute method which will be responsible
for handling the logic of the pro?ect and is responsible for delegating the result to an
appropriate "iew. This will also ma+es use of helper methods to perform business logic.
etailed design chapter can be described by using the flowcharts
[Type text] Page 42
$ie% SP 9o&el Controller
8/17/2019 Identifying Syntax Semantic
43/63
identifying syntax and semantic relation between articles
Article Module
The article module is responsible for storage of articles. %rticle name and article
description acts as an input
Fig: Article Module
:20 Data Clea!i!"
[Type text] Page 43
Start
Article *ame and Article 'escri$tion
etrie"e the !ist of %rticle names in the application
C-eck
Article in
artclenameli
st,:
Storage o( Article is success(ul
$ali&ation
error
@S
8/17/2019 Identifying Syntax Semantic
44/63
identifying syntax and semantic relation between articles
This is the processing which the data is cleaned from unwanted symbols and set of stop
words. The data undergoes first a delimitation process and then it undergoes a cleaning process.
The set of stop words used in the case of data mining is as gi"en in the below snippet.
ist of %top =ords
$top Word
A
about
above
across
after
afterwards
again
against
all
almost
alone
along
already
also
although
[Type text] Page 44
8/17/2019 Identifying Syntax Semantic
45/63
identifying syntax and semantic relation between articles
always
am
among
amongst
amoungst
amount
an
and
another
any
anyhow
anyone
anything
anyway
anywhere
are
around
as
at
back
[Type text] Page 4"
8/17/2019 Identifying Syntax Semantic
46/63
identifying syntax and semantic relation between articles
be
became
because
become
becomes
becoming
been
before
beforehand
behind
being
below
beside
besides
between
beyond
bill
both
bottom
but
[Type text] Page 4#
8/17/2019 Identifying Syntax Semantic
47/63
identifying syntax and semantic relation between articles
by
call
can
cannot
cant
co
computer
con
could
couldnt
cry
de
describe
detail
do
done
down
due
during
each
[Type text] Page 4/
8/17/2019 Identifying Syntax Semantic
48/63
identifying syntax and semantic relation between articles
eg
eight
either
eleven
else
elsewhere
empty
enough
etc
even
ever
every
everyone
everything
everywhere
except
few
fifteen
fify
fill
[Type text] Page 40
8/17/2019 Identifying Syntax Semantic
49/63
identifying syntax and semantic relation between articles
find
fire
first
five
for
former
formerly
forty
found
four
from
front
full
further
get
give
go
had
has
hasnt
[Type text] Page 4
8/17/2019 Identifying Syntax Semantic
50/63
identifying syntax and semantic relation between articles
have
he
hence
her
here
hereafter
hereby
herein
hereupon
hers
herse”
him
himse”
his
how
however
hundred
i
ie
if
[Type text] Page "
8/17/2019 Identifying Syntax Semantic
51/63
identifying syntax and semantic relation between articles
in
inc
indeed
interest
into
is
it
its
itse”
keep
last
latter
latterly
least
less
ltd
made
many
may
me
[Type text] Page "1
8/17/2019 Identifying Syntax Semantic
52/63
identifying syntax and semantic relation between articles
meanwhile
might
mill
mine
more
moreover
most
mostly
move
much
must
my
myse”
name
namely
neither
never
nevertheless
next
nine
[Type text] Page "2
8/17/2019 Identifying Syntax Semantic
53/63
identifying syntax and semantic relation between articles
no
nobody
none
noone
nor
not
nothing
now
nowhere
of
off
often
on
once
one
only
onto
or
other
others
[Type text] Page "3
8/17/2019 Identifying Syntax Semantic
54/63
identifying syntax and semantic relation between articles
otherwise
our
ours
ourselves
out
over
own
part
per
perhaps
please
put
rather
re
same
see
seem
seemed
seeming
seems
[Type text] Page "4
8/17/2019 Identifying Syntax Semantic
55/63
identifying syntax and semantic relation between articles
serious
several
she
should
show
side
since
sincere
six
sixty
so
some
somehow
someone
something
sometime
sometimes
somewhere
still
such
[Type text] Page ""
8/17/2019 Identifying Syntax Semantic
56/63
identifying syntax and semantic relation between articles
system
take
ten
than
that
the
their
them
themselves
then
thence
there
thereafter
thereby
therefore
therein
thereupon
these
they
thick
[Type text] Page "#
8/17/2019 Identifying Syntax Semantic
57/63
identifying syntax and semantic relation between articles
thin
third
this
those
though
three
through
throughout
thru
thus
to
together
too
top
toward
towards
twelve
twenty
two
un
[Type text] Page "/
8/17/2019 Identifying Syntax Semantic
58/63
identifying syntax and semantic relation between articles
under
until
up
upon
us
very
via
was
we
well
were
what
whatever
when
whence
whenever
where
whereafter
whereas
whereby
[Type text] Page "0
8/17/2019 Identifying Syntax Semantic
59/63
identifying syntax and semantic relation between articles
wherein
whereupon
wherever
whether
which
while
whither
who
whoever
whole
whom
whose
why
will
with
within
without
would
yet
you
[Type text] Page "
8/17/2019 Identifying Syntax Semantic
60/63
identifying syntax and semantic relation between articles
your
yours
yourself
yourselves
-o)ule 2 !ata Cleaning
The 'lowchart for the ata -leaning process is gi"en below
[Type text] Page #
Data @xtraction
Bsing Delimiter an&
Data Cleaning using
Stop %or&s
epository
Bnclean Data
Stop ?or&s
epository
8/17/2019 Identifying Syntax Semantic
61/63
identifying syntax and semantic relation between articles
&ig* +ata %leaning Process &lowchart
-o)ule 3 /o0eni1ation%fter the data is cleaned then the to+en extraction process begins in which all the words
in the %rticles are referred as to+ens and are extracted. The To+en extraction is done with the
help of again delimiters. The flowchart for the to+en extraction is as gi"en below
[Type text] Page #1
Start
?ebsite Brl
@xtract the indi"idual words with help of a delimiter li+e comma or a space
-lean the symbols and if the word belongs to stop word remo"e it
-lean data is stored in the repository
Stop
8/17/2019 Identifying Syntax Semantic
62/63
identifying syntax and semantic relation between articles
Fig: 3eyword "xtraction Process
[Type text] Page #2
Data @xtraction
Bsing Delimiter
Eey%or&sepository
Clean Data
8/17/2019 Identifying Syntax Semantic
63/63
identifying syntax and semantic relation between articles
The 'lowchart for the Text @xtraction process is gi"en below
&ig* ,oken E-traction Process &lowchart
'ig shows the To+en extraction process where the clean data is scanned to obtain
Start
-lean data
@xtract the indi"idual words with help of a delimiter li+e comma or a space
Stop
iFG no o(
tokens
$tore to+en in repository
iM1