Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
TEXT MINING BASED RETRIEVE SIMILARITY
CONTENT WEBPAGE IN WEB MINING TECHNIQUES
S.Amudha*, Dr.I.ElizabethShanthi**, *(Ph.D Research Scholar
Department of Computer Science,
Avinashilingam Institute for Home Science and Higher Education for Women,
Coimbatore, India
[email protected]) **(Professor,
Department of Computer Science,
Avinashilingam Institute for Home Science and Higher Education for Women,
Coimbatore, India
I.INTRODUCTION
Nowadays World Wide Web (WWW) is
considered to be the best source of information.
Its importance mainly is due to easy access, low-
cost and being responsive to users’ needs in the
shortest time[10]. Due to the vast number of web
pages that exists in; analyzing and clustering of
the results is still the maximum important
challenge in design of search engines and still
more than half of all retrieved web pages in any
search engine have been reported to be
Abstract:
Search engine have a huge amount of information on the web. Search engine based on the query to retrieve
the content and the user viewed some pages of search results. The user’s views of the web information
produce the ranking value to the web pages for retrieve the content. Most of the time user’s query not
contain relevant document to the users search and relevant document not contain highest ranking values.
The proposed rwork overcomes the drawback to retrieve better relevant document. The proposed
framework mainly classify three parts (i) Webcrawler: to retrieve the web page content in search engine
based on user’s query (ii) preprocessing: tokenization-nonempty sequence of characters excluding spaces
and punctuations, stopwords-remove function words and connectives words, stemming-Remove
inflections that convey parts of speech, tense and number (iii) similarity of the web content in retrieve
document in web crawler using clustering techniques.
Keywords — web crawler, search engine, information retrieval.
International Journal of Pure and Applied MathematicsVolume 119 No. 12 2018, 13571-13583ISSN: 1314-3395 (on-line version)url: http://www.ijpam.euSpecial Issue ijpam.eu
13571
irrelevant.Search engines are the major tools for
finding and receiving access to the contents on
the web. Whenever users seek information, enter
their query in search engine. The search engine
searches through web pages and return a list of
relevant ones[6].
Current Web search techniques are not
directly suited for indexing and retrieval of
semantic mark-up. Document is treated as a bag
of words where words or word variants are
recognized as indexing terms. The existing
semantic mark-up is either simply ignored by
many search engines for indexing purposes or not
processed in a way that allows the mark-up to be
used distinguishably from other text during the
search. The upcoming Web search is no longer
limited to matching keywords of the query
against documents but instead complex
information needs can be expressed in a
structured way with precise and structured
answers as results.The kind of search in which
user’s information needs are addressed by
considering the meaning of user’s query as well
as available resources is referred to as Semantic
Search[12].
One of the most important challenging
issues in any web search engine is finding high
quality web pages. Quality of pages is defined
based on the user preferences. Then, the problem
of ranking is to sort web pages based on users’
requests or preferences. Definitely, to make the
web more interesting and productive, we need a
good and efficient ranking algorithm for crawling
and searching [2].
The reason search results are ranked in an
information retrieval (IR) system derives from
the assumption that information-seeking users
should get all the information relevant to their
search query and only that information. Although
mathematical and statistical methods of varying
complexity do exist to determine the relevance of
a search result, such methods use algorithms to
integrate assumptions of relevance. But it is the
subjective relevance of a result that matters to the
user in the end, ―because an information-retrieval
system exists only to serve its users‖ [4].
The workflow of a web crawler can be described
roughly as follows [13]:
(1) A search engine assigns some URLs as the
initial URLs for every web crawler. Then,
the web crawler pushes them into a URL
queue (queued URLs) in which each one
instructs the web crawler where to travel in
the Web.
(2) The web crawler starts working with the
initial URLs.
(3) When the web crawler retrieves web pages, it
extracts all of the URLs (current URLs) in
the web pages.
(4) The web crawler adds them to the queued
URLs.
(5) Where after, to continue crawling, the web
crawler makes a choice of URLs from the
queues URLs and deletes these crawled
URLs.
(6) The web crawler repeats (2) to (5) until no
URLs remain in the queue of URLs.
Currently, the most classic Web structure
algorithm is PageRank algorithm that Sergey
Brin and Larry Page have proposed at Stanford
University. In order to verify the performance of
the algorithm, they successfully applied it to the
Google search engine prototype, and now Google
has become the world's most well-known search
engine.
Many of the existing page ranking
algorithms are based on connectivity. Graph
theory based on networks plays an important role
in page ranking and many algorithms use the in
links and out links of a page for ranking them.
One more important aspect in graph theory
involves the concept of Eccentricity. The more
the eccentricity of a page, the more will be its
reachability. That is, the rank will be high for
those pages which have small eccentricity value
[8].
Web information retrieval may be defined
as the application of information retrieval
theories and methodologies to the World Wide
Web. Web information retrieval task faces
several challenges when compared to the classic
information retrieval due to the following reasons
[11]:
The difference in size between the
document collections used for classic
International Journal of Pure and Applied Mathematics Special Issue
13572
information retrieval and the web makes
the task of web information retrieval a
tedious one.
The structure of the web is another
important factor. On the web, the links
between the documents exhibit unique
patterns.
The web exhibits a dynamic behavior.
The information on the web is
heterogeneous in nature, where multiple
types of document formats coexist.
Most of the contents on the web are
duplicated.
The task of web information retrieval
needs to deal with several types of users
starting from professionals to naive users.
II. RELATED WORK
ImanRasekh 2015 has proposed a new
type of web page search based on the competitive
intelligence and used link based ranking for
identified user preferences. This proposed system
getting keywords from the user and retrieved the
information based on the keyword. After
analyzed the retrieved information and stored in
the system. Then find the relationship between
the users and find the user behavior of the user
web pages are classified. Finally using ICA
semantic algorithm produce the final result to the
user.
SuruchiChawla 2016 has developed for
optimal ranking of clicked URLs using genetic
algorithm based on clustered web page query
session for personalised web search. This system
using the dataset of query session collected from
the web in the three domain academician,
entertainment and sports. This system produce
result improvement in the average precision of
personalized web search with clustered based
optimal ranking of clicked URLs in selected
domain and produce more relevant document in
top URLs. Personalised web search[PSW] using
optimal ranked clicked URL more effective for
produce relevant document.
ValiDerhami, ElaheKhodadadian,
Mohammad Ghasemzadeh, Ali Mohammad
ZarehBidoki (2013) has developed two new
algorithms using reinforcement learning concept
in artificial intelligence. This experiment using
benchmark datasets like LETOR and dotIR data
collection. The dataset use three common
evaluation measures like precision, mean average
precision and Normalized discount cumulative
gain. This system calculate score of every
webpage considered state and value of web
page.RL Ranking algorithm based of n
connectivity of out links from the current page
for finding the score of web page in iteratively.
They proposed another new algorithm is
combined of content based inBM25 and RL rank
algorithm. Those algorithms produce improvised
result of existing ranking system.
VikasJinda, SeemaBawa, ShaliniBatra
(2014) this paper describesthe semantic search on
web using different ranking algorithms. The
relevancy ranking approach based on semantic
which are consider appropriate for retrieval of
relevant information. This review paper
examined depends on the methodologies and
unique characteristics on ranking process. The
classical IR based search model and semantic
based search models, ranking involves three
stages like Entity ranking, relationship ranking
and semantic document ranking. The review
process considers many number of parameter of
semantic search on web have been identified
directly or indirectly in ranking process.
Ali Mohammad ZarehBidoki, Nasser
Yazdani has proposed intelligent ranking
algorithm for web pages with distance. Distance
rank is a recursive method based on
reinforcement learning which considered distance
between web pages. This system compute rank of
web pages and number of average clicks between
two pages. This system used University of
International Journal of Pure and Applied Mathematics Special Issue
13573
California at Berkeley’s Web site with five
millions web pages to evaluate Distance-Rank
and used two scenarios like crawling schedule
and ranking ordering. Finally it is compared the
ordered rank of distance rank with pagerank
algorithm and Google rank algorithm with and
without user query. The distance rank algorithm
produce 5% more throughput compared with
other algorithm.
Ahmet Selman Bozkir
,EbruAkcapinarSezer(2018) has proposed layout
based calculation of web page similarity ranks
and considered the structure and vision based
features. This system considered two categories.
In the first category structural similarities are
analysed with visual inspection of DOM trees
and they have used five types of structure layout
component with whitespace are utilized. In the
second category computer vision based method is
histogram of oriented gradient (HOG) is
employed to edge orientation. The feature
extraction phase used the method is spatial
pyramid matching. This paper achieved the goal
like the visual layout of web pages were mapped
and compared in a multi-resolution schema, the
intermediate process of visual segmentation was
removed and efficient and easily comparable web
page layout signatures were generated.
Gabrielle Demange (2017) has proposed
evolutions between two groups abound for
instance between buyer and seller. This system
used ranking algorithm to assigns scores to each
side members based on these evaluations and
mutual centrality method used to characterize by
two properties. Finally the mutual centrality and
congruence method coincide for affiliation
network. The characterization applies to any pair
of evaluation matrices and affiliation network
minimization of the error.
Bo Yang, Hechang Chen , Xuehua Zhao,
Masato Naka , Jing Huang (2015) has developed
a probabilistic counting based method to
quantitatively and efficiently computing the
diversity of inbound hyperlink and Drank
algorithm to rank pages by simultaneously
analysing the quantity, quality and diversity of
their inbound hyperlinks. The Drank algorithm
compute the following are the diversity of each
pair of pages,adjust hyperlink weights based on
diversity and page authority according to the
updated hyperlink weight.
Christiane Behnert , Dirk Lewandowski
(2015) has proposed library information system
consider approaches adapted from web search
engines. This system considers ranking factors
into six groups are text statistics, popularity,
freshness, locality and availability, content
properties and user background. The first factor
finds the relevancy of content using relevancy
ranking and popularity factor based on citation
analysis. Remaining factors are major role in
relevancy ranking.
S Hariharan, S Dhanasekar,
KalyaniDesikan (2015) has developed
reachability for web based ranking using Haar
wavelets with multi resolution. This system used
page ranking in the form of structured signal with
in link, out link and reachability values of the
web page in network graphs. The page ranking
of web pages used average, coefficient of the
input signal and down sampling process. Finally
compare the result between original page rank
and category based page rank and produce better
result category based page rank compared with
others.
YaJun Du, YuFengHai (2013) has
proposed new method for measuring the
similarity of formal concept analysis(FCA)
method for web page rank in user’s web log. This
system proposed new algorithm that to find the
intension and extension similarity that analyze a
user’s browsing pattern with hyperlinks and also
find the information similarity between two
nouns with using of user’s web log. This system
computes the semantic similarity between two
concepts and finding similarity ranking of web
pages in own web crawler based on focused web
crawler. They proved that the semantic ranks of
International Journal of Pure and Applied Mathematics Special Issue
13574
web pages are useful and efficient for making a
web crawler’s choice of web pages for
continuing work.
Michael Scholz, Jella Pfeiffer , Franz
Rothlauf (2017) has proposed default page
ranking algorithm used to non-personalized
product ranking on landing pages of online
stores. This system proposed new algorithm
product centrality ranking algorithm (PCRA)
used the page rank centrality product in a product
domination graph to find their rank values. The
graph contains two parts are node and edges. The
node represents products and the edges represent
dominance relationship between the products.
The PCRA algorithm achieve more accurate
ranking than existing algorithm.
Vidya P V, Reghu Raj P C, Jayan V
(2016) has proposed multilingual information
search algorithm with web page ranking based on
user’s query. This system performs five major
task are preprocessing, searching, processing web
page contents, retrieval and ranking. This system
used cross lingual information retrieval among
the languages English, Hindi and Malayalam and
performs pre-post preprocessing for user quires
in different language. Finally improves the
quality of the result obtained from Google search.
III. PROPOSED METHODOLOGY
The proposed work developed using java
language for finding the similarity content in
retrieved documents. The first process in this
framework is pass the user’s query to the search
engine for retrieve the content in the web. Then
search engine based on the user’s query analysed
to receive the search results and set the window
size for retrieve the number of the web page for
example set as WZ=2 retrieve homepage have 10
and next page have 8 then both have 18 links of
web pages.
The web crawler used http protocol to
retrieve the document in links with help of href
tag. The href tag used to extract the web
information in the particular link. Finally match
the content similarity in the retrieved documents.
STOPWORDS
Stop words are a partition of natural
language. The purpose of that stop-words should
be eliminated from a text is that they make the
text appear weighted and less important for
analysts. Removing stop words decreases the
dimensionality of term space.
Figure 1.Workflow of retrieve similarity
content
The most common words in text
documents are articles, prepositions, and pro-
nouns, etc. that doesn’t offer which means of the
documents. These words are preserved as stop
words. Sample of stop words are: the, in, a, an,
with, etc. Stop words are removed
fromdocuments as a result of those words don’t
seem to be measured as keywords in text mining
applications
STEMMING
This technique is used to find the
root/stem of a word. For example, the words
select, selected, selecting, selections all can be
stemmed to the word ―select‖ [6]. The
determination of this method is to
eliminatenumerous suffixes, to decrease the
Find The Similarity Content
Preprocessing
Retrieve Webpages
Web Crawler
Search Engine
Query
International Journal of Pure and Applied Mathematics Special Issue
13575
amount of words, to have perfectlyequivalent
stems, to save time and memory space.
PORTERS STEMMER
Porters stemming algorithm is one
amongst the foremost stemming algorithm
projected in 1980. Several modifications and
enhancements are created and suggested on the
fundamental algorithm.
It’s supported the thought that the
suffixes within the English language area
unit largely created from grouping of smaller
and less complicated suffixes. It’s 5 steps,and at
every step, rules are applied till one
amongst them passes the conditions. If a rule is
accepted, the suffix is removed consequently, and
therefore the next step is performed. The
resultant stem at the end of the fifth step is came
back. The rule like the following: → as an
example, a rule (m>0) EED → EE suggests
that ―if the word has a minimum of one vowel
and consonant and EED ending, modification the
ending to EE‖. Therefore ―agreed‖ becomes
―agree‖ whereas ―feed‖ remains unchanged.
Porter designed an in depth framework of
stemming that is thought as „Snowball‟ . The
most purpose of the framework is to
permit programmers to develop their own
stemmers for different character sets or
languages. But it had been noted that Lovins
stemmer could be a heavier stemmer that
produces a higher information reduction [13].
The Lovins algorithmic rule is clearly larger than
the Porter algorithmic rule, attributable
to its terribly intensive endings list. However in a
way that's used to advantage: it's quicker. It is
effectively listed area for time, and with
its massive suffix set it wants simply 2 major
steps to get rid of a suffix, compared with
the 5 of the Porter algorithmic rule.
K-MEANS CLUSTERING ALGORITHM
K-means is one among the best
unsupervised learning algorithms that solve the
well-known clustering issues. The procedure
follows a straightforward and simple way to
classify a given data set through a particular
group of clusters used an apriori. The
most plan is to define k centers, one for
every cluster. These centers are placed in totally
different location causes different result. So, the
higher alternative is to put them the maximum
amount as possible from one another. The
following step is to require every point going to a
given data set and associate it to the
closest center. Once no point is unfinished, the
primary step is completed associated with
nearest cluster. At now we'd like to re-calculate k
new centroids as barycenter of the
clusters output from the previous step. Next
when we've these k new centroids, a
prime binding must be done between an
equivalent data set points and also the nearest
new center. A loop has been generated. As
a results of this loop we tend to might notice that
the k centers alter their location step by
step till no additional changes are done or
in alternative words centers don't move any
further. Finally, this algorithm aims at
minimizing associate objective function as square
error function given by:
where,
‘||xi - vj||’ is the Euclidean distance
between xi and vj.
‘ci’ is the number of data points in ith
cluster.
‘c’ is the number of cluster centers.
Algorithmic steps for k-means clustering
Let X = {x1,x2,x3,……..,xn} be the set of data
points and V = {v1,v2,…….,vc} be the set of
centers.
1) Randomly select ‘c’ cluster centers.
International Journal of Pure and Applied Mathematics Special Issue
13576
2) Measure the distance between each data
point and cluster centers.
3) Assign the data point to the cluster center
whose distance from the cluster center is
minimum of all the cluster centers..
4) Recalculate the new cluster center using:
where, ‘ci’ signifies the number of data points
in ith
cluster.
5) Recalculate the distance between every data
point and new found cluster centers.
6) If no information point was reallocated then
stop, otherwise repeat from step three).
IV. EXPERIMENTAL RESULT AND
EVALUATION
This experiment was done on a dataset
based on user queries like single keyword. This
system capture the users in search results
obtained using the google,yahoo,bing and ask
search engines. In order to generate the dataset,
the user require to enter the input query as single
keyword is passed to the google, yahoo, bing and
ask search engines. The figure 1 represent the
personalized search engine.
The Search results are retrieved and
stored in the system using the href and h3 html
tag. This system was evaluated the retrieved most
similarity content in the webpages and developed
this experiment using java netbeans and mysql
software. The first step collect the data from the
various search engines based on the users input
query and retrieve the search results. The second
process is preprocessing the datasets in the
follows:
Tokenization:
Tokenization is the procedure of splitting
a stream of text up into words, phrases, symbols,
or other meaningful elements called tokens. The
list of tokens becomes input for further
processing such as parsing or text mining.
Tokenization is beneficial both in
linguistics and in computer science, where it
forms portion of lexical analysis‖. The figure 2
shows the tokenization results in preprocessing.
Stopwords:
Terms that occur numerous times in a group and
later are not discriminating for example to, a, the,
of, from ect. Assess the stop terms for a domain
and Stop word lists are maintained. Stop words
decreases the index size.Information retrieval has
been to reduce the size of stop word list or
remove the use of it. Using a better index
compression and Weighting stop terms depend
for query processing (query-based). The figure 3
shows the stopwords results in preprocessing.
Stemming:
Stemming is also known as Conflation.
This is to reduce differences of every word due to
modulation or derivation to a similar stem.
Stemming is improves effectiveness by providing
aimprovedequal between query and a relevant
document. User who is searching for
―swimming‖ might be attentive in documents
with ―swim‖.
It decreases the term index by ~17% and
alsolossy compression. Our system using porter
stemmer for remove the ing,ion,ious etc. porter
stemming an inward word is washed up in the
initialization part, one prefix trimming phase then
takes place so then five suffix trimming phases
occur. The figure 4 shows the stemming results
in preprocessing.
International Journal of Pure and Applied Mathematics Special Issue
13577
Figure 1:Personalised Search Engine
Figure 2:Tokenization Result
Figure 3: Stopwords result
Figure 4:Stemming Result
The table 1 represents the grouping of similar
content in the web pages using k-means
clustering algorithm and table 2 Retrieve
International Journal of Pure and Applied Mathematics Special Issue
13578
unique similar content with cluster values in
webpages in various search engines. Figure 5:
Chart for Retrieve similar content in webpages
in various searches and table 3: Retrieve
similar content in webpages in various
searches.
FILE CLUSTER VALUE
apple//google//https___en.wikipedia.org_wiki_Apple_Inc..txt 1 0
apple//yahoo//https___en.wikipedia.org_wiki_Apple_Inc..txt 1 0
apple//bing//https___en.wikipedia.org_wiki_Apple_Inc..txt 1 0
apple//ask//https___en.wikipedia.org_wiki_Apple_Inc..txt 1 0
apple//google//https___support.apple.com_en_in.txt 2 1.75890591
apple//yahoo//https___support.apple.com_en_in.txt 2 1.75890591
apple//bing//https___support.apple.com_en_in.txt 2 1.75890591
apple//ask//https___support.apple.com_.txt 2 1.75890591
apple//google//https___www.apple.com_in_.txt 3 5.140819694
apple//google//https___www.apple.com_in_buy_shop_.txt 3 5.140819694
apple//google//https___www.apple.com_in_iphone_.txt 3 5.140819694
apple//yahoo//https___www.apple.com_.txt 3 5.140819694
apple//yahoo//https___www.apple.com_in_.txt 3 5.140819694
apple//yahoo//https___www.apple.com_in_ipad_.txt 3 5.140819694
apple//yahoo//https___www.apple.com_in_iphone_.txt 3 5.140819694
apple//bing//https___www.apple.com_in_.txt 3 5.140819694
apple//bing//https___www.apple.com_in_buy_.txt 3 5.140819694
apple//bing//https___www.apple.com_in_contact_.txt 3 5.140819694
apple//bing//https___www.apple.com_in_iphone_.txt 3 5.140819694
apple//bing//http___www.myimaginestore.com_.txt 3 5.140819694
apple//ask//https___www.apple.com_.txt 3 5.140819694
apple//ask//https___www.apple.com_in_buy_shop_.txt 3 5.140819694
apple//bing//https___simple.wikipedia.org_wiki_Apple.txt 4 0
apple//yahoo//https___www.apple.com_iphone_.txt 5 5.490529511
apple//ask//https___www.apple.com_ipad_.txt 5 5.490529511
apple//ask//https___www.apple.com_iphone_.txt 5 5.490529511
apple//ask//https___www.apple.com_watch_.txt 5 5.490529511
apple//google//https___www.apple.com_in_iphone_battery_and_performance_.txt 6 0
apple//google//https___www.apple.com_in_macbook_.txt 7 0
International Journal of Pure and Applied Mathematics Special Issue
13579
apple//google//https___www.apple.com_in_mac_.txt 8 13.01161655
apple//yahoo//https___www.apple.com_in_ios_ios_11_.txt 8 13.01161655
apple//yahoo//https___www.apple.com_in_mac_.txt 8 13.01161655
apple//yahoo//https___www.apple.com_iphone_se_.txt 8 13.01161655
apple//bing//https___www.apple.com_in_mac_.txt 8 13.01161655
apple//ask//https___www.apple.com_mac_.txt 8 13.01161655
apple//google//https___www.engadget.com_2018_01_26_apple_homepod_2018_release_.txt 9 0
apple//google//https___www.youtube.com_channel_UCE_M8A5yxnLfW0KghEeajjw.txt 10 1.414213562
apple//ask//https___www.youtube.com_user_Apple.txt 10 1.414213562
apple//google//http___imaginestore.org_.txt 11 0
Table 1: Group the similarity content using K-means Clustering Algorithm
Link VALUES
http://www.myimaginestore.com/ 5.14082
https://en.wikipedia.org/wiki/Apple_Inc. 0
https://support.apple.com/ 1.758906
https://support.apple.com/en-in 1.758906
https://www.apple.com/ 5.14082
https://www.apple.com/in/ 5.14082
https://www.apple.com/in/buy/ 5.14082
https://www.apple.com/in/buy/shop/ 5.14082
https://www.apple.com/in/iphone-battery-and-performance/ 0
https://www.apple.com/in/iphone/ 5.14082
https://www.apple.com/in/mac/ 13.01162
https://www.apple.com/iphone/ 5.14082
Table 2: Retrieve similar content with cluster values in webpages in various search engine
International Journal of Pure and Applied Mathematics Special Issue
13580
Figure 5: Chart for Retrieve similar content in webpages in various search
LINK
http://www.myimaginestore.com/
https://en.wikipedia.org/wiki/Apple_Inc.
https://support.apple.com/
https://support.apple.com/en-in
https://www.apple.com/
https://www.apple.com/in/
https://www.apple.com/in/buy/
https://www.apple.com/in/buy/shop/
https://www.apple.com/in/iphone-battery-and-performance/
https://www.apple.com/in/iphone/
https://www.apple.com/in/mac/
https://www.apple.com/iphone/
Table 3: Retrieve similar content in webpages in various search
0.01.02.03.04.05.06.07.08.09.010.011.012.013.014.0
http://w
ww.m
yimaginestore.com/
https://en.wikipedia.org/w
iki/Apple_I
nc.
https://support.apple.com/
https://support.apple.com/en-in
https://www.apple.com/
https://www.apple.com/in/
https://www.apple.com/in/buy/
https://www.apple.com/in/buy/shop
/
https://www.apple.com/in/iphone-
battery-and-perform
ance/
https://www.apple.com/in/iphone/
https://www.apple.com/in/m
ac/
https://www.apple.com/iphone/
International Journal of Pure and Applied Mathematics Special Issue
13581
V. CONCLUSION AND FUTURE WORK
In this paper has proposed which use k-means
clustering algorithm for retrieve similar content
for webpages in various search engine. This
system remove the noisy values in datasets in the
preprocessing techniques like tokenization,
stopwords and stemming with porter stemmer
algorithm. The performance of the proposed
work is assessed for similar content in webpages.
Our future work is improvise the similarity links
and also improvise page ranking values for web
pages.
REFERENCES
[1] Ahmet Selman Bozkir∗ ,
EbruAkcapinarSezer,‖Layout-based
computation of web page similarity
ranks ―,Int. J. Human-Computer Studies
110 (2018) 95–114
[2] Ali Mohammad ZarehBidoki *, Nasser
Yazdani‖, DistanceRank: An intelligent
ranking algorithm for web pages‖,
Information Processing and
Management 44 (2008) 877–892
[3] Bo Yang, Hechang Chen , Xuehua Zhao
, Masato Naka, Jing Huang,‖ On
characterizing and computing the
diversity of hyperlinks for anti-
spamming page ranking‖, Knowledge-
Based Systems 77 (2015) 56–67
[4] Christiane Behnert , Dirk
Lewandowski,‖ Ranking Search Results
in Library Information Systems —
Considering Ranking Approaches
Adapted From Web Search Engines‖,
The Journal of Academic Librarianship
41 (2015) 725–735
[5] Gabrielle Demange,‖ Mutual rankings‖,
Mathematical Social Sciences 90 (2017)
35–42
[6] ImanRasekh,‖A New Competitive
Intelligence-Based Strategy for Web
Page Search‖ The 2015 International
Conference on Soft Computing and
Software Engineering (SCSE 2015),
Procedia Computer Science 62 ( 2015 )
450 – 456
[7] Michael Scholz, JellaPfeiffer , Franz
Rothlauf ,‖ Using PageRank for non-
personalized default rankings in
dynamic markets ―,European Journal of
Operational Research 260 (2017) 388–
401.
[8] S Hariharan, S Dhanasekar,
KalyaniDesikan,‖Reachability Based
Web Page Ranking Using Wavelets‖,
2nd International Symposium on Big
Data and Cloud Computing
(ISBCC’15),Procedia Computer
Science 50 ( 2015 ) 157 – 162.
[9] SuruchiChawla,‖A novel approach of
cluster based optimal ranking of clicked
URLs using genetic algorithm for
effective personalized web search‖,
Applied Soft Computing 46 (2016) 90–
103
[10] ValiDerhami∗, ElaheKhodadadian,
Mohammad Ghasemzadeh, Ali
Mohammad ZarehBidoki,‖ Applying
reinforcement learning for web pages
ranking algorithms‖, Applied Soft
Computing 13 (2013) 1686–1692
[11] Vidya P V, Reghu Raj P C, Jayan V,‖
Web Page Ranking Using Multilingual
Information Search Algorithm - A
Novel Approach‖, International
Conference on Emerging Trends in
Engineering, Science and
Technology(ICETEST - 2015),
Procedia Technology 24 ( 2016 ) 1240 –
1247
[12] Vikas Jindal, SeemaBawa ,
ShaliniBatra,‖A review of ranking
approaches for semantic search on
Web‖, Information Processing and
Management 50 (2014) 416–425
[13] YaJun Du, YuFengHai,‖ Semantic
ranking of web pages based on formal
concept analysis‖, The Journal of
Systems and Software 86 (2013) 187–
197
International Journal of Pure and Applied Mathematics Special Issue
13582
13583
13584