94
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Building an Intelligent Web:Theory and Practice

Pawan Lingras

Saint Mary’s University

Rajendra Akerkar

American University of Armenia and SIBER, India

Page 2: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 3: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Discipline

Computer Science Mathematics and Statistics Management

Research Graduate Research Graduate

Chapters 1 – 8 excluding shaded portion related to

mathematics and implementation.

Complete BookInformation Retrieval

Web MiningChapters 2, 4 – 8 excluding

shaded portion related to implementation.

Chapters 1, 2, 3, 7 and 8 Chapters 4 - 8

Chapters 1 – 8 excluding shaded portion related to

implementation.

Page 4: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Information Retrieval

Page 5: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 6: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Create a list of words

Remove stop words

Stem words

Calculate frequency of each stemmed word

Figure 2.1 Transforming text document to a weighted list of keywords

Page 7: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 8: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Data Mining has emerged as one of the most exciting and dynamic fields in computing science. The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them. Commercial enterprises have been quick to recognize the value of this concept; consequently, within the span of a few years, the software market itself for data mining is expected to be in excess of $10 billion. Data mining refers to a family of techniques used to detect interesting nuggets of relationships/knowledge in data. While the theoretical underpinnings of the field have been around for quite some time (in the form of pattern recognition, statistics, data analysis and machine learning), the practice and use of these techniques have been largely ad-hoc. With the availability of large databases to store, manage and assimilate data, the new thrust of data mining lies at the intersection of database systems, artificial intelligence and algorithms that efficiently analyze data. The distributed nature of several databases, their size and the high complexity of many techniques present interesting computational challenges.

Page 9: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 10: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 11: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 12: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 13: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 14: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 15: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 16: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

0

0.25

0.5

0.75

1

0.25 0.5 0.75 1

Recall

Pre

cisi

on

Figure 2.43 Relationship between precision and recall

Page 17: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 18: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Semantic Web

Page 19: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Semantic WebThe layer language model

(Berners-Lee, 2001; Broekstra et al, 2001)

Page 20: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

<h1>Student Service Centre</h1>

Welcome to the home page of the Student Service Centre.

The centre is located in the main building of the University.

You may visit us for assistance during working days.

<h2>Office hours</h2>

Mon to Thu 8am - 6pm<br>

Fri 8am - 2pm<p>

But note that centre is not open during the weeks of the

<a href=”. . .”>State Of Origin</a>.

Figure 3.2 Example of a Web page of a Student Service Centre

Page 21: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

<organization>

<serviceOffered>Admission</serviceOffered>

<organizationName>Student Service Centre</organizationName>

<staff>

<director>John Roth</director>

<secretary>Penny Brenner</secretary>

</staff>

</organization>

Figure 3.3 Example of a Web page of a Student Service Centre

Page 22: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Figure 3.4 Representing classes and instances (Noy et al., 2001)

Page 23: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 24: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

root college

lecturer

lecturer

lecturer

location

course

course

course

course

course

course

@name

@name

@name

@title

@title

@title

@title

@title

@title

Innsbruck

NonlinearAnalysis

ModernAlgebra

DiscreteStructures

SamHoofer

NonlinearAnalysis

DanielaFrost

Computational

Algebra

Algorithms

EdwardBunker

Page 25: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 26: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Queries 1 and 2

root college

lecturer

lecturer

lecturer

location

course

course

course

course

course

course

@name

@name

@name

@title

@title

@title

@title

@title

@title

Innsbruck

NonlinearAnalysis

ModernAlgebra

DiscreteStructures

SamHoofer

NonlinearAnalysis

DanielaFrost

Computational

Algebra

Algorithms

EdwardBunker

Page 27: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Queries 3 and 4

root college

lecturer

lecturer

lecturer

location

course

course

course

course

course

course

@name

@name

@name

@title

@title

@title

@title

@title

@title

Innsbruck

NonlinearAnalysis

ModernAlgebra

DiscreteStructures

SamHoofer

NonlinearAnalysis

DanielaFrost

Computational

Algebra

Algorithms

EdwardBunker

Page 28: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 29: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 30: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 31: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/dc/elements/1.1/">

<rdf:Description rdf:about="">

<dc:title>

Building an Intelligent Web: Theory and Practice

</dc:title>

<dc:creator> Rajendra Akerkar and Pawan Lingras </dc:creator>

</rdf:Description>

</rdf:RDF>

Figure 3.26 Fragment of RDF

Page 32: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

A RDF model for automobiles

Page 33: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

<?xml version="1.0"?>

<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

xmlns:my="http://www.myvehicle.com/vehicle-schema/">

<rdfs:Class rdf:about="#Vehicle"/>

<rdfs:Class rdf:about="#Car">

<rdfs:subClassOf rdf:resource="#Vehicle"/>

</rdfs:Class>

<rdf:Property rdf:about="#name">

<rdfs:domain rdf:resource="#Vehicle"/>

</rdf:Property>

<rdf:Description rdf:about="#Ford">

<rdf:type rdf:resource="#Car"/>

<my:name>Ford Icon</my:name>

</rdf:Description>

<my:Truck rdf:about="#Mitsubishi">

<my:name>Mitsubishi</my:name>

<my:carry rdf:resource="#Mitsubishi"/>

</my:Truck>

</rdf:RDF>

Figure 3.29 RDF/XML file for the automobile example

Page 34: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

<?xml version="1.0"?>

<topicMap id="tmrf"

xmlns = 'http://www.topicmaps.org/xtm/1.0/'

xmlns:xlink = 'http://www.w3.org/1999/xlink'>

<!--

The map contains information about Technomathematics Research Foundation.

We can include comment and narrative here…

-->

.... here my topics and my associations go ...

</topicMap>

Figure 3.30 A Topic Map document (Adopted from http://topicmaps.bond.edu.au/docs/6/1)

Page 35: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Classification and Association

Page 36: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Data Preparation

• Database Theory

• SQL

• Data Transformation

• http://www.ecn.purdue.edu/KDDCUP/data/

Page 37: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Classification

• Find a rule, a formula, or black box classifier for organizing data into classes. – Classify clients requesting loans into categories

based on the likelihood of repayment– Classify customers into Big or Moderate Spenders

based on what they buy– Classify the customers into loyal, semi-loyal,

infrequent based on the products they buy• The classifier is developed from the data in the

training set• The reliability of the classifier is evaluated using

the test set of data

Page 38: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Classification

• ID3 Algorithm– Numerical Illustration– Application to a Small E-commerce Dataset

• C4.5 for Experimentation

• Other approaches – Neural Networks– Fuzzy Classification– Rough Set Theory

Page 39: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Association

• Market basket analysis – determine which things go together

• Transactions might reveal that– customers who buy banana also buy candles– cheese and pickled onions seem to occur frequently

in a shopping cart

• Information can be used for– arranging a physical shop or structuring the Web site– for targeted advertising campaign

Page 40: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Association

• Apriori Algorithm

• Demonstration for an E-commerce Application

Page 41: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Clustering

Page 42: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Clustering

• Breaks a large database into different subgroups or clusters

• Unlike classification there are no predefined classes

• The clusters are put together on the basis of similarity to each other

• The data miners determine whether the clusters offer any useful insight

Page 43: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

0

1

2

3

4

5

0 1 2 3 4 5

Page 44: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Statistical Methods

• k – means– Numerical Example– Implementation

• Data Preparation • Clustering

• Other Methods

Page 45: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Neural Network Based Approaches

• Kohonen Self Organising Maps– Numerical Demonstration– Application to Web Data Collection

• Other Neural Network Based Approaches

Page 46: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Clustering of customers

Page 47: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Web Mining

Web ContentMining

Web StructureMining

Web UsageMining

Web PageContent Mining

Search ResultMining

GeneralAccess Pattern

Tracking

CustomizedUsage Tracking

Page 48: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Web Usage Mining

Page 49: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

High level web usage mining process(Srivastava et al., 2000)

Page 50: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Applications of web usage mining

(Romanko, 2006; Srivastava et al., 2000)

Page 51: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

140.14.6.11 - pawan [06/Sep/2001:10:46:07 -0300] "GET /s.htm HTTP/1.0" 200 2267

140.14.7.18 - raj [06/Sep/2001:11:23:53 -0300] "POST /s.cgi HTTP/1.0" 200 499

Page 52: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 53: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 54: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 55: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 56: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 57: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 58: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 59: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 60: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 61: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 62: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 63: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 64: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 65: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 66: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 67: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Clustering exercise

Page 68: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 69: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 70: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Classification exercise

Channel Recall Precision Finance 44.3% 98.27% Health 52.3% 89.66% Market 49.1% 83.34% News 44.1% 89.27% Shopping 31.5% 91.31% Specials 60.2% 92.86% Sport 50.0% 91.93% Surveys 21.9% 92.66% Theatre 54.8% 94.63%

Table 6.8 Precision and recall for predicting user’s interest in channels

(Baglioni, et al., 2003)

Page 71: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Association exercise

News Section

Minimum Requests

Maximum Requests

Mean Requests

Standard Deviation

Science 1 97 2.3034 2.8184 Culture 1 208 3.7878 5.9742 Sports 1 318 5.6985 10.8360 Economics 1 258 3.9335 7.2341 International 1 208 3.3823 5.5540 Local Lisbon 1 460 5.6883 11.5650 Local Port 1 256 7.5984 13.2351 Politics 1 208 3.3577 5.4101 Society 1 367 4.2673 7.9853 Education 1 90 2.6496 3.29090

Table 6.9 Summary statistics of requests to the Publico on-line newspaper (Batista and Silva, 2002)

Page 72: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

The association mining showed strong associations between the following pairs:

Politics and Society

Politics and International News

Politics and Sports

Society and International News

Society and Local Lisbon

Society and Sports

Society and Culture

Sports and International News

Page 73: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Sequence Pattern Analysis of Web Logs

Page 74: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 75: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 76: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 77: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Web Content Mining

Page 78: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Data Collection

• Web Crawlers

• Public Domain Web Crawlers

• An Implementation of a Web Crawler

Page 79: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Architecture of a search engine(Romanko, 2006)

Page 80: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 81: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 82: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 83: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Other topics in Web Content Mining

• Search Engines– How to prepare for and setup a search

engine – Types and listings of search engines

(freeware, remote hosting services, commercial)

• Multimedia Information Retrieval

Page 84: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Web Structure Mining

Page 85: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

0/10: The site or page is probably new.

3/10: The site is perhaps new, small in size and has very little or no worthwhile

arriving links. The page gets very little traffic.

5/10: The site has a fair amount of worthwhile arriving links and traffic volume. The

site might be larger in size and gets a good amount of steady traffic with some

return visitors.

8/10: The site has many arriving links, probably from other high PageRank pages.

The site perhaps contains a lot of information and has a higher traffic flow and

return visitor rate.

10/10: The Web site is large, popular and has an extremely high number of links

pointing to it.

Page 86: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

http://www.iprcom.com/papers/pagerank/

Page 87: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 88: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 89: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 90: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Page 91: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Index quality for different search engines

(Henzinger, et al., 1999)

Page 92: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Index quality per page for different search engines

(Henzinger, et al., 1999)

Page 93: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Page Freq. Freq. RankWalk2 Walk1 Walk1

www.microsoft.com/ 3172 1600 1www.microsoft.com/windows/ie/default.htm 2064 1045 3www.netscape.com/ 1991 876 6www.microsoft.com/ie/ 1982 1017 4www.microsoft.com/windows/ie/download/ 1915 943 5www.microsoft.com/windows/ie/download/all.htm 1696 830 7www.adobe.com/prodindex/acrobat/readstep.html 1634 780 8home.netscape.com/ 1581 695 10www.linkexchange.com/ 1574 763 9www.yahoo.com/ 1527 1132 2

Table 8.2 Most frequently visited pages (Henzinger, et al., 1999)

Page 94: Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

Site Frequency Frequency RankWalk 2 Walk 1 Walk 1

www.microsoft.com 32452 16917 1home.netscape.com 23329 11084 2www.adobe.com 10884 5539 3www.amazon.com 10146 5182 4www.netscape.com 4862 2307 10excite.netscape.com 4714 2372 9www.real.com 4494 2777 5www.lycos.com 4448 2645 6www.zdnet.com 4038 2562 8www.linkexchange.com 3738 1940 12www.yahoo.com 3461 2595 7

Table 8.3 Most frequently visited hosts (Henzinger, et al., 1999)