19
1

Information Retrieval Techniques of Google

  • Upload
    cyr-ish

  • View
    1.093

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Information Retrieval Techniques of Google

1

Page 2: Information Retrieval Techniques of Google

2

PRESENTATIONON

GOOGLE

PRESENTED BY

Sehrish Akram

Page 3: Information Retrieval Techniques of Google

3

Page 4: Information Retrieval Techniques of Google

4

Google, the leading search engine worldwideFounded in 1998 by Stanford University graduate students Larry Page and Sergei Brin.

Page 5: Information Retrieval Techniques of Google

5

Page 6: Information Retrieval Techniques of Google

6

WHAT IS QUERY

Page 7: Information Retrieval Techniques of Google

7

SEARCHING TECHNIQUES

Google search engine uses these techniques:”It is a full-text searching engine”When we do a Google search actually, we are

searching GOOGLE’s index of the web.We do this by software program called

“spiders”.

Page 8: Information Retrieval Techniques of Google

8

SEARCHING TECHNIQUES

Spiders start fetching a few web pages and then they follow the link and fetch the pages they point to.

CASE FOLDING techniqueNormalized technique e.g. U.S.A …USA.

Page 9: Information Retrieval Techniques of Google

9

SEARCHING TECHNIQUES Case sensitive technique is not also used in

Google if the user search for seven , SEVEN, Seven or even 7 u get the same results.

Singular is different from plural searches for apple or apples turn up different pages.

The orders of words matters: Google considers the first word most important ,the second word next and so on.

Google ignores most little words including “I” “an” “ how” “the” “of” “AN”.

Page 10: Information Retrieval Techniques of Google

10

SEARCHING TECHNIQUES Google search word limit is 32.  Wildcards searching generally places the symbol

"*" after a word. It tells the database to look for variations of that

word. For Example: Investigation* Might pull sites

with words such as investigation, investigator, and investigative. 

Page 11: Information Retrieval Techniques of Google

11

INFORMATION RETRIEVAL AND THE WEB

What We DoGoogle WANTED TO organize the web into

something searchable. Their early prototype was based upon a few basic principles, including:

The best pages tend to be the ones that people linked to the most.

The best description of a page is often derived from the anchor text associated with the links to a page.

Page 12: Information Retrieval Techniques of Google

12

Anchor text

Page 13: Information Retrieval Techniques of Google

13

DOCUMENT ACQUISITION AND STORAGE:

Google searches more than 3 billion Web documents, which includes Web pages, images and Usenet postings.

Google uses a standalone Web crawler, distributed trough several machines, to create indexes and copies of the document.

Besides standard .html files, Google also indexes other file type including

_____________________________________

Page 14: Information Retrieval Techniques of Google

14

DOCUMENT ACQUISITION AND STORAGE:

A copy of each crawled page is stored in Google’s repository.

Indexes are created using stored words, pointing to an inverted index file

Page 15: Information Retrieval Techniques of Google

15

QUERY INTRODUCTION AND USER OPTIONS:

Since it’s foundation, Google has been steadily introducing new features.

Google uses Boolean search without nested expressions support and with some variations.

By default, it automatically uses AND operator between terms, the minus symbol can be used to perform a NOT function and the OR operation is supported (using OR in upper case).

Page 16: Information Retrieval Techniques of Google

16

Google does not uses stemming, nor truncation,but allows the use of ‘*’ as a wildcard in the

middle of a phrase. For example, searching for “Search Engine” wields quite different result from “Search * Engine”.

Query Introduction and user Options:

Page 17: Information Retrieval Techniques of Google

17

RESULTS SELECTION AND PRESENTATION

To select which document is presented, Google combines a document’s Page Rank value, anchor text and proximity

Results are clustered by server with two visible results and a link to “More results from server”.

Page 18: Information Retrieval Techniques of Google

18

RESULTS SELECTION AND PRESENTATION

Google helps users by correcting misspelled words in their search queries using, not a predetermined dictionary, but it’s own index of the entire web.

Google visual interface is one of the simplest and, according to many, one of the reasons to Google’s success, “it’s simple and it works”.

Page 19: Information Retrieval Techniques of Google

19

LOGICAL DIAGRAM

Web Crawling, Extraction, and Indexing