Upload
cyr-ish
View
1.093
Download
2
Embed Size (px)
Citation preview
1
2
PRESENTATIONON
PRESENTED BY
Sehrish Akram
3
4
Google, the leading search engine worldwideFounded in 1998 by Stanford University graduate students Larry Page and Sergei Brin.
5
6
WHAT IS QUERY
7
SEARCHING TECHNIQUES
Google search engine uses these techniques:”It is a full-text searching engine”When we do a Google search actually, we are
searching GOOGLE’s index of the web.We do this by software program called
“spiders”.
8
SEARCHING TECHNIQUES
Spiders start fetching a few web pages and then they follow the link and fetch the pages they point to.
CASE FOLDING techniqueNormalized technique e.g. U.S.A …USA.
9
SEARCHING TECHNIQUES Case sensitive technique is not also used in
Google if the user search for seven , SEVEN, Seven or even 7 u get the same results.
Singular is different from plural searches for apple or apples turn up different pages.
The orders of words matters: Google considers the first word most important ,the second word next and so on.
Google ignores most little words including “I” “an” “ how” “the” “of” “AN”.
10
SEARCHING TECHNIQUES Google search word limit is 32. Wildcards searching generally places the symbol
"*" after a word. It tells the database to look for variations of that
word. For Example: Investigation* Might pull sites
with words such as investigation, investigator, and investigative.
11
INFORMATION RETRIEVAL AND THE WEB
What We DoGoogle WANTED TO organize the web into
something searchable. Their early prototype was based upon a few basic principles, including:
The best pages tend to be the ones that people linked to the most.
The best description of a page is often derived from the anchor text associated with the links to a page.
12
Anchor text
13
DOCUMENT ACQUISITION AND STORAGE:
Google searches more than 3 billion Web documents, which includes Web pages, images and Usenet postings.
Google uses a standalone Web crawler, distributed trough several machines, to create indexes and copies of the document.
Besides standard .html files, Google also indexes other file type including
_____________________________________
14
DOCUMENT ACQUISITION AND STORAGE:
A copy of each crawled page is stored in Google’s repository.
Indexes are created using stored words, pointing to an inverted index file
15
QUERY INTRODUCTION AND USER OPTIONS:
Since it’s foundation, Google has been steadily introducing new features.
Google uses Boolean search without nested expressions support and with some variations.
By default, it automatically uses AND operator between terms, the minus symbol can be used to perform a NOT function and the OR operation is supported (using OR in upper case).
16
Google does not uses stemming, nor truncation,but allows the use of ‘*’ as a wildcard in the
middle of a phrase. For example, searching for “Search Engine” wields quite different result from “Search * Engine”.
Query Introduction and user Options:
17
RESULTS SELECTION AND PRESENTATION
To select which document is presented, Google combines a document’s Page Rank value, anchor text and proximity
Results are clustered by server with two visible results and a link to “More results from server”.
18
RESULTS SELECTION AND PRESENTATION
Google helps users by correcting misspelled words in their search queries using, not a predetermined dictionary, but it’s own index of the entire web.
Google visual interface is one of the simplest and, according to many, one of the reasons to Google’s success, “it’s simple and it works”.
19
LOGICAL DIAGRAM
Web Crawling, Extraction, and Indexing