Upload
rodney-watkins
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Why Can’t I Just
Use Google?Uncovering the Invisible Web
Back in the day…Students used to research using resources hand-picked by librarians and teachers. These materials were selected because:
They were written by expert authorsThey presented information without
biasThey were written at your reading
levels
Now, with access to the Web, you have to learn to find and evaluate resources too!
Today you will learn:How much information can search engines
find on the web?
What kind of information can you not typically find through a search engine?
How does information in a library database compare to information found through a Google search?
Can your favorite search engine
find all there is to find on the Web?
It finds all we need to find, right guys?
Think again.Search engines access a relatively small part of the Web, known as
“The Free Web”
The large part of the Web that search engines can’t access is known as
“The Invisible Web” or “The Deep Web”
• Subscription databases
• Archives• E-Books
Estimated to be 400-550 times size of the visible Web!
What else is not on the Free Web?
Print books still under the protection of copyright. This means around 90% of the books on our library’s shelves!
Full-text, searchable archives of journal, magazine, and newspaper articles
The “Invisible web” is information you cannot retrieve from search results.
Information from this part of the web is not “crawled” by a search engine and is thus “invisible” to a searcher
who does not know it exists.
What’s the Invisible Web?
Search engines “crawl” the web like a spider, indexing some of the information found on a web page.
Indexing is the process of recording that information along with its location so that people can search for it and find it in a database.
A database is really just a collection of information that is indexed or organized for people to search.
Spiders crawl the web?
How do search engines find web pages?
Search engines find some sites through links on existing web pages, and other sites are submitted directly to the search engine by the person who created the website.
It can take 2-6 months for a search engine to crawl a new web site.
Each search engine crawls the web and sorts the information found in a different way.
Why search engines can’t access every page on the
Web:The material is in a database that you have
to pay to access (library databases)
The page is available only after registration
A security firewall prevents access to the page
The information is private and owned by a company or organization
Some more reasons…The page is available by some search engines
but not others. No two search engines are the same.
The search engine crawler does not search a particular file format or non-text interface (Flash files)
The site may be new, and hasn’t been crawled yet.
Some pages just aren’t linked. You must know
the URL.
Smarter search engines
The larger search engines like Google already index information contained within media such as PDF files, word processor documents, spreadsheets, etc.
But some information will never be available for free on the
Internet.
Why can’t we get this stuff
for free?
Publishing is a business!Professional authors are in the business of
earning money for their hard work.
Publishers also expect to earn money from publishing and selling an author’s work.
Copyright laws protect the work of authors, artists, and anyone who creates information or other media.
Your teachers expect quality!
Because material from library books and subscription databases is written by experts,
your teachers may prefer you to use these resources instead of searching the Free Web.
You may even have to defend some of the
sources you find on the free Web!
Let’s try searching for information in Google and in a
library database.
You be the judge!