THU Agasang Evaluating Search Engines

Embed Size (px)

DESCRIPTION

Evaluating Search ENgines

Citation preview

  • 1

    Evaluating Search Engines

    Introduction

    What this research paper is all about

    As a student and future employee, researching is one of the methods to know more about the latest trends and news that will be useful in making decisions for the organization. The emergence of Information Technology made researching a lot easier - people can now communicate and exchange information faster.

    This research paper is all about search engines. It includes the three different kinds of search engines used primarily by users in accessing the Web. I am assigned to evaluate the search engines based on twelve criteria to determine the similarities as well as the differences between these search engines. The findings are compared to one another to weigh which search engine is better. My conclusion at the end of this paper reflects my own opinions -based on my observation on the time of evaluation.

    Furthermore, the outcome of this research paper may help me in choosing the right online search tool in making research studies not in the academe, but also in the workplace in the near future.

    Who I am

    Name

    John Carlo G. Agasang

    Academic Information

    DEGREE PROGRAM, YEAR IN COLLEGE

    BS Business Management, IV

    COURSES WHERE RESEARCH IS USED

    BM 183: Computer Methods and Applications; BM 126: Introduction to Information Systems; Other BM Major subjects; General Education (GE) subjects

    How I conduct research

    Discussion on the methods of research I employ and the frequency I use them

    LIBRARY RESEARCH

    The library contains large collections of books, journals, periodicals, and other educational materials. Preferably, library research is the oldest way to acquire information in making studies. Based on experience, in our English 10 class, we were required to do a library research on a topic of our choosing. In most of all the research studies weve already done, we do library researching.

    FACE-TO-FACE INTERVIEWS

    In-person interview is a common way to obtain first-hand information. There is a direct connection between the researcher and the interviewee. Questions are asked by the interviewer, and the interviewee is expected to answers the questions. It is helpful in extracting private information from the interviewee, e.g. private practices of the

  • 2

    company, common norms inside the department. It is perfectly utilized in BM courses like BM 99.1, BM 99.2, BM 115, BM 141 and BM 142

    SURVEYS

    Surveys are useful in obtaining specific first-hand data. Basically, the purpose is to get measurable data, and surveys are typically done physically or online. The availability of online surveying tools like Google Forms and Survey Monkeys makes surveying faster and easier. It contains templates and advanced capabilities to automatically calculate statistical data. This kind of research is mainly used in Marketing classes to know a customers preference towards a product or service based on different factors.

    ONLINE SEARCH TOOLS

    Using online search tools is used in this research study. I used Google, Dogpile and Internet Archive to explore the World Wide Web. Documentations using screenshots are needed to visually support my observations. Also, I used these engines to look for definitions that are needed in this study. The evaluation was made during allotted class meetings and free time at home.

    Currently, Im using online tools in looking for recent articles for our Term paper in Money and Banking

  • 3

    Search Engines

    What is a search engine?

    Definition

    A web search engine looks for data that match the criteria in a query. Specifically, search engines procure information and organize it in a variety of unique ways. At a basic level, a search engine is one of two things: a Robot or a Directory. A Robot uses a software program to search, catalog, and then organize information on the Internet, while a Directory search engines do not search on the Internet for information but rather obtain it from individuals who enter it into the search engines database.

    My Evaluation of my favourite search engine - Google

    Screen Capture of this search engines Home page

    History: The people/organization behind the search engine

    oogle Inc. is founded by Larry Page and Sergey Brin in 1998. The search engine Google (initially called BackRub) uses

    links to determine the importance of individual webpages. Google is a play word for googol (100 zeros).

    Google primarily started as a private company until 18th of October, 2004. Their Initial Public Offerings of 19,605,052 Class A common stock took place in Wall Street, New York. In the present, Google (the company) continues to expand its offerings to the public from Gmail, Google+, and many more.

    G

  • 4

    Features of this search engine:

    EASE OF USE AND USER-FRIENDLINESS

    Google is known for its ease of use and user-friendliness. Googles homepage is very simple. Its not cluttered. Google display results based on the users preferences by clicking the Search Tools button. The user can tick search filters, number of results per page, etc.

    Google has the ability to predict searches.

    In case the user misspelled a word, Google shows results of corrected spelling of the word.

    Users can also search by voice. To activate this feature, the user should setup the browsers microphone settings.

  • 5

    AVAILABILITY OF HELP

    Google offers help to aid users in their search problems. Search Help is categorized based on Popular Articles, Troubleshoot and Request Removals, Settings, Image search, Mobile, Type of Search results, and Filter and Refine your results.

    Interactive help videos are also offered in the Search Help.

  • 6

    AVAILABILITY OF ADVANCED SEARCH FEATURES

    Google offers advanced search features. Google has a separate page for Advanced Search for pages and images. Users can find pages with words (ma be exact, phrases or not, etc.) and narrow search results (language, domain, file type, etc.)

  • 7

    NUMBER OF RESULTS RETURNED

    Google is known for displaying millions of results for every search a user makes. Google tries to locate the words or phrases in every webpage in no specific order unless the user uses quotation marks to specifically locate the words in particular order.

  • 8

    SPEED THAT RESULTS ARE RETURNED AND DISPLAYED

    In a matter of seconds, Google can display millions of search results. However, speed depends on the Internet connection and machine speed.

  • 9

    RELEVANCE OF RESULTS RETURNED

    Like all search engines, Google is designed to display all web pages that contain the words or phrases the user have typed in the search box. In return, some results are not relevant. However, if the user used the Advanced Search features or Boolean operators, Google will only display relevant results. Google keeps on finding ways to make search results relevant and useful to users, and it uses different search algorithms to deliver better results.

    DIVERSITY OF SOURCES TAPPED

    As for diversity of sources, Google can tap as many sources possible. However, Google is just a simple search engine. It has a limited capability to deep search the Web.

    CLASSIFICATION OF RESULTS BY SOURCE, RELEVANCE, CLOSENESS TO MATCH

    As for Google, Google classifies search results based on different factors. The algorithms used by Google made this possible making search results more accurate and useful.

  • 10

    ABILITY TO AUTOMATICALLY FILTER RESULTS FOR USER-SPECIFIED TERMS

    The use of symbols like quotation marks to filter user-specified terms is accepted by Google. Google identifies the use of quotation as exact word or phrase. Results are refined and filtered making results returned from millions to thousands (or maybe fewer).

  • 11

    ABILITY TO REMOVE DUPLICATE LINKS FROM MULTIPLE SITES

    Google has its parameters in removing duplicate links or contents from multiple sites. In the Search help page, it is clearly expressed that Google tries to index and show pages with distinct information. Google chooses only one from duplicate links, and the selected link is shown in the results.

  • 12

    ABILITY TO CONSTRUCT COMPLEX SEARCH PARAMETERS USING BOOLEAN OPERATORS

    Google accepts complex search parameters using Boolean operators. To do so, the user must follow a certain format, e.g. ONLINE ENTREPRENEURSHIP COURSES +IDEAS.

  • 13

    In the Search Help page, Google enumerated the punctuations and search operators that users can use to make complex search parameters to further filter results.

  • 14

    ABILITY TO LOCATE IMAGES, AUDIO AND VIDEO CLIPS, PDFS AND OTHER FILE FORMATS

    Google can locate other file formats like audios, images, clips, and PDFs. In the search bar, users can select the kind of file they looking for. In searching for PDF files, users must include the word PDF in their search.

  • 15

    Metasearch Engines

    What is a metasearch engine?

    Definition

    A Metasearch engine is a search engine that queries other search engines and then combines the results that are received from all. Once a search is done in a metasearch engine, it gets results from other search engines (Google, Yahoo!, Lycos) and displays it to the users.

    Why use a metasearch engine

    ADVANTAGES AND BENEFITS

    Using metasearch engines in making web searches has advantages and benefits:

    Allows more information to be found since it retrieves information from different search engines

    Saves time in making searches More information can be retrieved in one single search, and it results in

    accessing different sources in just one click

    DISADVANTAGES AND RISKS

    The following are the disadvantages and risk in using metasearch engines:

    Metasearch engines have web partners, and specifically they priotize sponsored links

    Sources are not filtered by relevance at some point Prone to spam searches Decoding syntax and fields are not exact

    My Evaluation of my assigned metasearch engine Dogpile

    Screen Capture of my assigned Metasearch engines Home Page

  • 16

    History: The people/organization behind the metasearch engine

    ogpile search engine uses Metasearch technology. A Metasearch technology displays and filters relevant search results from leading search engines (like Google

    and Yahoo!). It is the flagship metasearch engine of InfoSpace the leading provider of white label search and monetization solutions. WebCrawler and Zoo.com are also metasearch engines by Info Space.

    InfoSpaces metasearch technology searches

    from top marketplaces and then aggregates

    filters the results providing complete and

    partner-branded search. Google and Yahoo!

    mainly supply the search offerings of InfoSpace.

    Features of this metasearch engine:

    EASE OF USE AND USER-FRIENDLINESS

    Dogpile has a very user-friendly GUI. In its homepage, the user can click on what kind of document (a web page, image, video, etc.) he or she is looking for. The homepage interface is also organized. It has a similarity to Google Dogpile also updates its homepage from time to time based on special occasions. At the left side of the page, it has a Recent Searches feature to help users access their history search.

    D

  • 17

    However, its major drawback is showing advertisements first before the search results.

  • 18

    AVAILABILITY OF HELP

    Dogpiles help is located in the About link located in the bottom part of the homepage. It contains Frequently Asked Questions about Metasearch, Search Preferences, and other features of Dogpile.

  • 19

    Under Using Dogpile Search, it explains the different sections of the search results page, how to make results more accurate, and discuss the distinct feature of Dogpile.

  • 20

    AVAILABILITY OF ADVANCED SEARCH FEATURES

    Dogpile doesnt have a separate page for Advanced Search features. However, it has a feature called IntelliFind. The purpose of this content is to help you find your answers and results more quickly.

    NUMBER OF RESULTS RETURNED

    Dogpile, since it is a metasearch engine, returns more results than Google. Metaseach engines retrieve different results from other search engines to display more results. However, Dogpile has no indicator on how many results are returned and displayed.

    SPEED THAT RESULTS ARE RETURNED AND DISPLAYED

    Dogpile also returns and displays results in a short time like Google. It lacks a meter on how much time it takes to get and combine results from other search engines.

    RELEVANCE OF RESULTS RETURNED

    As for relevance of results, the metasearch technology of Dogpile is plausible. As said, the IntelliFind feature of Dogpile tries to compare the search term to data from top media content providers. It also based the results on the popularity of the search term entered. It also recommends relevant matching content.

    DIVERSITY OF SOURCES TAPPED

    Based on diversity of sources, since Dogpile is a metasearch engine - it gets all relevant and related information from different content providers. It gives the user more results options. However, as mentioned above, Dogpile prioritizes showing sponsored links in web searches.

  • 21

    CLASSIFICATION OF RESULTS BY SOURCE, RELEVANCE, CLOSENESS TO MATCH

    Dogpile does not particularly categorize and classify results based on source, relevance and closeness to match. It just displays results based on the popularity of the search term from top content providers.

    ABILITY TO AUTOMATICALLY FILTER RESULTS FOR USER-SPECIFIED TERMS

    Dogpile supports the use of user-specified terms. Search results containing the specific word or phrases are highlighted.

  • 22

    However, it still highlights all the words in the phrase provided in the search.

    ABILITY TO REMOVE DUPLICATE LINKS FROM MULTIPLE SITES

    Dogpile clearly stated in their About page that they remove duplicate links to display only the relevant results that matches the search term.

  • 23

    ABILITY TO CONSTRUCT COMPLEX SEARCH PARAMETERS USING BOOLEAN OPERATORS

    Dogpile recognizes the use of Boolean operators in making searches.

  • 24

    However, it does not particularly remove word or phrases with a minus operator to make search more refined.

  • 25

    ABILITY TO LOCATE IMAGES, AUDIO AND VIDEO CLIPS, PDFS AND OTHER FILE FORMATS

    Dogpile offers corresponding links on what kind of file (images and videos, in particular) is needed by the user.

  • 26

    However, Dogpile has a limitation in locating other file types like PDF.

    What other features would be nice to have?

    Dogpile would be nice if t has the following features:

    Show web results first before advertisements Ability to predict searches Number of results meter Retrieving speed meter Advance search feature

    How it compares with my favorite search engine

    Based on GUIs, Google and Dogpile are quite much alike. They both update their homepages during special occasions. The availability of help regarding search tips for both engines is highly useful especially in making refined results. Despite of these similarities, Googles capabilities are way advanced than Dogpile. Google keeps on improving its search algorithms. Also, Google can access more files in different formats compared to Dogpile.

  • 27

    Did I like the metasearch engine I tried out?

    Yes. Dogpile is also easy to use. Its capability as a metasearch engine is commendable, because it uses a different technology to make search results relevant to users. However, I am not happy with the fact that Dogpile prioritizes advertisements than the retrieved search results.

    Will I shift to this metasearch engine?

    I will still use Google as my primary online search tool. Googles features are far more advanced than the features of Dogpile. Still, the assigned metasearch engine I evaluated has its own characteristics and futures that will be very helpful in my future researches. Probably, I will use Dogpile in cases I need to save time and find more reliable sources.

  • 28

    Deep Web Search Engines

    What is a deep web search engine?

    Definition

    A Deep Web search engine is a special kind of engine in which it allows users explore the invisible part of the web. The Deep Web is the part of the Internet that is inaccessible to conventional search engines. Deep Web searching is accessing the deepest part of the Internet including private databases and dead links. One example of a Deep Web Search Engine is Internet Archive.

    Why use a deep web search engine

    ADVANTAGES AND BENEFITS DISADVANTAGES AND RISKS

    Tons of search results can be retrieved in one search

    Retrieving search results may take some time

    Can locate thousands (or even millions) of useful information to individuals and organizations

    Accessing private databases may result to intrusion of privacy

    Students can access different scholarly journals and educational resources

    Sorting only relevant retrieved results may take some time especially if the user would want to read and scan all the results

    My Evaluation of my assigned Deep Web search engine Internet Archive

    Screen Capture of my assigned Deep Web Search engines Home Page

  • 29

    History: The people/organization behind this Deep Web search engine

    he Internet Archive is a Deep Web Search engine; a charitable non-profit organization that is founded to build an Internet library, with the purpose of offering

    permanent access for researchers, historians, and scholars to historical collections that exist in digital format. Founded in the year 1996, it is currently located in San Francisco, California, and receives tons of data donations every day.

    Internet Archive contains billions of web pages (including dead links), thousands of movies, audios and many other file formats. Internet Archive is continuously growing bigger every day.

    Features of this Deep Web search engine:

    EASE OF USE AND USER-FRIENDLINESS

    Internet Archives GUI is very pleasing to the eye. It is organized, neat, and simple. It is also user-friendly. Internet Archives homepage contains all the necessary links for the user. Its vast collections of web pages make Internet Archive more attractive. It has two search bars: The Universal Access to Knowledge and the WayBack Machine.The Universal Access to Knowledge search bar is used in deep web searching in Archive.

    Wayback Machine search bar, on the other hand, is mainly used in searching web links. Billions of links (including dead links) can be accessed in the Wayback Machine search bar.

    T

  • 30

    AVAILABILITY OF HELP

    On the Help page of Internet Archive, it contains FAQs about the site, search tips, and many other popular questions about Archive.

  • 31

    AVAILABILITY OF ADVANCED SEARCH FEATURES

    Internet Archive contains a separate Advanced Search page to refine and filter searches from the deep web. Only one field is required to be filled in.

    The Advance Search page also contains an advance search for websites based on formats (xml, JSON).

  • 32

    At the lower part of the page, it contains notes on how to use the feature.

  • 33

    NUMBER OF RESULTS RETURNED

    Like Google, Internet Archive has an indicator on how many results are returned and displayed. However, unlike Google it has less number of results.

  • 34

    Results returned in Internet Archive are deeply refined based on relevance, views, date archived, and creator. Results based on file types, collections, and topics are also numbered.

    SPEED THAT RESULTS ARE RETURNED AND DISPLAYED

    Unlike in Google and Dogpile, it is much slower to display search results in Internet Archive. Considering the number of web pages to look at, having a deep web search takes time (especially it is also dependent on Internet speed and machine speed).

    RELEVANCE OF RESULTS RETURNED

    Results returned in Internet Archive, based on observation, are relevant to the search term. It has the capacity to sort the results even more by ticking the Sort by bar on the top of the search results. However, Internet Archive doesnt have a supporting statement on how relevant the search hits are.

  • 35

    DIVERSITY OF SOURCES TAPPED

    Internet Archive is a collection of different libraries that contains millions of webpages, files, and other formats for scholars, librarians and public users.

    CLASSIFICATION OF RESULTS BY SOURCE, RELEVANCE, CLOSENESS TO MATCH

    At the right side of the search result page, results are categorized based on collections (where the pages come from), file types, and topics related.

  • 36

    ABILITY TO AUTOMATICALLY FILTER RESULTS FOR USER-SPECIFIED TERMS

    Automatic filtering for user-specified terms is supported by Internet Archive. Even Archive contains a technology to refine search, using quotation marks can maximize this ability.

  • 37

    ABILITY TO REMOVE DUPLICATE LINKS FROM MULTIPLE SITES

    Since refinements of results are a priority of Internet Archive, removing duplicate links are easy. Also, Internet Archive is a collection of different scholarly libraries. Donated data in the Archive are comprehensively selected by the donors themselves.

    ABILITY TO CONSTRUCT COMPLEX SEARCH PARAMETERS USING BOOLEAN OPERATORS

    Internet Archive does not support the use of +, - and = signs. It uses the words and and or in using Boolean searches. Internet Archive has its own Complex Search language which is only available for Collections.

  • 38

    ABILITY TO LOCATE IMAGES, AUDIO AND VIDEO CLIPS, PDFS AND OTHER FILE FORMATS

    In every search made, Internet archives can locate file formats like audios, videos, PDFs and other file formats.

  • 39

    What other features would be nice to have?

    Internet Archive, besides from its existing features, it would be nice if it has the following:

    A Most searched links list (in Wayback Machine search bar) in the homepage Retrieval speed meter Support the use of Boolean operators in making searches Peer Reviewed option for scholarly articles and journals

    How this Deep Web search engine compares with my favorite search engine

    Google and Internet Archive have lots of similarities. Both have a clean homepage, sort search results based on different categories and relevance, ability to show the number of results retrieved, ability to categorize files according to formats. Despite these good characteristics, based on my observation, Google can retrieve more information than Internet Archive. I was surprised that Google can show millions of results than Archive. I guess it is because of the search topics provided all are recent topics in Business. Archive developers continuously build the libraries, and I think that is a factor why it shows fewer results than what Google can retrieve.

    Did I like the Deep Web search engine I tried out?

    Yes. I love the simplicity and features of Internet Archive. Also, I admire the purpose of the people behind Internet Archive to provide an open and free access to literature that is essential to education and learning.

    Will I shift to this Deep Web search engine?

    I will not totally shift to Internet Archive, but will definitely use it in making thorough research. Internet Archive is still in the process of building different libraries, and it will be in that state forever. Also, it is not conventional to use Internet Archive in making daily searches.

  • 40

    Summary and Conclusions

    Summary

    Differences between a search engine, metasearch engine and a Deep Web search engine

    Search Engine

    A web search engine looks for data that match the criteria in a query. Specifically, search engines procure information and organize it in a variety of unique ways. At a basic level, a search engine is one of two things: a Robot or a Directory. A Robot uses a software program to search, catalog, and then organize information on the Internet, while a Directory search engines do not search on the Internet for information but rather obtain it from individuals who enter it into the search engines database.

    Metasearch Engine

    A Metasearch engine is a search engine that queries other search engines and then combines the results that are received from all. Once a search is done in a metasearch engine, it gets results from other search engines (Google, Yahoo!, Lycos) and displays it to the users.

    Deep Web Search Engine

    A Deep Web search engine is a special kind of engine in which it allows users explore the invisible part of the web. The Deep Web is the part of the Internet that is inaccessible to conventional search engines. Deep Web searching is accessing the deepest part of the Internet including private databases and dead links.

    Features that made my assigned metasearch engine and deep web search engine perform better than my favorite search engine

    Dogpiles Features that made it perform better than Google:

    Recent Searches IntelliFind technology

    Internet Archives Features that made it perform better than Google:

    Wayback Machine Sorting results by collections Number of results based on file formats

    Features that made my assigned metasearch engine and deep web search engine perform worse than my favorite search engine

    Dogpiles Features that made it perform worse than Google:

    Prioritizing advertisements before web results Lack of Advance Search feature Limited use of Boolean operators in making searches Inability to easily locate PDF files and other file formats

    Internet Archives Features that made it perform worse than Google:

    Low number of search hits

  • 41

    Limited use of Boolean Operators Lack of statement regarding relevance of results

    Conclusion

    Will I begin using the metasearch engine or the deep web search engine I evaluated instead of my favorite search engine for research?

    No. I will definitely stick to Google as my primary online search tool. Based on my evaluation, the people behind Google are having a lot of studies and tests different algorithms to make test even more reliable. However, I am not closing my doors in using Dogpile and Internet Archive. Both are useful in research - Dogpile in saving time and Internet Archive for accessing large collection of scholarly articles and journals.