Comp II....Unit 1

Embed Size (px)

Citation preview

  • 7/27/2019 Comp II....Unit 1

    1/31

  • 7/27/2019 Comp II....Unit 1

    2/31

    BBA IV Sem/CAII/Unit-1

    The Louvre's website also has links to the sites of other museums, such as the Vatican

    Museum. When you click on that link, you access the web server for the Vatican Museum. In

    this way, information scattered across the globe can be linked together.

    The "glue" that holds the Web together is called hypertext and hyperlinks. This feature

    allows electronic files on the Web to be linked so you can jump easily between them. On theWeb, you navigate through pages ofinformation--commonly known as browsing

    or surfing--based on what interests you at that

    particular moment.To access the Web you need a web browser,

    such as Netscape Navigator or Microsoft

    Internet Explorer. How does your web

    browser distinguish between web pages andother types of data on the Internet? Web

    pages are written in a computer language

    called Hypertext Markup Language orHTML.

    Some Web History

    The World Wide Web was originally developed in 1990 at CERN, the European

    Laboratory for Particle Physics. The original idea came from a young computer scientist, Tim

    Berners-Lee. It is now managed by The World Wide Web Consortium.

    The WWW Consortium is funded by a large number of corporate members, includingAT&T, Adobe Systems, Inc., Microsoft Corporation and Sun Microsystems, Inc. Its purpose is

    to promote the growth of the Web by developing technical specifications and reference softwarethat will be freely available to everyone. The Consortium is run by MIT with INRIA (The French

    National Institute for Research in Computer Science) acting as European host, in collaboration

    with CERN.

    The National Center for Supercomputing Applications (NCSA) at the University of

    Illinois at Urbana-Champaign, was instrumental in the development of early graphical software

    utilizing the World Wide Web features created by CERN. NCSA focuses on improving theproductivity of researchers by providing software for scientific modeling, analysis, and

    visualization. The World Wide Web was an obvious way to fulfill that mission. NCSA Mosaic,one of the earliest web browsers, was distributed free to the public. It led directly to the

    phenomenal growth of the World Wide Web.

  • 7/27/2019 Comp II....Unit 1

    3/31

    BBA IV Sem/CAII/Unit-1

    World Wide Web Vs Internet

    Many a times we do not make a distinction between the Internet and the World Wide

    Web. Though they are related to each other, they are not the same. The Internet is a massivenetwork that connects millions of computers across globe. Whereas, web is a way by which the

    information is accessed over the Internet. Information over the Internet travels from computer to

    computer via protocols. While sending electronic mails Internet uses SMTP protocol, whilesharing files (files can be text, images, video or MP3), the Internet uses FTP protocol and whileexchanging web related information (i.e. hypertext information) it uses HTTP protocol. But web

    uses HTTP protocol to transmit the data, share the web pages (hyperlink documents) and

    exchange the business logic. It utilizes the browser such as Internet Explorer or NetscapeNavigator, to display the hypertext documents. Web therefore, can be said to be a portion of the

    Internet.

    Domain Name System (DNS)

    Format. IP numbered addresses are difficult to remember. People are better in remembering

    names and mnemonics (symbols, letters etc). Therefore, numbered addresses have been mappedinto name, which consists of the host name and a domain (the group to which the computer

    belongs). The general format of domain name system is given below: -

    Host Name. Second Level Domain Name. First Level Domain Name

    Where,

    (a) Host Name is the name of the service provider or network name, e.g., VSNL.(b) Domain Name signifies the kind of organisation. Some of the organisational and

    geographic domain names are given in table.

    Rules: The rules that are followed for mapping numbered IP addresses into DNS scheme are: -

    (a) The DNS is distributed hierarchical naming system.(b) A node on the DNS can be named by traversing the tree from itself to the root. At

    each node, the name is added and a period (.) is appended to it until the root is

    reached.

    (c) Each node can have any number of child nodes but only one parent node. Childnodes must have different names to ensure a unique naming system.

    (d) All the letters used in the name of a node must be lower case with no spacebetween the dots (periods).

    Figure shows the domain name (address) of a node with a name APJ. A domain name

    Server on Internet keeps a directory of all the nodes on it.

  • 7/27/2019 Comp II....Unit 1

    4/31

    BBA IV Sem/CAII/Unit-1

    FIG: DOMAIN NAME

    Organizational & Geographical Domain Names

    com

    Commercial Organizationedu

    Educational

    govGovernment Agencies

    mil Military Organisation

    com Commercial Organisation

    net Sites which perform some administrative functions for the Net

    org Non Profit Organization

    au Australia

    ca Canada

    es Spain

    fr France

    hk Hong Kong

    in India

    jp Japan

    uk United Kingdom

    us United States

    com inmil

    edu

    gov

    net

    vsnl

    apj

    yahooapj.vsnl.net.in

    com inmil

    edu

    gov

    net

    vsnl

    apj

    yahooapj.vsnl.net.in

    com inmil

    edu

    gov

    net

    vsnl

    apj

    yahooapj.vsnl.net.in

    com inmil

    edu

    gov

    net

    vsnl

    apj

    yahooapj.vsnl.net.in

  • 7/27/2019 Comp II....Unit 1

    5/31

    BBA IV Sem/CAII/Unit-1

    IP Addressing

    Every host and router on the Internet has a unique IP address which encodes its network

    number and host number. No two machines or routers can have same IP address. The addressingscheme on Internet uses IPv4 (Internet protocol version four), which is a 32-bit IP addressing

    scheme. In this scheme, 32 bits are divided into four groups of 8-bit each joined by a period (i.e.,

    8 bits.8 bits.8 bits.8 bits). With eight bits 256 (28

    ) numbers can be represented. Thus, eacheight-bit group can represent numbers from 0 to 255. A typical IP address will appear like137.00.2.11. Based on this addressing scheme, networks connected on Internet have been

    classified into five types as shown in figure.

    FIG: IP ADDRESSING SYSTEM

    URL (Uniform Resource Locator)

    A string of characters that specify the address of a Web page.

    The browsers display is hypertext that contains pointers to the other documents. The

    pointers are implemented using a concept that is central to Web browsers called UniformResource Locator. URL can be thought of as a network extension of standard file name concept

    except that in this case the file and its directory can exist on any computer on the network.

    Typing a URL in the location area and hitting the return key will cause the browser to attempt to

    retrieve that page. If the browser is successful in finding the page, the browser will display it.This high-level explanation does not, however, convey any of the details of what is happening.

    To go from a URL to having the Web page displayed, the browser needs to be able to answer

    such questions as:

    How can the page be accessed? Where can the page be found? What is the file name corresponding to the page?

    8 2416 32

    0 Network Host1.0.0.0 to

    127.255.255.255

    10 Network Host

    110 Network Host

    1110 Multicast Address

    128.0.0.0 to

    191.255.255.255

    192.0.0.0 to

    223.255.255.255

    224.0.0.0 to

    239.255.255.255

    11110240.0.0.0 to

    247.255.255.255

    A

    B

    C

    D

    E Reserved for future use

    126 Networks with 16 mil hosts

    16,382 Networks with 64 K hosts

    2 mil Networks with 254 hosts

    Range of H ostsClass

    8 2416 32

    0 Network Host1.0.0.0 to

    127.255.255.255

    10 Network Host

    110 Network Host

    1110 Multicast Address

    128.0.0.0 to

    191.255.255.255

    192.0.0.0 to

    223.255.255.255

    224.0.0.0 to

    239.255.255.255

    11110240.0.0.0 to

    247.255.255.255

    A

    B

    C

    D

    E Reserved for future use

    126 Networks with 16 mil hosts

    16,382 Networks with 64 K hosts

    2 mil Networks with 254 hosts

    Range of H ostsClass

  • 7/27/2019 Comp II....Unit 1

    6/31

    BBA IV Sem/CAII/Unit-1

    The URL is designed to incorporate sufficient information to resolve these questions.Quite naturally, then, the URL has three parts. We can view the format of a URL as follows:

    how://where/what

    OR we can say in other words, URL contains three parts: the first describes the type ofresource (protocol), second part gives the name of server housing the resource, the third part

    gives the full file name of resource i.e. directory, subdirectory and file name. The format is:

    protocol://domain name of server/directory name/sub-directory name/file name

    At this point, it is helpful to consider a sample URL to illustrate the three parts:

    http://pubpages.uminn.edu/index.html

    Let us break this example down into its component.

    1. http-: Defines the protocol or schema by which to access the page. In this case, theprotocol is Hyper Text Transfer Protocol. This protocol is the set of rules by which anHTML document is transferred over the Web.

    2. pubpages.uminn.edu-: Identifies the domain name of the computer where the pageresides. The computer is a Web server capable of satisfying page requests. Just as a

    waiter serves food, a Web server serves Web pages. The name pubpages.uminn.edu tells

    the browser on which computer to find the Web page. In this case, the computer islocated at the University of Minnesota.

    3. index.html-: Provides the local name (usually a filename) uniquely identifying thespecific page. If no name is specified, the Web server where the page is located may

    supply a default file. On many systems, the default file is named index.html or index.htm.

    This example demonstrates that the URL consists of a protocol, a Web servers domain name,and a file name.

    Entering a URL in the location field of the browser will bring up the designated Web page,

    barring any problems. For example, if the Web page has moved to another machine or has beenremoved, or if you type an invalid URL, or if the server you are trying to access is unavailable,

    an error message will be displayed. Another way to retrieve a Web page is to mouse over and

    click on a hyperlink in the Web page that is currently being displayed.

    In the URL example presented earlier, the protocol to access the page was http. This is usedfor transferring an HTML document. Much of the power of browser is that they are

    multiprotocol. That is they can retrieve and render information from a variety of servers and

    sources. The given table provides a summary of other common protocols:

    Protocol Name Use Example

    ftp File Transfer ftp://ftp.bio.umaine.edu

    gopher Gopher gopher://gopher.tc.umn.edu/11/libraries

    http Hypertext http://www.chem.uab.edu/pauling/argon.html

    telnet Remote Login telnet://www.amnesty.org

    Mail to Sending E-mail mailto:[email protected]

    http://pubpages.uminn.edu/index.htmlftp://ftp.bio.umaine.edu/http://gopher//gopher.tc.umn.edu/11/librarieshttp://www.chem.uab.edu/pauling/argon.htmlhttp://telnet//www.amnesty.org/mailto:[email protected]:[email protected]://telnet//www.amnesty.org/http://www.chem.uab.edu/pauling/argon.htmlhttp://gopher//gopher.tc.umn.edu/11/librariesftp://ftp.bio.umaine.edu/http://pubpages.uminn.edu/index.html
  • 7/27/2019 Comp II....Unit 1

    7/31

    BBA IV Sem/CAII/Unit-1

    Concept of Protocol

    For any network to exist, there must be connections between computers and agreements(protocols) about the communication language. However, setting up connection and agreements

    between disparate computers (PCs to mainframe) is complicated by the fact that over the last

    decade, systems have become increasingly heterogeneous in their software and hardware as well

    as their intended functionality. A range of standards for networking, called protocol stacks hasbeen developed.

    A Protocol standard allows heterogeneous computers to talk to each other. Protocolstacks are software that performs variety of actions necessary for data transmission between

    computers. Protocol stacks are set of rules for inter computer communication that has been

    agreed upon and implemented by many vendors, users and standard bodies. The protocol stackworks by residing either in a computers memory or in the memory of transmission device like

    a network interface card. When data is ready for transmission it puts the data on the wire. At the

    receiving end, it takes the data off the wire and prepares the data for the application, taking off

    the error control information that was added at the transmission end. Internet Uses TCP/IP

    (Transmission Control Protocol/ Internet Protocol) as a protocol.

    Web Caching

    Web caching is the storage of Web objects near the user to allow fast access, thus

    improving the user experience of the Web surfer. Examples of some Web objects are Web pages

    (the HTML itself), images in Web pages, etc. Web objects can be cached locally on the userscomputer or on a server on the Web.

    Browser cache: Browsers cache Web objects on the users machine. A browser first looks for

    objects in its cache before requesting them from the website. Caching frequently used Web

    objects speeds up Web surfing. For example, I often use google.com and yahoo.com. If theirlogos and navigation bars are stored in my browsers cache, then the browser will pick them up

    from the cache and will not have to get them from the respective websites. Getting the objectsfrom the cache is much faster than getting them from the websites.

    Web objects can have an expiry time associated with them after which the object is considered tobe stale. A stale object is not used. If the object in the cache is stale, then it is equivalent to

    the object not being in the cache. An expiry date can be specified in the http header of a Web

    object. The expiry date is specified using EXPIRES and CACHE-CONTROL http headers.

    What are the Advantages of Web Caching?

    Web caching has the following advantages:

    Faster delivery of Web objects to the end user. Reduces bandwidth needs and cost. It benefits the user, the service provider and the

    website owner.

    Reduces load on the website servers.

  • 7/27/2019 Comp II....Unit 1

    8/31

    BBA IV Sem/CAII/Unit-1

    Web Server

    Web servers are computers that deliver (serves up) Web pages. In other words we can

    say, a web server is a computer that stores the web pages and gives them to the client wheneverasked for. When a client or the browser sends request message, it searches for the domain name.

    Every Web server has an IP address and possibly a domain name. For example, if you enter

    the URL http://www.pcwebopedia.com/index.htmlin your browser, this sends a request to theWeb server whose domain name ispcwebopedia.com. The server then fetches the pagenamed index.htmland sends it to your browser.

    Any computer can be turned into a Web server by installing server software and connecting the

    machine to the Internet. There are many Web server software applications, including publicdomain software from NCSA and Apache, and commercial packages

    from Microsoft, Netscape and others.Proxy Server

    A server that sits between a client application, such as a Web browser, and a real server.

    It intercepts all requests to the real server to see if it can fulfill the requests itself. If not, itforwards the request to the real server.

    In computer networks, a proxy server is a server (a computer system or an application) that acts

    as an intermediary for requests from clients seeking resources from other servers. A client

    connects to the proxy server, requesting some service, such as a file, connection, web page, orother resource available from a different server. The proxy server evaluates the request according

    to its filtering rules. For example, it may filter traffic by IP address or protocol. If the request is

    validated by the filter, the proxy provides the resource by connecting to the relevant server andrequesting the service on behalf of the client. A proxy server may optionally alter the client's

    request or the server's response, and sometimes it may serve the request without contacting the

    specified server. In this case, it 'caches' responses from the remote server, and returns subsequent

    requests for the same content directly.

    Proxy servers have two main purposes:

    Improve Performance: Proxy servers can dramatically improve performance for groupsof users. This is because it saves the results of all requests for a certain amount of time.Consider the case where both user X and user Y access the World Wide Web through a

    proxy server. First user X requests a certain Web page, which we'll call Page 1.

    Sometime later, user Y requests the same page. Instead of forwarding the request to theWeb server where Page 1 resides, which can be a time-consuming operation, the proxy

    server simply returns the Page 1 that it already fetched for user X. Since the proxy server

    is often on the same network as the user, this is a much faster operation. Real proxy

    servers support hundreds or thousands of users. The major online services suchas America Online, MSN and Yahoo, for example, employ an array of proxy servers.

    Filter Requests: Proxy servers can also be used to filter requests. For example, acompany might use a proxy server to prevent its employees from accessing a specific setof Web sites.

  • 7/27/2019 Comp II....Unit 1

    9/31

    BBA IV Sem/CAII/Unit-1

    Firewall

    A system designed to prevent unauthorized access to or from a private network.

    Firewalls can be implemented in both hardware and software, or a combination of both.Firewalls are frequently used to prevent unauthorized Internet users from accessing private

    networks connected to the Internet, especially intranets. All messages entering or leaving the

    intranet pass through the firewall, which examines each message and blocks those that do notmeet the specified security criteria.There are several types of firewall techniques:

    Packet filter: Looks at each packet entering or leaving the network and accepts orrejects it based on user-defined rules. Packet filtering is fairly effective and transparent to

    users, but it is difficult to configure. In addition, it is susceptible to IP spoofing.

    Application gateway: Applies security mechanisms to specific applications, such

    as FTP and Telnet servers. This is very effective, but can impose a performance

    degradation.

    Circuit-level gateway: Applies security mechanisms when a TCP or UDP connectionis established. Once the connection has been made, packets can flow between the hosts

    without further checking.

    Proxy server: Intercepts all messages entering and leaving the network. The proxy

    server effectively hides the true network addresses.

    In practice, many firewalls use two or more of these techniques in concert. A firewall is

    considered a first line of defense in protecting private information. For greater security, data can

    be encrypted.

    Web Portal

    A Web portalorpublic portalrefers to a Web site or service that offers a broad array of

    resources and services, such as e-mail, forums, search engines, and online shopping malls. The

    first Web portals were online services, such as AOL, that provided access to the Web, but bynow most of the traditional search engines have transformed themselves into Web portals to

    attract and keep a larger audience.

    An enterprise portalis a Web-based interface for users of enterprise applications. Enterprise

    portals also provide access to enterprise information such as corporate databases, applications

    (including Web applications), and systems.

    Home Page

    This is the starting point or front page of a Web site. This page usually has some sort of

    table of contents on it and often describes the purpose of the site. For example,http://www.apple.com/index.html is the home page of Apple.com. When you type in a basic

    URL, such as "http://www.cnet.com," you are typically directed to the home page of the Web

    site. Many people have a "personal home page," which is another way the term "home page" canbe used.

  • 7/27/2019 Comp II....Unit 1

    10/31

    BBA IV Sem/CAII/Unit-1

    Web Page and Web Site

    Web pages are what make up the World Wide Web. These documents are written in

    HTML (hypertext markup language) and are translated by your Web browser. Web pages caneither be static or dynamic. Static pages show the same content each time they are viewed.

    Dynamic pages have content that can change each time they are accessed. These pages are

    typically written in scripting languages such as PHP, Perl, ASP, or JSP. The scripts in the pagesrun functions on the server that return things like the date and time, and database information.All the information is returned as HTML code, so when the page gets to your browser, all the

    browser has to do is translate the HTML.

    Please note that a Web page is not the same thing as a Web site. A Web site is acollection of pages. A Web page is an individual HTML document. This is a good distinction to

    know, as most techies have little tolerance for people who mix up the two terms.Cookies

    A cookie, also known as an HTTP cookie, web cookie, orbrowser cookie, is used for

    an origin website to send state information to a user's browser and for the browser to return thestate information to the origin site. The state information can be used for authentication,

    identification of a user session, user's preferences, shopping cart contents, or anything else that

    can be accomplished through storing text data on the user's computer.

    Cookies cannot be programmed, cannot carry viruses, and cannot install malware on thehost computer. However, they can be used by spyware to track user's browsing activitiesa

    major privacy concern that prompted European and US law makers to take action. Cookies can

    also be stolen by hackers to gain access to a victim's web account.

    Browsers

    A Web browser is a program that your computer runs to communicate with the Web

    servers on the Internet, which enables it to download and display the Web pages that you request.

    A Web browser is an interface between the user and the internal working of the Internet.Browsers are referred as Web clients or universal clients as they follow the principle of clientserver technology where the browser is the client.

    On typing a URL in the address window or by following hyperlinks; the browser contacts

    the server by sending a request for the required information. After receiving this information thebrowser displays it on the Web page in the users window.

    At a minimum, a Web browser must understand HTML and display text. In recent years,

    however, Internet users have came to expect a lot more. A state-of-the-art Web browser provides

    a full multimedia experience, complete with pictures, sound, video, and even 3-D imaging.Because a Web browser has the ability to interpret or display so many types of files; you

    often may use a Web browser even when you are not connected to the Internet. Windows 98, for

    example, uses Internet Explorer to open most image files.There are many types of browsers; you can obtain a comprehensive list of the same from

    the web sitewww.browsers.com. The most popular browsers; by far; are Netscape Navigator and

    Microsoft Internet Explorer. Both are state-of-the-art browsers; and the competition betweenthem is fierce.

    Both Navigator and Internet Explorer are available over the Internet at no charge.

    Microsoft designed Internet Explorer for the Windows operating system, but it is now available

    http://www.browsers.com/http://www.browsers.com/http://www.browsers.com/http://www.browsers.com/
  • 7/27/2019 Comp II....Unit 1

    11/31

    BBA IV Sem/CAII/Unit-1

    for Macintosh and some UNIX system, as well. Navigator is available for Windows, Macintosh,UNIX, and Linux operating system.

    Features of a Good Browser

    1. The most important feature of a web browser is the presentation of web pages withoutdistortion.

    2. The browser should support multimedia features like sound, video, etc.3. It should support also forms and frames. Frames divide web pages into sections, thus

    improving readability.

    4. A good should have the ability to open multiple windows.5. Latest browsers support Active X technology, Java, VRML and other plug-in support.6. E-mail, News, and FTP support should also be extended.7. Last but not the least, certain amount of security features like the ability to block the

    access to certain Web pages should also exist.

    Internet

    The Internet - Interconnected Networks - is the most well known component of

    the Information Super Highway (I-Way) infrastructure. Today, Internet is an information

    distribution system spanning several continents. Its general infrastructure targets not only one

    electronic commerce application, such as video-on-demand or home shopping, but a wide rangeon computer-based services, such as e-mail, EDI, information publishing, information retrieval

    and video conferencing. Simply put, the Internet environment is unique combination of postal

    service, telephone system, research library, supermarket and talk show center that enables peopleto share and purchase information. Internet is viewed as a prototype for emerging I- way of

    which it will become one component.

    Internet began around 1965 when US Department of defence (DOD) financed the design

    of a computer network to link a handful of universities and military research laboratoriescalledAdvance Research Project Agency Net work (ARPA net). In mid 1980's National Science

    Foundation (NSF) took over the control, when defence traffic moved from ARPA net to MIL

    net. In 1987, the NSF created NSF net. In 1991, commercial Internet started using NSFbackbone. In 1995, NSF net was decommissioned and modern Internet came into existence.

    Internet Administration

    The Internet, with its roots primarily in the research domain, has evolved and gained a

    broader user base with significant commercial activity. Various group that coordinate Internet

    issues have guided and development. Figure shows the general organization of Internet

    administration.

  • 7/27/2019 Comp II....Unit 1

    12/31

    BBA IV Sem/CAII/Unit-1

    Internet Society (ISOC)

    The Internet Society (ISOC) is an international, non-profit organization formed in 1992to provide support for the Internet standards process. ISOC accomplishes this through

    maintaining and support other Internet administrative bodies such as IAB, IETF, IRTF and

    IANA. ISOC also promotes research and other scholarly activities relating to the Internet.

    Internet Architecture Board (IAB)

    The Internet Architecture Board (IAB) is the technical advisor to the ISOC. The mainpurpose of the IAB is to oversee the continuing development of the TCP/IP Protocol Suit and to

    serve advisory capacity to research members of the Internet community. IAB accomplishes this

    through its two primary components, the Internet Engineering Task Force (IETF) and the

    Internet Research Task Force (IRTF). Another responsibility of the IAB is the editorialmanagement of the RFCs. IAB is also the external liaison between the Internet and other

    standards organizations and forum.

    Internet Engineering Task Force (IETF)

    The Internet Engineering Task Force (IETF) is a forum of working groups managed by

    the Internet Engineering Steering Group (IESG). IETF is responsible for identifying operationalproblems and proposing solutions to these problems. IETF also develops and reviews

    specifications intended as Internet standards. The working groups are collected into areas, and

    each area concentrates on a specific topic. Currently nine areas have been defined, although thisis by no means a hard and fast number. The areas are:

    Applications Internet Protocols Routing Operations

  • 7/27/2019 Comp II....Unit 1

    13/31

    BBA IV Sem/CAII/Unit-1

    User Services Network Management Transport Internet protocol next generation (IPng) Security

    Internet Research Task Force

    The Internet Research Task Force (IRTF) is a forum of working groups managed by the

    Internet Research Steering Group (IRSG). IRTF focuses on long term research topics related to

    Internet protocols, applications, architecture, and technology.

    Internet Assigned Numbers Authority (IANA) and Internet Corporation for Assigned Names and

    Numbers (ICANA)

    The Internet Assigned Numbers Authority (IANA), supported by the U S government,

    was responsible for the management of Internet domain names and addresses until October 1998.

    At that time the Internet Corporation for Assigned Names and Numbers (ICANA), a private non-profit corporation managed by an international board, assumed IANA operations.

    Network Information Center (NIC)

    The Network Information Center (NIC) is responsible for collecting and distributing

    information about TCP/IP protocols.

    History of Internet

    1960s Telecommunications:- Essential to the early Internet concept was packet

    switching; in which data to be transmitted is divided into small packets of information andlabeled to identify the sender and recipient. The packets were sent over a network and then

    reassembled at their destination. If any packet did not arrive or was not intact, the original sender

    was requested to resend the packet.

    ARPANET, 1969:- In 1969, Bolt, Beranek, and Newmann, Inc., (BBN) designed anetwork called the ARPANET for the United States Department of Defense. The

    military created ARPA to enable researchers to share super-computing power. It

    was rumored that the military developed ARPANET in response to the threat of anuclear attack destroying the countrys communication system.

    1970s Telecommunications:- In this decade, the ARPANET was used primarily by the

    military, some of the larger companies, such as IBM, and universities. The general populationwas not yet connected to the system and very few people were on line at work.

    The use of Local Area Networks (LANs) became more prevalent during the 1970s. Alsothe idea of an open architecture was promoted; that is, networks making up the ARPANET could

    have any design. In later years, this concept had a tremendous impact on the growth of the

    ARPANET.

    Twenty Three Nodes, 1972:- By 1972, the ARPANET was international, with nodesin Europe at the University College in London, England, and the Royal Radar

    Establishment in Norway. The number of nodes on the network was up to 23, and the

  • 7/27/2019 Comp II....Unit 1

    14/31

    BBA IV Sem/CAII/Unit-1

    trend would be for that number to double every year from then on. Ray Tomlinson,who worked at BBN, invented e-mail.

    UUCP, 1976:- AT & T Bell Labs developed UNIX to UNIX copy. In 1977, UUCPwas distributed with UNIX.

    USENET, 1979:- User Network (USENET) was starting by using UUCP to connectDuke University and the University of North Carolina at Chapel Hill. Newsgroup

    emerged from this early development.

    1980s Telecommunications:- In this decade, Transmission Control Protocol/InternetProtocol (TCP/IP), a set of rules governing how networks making up the ARPANET

    communicate, was established. For the first time, the term Internet was being used to describe

    the ARPANET. Security became a concern, as virus appeared. As the Internet became longer,the Domain Name System was developed; to allow the network to expand more easily by

    assigning names to host computers in distributed fashion.

    CSNET, 1980:- The computer Science network (CSNET) connected all UniversityComputer Science departments in the United States. Computer Science departments

    were relatively new and only a limited number existed in 1980. CSNET joined the

    ARPANET in 1981. BITNET, 1981:- The Because Its Time Network (BITNET) formed at the City

    University of New York and connected to Yale University. Many mailing lists

    originated with BITNET.

    TCP/IP, 1983:- The United States Defense Communication Agency required thatTCP/IP be used for all ARPANET hosts. Since TCP/IP was distributed at no charge,

    the Internet became what is called an open system. This allowed the Internet to grow

    quickly, as all connected computers were now speaking the same language. Centraladministration was no longer necessary to run the network.

    NSFNET, 1985:- The National Science Foundation Network (NSFNET) was formedto connect the National Science Foundations (NSFs) five super-computing centers.

    This allowed researchers to access the most powerful computers in the world, at atime when large, powerful, and expensive computers were a rarity and generally

    inaccessible.

    The Internet Worm and IRC, 1988:- The virus called Internet Worm (created byRobert Morris while he was a computer science graduate student at CornellUniversity) was released. It infected 10 percent of all Internet hosts. Also in this year,

    Internet Relay Chat (IRC) was written by Jarkko Oikarinen.

    NSF Assumes Control of the ARPANET, 1989:- NSF took over control of theARPANET in 1989. This changeover went unnoticed by nearly all users. Also, thenumber of hosts on the Internet exceeded the 1,00,000 mark.

    1990s Telecommunications:- During the 1990s, lots of commercial organizations startedgetting on-line. This stimulated the growth of the Internet like never before. URLs appeared intelevision advertisements and, for the first time, young children went on-line in significant

    numbers.

    Graphical browsing tools were developed, and the programming language HTMLallowed users all over the world to publish on what was called the World Wide Web. Millions of

    people went on-line to work, shop, bank, and be entertained. The Internet played a much more

    significant role in society, as many nontechnical users from all walks of life got involved withcomputers. Computer literacy and Internet courses sprang up all over the world.

  • 7/27/2019 Comp II....Unit 1

    15/31

    BBA IV Sem/CAII/Unit-1

    Gopher, 1991:- Gopher was developed at the University of Minnesota, whose sportsteams mascot is the Golden Gopher. Gopher allowed you to go for or fetch files onthe Internet using a menu based system. Many Gophers sprang up all over the

    country, and all types of information could be located on Gopher servers. Gopher is

    still available and accessible through Web browsers, but its popularity has faded; forthe most part, it is only of historical interest. (gopher://gopher.well.sf.ca.us/)

    World Wide Web, 1991:- The World Wide Web (WWW) was created by TimBerners-Lee at CERN (a French acronym for the European Laboratory for Particle

    Physics), as a simple way to publish information and make it available on theInternet.

    WWW Publicly Available, 1992:- The interesting nature of the Web caused it tospread, and it became available to the public in 1992. Those who first used the system

    were immediately impressed.

    Netscape Communications, 1994:- The company called Netscape Communications,formed by Marc Andreessen and Jim Clark, released Netscape Navigator, a Web

    browser that captured the imagination of everyone who used it. The number users of

    this software grew at a phenomenal rate. Netscape made its money largely through

    advertising on its Web pages. Yahoo, 1994:- Stanford graduate students David Filo and Jerry Yang developed their

    Internet Search Engine and directory called Yahoo, which is now world famous.

    Java, 1995:- The Internet programming environment, Java, was released by SunMicrosystems, Inc. This language, originally called Oak, allowed programmers todevelop Web pages that were more interactive.

    Microsoft Discovers the Internet, 1995:- The software giant committed many of itsresources to developing its browser, Microsoft Internet Explorer, and Internet

    applications.

    Netscape Releases Sources Code, 1998:- Netscape Communications released thesource code for its Web browser.

    Internet Services

    The Internet provides a mechanism for millions of computers to communicate, but what kind

    of information is transmitted? Many services are available over the Internet, and the followingare the most popular ones.

    1) E-Mail:- Enables people to send private message, as well as files, to one or more otherpeople.

    2) Mailing Lists:- Enable group of people to conduct group conversations by E-mail, andprovide a way of distributing newsletters by E-mail.

    3) On-line Chat:- Provides a way for real time online chatting to occur, whereby participantsread each others message within seconds of when they are sent.

    4) Voice and video conferencing:- Enable two or more people to hear and see each other andshare other applications.

    5) The World Wide Web:- A distributed system of interlinked pages that include text,pictures, sound, and other information.

    6) File Transfer:- Lets people download files from public file servers, including a widevariety of programs.

  • 7/27/2019 Comp II....Unit 1

    16/31

    BBA IV Sem/CAII/Unit-1

    7) Remote Login:- There are two programs that allow you to login to another computer froman a/c in which you are already logged, they let you use and interact with s/w on remote

    machine. To do this, you will need a second computer a/c and password that is accessible

    to you.8) Internet Telephony:- As the name suggest, Internet Telephony involves the usage of the

    Internet to transmit real time audio from personal computer to another(or in some

    instance to other telephone itself)9) USENET:- It is a bulletin board service featuring a large no of discussion groups

    involving millions of people around the world.

    10)Archie:- It is an indexing service like library. The large number of FTP server andarchieved on the number of archie server on Internet.

    11)Gopher:- Before web came into existence University of Minnesota, developed a systemcalled Gopher connecting Universities, Colleges and Government Authorities. Gopher

    system is based on set of related menus. The entire interconnected Gopher servers are

    collectively known as Gopher Space.12)Veronica:- It provides the archies services to Gopher. Veronica services are not

    necessary always easier and faster as gopher server are widely distributed.

    13)WAIS:- It is an Internet Service which looks for specific information from Internetdatabases. Searching is done by keywords and source documents are indexed for fastretrieval.

    Basic Structure of Internet

    Internet is the network of networks. Basic elements of Internet and associated

    components are shown schematically in figure. Various terms have the following meanings: -

    (a) Internet Service Provider (ISP):- ISP acts as an interface between end-users (whichcould be a stand alone PC or LANs) and Internet. ISP acts as main crossing of the town,

    which allows traffic to come out of the town and join the national highway. ISP hasrouters and severs, through which it connects end-users to Internet backbone. For all

    problems and management at end-user level, an end-user interacts with ISP only.

    (b)Router:- A special purpose computer that directs the packets of data along a network.

    (c) Gateway:- ISP gets connected to Internet's backbone through a Gateway. A Gateway

    functions as a door to enter the Internet backbone. It connects number of ISPs to Internetbackbone. In India VSNL has been the sole Gateway service provider until recently.

    However, private operators are now permitted to provide Gateway services.

    (d) Internet Backbone:- Internet backbone is high bandwidth (high speed) fiber opticcable - on which numbers of routers are in place - and is managed through Network

    Operations Center of Internet. The Internet backbone is of different bandwidth in various

    segments.

    The basic elements of Internet are a user (standalone PC or a LAN), ISP, routers,

    gateways and Internet backbone. Thus, an end-user wishes to establish link with another user on

  • 7/27/2019 Comp II....Unit 1

    17/31

    BBA IV Sem/CAII/Unit-1

    LAN, goes through his LAN - ISP - Gateway and gets connected to distant end user throughGateway - ISP - LAN (refer figure).

    FIG: BASIC ELEMENTS OF INTERNET

    Intranet

    An intranet is a private network (usually a LAN, but may be larger) that uses TCP/IP andother Internet standard protocols. Because it uses TCP/IP, the standard Internet

    communications protocol, an intranet supports TCP/IP-based protocols, such as HTTP (the

    protocol that web browsers use to talk to web servers), and SMTP and POP (the protocols that e-

    mail programmes use to send and receive mail). In other words, an intranet can run web servers,web clients, mail servers, and mail clients. An intranet is a network for a single organization

    with following features: -

    It uses Internet technologyBrowser & TCP/IP All services available on Internet can be implemented on intranet It could be implemented on a single LAN or a combination of LANs It could be implemented on a MAN or WAN Intranet need not be connected with Internet (for outside connectivity it can be

    through the Internet)

    R

    R

    RR

    RR R

    R

    Network

    Operation

    Centre

    Internet

    Backbone

    Buil d

    ing

    Rout e

    rsLAN I

    LAN II Server

    Router

    ISP

    Gateway

    PC

    PC

    PC

    PC

    Serv

    er

    Stand Alone PC

    LAN III

    Ga

    teway

    LAN

    ISP

    LAN

    LAN

    RR

    RR

    RRRR

    RRRR RR

    RR

    Network

    Operation

    Centre

    Network

    Operation

    Centre

    Internet

    Backbone

    Buil d

    ing

    Rout e

    rs

    Buil d

    ing

    Rout e

    rsLAN ILAN I

    LAN IILAN II ServerServer

    RouterRouter

    ISP

    GatewayGateway

    PCPC

    PCPC

    PCPC

    PCPC

    Serv

    er

    Serv

    er

    Stand Alone PC

    LAN III

    Ga

    teway

    Ga

    teway

    LANLAN

    ISP

    ISP

    LANLAN

    LAN

    LAN

  • 7/27/2019 Comp II....Unit 1

    18/31

    BBA IV Sem/CAII/Unit-1

    It is a private Internet of an organisationArchitecture of Intranet

    The architecture of intranet is shown in figure. A simplified intranet consists of

    following components: -

    (a) Workstations & Client Software. A PC with any Operating System (Win 95, 98,

    Mac, Unix) that supports networking can be connected on intranet as a workstation. In

    addition to other application programmes, workstations run client software that providesthe user with access to network servers. On an intranet a client software will typically

    include (depending upon the services provided) a browser (MS Internet explorer,

    Netscape Navigator), e-mail client (outlook Express), newsreaders, chat or FTP clients.

    These clients may be integrated with the OS or add-on.(b) Servers, NOS & Server Software. This is an important area of intranet in respect

    of hardware and software requirements, viz.,

    (i) The servers provide services to the workstations connected with the

    intranet. A network server is required to manage the LAN. Besides this,

    depending on the services to be provided servers would be required, e.g., Web

    server, mail server, FTP server, application servers and printer server.

    (ii) Network Operating System (Windows NT, Unix, and Linux) is required to

    run on Network server. Client part of NOS would require to be run onworkstations.

    (iii) Server software includes web server, mail server etc. (depending on the

    server & services required). Many intranet server programmes run on Unix andsome on NT. Lots offreeware and shareware server programmes are available

    for Unix server programmes. Windows NT server comes with a Web server (MS

    Internet Information Server).

    (iv) Intranet also needs middleware, the software that provide the access to

    database from a web browser, e.g., calls to the database programme to read and

    write records.

    (c) Network Cards, Cabling, Switches/Hubs. These are the components that are

    required to setup LAN. Commonly used network adapter card isEthernet, most common

    configuration of LAN is star topology and commonly used cables are CAT-5 or CAT-6UTP cables.

    (d) Security Systems (Firewall). If intranet is connected to the Internet, we need to

    control the kind of information that can pass between intranet and Internet. Thehardware, software and procedures that provide access control make up a firewall.

    Firewall systems are of two categories, viz.,

  • 7/27/2019 Comp II....Unit 1

    19/31

    BBA IV Sem/CAII/Unit-1

    FIG: ARCHITECTURE OF INTRANET

    (i) Network-Level Firewalls. These firewalls examine only the headers of

    each packet of information passing to or from the Internet. The firewall accepts

    or rejects packets based on the packets sender, receiver and port number (eachInternet service, such as e-mail or WWW has different pot number). For

    example, firewall might allow e-mail/Web packets to and from any computer onthe intranet, but allow remote login packets to and from only selectedcomputers.

    (ii) Application-Level Firewalls. These firewalls handle packets for eachInternet service separately, usually by running a programme called proxy server,

    which accepts e-mail, Web, Chat, newsgroup and other packets from computers

    on the intranet, strips off the information that identifies the packet and passes it

    along to the Internet or vice versa. When the replies return, the proxy serverpasses the replies back to the computer that sent the original message. To the rest

    of the Internet, all packets appear to be from the proxy server, so no information

    leaks out about the individual computers on your intranet. A proxy server cankeep a log of all packets that pass by. The proxy server can be configured toallow one-way login and disallow the other way.

    Switch

    FTP

    Server

    News

    Server

    Email

    Server

    WWW

    Server

    Network

    Server

    Application

    Server

    Router

    Firewall

    Internet

    Public Domain

    Corp

    LAN

    Corporate

    Intranet

    Router

    Firewall

    External LAN

    OR User

    SwitchSwitch

    FTP

    Server

    News

    Server

    Email

    Server

    WWW

    Server

    Network

    Server

    Application

    Server

    FTP

    Server

    News

    Server

    Email

    Server

    WWW

    Server

    Network

    Server

    Application

    Server

    RouterRouter

    FirewallFirewall

    Internet

    Public Domain

    Internet

    Public Domain

    Corp

    LAN

    Corp

    LAN

    Corporate

    Intranet

    RouterRouter

    FirewallFirewall

    External LAN

    OR User

  • 7/27/2019 Comp II....Unit 1

    20/31

    BBA IV Sem/CAII/Unit-1

    Advantages and Disadvantages of an Intranet

    LANs and intranets both let you share hardware, software, and information by connecting

    computers together. You dont need an intranet to share files and printers, or to send e-mailamong the people on your network: an LAN can do those jobs. The following are some reasons

    to convert a LAN to an intranet, or to connect your computers together into an intranet: -

    (a) Intranets Use Standard Protocols. Internet protocols such as TCP/IP are used on ahuge number of diverse computers. More development is happening for Internet-based

    communication than other types of communication. For example, intranet users can

    choose from a wide variety of e-mail programmes, because so many have been writtenfor the Internet.

    (b) Intranets are Scalable. TCP/IP works fine on the Internet, which has millions of

    host computers. So you dont have to worry about your network outgrowing itscommunications protocol.

    (c) Intranet Components are relatively Cheap and some are free. Because theInternet started as an academic and military network (rather than a commercial one),

    there is a long tradition of free, cheap, and cooperative software development. Some of

    the best Internet software is free, including Apache (the most widely used web server),

    Pegasus, and Eudora Lite (two excellent e-mail client programmes).

    (d) Intranets enable you to set up Internet-style Information Services. You can have

    your own private web, using web servers on your intranet to serve web pages to membersof your organisation only. You can also support chat, Usenet, telnet, FTP, or other

    Internet services privately on your network. Push technology (web channels) can deliver

    assignments, job status, and group schedules to the users desktop via his or her browser.

    (e) Intranets let People Share their Information. Everyone in your organisation can

    make their information available to other employees by creating web pages for the

    intranet. Because many word processing programmes can now save documents as webpages, creating pages for an intranet does not require a lot of training. Rather thanprinting and distributing reports, people can put them on the intranet and send e-mail to

    tell everyone where the report is stored.

    Of course, intranets have some disadvantages too, including these: -

    (a) Intranets Cost Money. You may need to upgrade computers, buy new software,

    run new cabling, and teach people to use the new systems.

    (b) People in your organization may waste time. If you connect your intranet to the

    Internet, people may spend hours a week watching sports results or checking their stockoptions. Even if you dont connect to the Internet, people can use the intranet to build

    web sites about the company softball team and send e-mail about upcoming baby

    showers. Youll need policies in place to determine how the intranet may be used.

  • 7/27/2019 Comp II....Unit 1

    21/31

    BBA IV Sem/CAII/Unit-1

    What can you do with an Intranet?

    Many organisations, especially those with large existing computer systems, have lots of

    information that is hard to get at. The intranet can change all that, by using Internet tools. Hereare some ideas/ways that your organisationlarge or smallcan use as an intranet.

    (a) E-mail within the organisation and to and from the Internet. People can use onee-mail programme to exchange mail both with other intranet users and with the Internet.

    (b) Private Discussion Groups. Using a mailing list manager or a news server

    accessible only to people in your organisation, you can set up mailing lists or newsgroupsto encourage people to share information within departments or across the organisation.

    (c) Private Websites. Each department in your organisation can create a website that

    is accessible only to people on the intranet. Instead of circulating memos and handbooks,information can go on these web sites. For example, the human resource department can

    post all employee policies, job postings, and upcoming training opportunities. The

    marketing department can post information about products, including upcoming releasedates, how products are targeted, and other information that is not appropriate for a public

    site on the Internet-based web. Every department can post web pages to shore its

    information with the other departments in the organisation. By using the intranet instead

    of printing on paper, it is economical to publish large documents and document thatchange frequently.

    (d) Access to Legacy Databases. If your organisation has information that is lockedaway in an inaccessible database, you can convert the information to web pages so that

    everyone on the intranet can see it. (Legacy systems are those considered outdated by

    whoever is describing the system). For example, a non-profit organisation might have a

    proprietary database containing all of its fundraising and membership information. Byusing a programme that can display database information as web pages and enter

    information from web page forms into the database, all the people at the non-profit

    organisation can see, and even update, selected information from the database by usingonly a web browser. Naturally, the programme would need to limit that could see andchange particular information in the database.

    (e) Teleconferencing. Rather than spend huge amount on video teleconferencingsystems, think about using your intranet (and the Internet), instead. If your organisation

    has offices in several locations, you can use the Internet for online chats with text, voice,

    and even limited video.

    Security Policies

    In addition to a firewall, you need to take steps to make sure that the intranet is usedappropriately in your organisation: -

    (a) Establish acceptable-use Policies. Post rules for using the intranet, including theuse of e-mail, the web, and discussion groups both within the intranet and on the Internet.

  • 7/27/2019 Comp II....Unit 1

    22/31

    BBA IV Sem/CAII/Unit-1

    (b) Monitor usage. It does not mean to suggest that you look over everyonesshoulders while they use the intranet, but make sure that someone monitors the content of

    the intranets web sites and discussion groups. Look for copyright infringements,

    personnel issues, and security lapses.

    (c) Close the door behind Departing Employees. When someone leaves the

    organisation, make sure that a system is devised to close the persons accounts, changepasswords, and deny other access to the intranet.

    (d) Be Vigilant about Data in general, not just about the intranet. The intranets

    connection to the Internet can certainly be a security hazard, but important data can alsowalk out your organisations door on a diskette in someones pocket, in a fax, or many

    other ways.

    Extranet

    An extranet is a network that links selected resources of the intranet of a company with

    its customers, suppliers and other business partners. Main features of extranet are: -

    (a) The link between the intranet and its business partners is achieved throughTCP/IP, the standard Internet protocol.

    (b) The extranet is an extended intranet, which isolates business communication fromopen Internet through secure solutions.

    (c) Extranets provide the privacy and security of an intranet while retaining theglobal reach of the Internet.

    (d) Extranets use cryptography and authorization procedures for securing data flowsbetween intranets through the Internet.Extranet connects intranets of business partners, suppliers, financial services, distributors,

    customers etc by an agreement between collaborating partners. The emphasis is on allowingaccess to authorized groups through strictly controlled mechanism.

    Extranets have led to true proliferation of e-commerce and act as an engine for B2B

    collaboration. It is the combination of intranets and extranets, which has established the virtualcorporation paradigm. This new virtual paradigm of e-commerce allows corporations to take

    advantage of any market opportunity anywhere, anytime and offering customized services and

    products. It is this combination that provides the technological backbone for strategic advantage

    to organizations in terms of reach, intensity, response time and innovative skills.

    Architecture of Extranet

    12. Figure shows the basic architecture of an intranet with its extension to one LAN or a

    single user. This makes it an extranet. Similar logic can be extended to make it general

    infrastructure of extranet plus intranets as shown in figure-2.

  • 7/27/2019 Comp II....Unit 1

    23/31

    BBA IV Sem/CAII/Unit-1

    FIG: ARCHITECTURE OF EXTRANET

    Components of Extranet

    Since extranet is an extension of intranet, the additional hardware and software that is

    needed to extend an intranet, is: -

    (a) Firewall servers and their software(b) Router(c) Internet connection (at least ISDN)

    Basic Level Applications of Extranet

    The basic level applications of extranet are given below: -

    S No Service Applications

    1 Secure e-mail For B2B Communications

    2 Usenet Services Bulletin board services, one-to-many info

    exchange, EDI messages, floating tenders

    3 Mailing List Private one-to-many e-mail, online newsletter,

    Internet

    Public

    DomainISP

    ISP

    ISP

    ISP

    Intranet

    Company A

    Location 1

    Intranet

    Company A

    Location 2

    Intranet

    Company B

    Intranet

    Company C

    Intra

    netC

    ompanyA

    Extra

    netC

    ompB

    &C

    Extranet CompA&

    C

    Extra

    net

    CompA

    &B

    Internet

    Public

    Domain

    Internet

    Public

    DomainISP

    ISP

    ISP

    ISP

    Intranet

    Company A

    Location 1

    Intranet

    Company A

    Location 2

    Intranet

    Company B

    Intranet

    Company C

    Intra

    netC

    ompanyA

    Extra

    netC

    ompB

    &C

    Extranet CompA&

    C

    Extra

    net

    CompA

    &B

  • 7/27/2019 Comp II....Unit 1

    24/31

    BBA IV Sem/CAII/Unit-1

    discussion group

    4 File Transfer (FTP) Exchange of data between supply chains, between

    Corp HQ & various companies, customer support

    & sales data

    5 Conferencing & Chat Electronic meetings

    6 Remote login (Telnet) Access to databases & ERP software

    7 Calendar Scheduling tasks

    ISP

    An ISP is a company that supplies Internet connectivity to home and business customers.

    ISPs support one or more forms of Internet access, ranging from traditional modem dial-up to

    DSL and cable modembroadband service to dedicated T1/T3 lines.More recently, wireless Internet service providers or WISPs have emerged that offer

    Internet access through wireless LAN or wireless broadband networks.

    In addition to basic connectivity, many ISPs also offer related Internet services like

    email, Web hosting and access to software tools.A few companies also offer free ISP service to those who need occasional Internet

    connectivity. These free offerings feature limited connect time and are often bundled with some

    other product or service.ISP Architecture

    As stated earlier, for availing the Internet services, each user must be connected to an

    ISP. For each modem at the user end, there is corresponding modem at the ISP. ISP has number

    of servers for each service that it provides. The versatility of the ISP can be measured by the

    number and type of services (in terms of value addition) provided by it to its customers. Figure

    shows the typical ISP architecture.

    FIG: ARCHITECTURE OF AN ISP

    WAIS

    Server

    Gopher

    Server

    News

    Server

    WWW

    Server

    Email

    Server

    Appl

    Server

    Mod

    Mod

    Mod

    Mod

    ISDN

    Mod

    ISDN

    Terminal

    Server

    Dial-up

    Terminal

    Server

    Modem Farm

    Verify User log-in &

    Password

    BillingServer

    Router

    connection

    To Internet

    WAIS

    Server

    WAIS

    Server

    Gopher

    Server

    Gopher

    Server

    News

    Server

    News

    Server

    WWW

    Server

    WWW

    Server

    Email

    Server

    Email

    Server

    Appl

    Server

    Appl

    Server

    Mod

    Mod

    Mod

    Mod

    ISDN

    Mod

    ISDN

    Terminal

    Server

    Dial-up

    Terminal

    Server

    Modem Farm

    Verify User log-in &

    Password

    BillingServerBillingServer

    Router

    connection

    Router

    connection

    To Internet

  • 7/27/2019 Comp II....Unit 1

    25/31

    BBA IV Sem/CAII/Unit-1

    Searching

    Searching the World Wide Web

    With the advent of the World Wide Web came the wide spread availability of on-line

    information. It is no longer necessary to travel to the library to find the answer to a question or

    engage in research on a specialized topic. Much of what you might want to know is availabilitythrough the web. Since any one can publish on the web, the range of topics that can be found isnearly all encompassing. However, while a lot of information is available on-line, not all of it is

    completely accurate.

    In all likelihood, the answers to your questions are some where on the Web, but how do

    you locate them? In the early days of the Web, unless you knew exactly where to look, you had

    trouble finding what you wanted. Unlike a library, the pages on the Web are not as neatly

    organized as books on shelves, nor are Web pages completely cataloged in one central location.Even knowing where to look for information is not a guarantee that you will find it, since Web

    page addresses are constantly changing. Usually, a forwarding address is provided for a page that

    has moved, but it may only be available for a short time.

    The rapid growth of the Web, as well as its huge size, has ruled out trying to keep track

    manually of What is what and What is where. As people were spending their time trying to

    find things on the Web, rather than actually reading the material they were after, the firstdirectories and search engines were being developed. These tools allow you to find information

    more quickly and easily. You have probably already been using these tools, but perhaps not as

    effectively as possible.

    Methods of Searching

    1. Directories:- The first method of finding and organizing Web information is the directoryapproach. A Web directory or Web guide is a hierarchical representation of hyperlinks. The top

    level of the directory typically provides a wide range of very general topics, such as arts,

    automobiles, education, entertainment, news, science, sports, and so on. Each of these topics is ahyperlink that leads to more specialized subtopics. They in turn have a number of subtopics, andso on until you reach a specific web page.

    In addition to being very easy to use, another benefit of a directory structure is you need

    not know exactly what you are looking for in order to find something worthwhile. You select thecategory for the topic in which you are interested. You continue to move down through

    hierarchy, selecting subcategories and narrowing the search at each level, until you are presented

    with a list of hyperlinks that pertain to your topic.

    As you begin with zero in on your topic, you may find other interesting items of whichyou were previously unaware. On the other hand, you may reach the bottom of the directory

    without finding the information you were after. In such case, you may need to back track, going

    up several levels and then proceeding down again. Of course, it is possible that the directory youare searching does not contain the information you want, in this case you may decide to try either

    a different directory or a search engine.

    When traversing a directory downward, you are moving toward more specific topics.When going upward, you are heading back to more general topics. Directories are useful if you

    want to explore a tpic and its related areas, or if you want to research a subject, but not at a very

    detailed level.

  • 7/27/2019 Comp II....Unit 1

    26/31

    BBA IV Sem/CAII/Unit-1

    If you are interested in a very specific topic, you may want to start off by using a searchengine or a meta search engine. Arriving at a very specific topic in a directory structure involves

    traversing between five and ten hyperlink level.

    Note that while the directory structure is logically organized as a hierarchy, a specificWeb page may occur in many different parts of the hierarchy. There is usually more than one

    way to reach a given page.

    Popular Directories

    AOL NetFind - www.aol.com/netfind CNET Search.com - www.search.com Excite - www.excite.com Infoseek - www.infoseek.com Looksmart - www.looksmart.com Lycos - www.lycos.com Magellan - www.mckinley.com Yahoo - www.yahoo.com Rediff - www.rediff.com

    2. Search Engine:- The second approach to organizing information and locating information on

    the Web is a search engine, which is a computer program that does the following:

    (a)Allow you to submit a form containing a query that consists of a word or phrasedescribing the specific information you are trying to locate on the Web.

    (b)Searches its database to try to match your query.(c)Collates and returns a list of click able URLs containing presentations that match your

    query; the list is usually ordered, with the batter matches appearing at the top.

    (d)Permits you to revise and resubmit a query.A number of search engines also provide URLs for related or suggested topics.

    Many people find that search engines are not as easy to use as directories. To use a search

    engine, you supply a query by entering information into a field on the screen. To be effective,that is, to have the search engine return a small list of URLs on your topic of interest, you oftenneed to be very specific. To pose such queries, you must learn the query syntax of the search

    engine with which you are working. Learning the syntax so that you can phrase effective and

    legal queries often requires that you read and understand the documentation accompanying thesearch engine. A hyperlink to the documentation is usually provided next to the query field, and

    example queries are often given.

    Once you learn to use a specific search engine query language effectively, you can

    quickly zoom in on very narrow topics, this is the advantage of a search engine. Thedisadvantages are that you have to learn the query language and you have to learn a search

    strategy.

    The user-friendliness and power of query languages vary from search engine to searchengine. We recommend you try several of them and then learn the syntax of one search engines

    query language. Since each search engine searches a different database, you would be best off

    learning about a search engine that has indexed an gauge this by posing similar queries to anumber of search engines and seeing which one finds the best matches.

    http://www.aol.com/netfindhttp://www.aol.com/netfindhttp://www.search.com/http://www.search.com/http://www.excite.com/http://www.excite.com/http://www.infoseek.com/http://www.infoseek.com/http://www.looksmart.com/http://www.looksmart.com/http://www.lycos.com/http://www.lycos.com/http://www.mckinley.com/http://www.mckinley.com/http://www.yahoo.com/http://www.yahoo.com/http://www.rediff.com/http://www.rediff.com/http://www.rediff.com/http://www.yahoo.com/http://www.mckinley.com/http://www.lycos.com/http://www.looksmart.com/http://www.infoseek.com/http://www.excite.com/http://www.search.com/http://www.aol.com/netfind
  • 7/27/2019 Comp II....Unit 1

    27/31

    BBA IV Sem/CAII/Unit-1

    Popular Search Engines

    AOL NetFind - www.aol.com/netfind Excite - www.excite.com Infoseek - www.infoseek.com Looksmart - www.looksmart.com Lycos - www.lycos.com Magellan - www.mckinley.com Yahoo - www.yahoo.com Rediff - www.rediff.com AltaVista - altavista.digital.com Hot Bot - www.hotbot.com Google - www.google.com Web Crawler - www.webcrawler.com

    3. Meta Search Engine:- A meta search engine or all-in-one search engine performs a search by

    calling on more than one other search engine to do the actual work. The results are collated,

    duplicate retrievals are eliminated, and the results are ranked according to how well they matchyour query. You are then presented with a list of URLs.

    The advantage of a meta search engine is that you can access a number of different search

    engines with a single query. The disadvantage is that you will often have a high noise-to-signalratio; that is a lot of matches will not be of interest to you. This means you will need to spend

    more time evaluating the results and deciding which hyperlinks to follow.

    For very specific, hard to locate topics, meta search engines can often be a good starting

    point. For example, if you try to locate a topic using your favorite search engine, but fail to turn

    up anything useful, you may want to query a meta search engine.

    Popular Meta Search Engine

    Meta Search - www.metasearch.com Meta Crawler - www.metacrawler.com Meta Find - www.metafind.com Savvy Search - guaraldi.cs.colostate.edu:2000

    4. Web Ring:- A web ring is community of related Web pages that are organized into a circular

    ring. Each page in a ring has links that enable visitors to move to an adjacent site on the ring,

    access a ring index or jump to a random site. Web sites are added continuously to the web rings.Each ring is managed from one of the sites. Web rings are fun to visit, but they do not contain the

    volume of information of the other search tools. Currently, web rings are available on many

    topics, including acrobatics, religion, Spanish Hotels, Disney Land, medieval studies. Most webrings are devoted to games. Web ring home page at www.webring.com contains more

    information on the web rings and how to search web rings. Another devoted to web rings is the

    ring surf site, located atwww.ringsurf.com.

    http://www.aol.com/netfindhttp://www.aol.com/netfindhttp://www.excite.com/http://www.excite.com/http://www.infoseek.com/http://www.infoseek.com/http://www.looksmart.com/http://www.looksmart.com/http://www.lycos.com/http://www.lycos.com/http://www.mckinley.com/http://www.mckinley.com/http://www.yahoo.com/http://www.yahoo.com/http://www.rediff.com/http://www.rediff.com/http://www.hotbot.com/http://www.hotbot.com/http://www.google.com/http://www.google.com/http://www.webcrawler.com/http://www.webcrawler.com/http://www.metasearch.com/http://www.metasearch.com/http://www.metacrawler.com/http://www.metacrawler.com/http://www.metafind.com/http://www.metafind.com/http://www.webring.com/http://www.webring.com/http://www.ringsurf.com/http://www.ringsurf.com/http://www.ringsurf.com/http://www.ringsurf.com/http://www.webring.com/http://www.metafind.com/http://www.metacrawler.com/http://www.metasearch.com/http://www.webcrawler.com/http://www.google.com/http://www.hotbot.com/http://www.rediff.com/http://www.yahoo.com/http://www.mckinley.com/http://www.lycos.com/http://www.looksmart.com/http://www.infoseek.com/http://www.excite.com/http://www.aol.com/netfind
  • 7/27/2019 Comp II....Unit 1

    28/31

    BBA IV Sem/CAII/Unit-1

    Search Terminology

    Here are a few common search related terms we should know about.

    Search Tool:- Any mechanism for locating information on the Web, usually refers to asearch or meta search engine or a directory.

    Query:-Information entered into a form on a search engines Web page that describes theinformation being sought. Query need not be a question. Invariably a word or a phrase is

    used. A phrase is put within the quotes e.g. Indian Tigers. Query Syntax:- A set of rules describing what constitutes a legal query. On some search

    engines, special symbols may be used in a query. Syntax defines the grammar of the

    query writing. Each search engine may have different syntax rules that are available in

    Help menu of the search engine. Query Semantics:- A set of rules that defines the meaning of a query. Page View:- The viewing of one specific HTML file without counting any graphics or

    other items on the page is referred to as page view rate.

    Hit/Match:- A URL that a search engine returns in response to a query. Commonlythought of as the number of times a page on a web site is requested by a browser but thisis not accurate. Hits also includes the number of times all other files, such as graphic,

    images are viewed. For example, if your home page has nine graphics on it, each time

    someone views your home page, the log file registers one hit for the HTML file and ninehits for the graphics, for a total of ten hits. Because the term hits has such an

    ambiguous meaning, most people are now measuring traffic in terms of page views. Visit:- All the pages viewed by a user within a continuous session, which can include a

    single HTML file or a visit that lasts for a given duration, is called visit.

    Relevancy Score:- A value that indicates how close a match a URL was to a query;usually expressed as a value from 1 to 100, with the higher score meaning more relevant.

    Search Engine Components

    If you understand how a search tool works, there is a good chance you will be able to useit more effectively. For the most part, these same ideas apply to directories; the main differenceis that the hierarchical organizational structure and categorizations for directories need to be in

    place and displayed. The references include additional information about how directories are put

    together.To describe how a search engine works, we split up its functions into a number of

    components: user interface, searcher, and evaluator.

    User Interface:- The screen in which you type a query and which displays the search results.Searcher:- The part that searches a database for information to match you query.

    Evaluator:- The function that assigns relevancy scores to the information.

    In addition, a search engines database is created using the following.

    Gatherer:- The component that traverses the Web, collecting information about pages.Indexer:- The function that categorizes the data obtained by the gatherer.

  • 7/27/2019 Comp II....Unit 1

    29/31

    BBA IV Sem/CAII/Unit-1

    For comparison, think of the different facets of a typical library, such as a acquisitions,cataloging, indexing, and on-line searching.

    User Interface:- The user interface must provide a mechanism by which a user can submitqueries to the search engine. This is universally done using forms. In addition, the user interface

    needs to display the results of the search in a convenient way. The user should be presented with

    a list of hits from their search, a relevancy score for each hit and a summary of each page thatwas matched. This way, the user can make an informed choice as to which hyperlinks to follow.

    Searcher:- The searcher is a program that uses the search engines index and database to see if

    any matches can be found for the query. Your query must first be transformed into a syntax thatthe searcher can process. Since the databases associated with search engines are extremely large

    (with perhaps 25,000,000 to 50,000,000 indexed pages), a highly efficient search strategy must

    be applied.

    Evaluator:- The searcher locates any URLs that match your query. The hits retrieved by your

    query are called the result set of the search. Not all of the hits will match your query equally

    well. For example, a query about Honey Bees might be matched by a page containing thephrase Honey Bees in the following sentence:

    Ants, honey bees, and crickets are all insects.

    Or by the page title

    Everything You Ever Wanted To Know About Honey Bees.

    Clearly, in most cases, it would be better to rank this second page much higher, as it

    probably contains many more references to Honey Bees.

    The ranking process is carried out by the evaluator, a program that assigns a relevancy

    score to each page in the result set. The relevancy score is an indication of how well a given page

    matched your query.How is the relevancy score computed by the evaluator? This varies from search engine to

    search engine. A number of different factors are involved, and each one contributes a different

    percentage towards the overall ranking of a page. Some of the factors typically considered are:

    a) How many times the words in the query appear in the page.b) Whether or not the query words appear in the title.c) The proximity of the query words to the beginning of the page.d) Whether the query words appear in the CONTENT attribute of the meta tag.e)

    How many of the query words appear in the documents.

    Some search engines also consider other factors in computing a relevancy score. Each

    factor is weighted, and a value is computed that rates the page. The values are usuallynormalized and are assigned numbers between 1 and 100, with 100 representing the best possible

    match. As part of the user interface, the result set and relevancy scores computed by the

    evaluator are displayed for the user. With the best matches appearing first. Hyperlinks to each hitare provided and a short description of the page is usually given.

  • 7/27/2019 Comp II....Unit 1

    30/31

    BBA IV Sem/CAII/Unit-1

    Gatherer:- A search engine obtains its information by using a gatherer, a program that traversesthe Web and collects the information about web documents. The gatherer does not collect the

    information every time a query is made. Rather the gatherer is run at regular intervals, and it

    returns information that is incorporated into the search engines database and is indexed.Alternate names for gatherer are bot, crawler, robot, spider, and worm.

    Indexer:- Once the gatherer retrieves information about Web pages, the information is put into adatabase and indexed. The indexer function creates a set of keys (an index) that organizes thedata, so that high-speed electronic searches can be conducted and the desired information can be

    located and retrieved quickly.

    Types of QueriesTwo types of queries are generally used for surfing-

    (a)Pattern Matching Queries:- It is the most basic type of query, which is used. Toformulate a pattern-matching query a keyword or a group of keywords are used and typedin query submission form. The search engine returns the URL of any page that contains

    these keywords. The result set varies from one search engine to other. The search result

    may vary if singular or plural words are used. A space between two words treats them astwo words. We can also use (+) and (-) signs to include or exclude a word from the query

    words, e.g. the query +Indian+Lion-Tiger will search for the words Indian and Lion but

    not Tiger. Any words within the quotes are taken as one word or phrase. These syntax

    rules may vary with different search engines. For details one must go through the Helpsupport of that search engine.

    (b)Boolean Queries:- Boolean queries involve Boolean operations AND, OR and NOT.Most search engines permit to enter Boolean queries. Some example of Boolean queriesis given below-

    (i) Lion AND TigerWill show all pages that contains both Lion and Tiger.(ii) Lion OR Tiger Will show all pages that contains either Lion or Tiger or

    both, i.e. at least one of the word.(iii) Lion NOT TigerWill show all pages that contains information about Lion

    but not Tiger. Thus, Boolean NOT operation is used to exclude a word.

    Search Strategies

    Determining which search engine to use can be challenging. You can begin by testing a

    number of different search engines, trying to find one that you believe meets the following

    conditions:

    Possesses a user friendly interface. Has easy to understand, comprehensive documentation. Is convenient to access; that is you do not have to wait several minutes before being able

    to submit a query. Contains a large database, so that it knows a lot about the information for which you are

    searching.

    Does a good job in assigning relevancy scores.If you can find a search engine that meets most of these criteria, you should concentrate on

    learning it well, rather than learning a little bit about several different search engines.Once you have learned a query syntax of that search engine, you can begin to formulate

    your search strategy. When you post queries to the search engine, two common situations can

  • 7/27/2019 Comp II....Unit 1

    31/31

    BBA IV Sem/CAII/Unit-1

    occur: either your query does not turn up a sufficient number of hits, or your query turns up toomany hits. In the next sections, you will learn strategies for dealing with these situations.

    1. Too Few Hits : Search GeneralizationSuppose your query returns no hits or only a couple of hits, neither of which is very useful to

    you. In this case, you need to generalize your search. The ways to do this include: If you used a pattern matching query, eliminate one of the more specific keywords from

    your query.

    If you used a Boolean query, remove one of the keywords or phrases with which youused AND, or delete a NOT item you specified.

    If you restricted your search domain, enlarge it. If you are still having no luck, try keywords that are more general, or exchange a couple

    of the keywords with synonyms.

    If this fails, you may decide to use a directory and work your way down to the topic ofinterest. Another alternative would be to use a metasearch engine.

    2. Too Many Hits : Search SpecializationSuppose your query returns more URLs than you could possibly look through. In this case,

    you need to specialize your search.

    If you started with a pattern matching query, you may want to add more keywords. If you began with a Boolean query, you might want to AND another keyword, or use the

    NOT operator to exclude some pages.

    If you are still retrieving too many hits, try capitalizing proper nouns or names. If nothing seems to work, try reviewing the first 20 or son URLs, since search engines list

    the best matches near the top. If they do not contain what you are looking for, theinformation they do contain may help you refine your search.

    If this fails, you could resort to a directory and work your way down to the topic ofinterest.