An Overview of Directories and the Internetptgmedia.pearsoncmg.com/images/0139744525/samplechapter/013… · Pug, 18 pounds Miniature Poodle, 10 pounds Miniature Dachsund, 8 pounds

An Overview of Directoriesand the Internet

A DIRECTORY FOR THE DOGS

As an example of a typical directory request that illustrates property-based information retrieval, consider a directory that contains informa-tion about dogs. Imagine that a directory client wants to know aboutbreeds of dogs that are black and weigh less than 20 pounds. In this situ-ation, the directory client would present two facts that it knows aboutobjects that are in the directory:

� Weight < 20 pounds

� Color = Black

The directory server that the client contacts is expected to returninformation about all dog breeds of which it is aware, that have a weightproperty that is smaller than twenty, and a color property equal to black.The precise information that is returned depends upon the type of direc-tory service, and what information is available, but would include otherproperties of the people that matched the request, such as breed name,height, weight, etc. Figure 2.1 shows this interaction between the direc-tory client and the directory server.

There are numerous kinds of directories that have been defined bythe Internet. Most directory services are identified by the protocol that isused to communicate between directory clients and directory servers,

7

C H A P T E R 2

PH016-Greenblatt02 6/30/00 9:51 AM Page 7

even though there may be other protocols that are defined by this servicethat allow for communication among the directory servers. These server-to-server communication protocols are often termed back end protocols.

Compare this notion of directories to that of the widely used Inter-net search engines. The directory allows the client to supply some prop-erties and proposed values, and the server returns matching entries andactual values from those entries. The search engine is specificallydesigned to go to pages on web sites that want to be searchable and readsthem, using hypertext links on each page to discover and read a site’sother pages. Then, using this information, the search engine has a pro-gram that creates a huge index (sometimes called a “catalog”) from thepages that have been read. This index or catalog can then be searched byusers to find web sites of interest. The typical search engine then allowsthe user to supply some words and returns some web pages that it hassearched that have some type of match with the supplied words. Noticethat in the directory the client request has a definite property-valuestructure, while the search engine request is relatively unstructured.

A SECURITY PRIMER

This section introduces network security. Network security provides safe-guards against various threats that may be targeted against computernetworks. As directories are a network application service, they could beprime targets for these threats. Typical attacks against a directory

8 An Overview of Directories and the Internet • Chap. 2

DirectoryClient

DirectoryServer

Weight < 20 poundsColor = Black

Pug, 18 poundsMiniature Poodle, 10 pounds

Miniature Dachsund, 8 pounds

Client Request

Server Response

Fig. 2.1 Directory/client server interaction


involve the theft of information that is stored in the directory and thedenial of service to a directory client. If a rogue user gains access to thedirectory, then that user may be able to change the directory informationand cause the directory to provide incorrect answers to the clients’queries.

In order to provide a complete directory service some level of net-work security must be provided. But what parts of security are relevantto directories? Users need to prove who they are since some data that thedirectory stores is marked with access control information indicatingwho is allowed to read and write it. In some systems, proof of users’ iden-tities is accepted by just trusting the IP address from which the requestcomes, or by sending a password across the network, neither of which issecure. A better way is through cryptography, in which you provideknowledge of a secret without divulging it. Using cryptographic tech-niques allows a directory client to prove to the directory server that itknows the password without ever sending the password across the net-work. Thus, even if there is an eavesdropper on the connection betweenthe client and the server, the user’s password is still safe.

Authentication is the process of determining whether someone orsomething is, in fact, who or what it claims to be. It is often the case thata directory client must be authenticated to the directory server beforebeing granted access to the directory information. Secure forms ofauthentication involve the process in which you provide knowledge of asecret without divulging it. There are two types of secrets: the first typeis where both sides (i.e., the client and the server) know the same secret.The other type is known as public key schemes, in which each user has apair of keys, one public and one private, which are mathematicallyrelated such that you can verify if someone knows the private key byusing the public key. Authentication is the primary use of security indirectory systems.

Various security mechanisms that are used by directories are intro-duced here. A complete discussion of network applications can be found inKaufman, Perlman, and Speciner’s book, Network Security: Private Com-munication in a Public World.1 The most basic tool used in network secu-rity is encryption. Encryption allows data to be modified into a form thatallows it to be hidden from unauthorized people. The encrypted form ofthe data is known as the cipher data. Decryption is the process that theseauthorized people use to transform the cipher data back into the originaldata. Authorized people encrypt and decrypt data by making use of keys

A Security Primer 9

1 Published by Prentice Hall PTR, 1995.


that are used by the encryption or decryption process. A key is a shortpiece of data that is known only to the authorized people. Keeping the keysecret allows the encryption algorithms to be published. The typical use ofencryption in directories is to keep certain data private as it is transferredacross the network between the directory client and server. There are alarge number of encryption algorithms that can be used to provide thisprivacy; the most widely used algorithms fall into two categories:

� Public-key algorithms use a pair of keys that are created in a specialprocess that allows one key to be used in encryption, and the otherkey to be used in decryption. One of the keys is kept secret (the pri-vate key), and the other key is made widely available (the public key).

� Secret-key algorithms allow only a single key to be used for bothencryption and decryption.

A few notes about these two types of encryption algorithms:

� Public-key algorithms are substantially slower than secret-keyalgorithms. In fact, public-key algorithms are so much slower thatthey are rarely used for encrypting data that is larger than a fewkilobytes.

� The keys used in public-key algorithms are much larger than thekeys used in secret-key algorithms. The key used in the popularRSA public-key algorithm defined in RFC 2437 is normally 256bytes to provide adequate security. The key used in the popularRC5 secret-key algorithm defined in RFC 2040 is normally only 8 or16 bytes.

� Great care must be taken in transmitting the key used in thesecret-key algorithm among authorized users. If any unauthorizedusers gain access to the key, then any data encrypted using that keyhas been compromised.

Secret-Key Encryption

The most common secret-key encryption algorithms operate on a fixedlength segment of data at a time, usually 8 or 16 bytes. The algorithmstake this data segment in combination with the secret key as input datain order to produce the encrypted data. The encrypted data is almostalways the same size as the input, plain text data. When the data to beencrypted is longer than the segment that the algorithm is designed toaccept, the plain text data is broken into several blocks which are



encrypted one at a time. For example, if a document to be encrypted is100 bytes long, and the encryption algorithm operates on data blocksthat are 8 bytes long, then the document would be divided into 13 blocks.Note that the last block of data would not be 8 full bytes, but only 4 byteslong. In order to decrypt the data, the algorithm uses the encrypted dataalong with the same secret key that was used in the encryption.

Some algorithms are defined to encrypt each block independent ofall the other blocks of data. Alternatively, the algorithm can use theresults of a previous block in the input to the encryption of the nextblock. The details of any particular encryption algorithm are beyond thescope of this book. However, it can be assumed that if a document hasbeen encrypted with a strong secret-key encryption algorithm, then theencrypted data may be safely transmitted across the Internet. As long asonly the originator and the intended recipient know the key used in theencryption process, any malicious intruders that may intercept the docu-ment cannot decrypt the document.

Public-Key Encryption

Two keys are required, in public-key encryption algorithms. One key isused in the encryption process, and another key is used in the decryptionprocess. In public-key technology, the two keys must have some sort ofspecial relationship to each other, and be generated by a special mathe-matical process at the same time. Each pair of keys belongs to a user. Theuser will publish one of the keys (i.e., the public key) in order to make itavailable to other users. The second key (i.e., the private key) is kept con-fidential and not made available to anyone else. For example, if Alicewants to send Bob a secret message, she would retrieve Bob’s public key(perhaps from a known directory) and use it to encrypt the message.Once the message has been encrypted, only Bob can decrypt it using hisprivate key. Thus, even though Alice knows the public key, the plain textdata, and the encrypted data, she still cannot derive Bob’s private key.This is due to the special mathematical relationship between the twokeys. In one popular encryption algorithm, the attempt to derive the pri-vate key would require the potential attacker to factor a large number.This large number is in the range of the size of the key. If the key is 1024bits, then the attacker would have to factor a number in the range of21024 power, which is a number with more than a hundred decimal digitsand, therefore, nearly imposssible to guess or derive.

Public keys should be widely published. If Alice published her publickey so that it is widely available, anybody who needs to send her encrypteddata can easily retrieve Alice’s key and securely send her information. A

A Security Primer 11


good way of publishing public keys is by storing them in a directory. A direc-tory client can provide Alice’s e-mail address, and the directory server willbe able to perform the lookup to find the entry in the directory, which con-tains Alice’s public key, and return it to the end user.

Message Digests, Digital Signatures, and Authentication

Another security algorithm of special interest is the message digest algo-rithm. A message digest algorithm takes any size document as input (i.e.,the message) and produces a fixed size data block as output. This fixedsize data block is called the message digest. For example, the popularMD5 message digest algorithm is described in RFC 1321: The MD5 Mes-sage Digest Algorithm. MD5 produces a 16-byte message digest of itsinput. The message digest is also called a fingerprint because of the anal-ogy to a person’s fingerprint. Just as it is extremely difficult to find twopeople with the same fingerprint, it is also extremely difficult to find twodocuments that produce the same MD5 message digest. A good messagedigest algorithm has the property that it is computationally infeasible toproduce two messages having the same message digest. Similarly, it isalso computationally infeasible to produce any message having a givenprespecified target message digest.

A message digest algorithm must have these properties to be usefulin the creation of digital signatures. If Alice wants to create a digital sig-nature for a document, she must first create the message digest of theplain text document. Then Alice will create the digital signature usingher private key. She can then send the plain text document along withthe document’s signature to Bob. Bob can verify the digital signature byfirst creating the message digest of the plain text document. Then he willverify the digital signature using Alice’s public key. If the decrypted digi-tal signature and the message digest that Bob created are identical, thenBob has verified Alice’s signature. In verifying the digital signature, Bobis guaranteed of two facts:

� The document that Bob received is precisely the document thatAlice sent, and it has not been altered en route.

� The document was actually sent by Alice and no one else. This isdue to the fact that no other person could have created the digitalsignature since it required the use of Alice’s private key, and onlyAlice has access to her private key.



Digital signatures are especially useful in directories for the purposeof authentication. The digital signature process can be used in this sce-nario. Once the client connects to the server, the server provides the clientwith a piece of data that the client must sign. Once the server verifies thesigned data, the server is assured of the identity of the client, and the clientcan continue operating on the directory. Additionally, digital signatures arealso useful so that the information in the directory can be signed. Thisallows the directory clients to trust the information in the directory (espe-cially if the information has been signed by a trusted agent).

THE INTERNET

Now that the basic concepts of directories have been introduced, the con-cepts of the Internet that are important to directories can be discussed.The Internet refers specifically to the original network that was fundedby the United States Department of Defense Advanced Research ProjectsAgency (DARPA) known as ARPANET. Since its inception in the late 60sthe ARPANET has evolved to a network that connects millions of hostsacross the world. These hosts are all connected by protocols that areknown as the TCP/IP suite of protocols. When an Internet-like networkis contained inside an enterprise, it can also be called an intranet.

This section will not attempt to explain the entire Internet suite orprotocol stack. That will take the perspective of viewing the Internetstack from the perspective of a directory, and discussing what parts of theInternet Suite directories use. For example, the services offered by theInternet Protocol (IP) and the Internet Core Message Protocol (ICMP)are not directly used by directory services, and so won’t be discussed.However, an understanding of the Transport Control Protocol (TCP),Transport Layer Security (TLS, also known as Secure Sockets Layer orSSL), and the User Datagram Protocol (UDP) are directly used by direc-tory services, and thus will be introduced. However, before discussingthese services specifically, it is important to understand how TCP/IPworks so that the features that are available to the Directory are known.

The suite of protocols used by the Internet is known as the TCP/IPsuite because the principal protocols used by Internet applications tocommunicate are the Transport Control Protocol (TCP) and the InternetProtocol (IP). The TCP/IP suite of protocols is typically viewed as a stack,in which one layer is piled upon another. This is an indication that thelayers at the top of the stack make use of services at the layers towardthe bottom of the stack. A view of a portion of the TCP/IP stack is shownin Figure 2.2.

The Internet 13


In this figure, each layer in the stack provides a set of services thatare available to the layers that reside logically on top of it. TLS directlymakes use of the services provided by TCP, but does not use any of theservices that are offered by UDP or ICMP. Furthermore, TLS is generallyignorant of the services that are offered by the IP layer, and does notdirectly make use of those services. Internet applications, such as webbrowsers, connect to Internet servers, such as web servers, by creating aconnection known as a socket between the application and the server. Asocket can be viewed as a pipeline between the application and the serverthrough which data may be exchanged once it is created.

The creation of a socket is very analogous to the dialing of a tele-phone call. Once the telephone number is entered, the destination linerings, and when the receiving party answers the ringing telephone ontheir end, a telephone connection is established. In order to create asocket connection between two entities on the Internet, the calling party(known as the client) enters an Internet address (the format of which willbe discussed shortly). This Internet address is made by some process run-ning on the client’s machine which attempts to make a connection tosome process that is running on the machine named by the given Internetaddress. If there is any process listening for connections on the destina-tion machine, then the connection can be established. All of the Internettransports that are used in Internet directories (i.e., TCP, UDP, and TLS)make use of sockets for communication between clients and servers, butuse different types of sockets. However, the sockets for all of the transporttypes have similar behavior. Thus, when they are used for the simpletransport of data between the client and the server, the different sockettypes can be used by the directory entities in virtually the same way.


Fig. 2.2 Upper layers of the Internet Protocol Stack.

Internet Protocol

Transport ControlProtocol

User DatagramProtocol

Address ResolutionProtocol (ARP)

Reverse ARP(RARP)

…

Transport LayerSecurity

Internet CoreMessage Protocol


During the attempt to create a socket, the client and server gothrough a process known as handshaking. During the handshakingprocess, the client and server each exchange some information before thesocket can be created. If either side is not satisfied with the informationthat is provided by the other side (known as its peer), then the attempt tocreate the socket is broken off, and no socket connection is created. Forexample, a socket server may be configured in such a way that it onlyallows sockets to be opened by clients from a specified set of hosts. Dur-ing the handshaking process the client and server exchange their addressinformation. If the client’s address is not one of those that the server isconfigured to accept, then the server will reject the client’s attempt tocreate the socket.

During any handshaking process at the Internet Transport Layer,the client and server exchange information in order that (among otherthings) they may be able to identify each other. In the real world, peopleidentify each other by any number of means—names, telephone num-bers, electronic mail addresses, etc. In the Internet, peer entities are ableto identify each other by several different means, but by far the two mostcommon mechanisms are Internet addresses and Internet host names.The following section presents an overview of the upper layers of theTCP/IP stack, from the top of the stack first, since those layers aredirectly used by the directory.

THE TLS LAYER

A very simplified view of the handshaking that occurs in the TLS layer isshown in Figure 2.3.

In this view of the handshaking that occurs during the attempt by aTLS client to open a socket with a TLS server, both the client and theserver send two pieces of information across the network prior to a suc-cessful TLS socket creation. The client initiates the handshaking when itsends a special message defined by TLS, known as a client hello. Thismessage contains various parameters that define those kinds of TLS ser-vices that are being requested. For example, the TLS client can requestthat the connection be encrypted by any of several different means. Itcan also request that any data being passed across the socket is to becompressed. The client hello message also includes some randomly gen-erated data that aids in the creation of the encrypted connection. Theserver responds to a client hello with another special message that isdefined by TLS, known as a server hello. The main piece of informationthat is included in the server hello message is information that is unique

The TLS Layer 15


to the server, known as a certificate. The server presents its certificate insuch a way that the client can verify that it really does belong to theserver. This verification is known as authentication. The precise details ofhow this authentication works are not particularly important to thefunctions of Internet directories, but a short overview of the authentica-tion process will be discussed in Chapter 4, “Directory Management.”The important feature of this first stage in the handshaking process isthat it has allowed the TLS client to verify that the TLS server to whichit has connected is indeed the one that it intended to contact.

Once the client has authenticated the server, the second phase of theTLS handshaking can begin. The objectives of the second phase are toallow the server to authenticate the client, and for the creation of anencryption key that allows all data passed across the TLS session to bekept confidential. If the server requested client authentication in its server


Fig. 2.3 TLS handshaking

TLS Server

TLS Client

TLS

Hel

lo

Serv

er ID

Info

rmation

Clie

nt

ID Info

Encr

ypte

d

TLS

Open

Encr

ypte

d C

hannel


hello message, then the client is obligated to provide its certificate informa-tion in the subsequent message. At this time the client uses the randomdata that it provided in the client hello message along with informationthat is in the server certificate to generate the encryption key. Simultane-ous to this, the server is performing the same process, thus guaranteeingthat the client and server have generated the same key, often known as ashared secret. If the server is satisfied with the client certificate informa-tion that the client has provided, then it provides a response that indicatesthat the connection has been opened successfully. (Keep in mind that thisexplanation of a TLS connection creation has been greatly simplified, and,while accurate, many details have been omitted.)

THE TCP LAYER

TCP is the Transport Control Protocol. It takes care of providing a reli-able connection between two Internet nodes (e.g., hosts). For example,TCP nodes synchronize with each other and number each packet that issent between them. For each packet that is sent from one host to theother, an acknowledgment is returned. If a host does not receive anacknowledgment for a sent packet, that packet is presumed to have beenlost and is retransmitted. The TCP handshaking and socket setupprocess is much simpler than that of TLS. When a TCP client wishes toopen with a TCP server, the client transmits a special message that indi-cates it wishes to synchronize sequence numbers with the server. Thismessage, known as a SYN message, is the first step in the TCP hand-shaking. The steps in the TCP handshaking are illustrated in Figure 2.4(taken from RFC 793, which defines TCP).

RFC 793 describes this process, known as the 3 way handshake, asfollows: The TCP client begins by sending a SYN segment indicating thatit will use sequence numbers starting with 100. Next, the TCP serversends a SYN and acknowledges the SYN it received from the TCP client.Note that the acknowledgment field indicates that the TCP server is nowexpecting to hear sequence 101, acknowledging the SYN, which occupiedsequence 100. Finally, the TCP client responds with an empty segmentcontaining an ACK for the TCP server’s SYN. With this third message, theTCP client and server have successfully negotiated a socket connection.

The synchronizing of the TCP sequence numbers is particularlyimportant to the reliability of the TCP layer. As RFC 793 indicates, theTCP must recover from data that is damaged, lost, duplicated, or deliv-ered out of order by the IP layer. This is achieved by assigning a sequencenumber to each octet transmitted, and requiring a positive acknowledg-

The TCP Layer 17


ment (ACK) from the receiving TCP. If the ACK is not received within atimeout interval, the data is retransmitted. At the receiver, the sequencenumbers are used to correctly order segments that may be received outof order and to eliminate duplicates. Damage is handled by adding achecksum to each segment transmitted, checking it at the receiver, anddiscarding damaged segments.

THE UDP LAYER

TCP is an inherently reliable protocol (using the above notion of reliabil-ity). This is due to the fact that packets are sent again when lost or some-how damaged in transit. Its companion protocol, User Datagram Protocol(UDP), is considered to be an inherently unreliable protocol. This unreli-ability is due to the fact that the guarantees of TCP are not made byUDP. UDP does not recover from data that is damaged, lost, duplicated,or delivered out of order by the IP layer. UDP does not have a handshak-


TCP Server

TCP Client

SEQ

=100, CTL

=SY

N

SEQ

=101, A

CK

=301,

CTL

=A

CK

SEQ

=300, A

CK

=101,

CTL

=SY

N,A

CK

Fig. 2.4 TCP handshaking


ing process that takes place. UDP is designed for situations in which allthat is needed is a single message and a single response. Thus, no ongo-ing connection between a UDP client and server is maintained. UDP isdefined in RFC 768, which indicates that UDP provides a procedure forapplication programs to send messages to other programs with a mini-mum of protocol mechanisms. Applications requiring ordered reliabledelivery of streams of data should use TCP. UDP operates by the UDPclient opening a socket to a UDP server and sending some data across thesocket. The UDP server replies to this request with whatever data isappropriate, and the UDP socket is then closed.

TYING THE LAYERS TOGETHER

Most hosts that are reachable on the Internet have been assigned one ormore Internet Protocol (IP) addresses, and usually have been assigned ahost name as well.2 The IP is the numeric form, while the host name is thetextual form. For convenience, the numeric addresses are normally repre-sented textually in the dotted notation form, e.g., 127.0.0.1, rather than asthe raw 32-bit or 64-bit number. Humans prefer the host name, as thenames generally have some intrinsic meaning. For example, the web sitefor the Prentice Hall publishing company is located on the Internet hostwww.prenhall.com. Host names are the text strings that appear on theright side of the “@” in electronic mail address. For example, the authorcan be reached via e-mail at: [email protected].

Version 4 of IP, which is the current ubiquitously deployed version,allows for the IP address to be 32 bits, which is large enough to holdapproximately four billion different addresses. The recently approvedversion 6 of IP allows for 64 bit addresses, which is large enough to holdover four quintillion (i.e., 4.6 H 1018) addresses.

In addition to knowing the name or address of the server on whichit wants to open a socket, the client must also know the number of theport to which the server is connected. To allow for many processes withina single host to use TCP communication facilities simultaneously, theTCP provides a set of addresses or ports within each host. Concatenatedwith the network and host addresses from the IP layer, this forms asocket. This definition, taken directly from RFC 793, indicates that thepurpose of the TCP port is to allow for many servers to operate on the

Tying the Layers Together 19

2 Modern techniques allow hosts to participate on the Internet without having an IP address (orhost name) assigned to them.


same Internet host at the same time. A TCP port is identified by an inte-ger. In order to make this happen, each server must listen to a differentincoming port. If a server attempts to listen on a port to which anotherserver is already listening, TCP will return an error to the second serverthat indicates that the port is busy. Both UDP and TLS support the sameimplementation of port numbers, in that only one server is allowed to lis-ten on a port at a time. Historically, Internet application services havereserved ports when they were defined. The Internet Assigned NumbersAuthority (IANA) keeps track of all port numbers that have beenassigned. Some of the more notable assignments for service contact portsare named in Table 2.1.

Table 2.1 Port Number Assignments for Some Internet Protocols

Protocol Name TCP Port UDP Port TLS Port

Telnet 23 23 992

SMTP 25 25 465

DNS 53 53

Whois 43 43

Whois++ 63 63

Finger 79 79

Http 80 80 443

Ldap 389 389 636

Rwhois 4321 4321

Note in the above table that not all Internet application protocolshave been assigned TLS ports. Port numbers are divided into three sepa-rate ranges:

� Well Known Ports—numbered from 0 through 1023� Registered Ports—numbered from 1024 through 49151� Dynamic or Private Ports—numbered from 49152 through 65535

Thus, DNS uses a well-known port while Rwhois uses a registeredport. The distinction between these two port types is minor, in that typi-cally only processes or programs that are run by the most privileged userscan listen on a well-known port number. If a service has a well-known orregistered port assignment, then clients of that service can assume thatthe default configuration of the server has the server listening on theassigned port. Thus, finger clients normally assume that there is a fingerserver listening on port number 79 on most Internet hosts, and that there



is never a server that doesn’t understand the finger protocol listening onthat port. This means that when the finger client attempts to open asocket to a finger server, it will try port 79, and it will both succeed andtalk to a finger server, or there will not be a finger server active on thatserver. It will never be the case that the finger client will attempt to openthe socket, and that there is a server listening on port 79 that does notunderstand the finger protocol (for example, a web server).

More information on the definition of the Internet may be found inthe Internet document, FYI 20, entitled “What is the Internet?” as wellas any number of other published references.

INTERNET DIRECTORIES

Recall from our earlier discussion that a directory is an application ser-vice that primarily performs property-based information retrieval. Direc-tories store objects of various types. Each object that is stored hasproperties. For example, dog objects have properties, such as color, breed,height, weight, age, etc. A directory that contains objects of this typewould allow clients to retrieve information about dogs based on the prop-erties that have been defined.

As they relate to the Internet, directories perform various necessaryand useful functions. For example:

� They allow for the resolution of host names to underlying IPaddress.

� They allow for the creation of an Internet Public Key Infrastruc-ture (PKI) in order to allow for the secure exchange of informationacross an insecure network.

� They allow for controlled access to resources across a distributednetwork.

� They allow for location of a server based upon the type of the ser-vice (e.g., electronic mail) rather than upon the name of the server.

� They allow for the exchange of index information among them-selves in order to facilitate the routing of queries to the appropriateserver. This function allows each server to have some knowledgeabout the data that is contained on many other servers.

Directories provide many useful services for the Internet. But theInternet also provides many useful functions for directories. TCP pro-vides a reliable means of transporting data from client to server. TLS

Internet Directories 21


allows for the directory to provide a secure means of transporting datafrom client to server. TLS also allows the directory peers to provide astrong means of identifying each other, rather than simply passing userIDs and passwords across the network.

The previous section provided an overview of the Internet, and thenotion of Directories as a service that allows for property-based informa-tion retrieval has been touched upon. Putting the Internet and directo-ries together yields the concept of the Internet directory. An Internetdirectory is a service that has property-based information retrieval as itsprimary function, and uses one or more of the Internet transports (TLS,TCP, or UDP) as its native means for communication between the clientand server. The two most prominent Internet directories are the DomainName System (DNS) and the Lightweight Directory Access Protocol(LDAP).

DNS

DNS is an Internet standard that is defined in RFCs 1034 and 1035. Theprimary goal of the DNS directory service is to provide for the mappingof Internet host names to IP addresses. In terms of the property-basedinformation retrieval concept, the objects that are stored in the DNSdirectory are Internet hosts. The properties that are available forretrieval are host names and IP addresses. DNS clients, known asresolvers, send requests to DNS servers. In the DNS, resolvers typicallyprovide a server with a host name, and the server provides the resolverwith the IP address that corresponds to the provided host name. Thegrowth in the Internet was the impetus for the development of DNS. Asthe number of hosts attached to the Internet grew beyond several hun-dred in the early 1980s, scalability problems with the previous mecha-nism for providing the name to address mapping were exposed.

As described in RFC 1035, prior to the implementation of DNS, hostname to address mappings were maintained by the Network InformationCenter (NIC) in a single file (HOSTS.TXT) which was copied by all hosts[RFC-952, RFC-953]. The total network bandwidth consumed in distrib-uting a new version by this scheme was proportional to the square of thenumber of hosts in the network; the outgoing FTP load on the NIC hostwas considerable. Explosive growth in the number of hosts didn’t bodewell for the future. Furthermore, the network population was alsochanging in character. The timeshared hosts that made up the originalARPANET were being replaced with local networks of workstations.Local organizations were administering their own names and addresses,



but had to wait for the NIC to change HOSTS.TXT to make alterationsvisible to the Internet at large. The proposals for the replacement of thismechanism varied, but a common thread was the idea of a hierarchicalname space, with the hierarchy roughly corresponding to organizationalstructure, and names using “.” as the character to mark the boundarybetween hierarchy levels. The implementation of DNS met the require-ments with a distributed database of information rather than a centrallyadministered hosts file. The distributed administration of directoryadministration devised by DNS was to become a hallmark for virtually allof the Internet Directory services that were to follow. The implementa-tion and protocols involved in DNS will be discussed in significant detailin a subsequent chapter.

LDAP

The Lightweight Directory Access Protocol (LDAP) was defined as aresult of the desire to pursue implementation of the X.500 series of rec-ommendations of the International Telecommunications Union (X.500)on the part of Defense Advanced Research Projects Agency (DARPA).X.500 defines several different models and protocols that are used in theimplementation of directories. The most notable protocol defined byX.500 is the Directory Access Protocol (DAP). DARPA wanted to deploydirectories based on the X.500 series, but their implementation was slowin coming. DARPA decided to fund a research project at the University ofMichigan that would result in the definition of a different version of DAPthat would be significantly easier to implement, but would still retain thecore features of the X.500 model. The end product of this research projectwas LDAP.

An early definition of LDAP was experimental in nature, and thefirst widely implemented definition of LDAP was LDAP version 2, asdefined in RFC 1777. Due to deficiencies in the areas of security, interna-tionalization, and extensibility, a third version of LDAP was defined inRFC 2251. RFC 2251 indicates key aspects of this version of LDAP:

� All features of LDAPv2 (RFC 1777) are supported. The protocol iscarried directly over TCP or other transport, bypassing much of thesession/presentation overhead of X.500 DAP (which is defined ontop of the OSI protocol stack).

� Most of the data that is passed between LDAP clients and serverscan be encoded as ordinary strings (X.500 uses various binary datatypes to encode its information).

LDAP 23


� Referrals to other servers may be returned when the server ini-tially contacted by the LDAP client does not have enough informa-tion in order to completely fulfill the client request.

� Any mechanism may be used with LDAP to provide security ser-vices that can be used in the authentication step between the clientand the server.

� Attribute values and distinguished names have been international-ized to allow for any character (e.g., in the Chinese character set) tobe used in LDAP strings.

� The protocol can be extended to support new operations, and con-trols may be used to extend existing operations.

� Clients publish schemas in the directory for use so that the types ofinformation that are available for retrieval are available as part andparcel of the normal information that is published by LDAPservers.

The last point is especially interesting. In the context of Internetdirectories, the schema defines the types of objects and the properties ofthose objects that are available for retrieval by clients. Since LDAPclients are the ones that are capable of publishing the information thatappears in directories, it is only natural that the clients are allowed topublish (and retrieve) the schema for that information. While appearingnatural, this innovation is new in version 3 of LDAP, and allows clients tobe able to find out about new types of objects that are stored in the direc-tory, and to determine the precise syntax that is used in the properties ofthese new types of objects. This notion of dynamic schema discovery is amarked difference between LDAP and earlier directories. In DNS,resolvers are expected to have full knowledge of the various types ofrecords that are maintained by DNS servers. There is no defined way inthe DNS scheme of things for resolvers to understand new record typeson the fly.

INTERNET DIRECTORY REQUIREMENTS

In the previous sections of this chapter, various functions of Internetdirectories have been discussed, but the fundamental requirements for adirectory service has only been touched on. In traditional software devel-opment, the requirements gathering phase is the first part of developingsoftware and it defines the external behavior of the software system to bebuilt. In terms of Internet directories, the requirements indicate the fea-tures of the clients that are made available to their users. Note that not



all directory services implement all of these requirements. For example,as we mentioned previously, DNS does not implement schema discovery.This section is meant to describe, in a general way, the types of featuresthat are available in many Internet directory services, in order to distin-guish them from other types of application services.

The foremost requirement of an Internet directory service is toallow for property-based information retrieval. Regardless of the type ofinformation that is stored in the directory, each object that is stored hasvarious properties, and the directory service must allow for clients toretrieve this information based on these properties.

Data Storage

Internet directories store their data in such a way that the properties andprotocol that are used fit in naturally with the rest of the Internet. Forexample, multimedia objects are typically represented in the Internet bymaking use of the MIME (Multipurpose Internet Mail Extensions) struc-ture. MIME was first defined as a way to transport various types ofbinary data across the Internet in electronic mail messages. However, ithas come to be used in numerous protocols that need to define ways totransport multipart, possibly binary data. For example, MIME is usednot only in electronic mail, it is also used in LDAP, the Common IndexProtocol (CIP), the Hypertext Transport Protocol (HTTP) used in theWorld Wide Web, and many other Internet application protocols.

In the original days of the Internet, data was assumed to be storedin the United States (U.S.) version of the ASCII character set. U.S. ASCIIrepresents each character as a single byte, the high order bit of which isalways zero. The resulting 128 different characters that are defined byU.S. ASCII include the 26 upper and lowercase letters, the 10 digits, andvarious other punctuation and control characters. Of course, informationthat is transported across the Internet needs to include characters frommany different cultures outside the US. For example, European and Ori-ental characters are not represented in U.S. ASCII. The French word“çiel” can’t be represented using U.S. ASCII, since the character ‘ç’ is notone of the 128 characters defined by ASCII. In order to represent suchkinds of information, the Universal Character Set (UCS) repertoire wasdevised. UCS character sequences are normally represented on the Inter-net by using the specification known as UTF-8. UTF-8 is defined in RFC2279, which is titled, “UTF-8, a transformation format of ISO 10646.”The International Organization for Standardization (ISO) standardnumbered 10646 defines the multibyte character set known as UCS.

Internet Directory Requirements 25


UCS characters are either two bytes long or four bytes long, and they usethe full range of possible two or four byte values.

The point of UTF-8 is that it encodes UCS characters as a sequenceof one or more 8-bit ASCII characters. UTF-8 defines an encoding mech-anism that has the characteristic of preserving the full U.S. ASCII range.It also provides a mechanism for encoding characters outside of thisrange in more than one byte. Table 2.2 (taken directly from RFC 2279)summarizes the format of these different octet types. The letter “x” indi-cates bits available for encoding bits of the UCS-4 character value.

Table 2.2 UTF-8 Character Encoding

UCS-4 range (hex.) UTF-8 octet sequence (binary) Number of 8 bitcharacters needed

0000 0000-0000 007F 0xxxxxxx 1

0000 0080-0000 07FF 110xxxxx 10xxxxxx 2

0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx 3

0001 0000-001F FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 4

0020 0000-03FF FFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx10xxxxxx 5

0400 0000-7FFF FFFF 1111110x 10xxxxxx ... 10xxxxxx 6

Thus, the UTF-8 encoding of the word “Directory” is precisely thesame as the original ASCII encoding of that same word. Some otherexamples from RFC 2279 of character encoding are:

� The UCS-2 sequence “A<NOT IDENTICAL TO><ALPHA>.”(0041, 2262, 0391, 002E) may be encoded in UTF-8 as follows:

41 E2 89 A2 CE 91 2E

� The UCS-2 sequence representing the Hangul characters for theKorean word “hangugo” (D55C, AD6D, C5B4) may be encoded asfollows:

ED 95 9C EA B5 AD EC 96 B4

� The UCS-2 sequence representing the Han characters for theJapanese word “nihongo” (65E5, 672C, 8A9E) may be encoded asfollows:

E6 97 A5 E6 9C AC E8 AA 9E



� Since UTF-8 encoding is the Internet Standards track definition forinternational character representations, when representing dataoutside of the US-ASCII character set, Internet directories shouldmake use of UTF-8 encoding for that data, as specified in RFC 2279.

Protocol Usage

The protocols that are used in Internet directories should be carrieddirectly on top of an Internet transport, i.e., TCP, UDP, or TLS. Nativeintegration of applications with the TCP/IP stack makes integration withfuture enhancements to this stack more likely to be smooth. For exam-ple, TLS has been designed in such a way that creation and deletion ofTLS sockets is done in virtually the same way as the creation and dele-tion of TCP sockets. For example, even though TLS had not yet beeninvented at the time that LDAP v2 was released, LDAP v2 clients haveno problem in changing from the use of TCP to access directory serversto using TLS to access those same directory servers. LDAP v2 clientsthereby gained the advantage of encrypted sessions, server authentica-tion, and other TLS provided services without any change at all in theLDAP protocol definition.

Distributed Operation

Internet directories should operate in a distributed manner in such a wayas to allow consistent access to their information throughout the Inter-net. Not all Internet hosts are uniformly available from any site on theInternet. This is due to a wide variety of factors, not the least of whichare geographic considerations and other bandwidth-related concerns.However, access to directory information should not suffer from theseproblems. It must be possible to allow multiple directory servers to pro-vide services for the same set of objects. This allows directory clients toaccess whichever directory server is most conveniently located. In orderto illustrate this requirement, consider Figure 2.5.

The point here is that any of the clients can present their query(“What breeds of black dogs weigh under 20 pounds?”) to any of theservers and expect to get the same answer back. Due to geographical andother bandwidth considerations, the timing will be different for each ofthe clients working with each of the servers. Therefore, Internet direc-tory servers are required to cooperate in order to present a uniform viewof the data they manage to Internet directory clients. Historically, direc-tory servers that do support the notion of information sharing or replica-tion have defined their own information sharing protocols that are

Internet Directory Requirements 27


specific to one type of directory (e.g., DNS, X.500, etc.). RFC 2651 definesthe Common Indexing Protocol (CIP). CIP allows directory servers toshare and publish much of the information that they contain, and CIP isnot limited to any particular protocol.

Because directory servers cooperate to provide a service, they willoften not be able to fulfill a client request. In this situation, the directoryserver that is being queried may be able to refer the client to anotherdirectory server that can be consulted with this query. This notion ofreferrals is illustrated in Figure 2.6.

In this figure the directory client first submits its query about smallblack dogs to the Chicago directory server. The Chicago directory serveris unable to fulfill this request, and refers the directory client to the Parisdirectory server. The directory client chases the referral that is provided,and presents the same query to the Paris directory server. In thisinstance, the Paris directory server is able to fulfill the client’s request,and furnishes an appropriate response. Thus, a referral is an indicationby a directory server that it does not have the information needed toreturn that desired result. The referral contains that information neededby the directory client for it to be able to contact some other directoryserver. Directories can gather knowledge of the information that is heldby other directory servers. This knowledge can be manually gathered bythe directory administrator, or it can be automated by the use of back enddirectory protocols.


CaliforniaDirectory

Client

ParisDirectory

Client

MoscowDirectory

Client

ChicagoDirectoryServer

ParisDirectoryServer

NairobiDirectoryServer

Weight < 20

Color = Black

Weight < 20Color = Black

ChicagoDirectory

Client



Fig. 2.5 Directory replication


WHITE PAGES SERVICE

One of the main uses for a directory on the Internet is for the provisionof an Internet White Pages Service (IWPS) in which the client furnishessome properties of a user that it knows about, and the service respondswith various types of address information about this user. In the contextof the telephone system, a white pages book, in which a list of names arearranged alphabetically, is often used. If the white pages client knows thename of a listed telephone subscriber, then the white pages service willsupply the client with a telephone number and possibly a street address.An IWPS, as a special type of white pages service that is located on theInternet, is intended to provide Internet-related addressing information.In the IWPS, the principal addressing information that is to be returnedby the IWPS server is the electronic mail address of a user. Other types ofrelated information can also be returned. For instance, RFC 2148“Deployment of the Internet White Pages Service” describes in greatdetail the requirements for an IWPS, and RFC 2218 “A Common Schemafor the Internet White Pages Service” defines the data that is to be main-tained by an IWPS server. A small subset of the user properties that aredefined by RFC 2218 is:

E-mailCertificateHome Page

White Pages Service 29

1: Weight < 20Color = Black

3: Weight < 20Color = Black

2: Don’t know, ask Paris

DirectoryClient

ParisDirectoryServer

ChicagoDirectoryServer

4: Pug, 18 poundsMiniature Poodle, 10 poundsMiniature Dachsund, 8 pounds

Fig. 2.6 Directory Referrals


Given NameSurnameOrganizationCountryPersonal PhonePersonal Fax

From the list above, it can be seen that some of the properties arethose that are generally provided by the IWPS client in a request (e.g.,Given Name, Surname, Country), while others are generally provided bythe IWPS server in its response (e.g., E-mail, Certificate, PersonalPhone). For example, an IWPS client in a search for someone’s electronicmail address could provide:

Given Name = “John”Surname = “Smith”Organization = “Prentice Hall”

while the IWPS server would respond with [email protected] IWPS was historically connected with electronic mail packages, andoften called an address book service. An IWPS has been confused bymany with the directory service itself. This important distinction is madebetween a service and an application of that service. The directory ser-vice is a general-purpose property-based information retrieval service,while an IWPS is a special purpose service optimized for retrieval ofuser’s addressing properties.

A SIMPLE DIRECTORY

Consider as an example of a directory service a system that converts IPaddresses into host names. This directory fits the definition that is beingused for a directory (a property-based information retrieval service). Theservice will operate as a client server application protocol. The client, whenattempting to retrieve a host name for a known IP address, will open asocket to the Internet host that corresponds to that IP address. The serviceoperates equally well on TCP, UDP, or TLS. Once the socket has been suc-cessfully opened, the client uses the protocol to request the host name forthe server that accepted the incoming socket. The server responds to thisrequest by sending its host name and closing the socket connection.



The property that the client knows about is the IP address of theserver, and the property that it is attempting to retrieve is the host nameof the server. The question for the protocol designers was: “what infor-mation does the client need to present to the server in order to relay theinformation request?” It turns out that once the client opens the socketno further information needs to be transmitted by the client. So, theclient only needs to indicate that it is ready for the server to transmit thedata. This can be represented in the protocol as the sequence of ASCIIcharacters, carriage return, and line feed. The hexadecimal representa-tion of this in the protocol that is transmitted over the socket is “0D0A.”Once the server receives this data, it is expected to respond with its hostname. While this example may seem a bit contrived, it turns out (as willbe seen later) that this protocol really exists as a subset of the Internetfinger protocol. Try using a telnet application to connect to port 79 on anInternet host (the finger port), and hitting the enter key. If a fingerserver is listening, it should tell you the host name.

CHAPTER SUMMARY

This chapter provided a general purpose definition of a directory as aproperty-based information retrieval system. This definition will be usedthroughout the book. A simple example of a directory was given. Thischapter defined computer and network security as they are used by direc-tories. Additionally, the various upper layers of the Internet protocolstack were defined. Finally, a quick overview of some of the importantdirectories was given.

With this foundation of computer networking and security, as wellas a quick overview of some directories and their applications, we canprogress into more detailed definitions in the chapters to come. The fol-lowing chapters will give detailed descriptions of DNS, LDAP, and manyof their applications, as well as shorter overviews of some other Internetdirectories.

Chapter Summary 31



Documents

An Overview of Directories and the Internetptgmedia.pearsoncmg.com/images/0139744525/samplechapter/013… · Pug, 18 pounds Miniature Poodle, 10 pounds Miniature Dachsund, 8 pounds