Lecture 1 – Characterization of Distributed Operating Systems

Lecture 1 –

Characterization of Distributed Operating Systems

Part 1 – Operating Systems: We will begin with a brief review of what an operating system is and what purpose it serves in the context of conventional or non-distributed operating systems. From there, we will extend the discussion to see how the concept of a distributed system is merely an extension of this. The idea of multiple processes sharing the resources of a single machine is essentially extended to encompass multiple processes on multiple machines sharing resources provided by one another.

Operating System

Traditional Definition:

“A piece of software that controls the execution of programs on a processor and that manages the processor’s resources.”

What resources?

- memory

- disk

- I/O

Why “manage?”

- resources must be shared

multiple processes need to use them

Other issues include:

- device abstraction

- predefined functionality

(convenience and ease of use)

Issues related to resource sharing by multiple processes are complicated by multiple users.

- different privileges

- concurrency

- time allocation (scheduling)

- quotas

Part 2 – Consequences of Distribution: When we attempt to apply the principles studied in a traditional operating systems to a distributed environment, three issues immediately become significant. These issues include concurrency, the lack of a global clock, and the possibility of independent failures. Concurrency is a familiar issue studied in non-distributed systems, however the issue can become more complicated in a distributed environment. We will elaborate on each of these issues.

Concurrency

- processes and users are distributed

- resources are distributed

- sharing occurs on a regular basis

- concurrency issues become more complex

- co-ordination becomes more complex

Instead of concurrency being an issue likely to arise that we need to deal with, it becomes the normal expected behavior.

“The exception becomes the rule!”

No Global Clock

Non-distributed:

- one clock

shared among processes

and even multiple CPU’s

Distributed:

- no common clock

- no common perception of the current time

- no common conception of how fast time is

How do we measure things like:

“When” and “How long?”

Independent Failures

Non-distributed:

- a failure is generally catastrophic

(memory, disk, CPU)

Distributed:

- a failure of one computer or even an entire isolated network might be transparent to a user!

How do we deal with failures differently in a distributed system to allow this to happen?

Part 3 – Example Systems: We will now look at a number of examples of distributed systems. Specifically, we will discuss the internet, intranets, mobile computing, and ubiquitous computing. In each case we will discuss how each example constitutes an example of distributed computing.

Internet/Intranets

Internet - a collection of intranets.

- a particular intranet might not be part of the internet

- could be logically or physically isolated

Intranet - a network separately administered with its own local configuration rules and regulations.

Intranet

- typically higher bandwidth than “internet”

- typically “protected” by a firewall

- typically smaller geographical area

- sharing of resources is more private

- certain users can use certain printers

- certain files have limited access

- may be physically isolated

Internet

- typically lower bandwidth than “internet”

- worldwide geographical coverage

- sharing of resources is more public

- web pages, anonymous ftp, etc

- still has capability to restrict access and

protect privacy and enforce policies

Mobile Computing

- improvements in technology in power consumption and size reduction allow very portable devices

- we have become used to our computing resources being connected together … i.e. “wired”

- as they become portable, we desire to maintain this “wired” functionality, but need to do it “wireless.”

Does this just mean the media changes?

NO!

EVERYTHING changes … portable devices need to seemlessly migrate from location to location, appearing in locations that they have never been before and might never return to! This is much more complicated than just replacing a wired network with a wireless network.

Ubiquitous Computing

- Similar improvements in technology in power consumption and size reduction allow computing to be embedded into any electrical appliance or electronic device.

- toasters

- radios

- televisions

- portable electronics

- ovens, lights, etc.

Ubiquitous computing has issues similar to those found in mobile computing:

- such devices will probably not have wired network connections (although they could)

- such devices may migrate around certainly within a household, but even house to house and city to city.

Part 4 – WWW: The World Wide Web is actually a rich example of distributed services and resources. We will explain how and why this is so, and will examine in detail some examples of the web which illustrate the use of distributed computing. Our discussion will include a review of the basic terms and concepts related to the web.

Resource Sharing

services

- printers

- disks -> files -> data

- computing power (distributed computing)

What sort of “resource sharing” does the web provide?

WWW resources

simplest example: Web page

- text, images, sound, video

expanded by “plug-ins”

other services?

- currency converter

- search engines

- database operations

WWW terms:

HTML - HyperText Markup Language

- html documents are pure TEXT

- non-text material (pictures, sound, etc)may still be conveyed

- uses three characters, <, >, and /to form “tags”

- open tag ex: <BOLD>

- close tag: </BOLD>

- pictures: <IMG SCR=pix.jpg>

WWW terms:

URL - Uniform Resource Locator

- uniquely identifies a particular web resource

General form of a URL:

scheme : scheme_specific_location

The protocol to useThe server to consult

WWW terms:

HTTP - HyperText Transfer Protocol

- protocol used for transferring html doccuments

- request/reply in nature

- one resource is requested per request

General form of an http URL:

Only this portion is required. (optional parts)

http://server_name :port_number/path?arguments

Part 5 – Design Objectives: In a distributed system, there are a number of objectives that designers strive to satisfy. These objectives include heterogeneity, openness, security, scalability, failure handling, concurrency, and transparency. We will explain what each of these terms refers to and discuss some of the issues that arrive in our pursuit of them. These form the core issues that we will be discussing throughout the remainder of the course.

What are the costs of an H, anyway?

Heterogeneity

- variety and difference

- hardware, software, programming languages, operating systems, networks, devices, companies, developers, policies, practices

Openness

- Open systems are characterized by the fact that their key interfaces are published.

- Open distributed systems are based on the provision of a uniform interprocess communication mechanism and published interfaces for access to shared resources.

- Open distributed systems can be constructed from heterogeneous hardware and software, possibly from different vendors. But the conformance of each component to the published standard must be carefully tested and certified if users are to be protected from responsibility for resolving system integration problems.

Security

- restricting access of resources to approved clients

- prevention of unauthorized access of data in transit

- guarding against malicious code being excuted

- guarding against DoS attacks (Denial of Service)

ScalabilityHow can the design of a distributed system ensure

that it will be scalable? As the demand for a resource grows, it should be possible to extend the system to meet it. For example, the frequency with which files are accessed is likely to grow as the number of users and workstations in a distributed system increases. It must be possible to add server computers to avoid the performance bottleneck that would arise if a single file server had to handle all file access requests. Some files may be accessed so frequently that the processing of requests for access to a single file or a small group of files becomes a performance bottleneck. In that case, those files must be replicated in several servers and the system must be designed so that when replicated files are updated the updates are applied to all of the replicas.

Fault Tolerance

A fault tolerant system is one that allows users and application programs to complete their tasks despite the failure of hardware or software components. Methods for achieving fault tolerance include hardware redundancy (e.g. hot standby machines and peer testing) and software recovery mechanisms (e.g. check-pointing, transaction logs and `roll back’ mechanisms).

Concurrency

- basic issues are the same as in traditional undistributed operating systems

- complicated by the distribution of processesand resources

- all processes responsible for maintaining an object in a distributed environment areresponsible for ensuring that operationsare appropriately synchronized to ensurethe correct behavior/status of the object.

- familiar techniques from undistributed systems such as semaphores are extended toprovide distributed solutions.

Transparency

The concealment from the user and the application programmer of the separation of components so that no special procedures are to be built. 8 forms of transparencies are:

1. Access 2. Location

3. Concurrency 4. Replication 5. Failure 6. Migration 7. Performance8. Scaling

Part 6 – Applications: We will now look at some examples of how the principles that we have studied may be applied to several example scenerios. Our discussions will range from the concrete to the hypothetical.

Applications of Sharing Hardware

- CPU: compute server- Memory: cache server- Disk: file server- screen: X11 server- printer: networked printer- network bandwidth: packet transmissions

Applications of Sharing Data/Software

- Web page: web server- File: file server- object: whiteboard, scheduler, etc.- newsgroup content: net-news system- video/audio stream: video servers- exclusive lock: system level object for resource sharing

Application Concerning Lack of Global Clock

Q: Suppose we wish to attempt to “synchronize” the clocks on two network connected computers. How might we attempt this, what would the problems be, and how accurate would it be?

Application Concerning Lack of Global Clock

A: Call the two machines M1 and M2. Have M1 and M2 send a series of request-reply exchanges between them repeatedly to determine the average time taken for a round trip message from one to the other - call this time Tround. Then have M1 send the current time, Tnow to M2, where Tnow is M1’s current time. Have M2 set its time to Tnow+Tround/2 immediately upon receipt of M1’s communication. This would probably be accurate within 1ms on a typical LAN. The problems with the system are that there is no guarantee regarding the consistency of the delays, and no experiment that could confirm that the “/2” is the correct factor since it assumes symmetry, with no experiment possible to verify symmetry.

Application of HTML, URL’s, and HTTP

Q: What are the pros and cons of using these technologies as the basis for client server computing?

Pros:•HTML is easy to parse•URL’s are efficient resource locators•HTTP is a simple protocol that can transfer many types of media

Cons:•HTML confuses presentation with the underlying data•URL’s may point at a webpage that no longer exists•URL’s point to an entire web page (i.e. not granular enough)•HTTP is verbose and therefore not suitable for small messages

These could work as technologies for client server computing, but there are the inefficiencies mentioned, and no strong type checking exists since there is no compiler support in this scheme.

Using the technologies of HTML, URL’s, and HTTP as the basis for client server computing:

Documents

Lecture 1 – Characterization of Distributed Operating Systems