Lecture 8 Epidemic communication, Server implementation

Lecture 8

Epidemic communication, Server implementation

EECE 411: Design of Distributed Software Applications

Logistics / reminders Subscribe to the mailing list! Assignment1 marks

You should receive them today Assignment2: due this Friday 11:59pm Quizzes:

Q1: Thursday next week (10/14) Q2: 11/16


Roadmap

A distributed system is: a collection of independent computers that

appears to its users as a single coherent system

Components need to: Communicate

point-to-point communictaion Sockets, RPC, RMI

point-to-multipoint [last time] Multicast Epidemic communication

Cooperate [next]


Quiz question

A distributed service operates on a large cluster. The service has one component running on each cluster node.

Cluster nodes might be shut down for maintenance, they might simply fail, or they might come back online.

To function correctly each service component needs an accurate list of all other nodes/service components that are active.

TO DO: Design a mechanism that provides this list

Describe the mechanism in natural language Provide the pseudocode.


Detour: processes and threads


Threads & distributed systems: Client side issues

Essence: Client side often tailored to provide transparency

access transparency: client-side stubs for RPCs location/migration transparency: let client-side

software keep track of actual location replication transparency: multiple invocations

handled by client stub: failure transparency: can often be placed only at

client (we’re trying to mask server and communication failures).


Threads & distributed systems: Client side issues

Goal: hiding network latency.

Example I: Multi-threaded Web client: Web browser scans an incoming HTML page, and finds that

more files need to be fetched. Each file is fetched by a separate thread, each doing a

(blocking) HTTP request. As files come in, the browser displays them.

Example II: Multiple request-response calls to other machines (RPC):

A client does several calls at the same time, each one by a different thread.

It then waits until all results have been returned.

Q: Why exactly does multi-threaded processing help?


Server side issues: How to handle incoming requests?

iteratively vs. concurrently Why multiple threads can be a good idea?

Multithreaded File Server Example


How to handle incoming requests? Iterative vs. concurrent

If concurrent one more choice: Blocking vs. non-blocking I/O

Concurrent with blocking I/O: Choice: Processes vs. threads.

Concurrent with non-blocking I/O Finite state machine based design Event driven programming


Threads & distributed systems: Server side issues

Main issues are performance and structure.

Improve performance: Having an iterative server prohibits scaling in a

multiprocessor system. As with clients: concurrent servers

reduce latency (by reacting to next request while previous one is being processed) and

increase throughput Starting a thread to handle an incoming request is much

cheaper than starting a new process.

Better structure: Most servers have high I/O load. Using simple, well-

understood blocking calls may simplify the overall structure. Multithreaded programs tend to be smaller and easier to

understand due to simplified flow of control. Better isolation may sometimes be an argument for process

based servers


Other issues in Server Design

How do clients find server’s location? Stateless vs. stateful server design Server clusters


Servers: General organization

Basic server model: A server is a process that waits for incoming requests at a specific transport address.

In practice, there is a one-to-one mapping between a port and a service:

superserver (inetd) vs. ‘normal’ server.


Servers and State:

File system server example


Servers and State (I): Stateless servers

Stateless servers: Never keep accurate information about the status of a client after having handled a request:

Don’t record whether a file has been opened (simply close it again after access)

Don’t promise to invalidate a client’s cache Don’t keep track of your clients

Consequences: Clients and servers are completely independent State inconsistencies resulted from client or server crashes are

reduced. Possible loss of performance because, e.g., a server cannot

anticipate client behavior (think of prefetching file blocks)


Servers and State (II): Stateful servers

Stateful servers: Keep track of client status: E.g., record that a file has been opened, so that

prefetching can be done E.g., Knows which data a client has cached, and allows

clients to keep local copies of shared data and my promise to invalidate them

Observation: The performance of stateful servers can be extremely high, provided clients are allowed to keep local copies.

[Depending on application] reliability may not a major problem.


Server clusters

Observation: Many server clusters are organized along multiple tiers

Important: The first tier is generally responsible for passing requests to an appropriate server.


Request Handoff

Observation: Having the first tier handle all communication from/to the cluster may lead to a bottleneck.

Solution: Various, but a popular one is TCP-handoff


Summary so far

Client and server design: processes focus Sequential vs. concurrent,

Processes vs. threads blocking vs. non-blocking Concurrent & non-blocking: Finite state machine

design Design issues for clients and servers Cluster servers


Next

A distributed system is: a collection of independent computers that

appears to its users as a single coherent system

Components need to: Communicate Cooperate => support needed

Naming Synchronization


Naming systems

Functionality Map: names access points (addresses)

Names are used to denote entities in a distributed system.

To operate on an entity, we need to access it at an access point (address).

Note: A location-independent name for an entity E, is independent from the addresses of the access points offered by E.


Names are valuable!

NYT, August’00


Naming systems

Functionality Map: names access points (addresses)

One challenge: scaling #of names, #clients geographical distribution, Management!

Documents

Lecture 8 Epidemic communication, Server implementation