22
Lecture 8 Epidemic communication, Server implementation

Lecture 8 Epidemic communication, Server implementation

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Lecture 8

Epidemic communication, Server implementation

EECE 411: Design of Distributed Software Applications

Logistics / reminders Subscribe to the mailing list! Assignment1 marks

You should receive them today Assignment2: due this Friday 11:59pm Quizzes:

Q1: Thursday next week (10/14) Q2: 11/16

EECE 411: Design of Distributed Software Applications

Roadmap

A distributed system is: a collection of independent computers that

appears to its users as a single coherent system

Components need to: Communicate

point-to-point communictaion Sockets, RPC, RMI

point-to-multipoint [last time] Multicast Epidemic communication

Cooperate [next]

EECE 411: Design of Distributed Software Applications

Quiz question

A distributed service operates on a large cluster. The service has one component running on each cluster node.

Cluster nodes might be shut down for maintenance, they might simply fail, or they might come back online.

To function correctly each service component needs an accurate list of all other nodes/service components that are active.

TO DO: Design a mechanism that provides this list

Describe the mechanism in natural language Provide the pseudocode.

EECE 411: Design of Distributed Software Applications

Detour: processes and threads

EECE 411: Design of Distributed Software Applications

Threads & distributed systems: Client side issues

Essence: Client side often tailored to provide transparency

access transparency: client-side stubs for RPCs location/migration transparency: let client-side

software keep track of actual location replication transparency: multiple invocations

handled by client stub: failure transparency: can often be placed only at

client (we’re trying to mask server and communication failures).

EECE 411: Design of Distributed Software Applications

Threads & distributed systems: Client side issues

Goal: hiding network latency.

Example I: Multi-threaded Web client: Web browser scans an incoming HTML page, and finds that

more files need to be fetched. Each file is fetched by a separate thread, each doing a

(blocking) HTTP request. As files come in, the browser displays them.

Example II: Multiple request-response calls to other machines (RPC):

A client does several calls at the same time, each one by a different thread.

It then waits until all results have been returned.

Q: Why exactly does multi-threaded processing help?

EECE 411: Design of Distributed Software Applications

Server side issues: How to handle incoming requests?

iteratively vs. concurrently Why multiple threads can be a good idea?

Multithreaded File Server Example

EECE 411: Design of Distributed Software Applications

How to handle incoming requests? Iterative vs. concurrent

If concurrent one more choice: Blocking vs. non-blocking I/O

Concurrent with blocking I/O: Choice: Processes vs. threads.

Concurrent with non-blocking I/O Finite state machine based design Event driven programming

EECE 411: Design of Distributed Software Applications

Threads & distributed systems: Server side issues

Main issues are performance and structure.

Improve performance: Having an iterative server prohibits scaling in a

multiprocessor system. As with clients: concurrent servers

reduce latency (by reacting to next request while previous one is being processed) and

increase throughput Starting a thread to handle an incoming request is much

cheaper than starting a new process.

Better structure: Most servers have high I/O load. Using simple, well-

understood blocking calls may simplify the overall structure. Multithreaded programs tend to be smaller and easier to

understand due to simplified flow of control. Better isolation may sometimes be an argument for process

based servers

EECE 411: Design of Distributed Software Applications

Other issues in Server Design

How do clients find server’s location? Stateless vs. stateful server design Server clusters

EECE 411: Design of Distributed Software Applications

Servers: General organization

Basic server model: A server is a process that waits for incoming requests at a specific transport address.

In practice, there is a one-to-one mapping between a port and a service:

superserver (inetd) vs. ‘normal’ server.

EECE 411: Design of Distributed Software Applications

Servers and State:

File system server example

EECE 411: Design of Distributed Software Applications

Servers and State (I): Stateless servers

Stateless servers: Never keep accurate information about the status of a client after having handled a request:

Don’t record whether a file has been opened (simply close it again after access)

Don’t promise to invalidate a client’s cache Don’t keep track of your clients

Consequences: Clients and servers are completely independent State inconsistencies resulted from client or server crashes are

reduced. Possible loss of performance because, e.g., a server cannot

anticipate client behavior (think of prefetching file blocks)

EECE 411: Design of Distributed Software Applications

Servers and State (II): Stateful servers

Stateful servers: Keep track of client status: E.g., record that a file has been opened, so that

prefetching can be done E.g., Knows which data a client has cached, and allows

clients to keep local copies of shared data and my promise to invalidate them

Observation: The performance of stateful servers can be extremely high, provided clients are allowed to keep local copies.

[Depending on application] reliability may not a major problem.

EECE 411: Design of Distributed Software Applications

Server clusters

Observation: Many server clusters are organized along multiple tiers

Important: The first tier is generally responsible for passing requests to an appropriate server.

EECE 411: Design of Distributed Software Applications

Request Handoff

Observation: Having the first tier handle all communication from/to the cluster may lead to a bottleneck.

Solution: Various, but a popular one is TCP-handoff

EECE 411: Design of Distributed Software Applications

Summary so far

Client and server design: processes focus Sequential vs. concurrent,

Processes vs. threads blocking vs. non-blocking Concurrent & non-blocking: Finite state machine

design Design issues for clients and servers Cluster servers

EECE 411: Design of Distributed Software Applications

Next

A distributed system is: a collection of independent computers that

appears to its users as a single coherent system

Components need to: Communicate Cooperate => support needed

Naming Synchronization

EECE 411: Design of Distributed Software Applications

Naming systems

Functionality Map: names access points (addresses)

Names are used to denote entities in a distributed system.

To operate on an entity, we need to access it at an access point (address).

Note: A location-independent name for an entity E, is independent from the addresses of the access points offered by E.

EECE 411: Design of Distributed Software Applications

Names are valuable!

NYT, August’00

EECE 411: Design of Distributed Software Applications

Naming systems

Functionality Map: names access points (addresses)

One challenge: scaling #of names, #clients geographical distribution, Management!