View
218
Download
0
Tags:
Embed Size (px)
Citation preview
EECE 411: Design of Distributed Software Applications
Logistics / reminders Subscribe to the mailing list! Assignment1 marks
You should receive them today Assignment2: due this Friday 11:59pm Quizzes:
Q1: Thursday next week (10/14) Q2: 11/16
EECE 411: Design of Distributed Software Applications
Roadmap
A distributed system is: a collection of independent computers that
appears to its users as a single coherent system
Components need to: Communicate
point-to-point communictaion Sockets, RPC, RMI
point-to-multipoint [last time] Multicast Epidemic communication
Cooperate [next]
EECE 411: Design of Distributed Software Applications
Quiz question
A distributed service operates on a large cluster. The service has one component running on each cluster node.
Cluster nodes might be shut down for maintenance, they might simply fail, or they might come back online.
To function correctly each service component needs an accurate list of all other nodes/service components that are active.
TO DO: Design a mechanism that provides this list
Describe the mechanism in natural language Provide the pseudocode.
EECE 411: Design of Distributed Software Applications
Threads & distributed systems: Client side issues
Essence: Client side often tailored to provide transparency
access transparency: client-side stubs for RPCs location/migration transparency: let client-side
software keep track of actual location replication transparency: multiple invocations
handled by client stub: failure transparency: can often be placed only at
client (we’re trying to mask server and communication failures).
EECE 411: Design of Distributed Software Applications
Threads & distributed systems: Client side issues
Goal: hiding network latency.
Example I: Multi-threaded Web client: Web browser scans an incoming HTML page, and finds that
more files need to be fetched. Each file is fetched by a separate thread, each doing a
(blocking) HTTP request. As files come in, the browser displays them.
Example II: Multiple request-response calls to other machines (RPC):
A client does several calls at the same time, each one by a different thread.
It then waits until all results have been returned.
Q: Why exactly does multi-threaded processing help?
EECE 411: Design of Distributed Software Applications
Server side issues: How to handle incoming requests?
iteratively vs. concurrently Why multiple threads can be a good idea?
Multithreaded File Server Example
EECE 411: Design of Distributed Software Applications
How to handle incoming requests? Iterative vs. concurrent
If concurrent one more choice: Blocking vs. non-blocking I/O
Concurrent with blocking I/O: Choice: Processes vs. threads.
Concurrent with non-blocking I/O Finite state machine based design Event driven programming
EECE 411: Design of Distributed Software Applications
Threads & distributed systems: Server side issues
Main issues are performance and structure.
Improve performance: Having an iterative server prohibits scaling in a
multiprocessor system. As with clients: concurrent servers
reduce latency (by reacting to next request while previous one is being processed) and
increase throughput Starting a thread to handle an incoming request is much
cheaper than starting a new process.
Better structure: Most servers have high I/O load. Using simple, well-
understood blocking calls may simplify the overall structure. Multithreaded programs tend to be smaller and easier to
understand due to simplified flow of control. Better isolation may sometimes be an argument for process
based servers
EECE 411: Design of Distributed Software Applications
Other issues in Server Design
How do clients find server’s location? Stateless vs. stateful server design Server clusters
EECE 411: Design of Distributed Software Applications
Servers: General organization
Basic server model: A server is a process that waits for incoming requests at a specific transport address.
In practice, there is a one-to-one mapping between a port and a service:
superserver (inetd) vs. ‘normal’ server.
EECE 411: Design of Distributed Software Applications
Servers and State (I): Stateless servers
Stateless servers: Never keep accurate information about the status of a client after having handled a request:
Don’t record whether a file has been opened (simply close it again after access)
Don’t promise to invalidate a client’s cache Don’t keep track of your clients
Consequences: Clients and servers are completely independent State inconsistencies resulted from client or server crashes are
reduced. Possible loss of performance because, e.g., a server cannot
anticipate client behavior (think of prefetching file blocks)
EECE 411: Design of Distributed Software Applications
Servers and State (II): Stateful servers
Stateful servers: Keep track of client status: E.g., record that a file has been opened, so that
prefetching can be done E.g., Knows which data a client has cached, and allows
clients to keep local copies of shared data and my promise to invalidate them
Observation: The performance of stateful servers can be extremely high, provided clients are allowed to keep local copies.
[Depending on application] reliability may not a major problem.
EECE 411: Design of Distributed Software Applications
Server clusters
Observation: Many server clusters are organized along multiple tiers
Important: The first tier is generally responsible for passing requests to an appropriate server.
EECE 411: Design of Distributed Software Applications
Request Handoff
Observation: Having the first tier handle all communication from/to the cluster may lead to a bottleneck.
Solution: Various, but a popular one is TCP-handoff
EECE 411: Design of Distributed Software Applications
Summary so far
Client and server design: processes focus Sequential vs. concurrent,
Processes vs. threads blocking vs. non-blocking Concurrent & non-blocking: Finite state machine
design Design issues for clients and servers Cluster servers
EECE 411: Design of Distributed Software Applications
Next
A distributed system is: a collection of independent computers that
appears to its users as a single coherent system
Components need to: Communicate Cooperate => support needed
Naming Synchronization
EECE 411: Design of Distributed Software Applications
Naming systems
Functionality Map: names access points (addresses)
Names are used to denote entities in a distributed system.
To operate on an entity, we need to access it at an access point (address).
Note: A location-independent name for an entity E, is independent from the addresses of the access points offered by E.