20
Winter, 2004 CSS490 Fundamentals 1 CSS490 Fundamentals CSS490 Fundamentals Textbook Ch1 Textbook Ch1 Instructor: Munehiro Fukuda These slides were compiled from the course textbook and the reference books.

Fundamentals

Embed Size (px)

Citation preview

Page 1: Fundamentals

Winter, 2004 CSS490 Fundamentals 1

CSS490 FundamentalsCSS490 FundamentalsTextbook Ch1Textbook Ch1

Instructor: Munehiro Fukuda

These slides were compiled from the course textbook and the reference books.

Page 2: Fundamentals

Winter, 2004 CSS490 Fundamentals 2

Parallel v.s. Distributed Systems

Parallel Systems Distributed Systems

Memory Tightly coupled shared memoryUMA, NUMA

Distributed memoryMessage passing, RPC, and/or used of distributed shared memory

Control Global clock controlSIMD, MIMD

No global clock controlSynchronization algorithms needed

Processor interconnection

Order of TbpsBus, mesh, tree, mesh of tree, and hypercube (-related) network

Order of GbpsEthernet(bus), token ring and SCI (ring), myrinet(switching network)

Main focus PerformanceScientific computing

Performance(cost and scalabil i ty)Reliabil i ty/availabil i tyInformation/resource sharing

Page 3: Fundamentals

Winter, 2004 CSS490 Fundamentals 3

Milestones in Distributed Computing Systems1945-1950s Loading monitor1950s-1960s Batch system1960s Multiprogramming1960s-1970s Time sharing systems Multics, IBM3601969-1973 WAN and LAN ARPAnet, Ethernet1960s-early1980s Minicomputers PDP, VAXEarly 1980s Workstations Alto1980s – present Workstation/Server models Sprite, V-system1990s Clusters BeowulfLate 1990s Grid computing Globus, Legion

Page 4: Fundamentals

Winter, 2004 CSS490 Fundamentals 4

System Models Minicomputer model Workstation model Workstation-server model Processor-pool model Cluster model Grid computing

Page 5: Fundamentals

Winter, 2004 CSS490 Fundamentals 5

Minicomputer Model

Extension of Time sharing system User must log on his/her home minicomputer. Thereafter, he/she can log on a remote machine by telnet.

Resource sharing Database High-performance devices

Mini-computer

Mini-computer

Mini-computer

ARPAnet

Page 6: Fundamentals

Winter, 2004 CSS490 Fundamentals 6

Workstation Model

Process migration Users first log on his/her personal workstation. If there are idle remote workstations, a heavy job may

migrate to one of them. Problems:

How to find am idle workstation How to migrate a job What if a user log on the remote machine

100GbpsLAN

Workstation

Workstation Workstation

WorkstationWorkstation

Page 7: Fundamentals

Winter, 2004 CSS490 Fundamentals 7

Workstation-Server Model Client workstations

Diskless Graphic/interactive applications processed in local All file, print, http and even cycle computation

requests are sent to servers. Server minicomputers

Each minicomputer is dedicated to one or more different types of services.

Client-Server model of communication RPC (Remote Procedure Call) RMI (Remote Method Invocation)

A Client process calls a server process’ function.

No process migration invoked Example: NSF

100GbpsLAN

Workstation

Workstation Workstation

Mini-Computerf i le server

Mini-Computer

http server

Mini-Computer

cycle server

Page 8: Fundamentals

Winter, 2004 CSS490 Fundamentals 8

Processor-Pool Model Clients:

They log in one of terminals (diskless workstations or X terminals)

All services are dispatched to servers.

Servers: Necessary number of processors

are allocated to each user from the pool.

Better utilization but less interactivityServer 1

100GbpsLAN

Server N

Page 9: Fundamentals

Winter, 2004 CSS490 Fundamentals 9

Cluster Model Client

Takes a client-server model

Server Consists of many

PC/workstations connected to a high-speed network.

Puts more focus on performance: serves for requests in parallel.

100GbpsLAN

Workstation

Workstation Workstation

Masternode

Slave1

SlaveN

Slave2

1Gbps SAN

http server1

http server2

http server N

Page 10: Fundamentals

Winter, 2004 CSS490 Fundamentals 10

High-speedInformation high way

Grid Computing Goal

Collect computing power of supercomputers and clusters sparsely located over the nation and make it available as if it were the electric grid

Distributed Supercomputing Very large problems needing lots of CPU,

memory, etc. High-Throughput Computing

Harnessing many idle resources On-Demand Computing

Remote resources integrated with local computation

Data-intensive Computing Using distributed data

Collaborative Computing Support communication among multiple parties

Super-computer

Cluster

Super-computer Cluster

Mini-computer

Workstation

Workstation Workstation

Page 11: Fundamentals

Winter, 2004 CSS490 Fundamentals 11

Reasons for Distributed Computing Systems

Inherently distributed applications Distributed DB, worldwide airline reservation, banking system

Information sharing among distributed users CSCW or groupware

Resource sharing Sharing DB/expensive hardware and controlling remote lab. devices

Better cost-performance ratio / Performance Emergence of Gbit network and high-speed/cheap MPUs Effective for coarse-grained or embarrassingly parallel applications

Reliability Non-stopping (availability) and voting features.

Scalability Loosely coupled connection and hot plug-in

Flexibility Reconfigure the system to meet users’ requirements

Page 12: Fundamentals

Winter, 2004 CSS490 Fundamentals 12

Network v.s. Distributed Operating Systems

Features Network OS Distributed OS

SSI(Single System Image)

NOSsh, sftp, no view of remote memory

YESProcess migration, NFS,DSM (Distr. Shared memory)

Autonomy High Local OS at each computerNo global job coordination

LowA single system-wide OSGlobal job coordination

Fault Tolerance Unavailability grows as faulty machines increase.

Unavailability remains little even if fault machines increase.

Page 13: Fundamentals

Winter, 2004 CSS490 Fundamentals 13

Issues in Distributed Computing SystemTransparency (=SSI)

Access transparency Memory access: DSM Function call: RPC and RMI

Location transparency File naming: NFS Domain naming: DNS (Still location concerned.)

Migration transparency Automatic state capturing and migration

Concurrency transparency Event ordering: Message delivery and memory consistency

Other transparency: Failure, Replication, Performance, and Scaling

Page 14: Fundamentals

Winter, 2004 CSS490 Fundamentals 14

Issues in Distributed Computing System Reliability

Faults Fail stop Byzantine failure

Fault avoidance The more machines involved, the less avoidance capability

Fault tolerance Redundancy techniques

K-fault tolerance needs K + 1 replicas K-Byzantine failures needs 2K + 1 replicas.

Distributed control Avoiding a complete fail stop

Fault detection and recovery Atomic transaction Stateless servers

Page 15: Fundamentals

Winter, 2004 CSS490 Fundamentals 15

Flexibility Ease of modification Ease of enhancement

Network

MonolithicKernel(Unix)

MonolithicKernel(Unix)

MonolithicKernel(Unix)

Userapplications

Userapplications

Userapplications

Network

Microkernel(Mach)

Userapplications

Userapplications

Userapplications

Daemons(file, name,

Paing)Microkernel

(Mach)

Daemons(file, name,

Paing)Microkernel

(Mach)

Daemons(file, name,

Paing)

Page 16: Fundamentals

Winter, 2004 CSS490 Fundamentals 16

Performance/ScalabilityUnlike parallel systems, distributed systems involves OS intervention and slow network medium for data transfer

Send messages in a batch: Avoid OS intervention for every message transfer.

Cache data Avoid repeating the same data transfer

Minimizing data copy Avoid OS intervention (= zero-copy messaging).

Avoid centralized entities and algorithms Avoid network saturation.

Perform post operations on client sides Avoid heavy traffic between clients and servers

Page 17: Fundamentals

Winter, 2004 CSS490 Fundamentals 17

Heterogeneity Data and instruction formats depend on each machine

architecture

If a system consists of K different machine types, we need K–1 translation software.

If we have an architecture-independent standard data/instruction formats, each different machine prepares only such a standard translation software. Java and Java virtual machine

Page 18: Fundamentals

Winter, 2004 CSS490 Fundamentals 18

Security Lack of a single point of control Security concerns:

Messages may be stolen by an intruder. Messages may be plagiarized by an intruder. Messages may be changed by an intruder.

Cryptography is the only known practical method.

Page 19: Fundamentals

Winter, 2004 CSS490 Fundamentals 19

Distributed Computing Environment

Various 0perating systems and networking

Threads

Distributed File Service

RPC

Security

Name

Distributed Time Service

DCE Applications

Page 20: Fundamentals

Winter, 2004 CSS490 Fundamentals 20

Exercises (No turn-in)1. In what respect are distributed computing systems superior

to parallel systems?2. In what respect are parallel systems superior to distributed

computing systems?3. Discuss the difference between the workstation-server and

the processor-pool model from the availability view point.4. Discuss the difference between the processor-pool and the

cluster model from the performance view point. 5. What is Byzantine failure? Why do we need 2k+1 replica for

this type of failure?6. Discuss about pros and cons of Microkernel.7. Why can we avoid OS intervention by zero copy?