37
Networking Ethan Kao CS 6410 Oct. 18 th 2011

Networking

  • Upload
    annice

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Ethan Kao CS 6410 Oct. 18 th 2011. Networking. Papers. - PowerPoint PPT Presentation

Citation preview

Page 1: Networking

Networking

Ethan KaoCS 6410Oct. 18th 2011

Page 2: Networking

Papers

Active Messages: A Mechanism for Integrated Communication and Control, Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. In Proceedings of the 19th Annual International Symposium on Computer Architecture, 1992.

U-Net: A User-Level Network Interface for Parallel and Distributed  Computing,  Von Eicken, Basu, Buch and Werner Vogels. 15th SOSP, December 1995.

Page 3: Networking

Parallel vs. Distributed Systems

Parallel System: Multiple processors – one machine Shared Memory Supercomputing

http://en.wikipedia.org/wiki/File:Distributed-parallel.svg

Page 4: Networking

Parallel vs. Distributed Systems

Distributed System: Multiple machines

linked together Distributed memory Cloud computing

http://en.wikipedia.org/wiki/File:Distributed-parallel.svg

Page 5: Networking

Challenges How to efficiently communicate?

Between processors Between machines

Active Messages U-Net

http://en.wikipedia.org/wiki/File:Distributed-parallel.svg

Page 6: Networking

Active Messages: Authors Thorsten von Eicken

Berkeley Ph.D. -> Assistant professor at Cornell -> UCSB Founded RightScale, Chief Architect at Expertcity.com

David E. Culler Professor at Berkeley

Seth Copen Goldstein Berkeley Ph.D. -> Associate professor at CMU

Klaus Erik Schauser Berkeley Ph.D. -> Associate professor at UCSB

Page 7: Networking

Active Messages: Motivation Existing message passing multiprocessors had

high communication costs

Message passing machines made inefficient use of underlying hardware capabilities nCUBE/2 CM-5 Thousands of nodes interconnected

Poor overlap between computation and communication

Page 8: Networking

Active Messages: Goals

Improve overlap between computation & communication

Aim for 100% utilization of resources

Low start-up costs for network usage

Page 9: Networking

Active Messages: Takeaways Asynchronous communication

Minimal buffering

Handler interface

Weaknesses: Address of the message handler must be

known Design needs to be hardware specific?

Page 10: Networking

Active Messages: Design

Asynchronous communication mechanism

Messages contain user-level handler address

Handler executed on message arrival Takes message off network Message body is argument Does not block

Page 11: Networking

Active Messages: Design Sender blocks until messages can be injected

into network

Receiver interrupted on message arrival - runs handler

User level program pre-allocates receiving structures Eliminates buffering

Page 12: Networking

Traditional Message Passing

• Traditional send/receive models

Page 13: Networking

Active Messages: Performance Key optimization in AM vs. send/receive is

reduction of buffering.

AM can achieve near order of magnitude reduction: nCUBE/2 AM send/handle: 11us/15us overhead nCUBE/2 async send/receive: 160us overhead

CM-5 AM : <2us overhead CM-5 blocking: 86us overhead Prototype of blocking send/receive on top of AM: 23us

overhead

Page 14: Networking

Active Messages: Split-C

Non-blocking implementations of PUT and GET

Implementations consist of a message formatter and a message handler

Page 15: Networking

Active Messages: Matrix Multiply

Multiplication of C = A x B . Processor GETS one column of A after another to perform rank-1 update with its own columns of B.

Achieves 95% of peak performance

Page 16: Networking

Message Driven Architectures Computation occurs in the message handler.

Specialized hardware -> Monsoon, J-Machine Memory allocation and scheduling required upon message

arrival Tricky to implement in hardware Expensive

In Active Messages, handler only removes messages from the network.

Threaded Abstract Machine (TAM) Parallel execution model based on Active Message Typically no memory allocation upon message arrival No test results

Page 17: Networking

Active Messages: Recap Good performance

Not a new parallel programming paradigm “Evolutionary not

Revolutionary”

AM systems?

Multiprocessor vs. Cluster

Page 18: Networking

U-Net: Authors

Thorsten von Eicken Anindya Basu

Advised by von Eicken Vineet Buch

M.S. from Cornell Co-founded Like.com -> Google

Werner Vogels Research Scientist at Cornell -> CTO of Amazon

Page 19: Networking

U-Net: Motivation Bottleneck of local area communication at kernel

Several copies of messages made Processing overhead dominates for small messages

Low round-trip latencies growing in importance Especially for small messages

Traditional networking architecture inflexible Cannot easily support new protocols or send/receive

interfaces

Page 20: Networking

U-Net: Goals Remove kernel from critical path of

communication

Provide low-latency communication in local area settings

Exploit full network bandwidth even with small messages

Facilitate the use of novel communication protocols

Page 21: Networking

U-Net: Takeaways Flexible

Low latency for smaller messages

Off the shelf hardware – good performance

Weaknesses : Multiplexing resources between processes not in

kernel Specialized NI needed?

Page 22: Networking

U-Net: Design

User level communication architecture independent

Virtualizes network devices

Kernel control of channel set-up and tear-down

Page 23: Networking

U-Net: Design

Remove kernel from critical path: send/recv

Page 24: Networking

U-Net: Control

U-Net: Multiplexes NI among all processes accessing

network Enforces protection boundaries and resource

limits

Process: Contents of each message and management of

send/recv resources (i.e. buffers)

Page 25: Networking

U-Net: Architecture Main building blocks of U-Net:

Endpoints Communication Segments Message Queues

Each process that wishes to access the network Creates one or more endpoints Associates a communication segment with each

endpoint Associates set of send, receive and free message queues

with each endpoint

Page 26: Networking

U-Net: Send & Receive

Page 27: Networking

Network

U-Net: Send Prepare packet -> place it in the comm

seg Place descriptor on the Send queue U-Net takes descriptor from queue Transfer packet from memory to network

packetU-Net NI

From Itamar Sagi

Page 28: Networking

Network

U-Net: Receive U-Net receives message and identifies Endpoint Takes free space from free queue Places message in communication cegment Places descriptor in receive queue Process takes descriptor from receive queue and

reads message

packetU-Net NI

From Itamar Sagi

U-Net NI

Page 29: Networking

U-Net: Protection Boundaries Only owning process can access:

Endpoints Communication Segments Message queues

Outgoing messages tagged with the originating endpoint

Incoming messages demultiplexed by U-Net

Page 30: Networking

U-Net: “zero-copy”

Base-level: “zero-copy” Comm segment not regarded as memory regions 1 copy betw application data structure and buffer in

comm segment Small messages held entirely in queue

Direct-access: “true zero copy” Comm segments can span entire process address space Sender can specify offset within destination comm seg

for data Difficult to implement on existing workstation hardware

Page 31: Networking

U-Net: “zero-copy”

U-Net implementations support Base-level Hardware for direct-access not available Copy overhead not a dominant cost

Kernel emulated endpoints

Page 32: Networking

U-Net: Implementation

Implemented on SPARCstations running SunOS 4.13 Fore SBA-100 interface

▪ Lack of hardware for CRC computation = overhead Fore SBA-200 interface

▪ Uses custom firmware to implement base-level architecture▪ i960 processor reprogrammed to implement U-Net directly

Small messages: 65us RTT vs. 12us for CM-5 Fiber saturated with packet sizes of 800

bytes

Page 33: Networking

UAM: Performance

Page 34: Networking

U-Net: Split-C Benchmarks

Page 35: Networking

U-Net: TCP/IP and UDP/IP

Traditional UDP and TCP over ATM performance disappointing < 55% max bandwidth for TCP

Better performance with UDP and TCP over U-Net Not bounded by kernel

resources More state awareness = better application-network

relationships

Page 36: Networking

U-Net: TCP/IP and UDP/IP

Page 37: Networking

U-Net: Discussion Main goals were to achieve low latency

communication and flexibility

NetBump