71
RPC and Data Representation + Wireless Primer Theophilus Benson

RPC and Data Representation + Wireless Primer Theophilus Benson

Embed Size (px)

Citation preview

RPC and Data Representation+

Wireless Primer

Theophilus Benson

Assignment #3

• 3A: Reliability– No flow-control or congestion control– Just sliding window. – You can use the same window value for receiver and sender windows

• 3B: TCP-like– Must implement @ least slow-start and congestion avoidance– You can add more to make your implementation more efficient or

fair

• Implementation details– You can ignore the demux function

• Only gets tested in 3B and we wouldn’t be testing it.

– When testing 3B, the propagation between any two host will be 100ms

Today

• RPC– Session management+Language– Semantic Challenges– Example RPC (Not on exam)

• Wireless– Collision detection

Why Should You Care?• RPC is used to build distributed

systems

• Used by … everyone– Google– Facebook– Twitter– Google– Yahoo

RPC – Remote Procedure Call

• Procedure calls are a well understood mechanism– Transfer control and data on a single computer

• Idea: make distributed programming look the same– Have servers export interfaces that are accessible through

local APIs– Perform the illusion behind the scenes

• 2 Major Components– Protocol to manage messages sent between client and

server– Language and compiler support

• Packing, unpacking, calling function, returning value

Problem

• If you want a call between two servers to look like a call between two methods

• What are some Key Problems?– Semantics of the communication

• APIs, how to cope with failure

– Data Representation • (think of parsing information in Assignment #2 & #3)

– Scope: should the scheme work across• Architectures• Languages• Compilers…?

RPC Session Management

Stub Functions

• Local stub functions at client and server give appearance of a local function call

• client stub– marshalls parameters -> sends to server -> waits– unmarshalls results -> returns to client

• server stub– creates socket/ports and accepts connections– receives message from client stub -> unmarshalls

parameters -> calls server function– marshalls results -> sends results to client stub

Stub Generation

• Many systems generate stub code from independent specification: IDL– IDL – Interface Description Language

• describes an interface in a language neutral way’

• Separates logical description of data from– Dispatching code– Marshalling/unmarshalling code– Data wire format

RPC Components

• Stub Compiler– Creates stub methods– Creates functions for marshalling and unmarshalling

• Dispatcher– Demultiplexes programs running on a machine– Calls the stub server function

• Protocol– At-most-once semantics (or not)– Reliability, replay caching, version matching– Fragmentation, Framing (depending on underlying

protocols)

RPC Language Support

Presentation Formatting

• How to represent data?

• Several questions:– Which data types do you want to support?

• Base types, Flat types, Complex types

– How to encode data into the wire– How to decode the data?

• Self-describing (tags)• Implicit description (the ends know)

• Several answers:– Many frameworks do these things automatically

Which data types?

• Basic types– Integers, floating point, characters– Some issues: endianness (ntohs, htons), character

encoding, IEEE 754

• Flat types– Strings, structures, arrays– Some issues: packing of structures, order,

variable length

• Complex types– Pointers! Must flatten, or serialize data structures

Data Schema

• How to parse the encoded data?

• Two Extremes: – Self-describing data: tags

• Additional information added to message to help in decoding

• Examples: field name, type, length

– Implicit: the code at both ends “knows” how to decode the message• E.g., your code for assignment 2 &3 know what

format packets should be in• Interoperability depends on well defined protocol

specification!• very difficult to change

Examples of RPC Systems

• SunRPC (now ONC RPC)– The first popular system– Used by NSF– Not popular for the wide area (security,

convenience)

• Java RMI– Popular with Java– Only works among JVMs

• DCE– Used in ActiveX and DCOM, CORBA– Stronger semantics than SunRPC, much more

complex

…even more examples

• XML-RPC, SOAP• Json-RPC• Apache Thrift

RPC Semantic Challenges

Can we maintain the same semantics?

• Mostly…

• Why not?– New failure modes: nodes, network

• Possible outcomes of failure– Procedure did not execute– Procedure executed once– Procedure executed multiple times– Procedure partially executed

• Desired: at-most-once semantics

Implementing at-most-once semantics

• Problem: request message lost– Client must retransmit requests when it gets no reply

• Problem: reply message lost– Client may retransmit previously executed request– OK if operation is idempotent– Server must keep “replay cache” to reply to already

executed requests

• Problem: server takes too long executing– Client will retransmit request already in progress– Server must recognize duplicate – could reply “in

progress”

Server Crashes

• Problem: server crashes and reply lost– Can make replay cache persistent – slow– Can hope reboot takes long enough for all clients to fail

• Problem: server crashes during execution– Can log enough to restart partial execution – slow and

hard– Can hope reboot takes long enough for all clients to fail

• Can use “cookies” to inform clients of crashes– Server gives client cookie, which is f(time of boot)– Client includes cookie with RPC– After server crash, server will reject invalid cookie

Examples of RPC

Example: Sun XDR (RFC 4506)

• External Data Representation for SunRPC

• Types: most of C types• No tags (except for array lengths)

– Code needs to know structure of message

• Usage:– Create a program description file (.x)– Run rpcgen program– Include generated .h files, use stub functions

• Very C/C++ oriented– Although encoders/decoders exist for other

languages

Example: fetch and add server

• In fadd_prot.x:

RPC Program Definition

• Rpcgen generates marshalling/unmarshalling code, stub functions, you fill out the actual code

XML

• Other extreme

• Markup language– Text based, semi-human readable– Heavily tagged (field names)– Depends on external schema for parsing– Hard to parse efficiently

<person> <name>John Doe</name> <email>[email protected]</email> </person>

Google Protocol Buffers

• Defined by Google, released to the public– Widely used internally and externally– Supports common types, service definitions– Natively generates C++/Java/Python code

• Over 20 other supported by third parties

– Efficient binary encoding, readable text encoding

• Performance– 3 to 10 times smaller than XML– 20 to 100 times faster to process

Protocol Buffers Example

message Student {required String name = 1; required int32 credits = 2;

}

(…compile with proto)Student s; s.set_name(“Jane”); s.set_credits(20);fstream output(“students.txt” , ios:out | ios:binary );s.SerializeToOstream(&output);

(…somebody else reading the file)Student s;fstream input(“students.txt” , ios:in | ios:binary ); s.ParseFromIstream();

Binary Encoding

• Integers: varints– 7 bits out of 8 to encode integers– Msb: more bits to come– Multi-byte integers: least significant group first

• Signed integers: zig-zag encoding, then varint– 0:0, -1:1, 1:2, -2:3, 2:4, …– Advantage: smaller when encoded with varint

• General:– Field number, field type (tag), value

• Strings: – Varint length, unicode representation

Apache Thrift

• Originally developed by Facebook

• Used heavily internally

• Full RPC system– Support for C++, Java, Python, PHP, Ruby, Erlang, Perl,

Haskell, C#, Cocoa, Smalltalk, and Ocaml

• Many types– Base types, list, set, map, exceptions

• Versioning support

• Many encodings (protocols) supported– Efficient binary, json encodings

Apache Avro

• Yet another newcomer

• Likely to be used for Hadoop data representation

• Encoding:– Compact binary with schema included in file– Amortized self-descriptive

• Why not just create a new encoding for Thrift?– I don’t know…

Conclusions

• RPC is good way to structure many distributed programs– Have to pay attention to different semantics,

though!

• Data: tradeoff between self-description, portability, and efficiency

• Unless you really want to bit pack your protocol, and it won’t change much, use one of the IDLs

• Parsing code is easy to get (slightly) wrong, hard to get fast– Should only do this once, for all protocols

• Which one should you use?

Wireless

Wireless

• Today: wireless networking truly ubiquitous– 802.11, 3G, (4G), WiMAX, Bluetooth, RFID,

…– Sensor networks, Internet of Things– Some new computers have no wired

networking– 4B cellphone subscribers vs. 1B computers

• What’s behind the scenes?

Wireless is different

• Signals sent by the sender don’t always reach the receiver intact– Varies with space: attenuation, multipath– Varies with time: conditions change,

interference, mobility

• Distributed: sender doesn’t know what happens at receiver

• Wireless medium is inherently shared– No easy way out with switches

Implications

• Different mechanisms needed

• Physical layer– Different knobs: antennas, transmission power, encodings

• Link Layer– Distributed medium access protocols– Topology awareness

• Network, Transport Layers– Routing, forwarding

• Most advances do not abstract away the physical and link layers

Physical Layer• Specifies physical medium

– Ethernet: Category 5 cable, 8 wires, twisted pair, R45 jack

– WiFi wireless: 2.4GHz

• Specifies the signal– 100BASE-TX: NRZI + MLT-3 encoding– 802.11b: binary and quadrature phase shift keying

(BPSK/QPSK)

• Specifies the bits– 100BASE-TX: 4B5B encoding– 802.11b @ 1-2Mbps: Barker code (1bit -> 11chips)

What can happen to signals?

• Attenuation– Signal power attenuates by ~r2 factor for omni-

directional antennas in free-space– Exponent depends on type and placement of

antennas• < 2 for directional antennas• > 2 if antennas are close to the ground

Interference

• External sources– E.g., 2.4GHz unlicensed ISM band– 802.11– 802.15.4 (ZigBee), 802.15.1 (Bluetooth)– 2.4GHz phones– Microwave ovens

• Internal sources– Nodes in the same network/protocol can (and do)

interfere

• Multipath– Self-interference (destructive)

Multipath

Picture from Cisco, Inc.

• May cause attenuation, destructive interference

Implications of Attenuation and Interference

• Reduces the ratio of Signal to Noise– Makes it hard to decode bites– Increases bit error rates

• Could make signal stronger (transmit with higher power)– Uses more energy– Increases interference to other nodes

Link Layer

• Medium Access Control– Should give 100% if one user– Should be efficient and fair if more users

• Ethernet uses CSMA/CD– Can we use CD here ?

• No! Collision happens at the receiver

• Protocols try to avoid collision in the first place

Collision Detection in Wireless

Carrier Sensing

• The initial collision detection of wireless– Listen on the channel (medium)

• If someone is transmitting then you sleep and try again later

• Else you start sending

Hidden Terminals

• A can hear B and C• B and C can’t hear each other• They both interfere at A. COLLISION!!!!!• B is a hidden terminal to C, and vice-

versa• Carrier sense at sender is useless

A CB

Exposed Terminals

• A transmits to B

• C hears the transmission, backs off, even though D would hear C– C can still be sending to D. But It wouldn’t!!!!!!!

• C is an exposed terminal to A’s transmission

• Why is it still useful for C to do CS?

A CB D

Key points

• No global view of collision– Different receivers hear different senders– Different senders reach different receivers

• Collisions happen at the receiver

• Goals of a MAC protocol– Detect if receiver can hear sender– Tell senders who might interfere with receiver to

shut up

Simple MAC: CSMA/CA

• Maintain a waiting counter c

• For each time channel is free, c--

• Transmit when c = 0

• When a collision is inferred, retransmit with exponential backoff– Use lack of ACK from receiver to infer collision– Collisions are expensive: only full packet transmissions

• How would we get ACKs if we didn’t do carrier sense?

RTS/CTS

• Idea: transmitter can check availability of channel at receiver

• Before every transmission– Sender sends an RTS (Request-to-Send)– Contains length of data (in time units)– Receiver sends a CTS (Clear-to-Send)– Sender sends data– Receiver sends ACK after transmission

• If you don’t hear a CTS, assume collision

• If you hear a CTS for someone else, shut up

Benefits of RTS/CTS

• Solves hidden terminal problem• Also solves exposed terminal• Does it?– Control frames can still collide

• If A & C send RTS as the same time

– In practice: reduces hidden terminal problem on data packets

RTS/CTS

A CBRTS

• B sends to A

RTS/CTS

A CBCTS

• B sends to A• A responds with CTS• C knows not to send.!!!

RTS/CTS

A CBData

• B sends to A• A responds with CTS• C knows not to send.!!!• Hidden Terminal solved!

Benefits of RTS/CTS

• Solves hidden terminal problem• Also solves exposed terminal• Does it?– Control frames can still collide

• If A & C send RTS as the same time

– In practice: reduces hidden terminal problem on data packets

Issues with RTS/CTS

• A sends to B• C hears RTS but not CTS

A CBRTS

DRTS

Issues with RTS/CTS

• A sends to B• C hears RTS but not CTS

A CBCTS

DCTS

Issues with RTS/CTS

• A sends to B• C hears RTS but not CTS

– So, C can send!

• C sends to D• Exposed terminal solved!!!

A CBRTS

DRTS

Benefits of RTS/CTS

• Solves hidden terminal problem• Also solves exposed terminal• Does it?– Control frames can still collide

• If A & C send RTS as the same time • Also, CTS can get lost!!

– In practice: reduces hidden terminal problem on data packets

RTS Loss

A CBRTS RTS

RTS Loss

A CBRTS RTS

Benefits of RTS/CTS

• Solves hidden terminal problem• Also solves exposed terminal• Does it?– Control frames can still collide

• If A & C send RTS as the same time • Also, CTS can get lost!!

– In practice: reduces hidden terminal problem on data packets

CTS Loss

• T0: B sends RTS to A

A CBRTS

DRTS

CTS Loss

• T0: B sends RTS to A• T1: A responds with CTS WHILE D

also sends RTS

A CB DRTS RTS RTS RTS

CTS Loss

• T0: B sends RTS to A• T1: A responds with CTS WHILE D also

sends RTS to E• T1: This CTS is loss

– C doesn’t know about B->A

A CB DCTS CTS RTS RTS

Drawbacks of RTS/CTS

• Overhead is too large for small packets– 3 packets per packet: RTS/CTS/Data (4-

22% for 802.11b)

• RTS still goes through CSMA: can be lost

• CTS loss causes lengthy retries• 33% of IP packets are TCP ACKs• In practice, WiFi doesn’t use

RTS/CTS

Other MAC Strategies

• Time Division Multiplexing (TDMA)– Central controller allocates a time slot for each

sender– May be inefficient when not everyone sending

• Frequency Division– Multiplexing two networks on same space– Nodes with two radios (think graph coloring)– Different frequency for upload and download

ISM Band Channels

Sometimes you can’t (or shouldn’t) hide that you are on wireless!

• Three examples of relaxing the layering abstraction

Examples of Breaking Abstractions

• TCP over wireless– Packet losses have a strong impact on TCP

performance– Snoop TCP: hide retransmissions from TCP end-

points– Distinguish congestion from wireless losses

Summary

• Wireless presents many challenges– Across all layers– Encoding/Modulation (we’re doing pretty well

here)