If you can't read please download the document
Upload
morgan
View
22
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Fast Communication. Firefly RPC Lightweight RPC CS 614 Tuesday March 13, 2001 Jeff Hoy. Why Remote Procedure Call?. Simplify building distributed systems and applications Looks like local procedure call Transparent to user Balance between semantics and efficiency - PowerPoint PPT Presentation
Citation preview
Fast CommunicationFirefly RPCLightweight RPCCS 614Tuesday March 13, 2001Jeff Hoy
Why Remote Procedure Call?Simplify building distributed systems and applicationsLooks like local procedure callTransparent to userBalance between semantics and efficiencyUniversal programming toolSecure inter-process communication
RPC ModelClient ApplicationClient StubClient RuntimeServer ApplicationServer StubServer RuntimeNetworkCallReturn
RPC In Modern ComputingCORBA and Internet Inter-ORB Protocol (IIOP)Each CORBA server object exposes a set of methodsDCOM and Object RPCBuilt on top of RPCJava and Java Remote Method Protocol (JRMP)Interface exposes a set of methodsXML-RPC, SOAPRPC over HTTP and XML
GoalsFirefly RPCInter-machine CommunicationMaintain Security and FunctionalitySpeedLightweight RPCIntra-machine CommunicationMaintain Security and FunctionalitySpeed
Firefly RPCHardwareDEC Firefly multiprocessor1 to 5 MicroVAX CPUs per nodeConcurrency considerations10 megabit EthernetTakes advantage of 5 CPUs
Fast Path in a RPCTransport MechanismsIP / UDPDECNet byte streamShared Memory (intra-machine only)Determined at bind timeInside transport procedures Starter, Transporter, Ender, and Receiver for the server
Caller StubGets control from calling programCalls Starter for packet bufferCopies arguments into the bufferCalls Transporter and waits for replyCopies result data onto callers result variablesCalls Ender and frees result packet
Server StubReceives incoming packetCopies data into stack, a new data block, or left in the packetCalls server procedureCopies result into the call packet and transmit
Transport MechanismTransporter procedureCompletes RPC headerCalls Sender to complete UDP, IP, and Ethernet headers (Ethernet is the chosen means of communication)Invoke Ethernet driver via kernel trap and queue the packet
Transport MechanismReceiver procedureServer thread awakens in ReceiverReceiver calls the stub interface included in the received packet, and the interface stub calls the procedure stubReply is similar
ThreadingClient Application creates RPC threadServer Application creates call thread Threads operate in server applications address spaceNo need to spawn entire processThreads need to consider locking resources
Threading
Performance EnchancementsOver traditional RPCStubs marshal arguments rather than library functions handling argumentsRPC procedures called through procedure variables rather than by lookup tableServer retains call packet for resultsBuffers reside in shared memorySacrifices abstract structure
Performance AnalysisNull() ProcedureNo arguments or return valueMeasures base latency of RPC mechanism
Multi-threaded caller and server
Time for 10,000 RPCsBase latency 2.66msMaxResult latency (1500 bytes) 6.35ms
Send and Receive Latency
Send and Receive LatencyWith larger packets, transmission time dominatesOverhead becomes less of an issueGood for Firefly RPC, assuming large transmission over networkIs overhead acceptable for intra-machine communication?
Stub LatencySignificant overhead for small packets
Fewer ProcessorsSeconds for 1,000 Null() calls
Fewer ProcessorsWhy the slowdown with one processor?Fast path can be followed only in multiprocessor environmentLock conflicts, scheduling problemsWhy little speedup past two processors?
Future ImprovementsHardwareFaster network will help larger packetsTriple CPU speed will reduce Null() time by 52% and MaxResult by 36%SoftwareOmit IP and UDP headers for Ethernet datagrams, 2~4% gainRedesign RPC protocol ~ 5% gainBusy thread wait, 10~15% gainWrite more in assembler, 5~10% gain
Other ImprovementsFirefly RPC handles intra-machine communication through the same mechanisms as inter-machine communicationFirefly RPC also has very high overhead for small packetsDoes this matter?
RPC Size DistributionMajority of RPC transfers under 200 bytes
Frequency of Remote ActivityMost calls are to the same machine
Traditional RPCMost calls are small messages that take place between domains of the same machineTraditional RPC contains unnecessary overhead, likeSchedulingCopyingAccess validation
Lightweight RPC (LRPC)Also written for the DEC Firefly systemMechanism for communication between different protection domains on the same systemSignificant performance improvements over traditional RPC
Overhead AnalysisTheoretical minimum to invoke Null() across domains: kernal trap + context change to call and a trap + context change to returnTheoretical minimum on Firefly RPC: 109 us.Actual cost: 464us
Sources of Overhead355us addedStub overheadMessage buffer overheadNot so much in Firefly RPCMessage transfer and flow controlScheduling and abstract threadsContext Switch
Implementation of LRPCSimilar to RPCCall to server is done through kernel trap Kernel validates the callerServers export interfacesClients bind to server interfaces before making a call
BindingServers export interfaces through a clerkThe clerk registers the interfaceClients bind to the interface through a call to the kernelServer replies with an entry address and size of its A-stackClient gets a Binding Object from the kernel
CallingEach procedure is represented by a stubClient makes a call through the stubManages A-stacksTraps to the kernelKernel switches context to the serverServer returns by its own stubNo verification needed
Stub GenerationProcedure representationCall stub for clientEntry stub for serverLRPC merges protocol layersStub generator creates run-time stubs in assembly languagePortability sacrificed for Performance Falls back on Modula2+ for complex calls
Multiple ProcessorsLRPC caches domains on idle processorsKernel checks for an idling processor in the server domainIf a processor is found, caller thread can execute on the idle processor without switching context
Argument CopyingTraditional RPC copies arguments four times for intra-machine callsClient stub to RPC message to kernels message to servers message to servers stackIn many cases, LRPC needs to copy the arguments only onceClient stub to A-stack
Performance AnalysisLRPC is roughly three times faster than traditional RPCNull() LRPC cost: 157us, close to the 109us theoretical minimumAdditional overhead from stub generation and kernel execution
Single-Processor Null() LRPC
Performance ComparisonLRPC versus traditional RPC (in us)
Multiprocessor Speedup
Inter-machine CommunicationLRPC is best for messages between domains on the on the same machineThe first instruction of the LRPC stub checks if the call is cross-machineIf so, stub branches to conventional RPCLarger messages are handled well, LRPC scales by packet size linearly like traditional RPC
CostLRPC avoids needless scheduling, copying, and locking by integrating the client, kernel, server, and message protocolsAbstraction is sacrificed for functionalityRPC is built into operating systems (Linux DCE RPC, MS RPC)
ConclusionFirefly RPC is fast compared to most RPC implementations. LRPC is even faster. Are they fast enough?The performance of Firefly RPC is now good enough that programmers accept it as the standard way to communicate (1990)Is speed still an issue?