19
Improving IPC by Kernel Design By Jochen Liedtke German National Research Center for Computer Science Presented By Srinivas Sundaravaradan

Presented By Srinivas Sundaravaradan

  • Upload
    brier

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Improving IPC by Kernel Design By Jochen Liedtke German National Research Center for Computer Science . Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC. L3 Similar to MACH - PowerPoint PPT Presentation

Citation preview

Page 1: Presented By  Srinivas Sundaravaradan

Improving IPC by Kernel Design By

Jochen Liedtke

German National Research Center for Computer Science

Presented By Srinivas Sundaravaradan

Page 2: Presented By  Srinivas Sundaravaradan

MACH µ-Kernel system based on message passing

Over 5000 cycles to transfer a short message

Buffering IPC

L3 Similar to MACH

Hardware Interrupts delivered through messages

No Ports

Page 3: Presented By  Srinivas Sundaravaradan

Design PhilosophyFocus on IPC

Any Feature that will increase cost must be closely evaluated. When in doubt, design in favor of IPC

Design for Performance A poorly performing technique is unacceptable Evaluate feature cost compared to concrete baseline Aim for a concrete performance goal

Comprehensive Design Consider synergistic effects of all methods and techniques Cover all levels of implementation, from design to code

Page 4: Presented By  Srinivas Sundaravaradan

Making IPC fasterFewer

Call / Reply & Receive NextCombining messages

Faster15 other optimizations

Architectural levelUse redesign of L3 as opportunity to change kernel design

Page 5: Presented By  Srinivas Sundaravaradan

MethodologyTheoretical minimum

Null message between address spacesreceiver is ready to receive it107 cycles to enter & leave kernel45 cycles for TLB misses172 cycles

Goal350 cyclesAchieved 250 cycles = T

Page 6: Presented By  Srinivas Sundaravaradan

Minimize system calls Why minimize system calls ?

60% of T

Traditional IPC4 system calls

Solution CallReply & Receive next

Page 7: Presented By  Srinivas Sundaravaradan

Minimize system calls

Unblocked

Blocked

Send

Receive (reply)

Send (reply)

Receive (next)

Blocked

Unblocked

Client

Server

Call

Reply and receive next

Receive

Page 8: Presented By  Srinivas Sundaravaradan

Complex Message

Direct String Data to be transferred directly from send buffer to receive buffer

Indirect String Location and size of data to be transferred by reference

Memory Object Description of a region of memory to be mapped in receiver address space (shared memory)

A Complex Message

Page 9: Presented By  Srinivas Sundaravaradan

Ways of Message TransferTwofold Message Copy

user space A -> kernel space -> user space B

LRPC mechanismshare user-level memorysecure ?does not support variable-to-variable transfer

Page 10: Presented By  Srinivas Sundaravaradan

Temporary Mapping…

Two copy message transfer costs 20 + 0.75n cycles

L3 copies data once to a special communication window in kernel space

Window is mapped to the receiver for the duration of the call (page directory entry)

kernel

kernel

copy

mapped with kernel-only permission

add mapping to space B

Page 11: Presented By  Srinivas Sundaravaradan

Temporary Mapping…

Top-levelPage table

2nd-level tables

framesin

memory

Page 12: Presented By  Srinivas Sundaravaradan

Temporary Mapping

Page 13: Presented By  Srinivas Sundaravaradan

Lazy SchedulingScheduler overhead is significant component of IPC cost

Threads doing IPC are often moved to wait queue only to be inserted back again onto the ready queue.

Lazy Scheduling avoid locking of queuesqueue manipulation is avoided

instruction execution TLB misses

Page 14: Presented By  Srinivas Sundaravaradan

Use registers for short messagesMessages are usually short !

ack/error replies from drivershardware interrupt messages

Intel 486 processor 7 general purpose registerssender info, data

May not work for CPU’s with fewer registers

Page 15: Presented By  Srinivas Sundaravaradan

Summary of OptimizationsArchitectural

System Calls, Messages, Direct Transfer, Strict Process Orientation, Thread Control Blocks

AlgorithmicThread Identifier, Virtual Queues, Timeouts/Wakeups, Lazy

Scheduling, Direct Process Switch, Short messagesInterface

Unnecessary Copies, Parameter passingCoding

Cache misses, TLB misses, Segment registers, General registers, Jumps and Checks, Process Switch

Page 16: Presented By  Srinivas Sundaravaradan

Results…

Page 17: Presented By  Srinivas Sundaravaradan

Results

Page 18: Presented By  Srinivas Sundaravaradan

ConclusionsL3’s message passing was 22 times faster than that of

MACH

Kernel redesign focused mainly on IPC

CaveatsPorts and BufferingSpecific to the architecture

Page 19: Presented By  Srinivas Sundaravaradan

Thank You !