28
Rust as a Language for High Performance Garbage Collector Implementation Yi Lin, Stephen M. Blackurn, Antony L. Hosking, Michael Norrish

Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Rust as a Language

for High Performance

Garbage Collector

ImplementationYi Lin, Stephen M. Blackurn, Antony L. Hosking, Michael Norrish

Page 2: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

1Introduction

Page 3: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Motivation for Thesis

▪ Fast yet robust garbage collector is key to garbage

collected language runtimes

▫ Manipulate raw memory with optimized code

▫ Rich in concurrency and thread parallelism

▫ Prone to memory bugs and race conditions

▪ Importance of high performance encourages use of

languages such as C / C++

▫ Weak type system, lack of memory safety and lack of

integrated support for concurrency

▫ Developers are solely responsible for memory and

thread safety

▪ Rust is a systems programming language that runs

blazingly fast, prevents segfaults, and guarantees thread

safety

Page 4: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Overview of Thesis

▪ Overview of Rust

▪ Implementing an Immix Garbage Collector in

Rust

▫ Overview of an Immix Garbage Collector

▫ Distinct Elements of Rust

▫ Abusing Rust

▪ Evaluation of Garbage Collector using Rust

▫ Extent of utilizing Rust’s safety

▫ Comparing Immix in Rust and Immix in C

▫ Comparing Immix in Rust and BDW in C

▪ Conclusion

Page 5: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

2Overview of Rust

Page 6: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Rust: Introduction

▪ Open source, sponsored by

Mozilla Research

▪ Designed to be safe,

concurrent and practical

▪ Syntactically similar to

C++, but designed for better

memory safety while

maintaining performance

▪ First place for “Most loved

programming language” in the

Stack Overflow Developer

Survey in 2016 and 2017

Page 7: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Rust: Ownership

▪ Grants variable unique ownership

of the value it is bound to

▪ Unbound variables are not allowed

▪ Rebinding involves move semantics

▪ Ownership expires and resources

reclaimed when variable goes out

of scope

▪ Key concept for memory safety

Page 8: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Rust: References

▪ Borrow reference to access objects

▫ Less expensive - compiler does not need

extra code for destruction on expiry

▪ Immutable vs mutable

▫ 1+ coexisting immutable references

▫ 1 mutable and 0 immutable references

▪ Ownership cannot be moved when

borrowed

▫ Eliminates data races due to mutual

exclusivity of mutable (write) and

immutable (read) references

Page 9: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Rust: Data Guarantees

▪ Provides various wrapper types with

different guarantees and trade offs

▪ Box<T>: Pointer which owns a piece of

heap allocated data

▪ Arc<T>: Provides an atomically

reference-counted shared pointer to data

▫ All data stays accessible until every

Arc<T> goes out of scope

▪ Arc<Mutex<T>>: Provides mutual exclusive

lock and allows sharing the mutex lock

across threads

Page 10: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Rust: Unsafe Code

▪ Rust’s safety sometimes become too

restrictive or expensive

▪ Allows for unsafe code

▫ Raw pointers (e.g. *mut T)

▫ Forcefully allowing sharing data across

threads (e.g. unsafe impl Sync for T{})

▫ Intrinsic functions (e.g. mem::transmute())

▫ External functions from other languages

(e.g. libc::malloc())

▪ Alert programmers by requiring unsafe

code to be marked with unsafe

Page 11: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

3Implementing an Immix

Garbage Collector in Rust

Page 12: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Overview of Immix

▪ High performance garbage collector

▪ Mark-region Collector (Naive)

▫ Memory divided into fixed size regions

▫ Each region is either “free” or “unavailable”

▫ Allocates into free regions until all are used

▫ Triggers collection and marks free regions

▪ Optimizes Mark-region by operating at 2 levels

▫ Coarse grained blocks

▫ Fine grained lines

▪ Uses Opportunistic Defragmentation

▫ Identify source and target blocks

▫ Evacuate objects from source to target

▪ http://www.cs.utexas.edu/users/speedway/DaCapo/papers/im

mix-pldi-2008.pdf

Page 13: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Immix: Optimized Mark-Region▪ Initial Allocation: All blocks are initially empty.

Allocator obtains empty block and allocates object. When

block is exhausted, request a new block. Repeat until heap

is exhausted, then trigger collection.

▪ Identification: Collector traces object graph and marks

objects and lines in a line map

▪ Reclamation: Performs a coarse-grained sweep, identifying

free blocks and lines. Returns free blocks to global pool

and recycles semi-free blocks

▪ Steady State Allocation: Resume allocation into recycled

blocks, skipping over full and empty blocks. Allocator

scans line map to find holes in a recycled block to

allocate objects until the recycled block is exhausted.

Page 14: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Immix: Opportunistic Defragmentation

▪ Identify source and target blocks

▪ Evacuate objects from source to target

Target Source

▪ Single pass to mark and copy

Page 15: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Utilizing Rust: Goals for Immix

▪ Use Immix as proof of concept implementation

▫ High-performance garbage collector

▫ Interesting characteristics beyond simple

mark-sweep or copying collector

▫ Well-documented publicly available reference

▪ Three key principles:

▫ Collector must be high performance

▫ Do not use unsafe code unless unavoidable

▫ Do not modify the Rust language in any way

▪ Four distinct elements in using Rust

▫ Encapsulating Address Types

▫ Ownership of Memory Blocks

▫ Globally Accessible Per-Thread State

▫ Library-Supported Parallelism

Page 16: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Encapsulating Address Types (1)

▪ Address: Arbitrary location in memory space

managed by GC (address arithmetic is necessary)

▪ Object Reference: Maps directly to a

language-level object in raw memory

▪ Important to abstract over both raw addresses

and references to user-level objects

▫ Offers type safety and disambiguation

▫ Differentiate addresses and object references

■ Object reference → Address : Valid

■ Address → Object reference : Unsafe

▪ Must be efficient in space and time

▫ Used pervasively in GC implementation

Page 17: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Encapsulating Address Types (2)▪ Abstract over word-width integer, usize

▫ Disables operations on inner type

▫ New operations on abstract type

▫ No overhead in type size

▫ Static methods can be marked with

#[inline(always)] to remove call

overhead

▪ Restrict creation of Address from raw

pointer or existing Address

▪ Exception for Address::zero()

▫ Initial value for fields with type

Address within other structs

▫ Safer Alternative: Option<Address>

■ Performance overhead (4%)

▪ ObjectReference similar to Address

▫ Access to per-object memory manager

metadata

▫ ObjectReference can always be safely

cast to Address, but not vise-versa

Page 18: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Ownership of Memory Blocks▪ Memory Manager must ensure it correctly

manages raw memory blocks to thread-local

allocators, ensuring exclusive ownership

of any given raw block

▪ Guarantee each block is either usable,

used, or being allocated by unique thread

▫ Create Block objects

▪ Global memory pool owns the Blocks

▫ Arranges into list of usable and used

▪ When allocator attempts to allocate

▫ Acquires ownership from usable list

▫ Gets memory address and allocation

context from the Block

▫ Allocations into corresponding memory

▪ When thread-local memory Block is full

▫ Block is returned to the used list

▫ Waits there for collection

▫ Moved to usable list if freed

Page 19: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Globally Accessible Per-Thread State

▪ Thread-local allocator avoids costly

synchronization due to mutual exclusion

among allocators

▪ However, some parts of thread-local data

structure may be shared at collector time

▫ e.g. Allocators told to yield by

collector thread

▫ Rust does not allow for mixed

ownership

▪ Break per-thread Allocator into 2 parts

▫ Thread-local Part:

■ Data that is accessible strictly

within the current thread

■ Arc reference to its global part

▫ Global Part:

■ Safe wrapper for mutable fields

■ Allows shared per-thread data to

be safely accessed globally

Page 20: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Library-Supported Parallelism

▪ Efficiency depends on implementation of fast, correct, parallel

work queues (mark stack) for pending work

▫ Thread finds new marking work → add object reference to

work queue

▫ Thread needs work → take from work queue

▪ Safe abstractions from standard and external libraries in Rust

▫ std::sync::mpsc

■ Multiple-producers single-consumer FIFO queue

▫ crossbeam::sync::chase_lev

■ Work stealing deque, multiple stealers & single worker

▪ Parallel collector starts single-threaded, work on local queue

▪ Turns into controller and with multiple stealer collectors

▪ Controller creates asynchronous mpsc channel and shared deque

▫ Collector keeps channel’s receiver end and deque’s worker

▪ Controller is responsible for receiving object references from

stealer threads and pushing them onto the shared deque

▪ Stealers steal work from deque, perform mark and trace, then

push references to local queue (for thread local tracing) or

global deque (if local queue is full)

Page 21: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Abusing Rust▪ Safety model too restrictive at times, so must use unsafe code

▪ Implementing the Line Mark Table

▫ Remember the state of every line in memory

▫ Map of unsigned bytes (u8) for every 256-bytes of memory

▪ Allocation: Multiple allocators may access line mark table

▫ Rust array of u8 disallows concurrent writing

▪ Collection: Set lines to live by atomically storing to a byte

▫ Rust does not support Atomic unsigned bytes

▪ Work Around: Generalize Line Mark

Table as AddressMapTable

▪ Wrap unsafe code into impl of

AddressMapTable

▪ Rely on compiler to generate x86

Byte store which is atomic

Page 22: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

4Evaluation of Garbage

Collector in Rust

Page 23: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Utilization of Rust’s Safety

▪ 58 lines of unsafe code out of 1449 lines

▪ Mainly from:

▫ Required access to raw memory

▫ Work around Rust’s restricted semantics

▪ 96% of the implementation is safe

▪ “While Garbage Collectors are usually

considered low-level modules that operate

heavily on raw memory, the vast majority

of its code can in fact be safe and can

benefit from the implementation language

if the language offers safety”

Page 24: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Immix in Rust vs Immix in C

▪ Three performance-critical paths of collector to

run single-threaded as micro benchmarks

▫ Allocation, Object Marking, Object Tracing

▪ 50 Million objects of 24 bytes each (1200 MB of

heap memory)

▪ 2000 MB memory to control when tracing and

collection occurs (no spontaneous collection)

▪ Rust implementation matches performance of C

implementation across all micro benchmarks

Page 25: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Immix in Rust vs Immix in C▪ Library-based Parallel Mark and Trace

▫ High-level approach is also performant

▪ 1 Worker: Very large overhead (716 ms vs 605 ms)

▫ Send object references to global deque using asynchronous

channel & stealing from shared deque when local queue is

empty

▪ 2-3 Workers: Satisfactory scaling

▪ 4+ Workers: Scaling falls off slightly

▫ Share resources from same core after every core hosts 1 worker

▫ One central controller becomes performance bottleneck

Page 26: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

Immix in Rust vs BDW in C

▪ Using gcbench & mt-gcbench micro benchmarks

▫ Thread-local allocators and parallel marking

with 8 GC threads on BDW

▪ Bump pointer allocators (Immix) generally

outperforms free list allocators (BDW)

▫ Immix implementation is conservative with

stacks put precise with heap, while BDW is

conservative with both

▫ Immix implementation presumes specified heap

size with contiguous memory space while BDW

allows dynamic growing of discontiguous heap

Page 27: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

4Conclusion

Page 28: Implementation Garbage Collector for High Performance Rust ...tau/lecture/... · Rust: Introduction Open source, sponsored by Mozilla Research Designed to be safe, concurrent and

References

▪ http://users.cecs.anu.edu.au/~steveb/downloads/pdf/rust

-ismm-2016.pdf

▪ http://www.cs.utexas.edu/users/speedway/DaCapo/papers/i

mmix-pldi-2008.pdf

▪ http://slideplayer.com/slide/8827713/