Upload
diata
View
96
Download
0
Embed Size (px)
DESCRIPTION
D THREADS : Efficient Deterministic Multithreading. Tongping Liu, Charlie Curtsinger and, Emery D. Berger Dept. of Computer Science University of Massachusetts, Amherst Presented by: Lokesh Gidra. Concurrent Programming is hard!. Prone to deadlocks and race conditions. - PowerPoint PPT Presentation
Citation preview
DTHREADS: Efficient Deterministic Multithreading
Tongping Liu, Charlie Curtsinger and, Emery D. Berger
Dept. of Computer ScienceUniversity of Massachusetts, Amherst
Presented by: Lokesh Gidra
Concurrent Programming is hard!
• Prone to deadlocks and race conditions. • Thread interleavings are non-deterministic Hard to
debug! Deterministic Multithreaded System (DMT)
eliminates this non-determinism. Same program with same input same result. Simplifies debugging. Simplifies record and replay (eliminates need to track
memory operations). Multiple replicated execution for fault tolerance.
Contributions
• DTHREADS guarantees deterministic execution.
• Straightforward deployment: replaces libpthread. No recompilation required.
• Eliminates cache-line false sharing (as a side effect).
• Makes printf debugging practical!
Basic Idea
• Isolated memory access between different threads.
• Replace threads with processes.– Replace pthread_create()
with clone system call.– Memory mapped files are
used to share memory (globals and the heap).
Heap
Thread 1 Thread 2
Fence and Global Token
Commit Protocol
Deterministic Synchronization(Global token is the key!)
• Locks– If held by someone else, pass the token.– Release the token only when lock count is 0.
• Condition Variables– Pthread_cond_wait: Remove from token’s Q and
add to variable’s Q.– Pthread_cond_signal: remove first thread in
variable Q and add to token’s Q.
Contd…
• Barriers (similar to condition variable)– If not last to enter: move self from token Q to
barrier Q.– otherwise, move all from barrier Q to token Q.
• Thread Creation– Child: place on token Q; wait for || phase.
• Thread Exit/Cancellation– Remove from Q, call pthread_exit()/kill()
Memory Allocation and OS Support
• Assign sub-heap to each thread using deterministic thread index.
• Superblocks allocated using locks deterministic.
• Intercepts system calls which affect program execution (like sigwait).
• Intercepts read/write system calls: touch pages for COW, to avoid segfault.
Performance
• On 8-core machine with 16GB RAM, 4MB L2.• Benchmarks from PARSEC and Phoenix suites.
For 9 of 14 benchs, dthreads runs nearly as fast or faster than pthreads, while providing determinism.
Scalability
• Scales nearly as well or better than pthreads.• Scales almost always as well or better than
CoreDet.
Limitations
• Incurs substantial overhead for apps with large number of:– short lived transactions.– modified pages per-transaction.
• No control over external non-determinism.• Apps using Ad-hoc synchronization are not supported.• Sharing of stack variables is not supported.• Increases program’s memory footprint.• Will perform poorly if #threads > #cores.
Personal Observations(side-effects on NUMA systems)
• Substantially reduces TLB miss cost:– For 64-bit apps, one TLB miss:• Pthreads: ~1500 cycles• Dthreads: ~500 cycles
• Diff-ing will be too expensive:– 4K as compared to just few cache lines.
Take Away
• Deterministic Multithreaded Systems are good.• Dthreads: an easy to deploy DMT system.• Supports all pthread APIs.• Replaces threads with processes for memory isolation.• Uses twin pages and diff-ing to commit changes.• Avoids cache-line false sharing.• Good for apps with less transactions.– Or, can we say for scalable apps?
• Doesn’t support Ad-hoc synchronization.
Optimizations
• Lazy Commit• Lazy twin creation and diff elimination• Single threaded execution• Lock ownership• Parallelization