Upload
clive
View
52
Download
0
Embed Size (px)
DESCRIPTION
Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software Transactional Memory. Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, Scott Mahlke Advanced Computer Architecture Lab. University of Michigan. Multicore Architectures. Industry wide move to multicore - PowerPoint PPT Presentation
Citation preview
University of MichiganElectrical Engineering and Computer Science1
Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost
Software Transactional Memory
Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu,
Scott Mahlke
Advanced Computer Architecture Lab.University of Michigan
University of MichiganElectrical Engineering and Computer Science
Multicore Architectures
• Industry wide move to multicore– Higher throughput– More power efficient
• Great for parallel programs• Sequential see little benefit
2
Intel 4 Core Nehalem
AMD 4 Core Shanghai Sun Niagara 2 IBM Cell
University of MichiganElectrical Engineering and Computer Science
[Zhong ‘08]
3
Loop Parallelization
i = 0-39 i = 20-39i = 0-19
No cross-iteration register or memory dependences
Core 1Core 0
Parallelizable loop
Bad news: limited number of parallel loops in general purpose applications
University of MichiganElectrical Engineering and Computer Science
Loop Parallelization
4
SPECfp
[Zhong ‘08]
University of MichiganElectrical Engineering and Computer Science5
Speculative Loop Parallelization
i = 0-39
Pointer?
i = 10-19Pointer?
i = 30-39Pointer?
i = 0-9Pointer?
i = 20-29Pointer?
Core 1Core 0
Loop Chunk
Speculatively parallelizable loop
Memory address isunresolvable statically
University of MichiganElectrical Engineering and Computer Science
Speculative Loop Parallelization
6
University of MichiganElectrical Engineering and Computer Science
Supporting Thread Level Speculation
• Execution of speculative loops requires– Conflict detection– Rollback mechanism
• Speculation can be supported by transactional memory– Software is slow– Hardware needs complex structures
• Previous TLS works require hardware– Hydra [Hammond ‘98], Stampede [Steffan ‘98], POSH [Liu ‘06]
7
University of MichiganElectrical Engineering and Computer Science
Objectives
• Challenge– Can we get speedup supporting speculative loop
parallelization without additional hardware?• Build a specialized software system
– Provide functionality needed for speculation with software transactional memory
– Leverage existing loop parallelization framework from [Zhong ‘08]
– Tightly couple STM with compiler to ensure low overhead
8
University of MichiganElectrical Engineering and Computer Science
Traditional STM Execution Flow
9
Execute TX TX Commit Writeback WrSet to Memory
Execution Transaction
Start TX End TXWrSetRdSet Consistency
Check
Abort Commit
High Overhead:Validating RdSet
High Overhead:Global Locking
University of MichiganElectrical Engineering and Computer Science
Ordering Transaction Commit
• TMs typically have no way of controlling commit order
• Loop iterations must commit in original order– Ensures proper rollback
• Requires centralized control to enforce ordering
10
TX 3
TX 1
Core 0
TX 4
TX 2
Core 1
i = 10-19
i = 30-39
i = 0-9
i = 20-29
University of MichiganElectrical Engineering and Computer Science
STMlite
• Dedicated thread to control commits– Called the Transaction Commit Manager (TCM)– Performs consistency checks for all transactions– Provides point to easily enforce in-order commit
• Bloom-filter based signatures– Hash read and write sets– Similar technique used by HTMs like Bulk [Ceze ‘06]– Low-cost consistency checks during commit
11
University of MichiganElectrical Engineering and Computer Science
Bloom-Filter Based Signatures
• Constant time insertion and find• Linear time intersection (bitwise OR)
12
Decode
Signature(Bit array)
Address 101 010
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 1
101 100
11
University of MichiganElectrical Engineering and Computer Science
STMlite Execution Flow
13
Execute TX TX Commit Writeback WrSet to Memory
Execution Transaction
Start TX End TXWrSetRdSet Consistency
Check
Abort Commit
WrSigRdSig
Transaction Commit Manager
(TCM)
Wait for Ready
Flag Ready
Ready
Consistency Check
Abort Commit
University of MichiganElectrical Engineering and Computer Science
Experimental Setup
• Implemented framework in LLVM Compiler• Benchmarks
– Stanford STAMP transactional benchmarks– SPECfp benchmarks
• Run on Sunfire T2000– 8-core UltraSPARC T1 processor
• Baseline STM is Sun’s TL2 [Dice ‘06]
14
University of MichiganElectrical Engineering and Computer Science
STAMP Benchmarks
15
University of MichiganElectrical Engineering and Computer Science
SPECfp Benchmarks
16
University of MichiganElectrical Engineering and Computer Science
Conclusion
• STMlite– Customized for speculative loop parallelization– Transaction commit ordering– Centralized consistency checks– Hashing read/write sets with signatures
• Parallelization of sequential applications is feasible on commodity hardware– Removes much of the slowdown traditionally
associated with STM
17
University of MichiganElectrical Engineering and Computer Science
Thank You!
Questions?
18
University of MichiganElectrical Engineering and Computer Science
• Stale entries periodically removed from commit log
Transaction Execution and Commit
19
Transaction Commit Manager (TCM)
Transaction
RdSig Commit LogWrSigWrSigWrSig
End End End
Start WrSigWrSigEnd
Executing Waiting
Ready
Waiting Checking
End
Consistent?Consistent?Consistent?
Commit
WaitingWriteback