Read-Write Lock Allocation in Software Transactional Memory

Read-Write Lock Allocation in Software Transactional

Memory

Amir Ghanbari Bavarsad and Ehsan Atoofian

Lakehead University

Global Clock

Transactional Memory Software transactional memory (STM) exploits a

global clock to validate transactional data Pros: reduces validation overhead Cons: contention

Alternate: Read Write Lock Allocation (RWLA) Pros: no central clock Cons: overhead if a TX aborts

Speculative RWLA: changes validation policy dynamically → Speedup: up to 66%

Outline

Background

Speculative RWLA

Conclusion

Counter in STM

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

Transactional data are validated using: Global clock

Shared variable Timestamp for transactions

Lock Memory is mapped to Lock Table Each entry of the table:

Version #

Validation in STM

Global Clock

Memory

Lock Table

Version #

Updating Global Clock & Lock Increment Global Clock Version # = global_clock Global Clock

Memory

Lock Table

Version #

counter

Validation in STM

rv (read version) is set to global_clock

Metadata for TX1

Global Clock

Successful Read Validation

rv >= version# The most recent write to counter,

occurred before TM_BEGIN()

Metadata for TX1 Global Clock

Failed Read Validation

rv < version# The most recent write to counter,

occurred after TM_BEGIN()

Metadata for TX1 Global Clock

Overhead of Validation

This method, called GV4, results in many cache coherence misses if transactions commit frequently

Global Clock

Outline

Background

Speculative RWLA

Conclusion

Lock Memory is mapped to Lock Table Each entry of the table:

Lock bit Read bits

Read Write Lock Allocation (RWLA)

Lock Table

Memory

P0P1…Pn-1

lock bitRead bits

TM_READ

000000 …..

TM_READ

Set read bit in the corresponding lock

TM_READ()

Lock bit is free?

000000 …..1lock bit

TM_READ

100000 …..

Set read bit in the corresponding lock

TM_READ()

Lock bit is free?

TM_WRITE

All read bits are clear?

000100 …..

TM_WRITE

Acquire lockfailed

100000 …..

TM_WRITE

00000 …..

TM_WRITE

Acquire lockfailed

Experimental Framework

Benchmarks: Stamp v0.9.7 Run up to competition Measured statistics over 10 runs

TL2 as an STM framework

Two Intel Xeon E5660, 6-way CMP

Performance of RWLA

Bayes Kmeans Labyrinth Ssca2 Vacation Genome

2 4 8 16 AVG.

Speculative RWLA Conflict occurs frequently → select GV4 Conflict occurs rarely → select RWLA How to predict conflict?

Contention Predictor

Prediction: y≥0 →predict commit y<0 →predict abort

Update If outcome of current TX and TXi agree/disagree →increment/decrement

1 X1 … Xn

w1w0 wn

niiwxwy

xi: global transaction history, bipolar value

wi: weight vector

Performance of Speculative RWLA # of threads changes between 2 and 16 On average, performance changes from 21% in Bayes to

47% in Labyrinth

Bayes Kmeans Labyrinth Ssca2 Vacation Genome

2 4 8 16 AVG.

Conclusion

RWLA to overcome contentions over global clok

Applications react differently to GV4 and RWLA

Speculative RWLA changes validation policy dynamically

Speculative RWLA performance of STMs up to 66%

Thank You!

Questions?

Read-Write Lock Allocation in Software Transactional Memory

Documents

Performance Tradeoffs in Software Transactional Memory833477/... · 2015. 6. 30. · Hardware Transactional Memory (HTM), Software Transactional Memory (STM) and Hybrid Transactional

Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Transactional SMS Services Hyderabad, Transactional Bulk SMS Packages Hyderabad

Transactional Memoryajayk/TMslide.pdf · 2013. 4. 25. · Transactional Memory Optimistic against con ict Abandon work of one of con icting transactions Mutual Exclusion Lock Pessimistic

Transactional Analysis Presentationeducate.russellsquires.co.uk/wp-content/uploads/2017/10/Transactional... · Eric Berne 1910 - 1970 Transactional Analysis. Transactional Analysis

Transactional analysis

Scalable Lock-Free Dynamic Memory Allocation

Performance Evaluation of Intel® Transactional ...ts.data61.csiro.au/Events/summer/14/lai.pdf · Coarse-Grained Locks •Use a lock to guard shared data –A thread acquires lock,

Contention-Aware Lock Scheduling for Transactional Databasesmozafari/php/data/uploads/pvldb_2018… · Locks are the most commonly used mechanism for ensur-ing consistency when a

Applying Hardware Transactional Memory for Concurrency-Bug ...€¦ · most failure-vulnerable part of a multi-threaded pro-gram, which already contains largely correct lock-based

THE INTERNATIONAL TRANSACTIONAL ANALYSIS ASSOCIATION ITAA ... · THE INTERNATIONAL TRANSACTIONAL ANALYSIS ASSOCIATION ... social position, ... © 2014 International Transactional

CONFERENCE: Omid: Lock-Free Transactional Support for ...nosqlmark.informatik.uni-hamburg.de/sdb2014/res/paper/Omid.pdf · Distributed Data Stores Daniel G´omez Ferro ... code of

Transactional Memory CDA6159. Outline Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional

Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Konrad Lai, Microprocessor

CS510 Concurrent Systems Jonathan Walpole. Transactional Memory: Architectural Support for Lock-Free Data Structures By Maurice Herlihy and J. Eliot B

Transactional Analysis 2- Key concepts in transactional analysis

Ndc2014 시즌 2 : 멀티쓰레드 프로그래밍이 왜 이리 힘드나요? (Lock-free에서 Transactional Memory까지)

Transactional Leadership

Transactional Memory Architectural Support for a Lock-Free Data Structure

RADIR: Lock-free and Wait-free Bandwidth Allocation Models ...srsarangi/files/papers/stpaper.pdf · RADIR: Lock-free and Wait-free Bandwidth Allocation Models for Solid State Drives