Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program...

Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program

Duan Yuelu, Feng Xiaobing, Pen-chung Yew

Outline

Motivation Approach & Implementation Results Related Work Conclusion

Motivation

Programmers develop “low-lock” code for better performance lock is expensive data race are deliberately employed require sequential consistency (SC) model

Such code might fail in relaxed consistency (RC) models E.g. Double Checked Locking (DCL) for lazy

initialized singleton

Example 1 (a)：Lazy initialized singleton

Object::Object() {

this.field = 100;

Object Object::getInstance() {

if (!_instance)

_instance = new Object();

return _instance;

lock(l);

if (!_instance)

unlock(l);

return _instance;

work only for single thread

work for multi-thread, but is expensive...

void Object::useInstance() { Object ins; ins = Object::getInstance(); int f = ins.getField();}

(b): Double Checked Locking for lazy initialized singleton

if (!_instance) {

lock(l);

if (!_instance)

unlock(l);

return _instance;

If the architecture is SC, then it works correctly, with better performance than (a).

But, how about running on RC models that allows write-write reorder?

A possible execution interleave…correct!

if (!_instance) {

lock(l);

if (!_instance) {

temp = malloc(..);

A1: temp->field = 100;

A2: _instance = temp;

unlock(l);

return _instance;

B1: if (!_instance) {…}

B2: read _instance->field;

Initializer Thread (T1) Reader Thread (T2)

Data races are employed, since these accesses are improperly synchronized

But, how about reorder write-write?

if (!_instance) {

lock(l);

if (!_instance) {

temp = malloc(..);

temp->field = 100;

A2: _instance = temp;

A1: temp->field = 100;

B1: if (!_instance) {…}

B2: read _instance->field;

Initializer Thread (T1) Reader Thread (T2)

Get Un-initialized value of instance->field

Violate Sequential Consistency

bug pattern:Potential Violation of Sequential Consistency (PVSC),- since these defects might cause SC violation.

How to detect and eliminate PVSC bugs?- Basically, we combine Shasha/Snir’s conflict graph and delay set theory with existing data race detection scheme.

Outline

our scheme

(1) Construct Race Graph (2) Find cycles in it

A cycle in race graph corresponds to a PVSC bug

(3) Compute delay set (4) Insert memory ordering fences

Constructing Race Graph

For all the instructions that executed in a particular execution of a program P:Add program order edge for instructions in

each thread.Add race edge for each data race.

Thread 1 Thread 2

Race edge

Program order edge

A: wr a

B: wr b

C: rd b

D: rd a

Example 1.

Race Graph for DCL…

lock(l);

if (!_instance) {

temp = malloc(..);

temp->field = 100;

_instance = temp;

unlock(l);

if (!_instance) {…}

read _instance->field;

Find cycles in race graph

Theorem 1. A cycle in race graph corresponds to a PVSC bug.Proof: If a cycle is found in race graph, then it

is possible to get a non-sequential-consistent execution by letting the race order be consistent with the cycle. E.g, we can get a non-SC execution E={B->C, D->A} from the cycle A->B->C->D->A in previous example.

Compute delay set

Delay lemma : Any execution should be consistent with a delay set D. [Shasha/Snir]

Theorem 2. Let D be the delay set which contains all the program order edge of the race cycles in race graph. Then D enforces sequential consistency for the executions that generates D.Proof: Omitted

Insert memory ordering fences

A fence instruction delays the issue of an instruction until all previous instructions completed.

Insert a fence for each delay in D. Then D can be enforced, and, Detected PVSC can be eliminated.

Thread 2Thread 1

Examples for above 3 steps…

Fig. 1 ： No cycles, no PVSC, no fence is needed. (Implies that any execution on RC is sequential consistent, thus we don’t need fences.)

Thread 1 Thread 2 Thread 3

A: a=1

C: b = 1

D: if (b)

B: if (a)

Fig. 2 ： contains a cycle A->B->C->D->E->A, PVSC.It’s possible to get the execution {A->B, C->D,E->A} which violates SC and results in {a=1,b=1, R1=0}.If we insert fences between A and B, C and D, then PVSC is eliminated.

E: R1=a

Initially a = b = 0

Fig. 3: Corrected version of DCL for lazy initialized singleton.

Object getInstance() { Object *tmp = _instance; Fence(); if (!tmp) {

lock(l); tmp = _instance; if (!tmp) tmp = new Object(); Fence(); _instance = tmp; unlock(l);

} return _instance;}

Optimization

To handle real-world applications with Long execution time Many threads

We convert the race graph into PC race graph Combine nodes with same PC into one node.

The graph contains N nodes, where N equals the number of race access instructions.

Adopt SCC algorithm on PC race graph. Each SCC corresponds to a PVSC bug

Can introduce false negatives.

Outline

Results

Detected PVSC bugs Performance loss after fence insertion Cost of PVSC detection over race detection

Part of detected bugsMySQL 5.0.x

sql/slave.c,

handle_slave_io()

Assertion in slave shutdown. mi->slave_running=0 could be visible

toother threads before the cleanup is completed. Thus causes assertion during slave shutdown.

httpd 2.2.x modules/cache/mod_cache.c,

cache_store_content()

store_header() might be visible to other threads before store_body(), thus mod_cache might provide old content despite new content has been fetched.

httpd 2.2.x prefork/prefork.c,

ap_mpm_run()

restart_pending = shutdown_pending = 0; might be visible to child threads after set_singal(), thus if httpd receives SIGTERM, it will be ignored while child processes are being spawned.

Performance loss of SPLASH-2

Figure 10: Performance on Intel Itanium SMP

raytrace

radixN

alized E

xecution T

Non_Fence Compiler Analysis Lock-set Hybrid Happens-before

Cost over data race detection

Figure 13: Cost of PVSC detection over different race detecting algorithm

0. 940. 960. 98

11. 021. 041. 061. 081. 1

esky lu

ction T

Non_PVSC Detection Lock-set Hybrid

Related Work

Compiler Analysis: Conservative for C/C++ programs, insert much redundant fences which hurt performance severely. [K.Yelick@ucb, S.Midkiff@purdue]

Verification: Enumerate all possible executions fit with a RC model. Not scale to large applications. [S.Burckhardt@msr]

Data race detection: Do not concern with the problem of SC violation. [many]

Other concurrency bugs: Atomicity[AVIO,yyzhou], Correlation[MUVI,yyzhou], do not consider the PVSC problem.

Outline

Conclusion

An effective and efficient scheme of detect Potential Violation of Sequential Consistency for concurrent C/C++ programs. Easy to be ported to the matured data race detection tools. Retain the performance after PVSC elimination. Scalable and low-cost.

Current limitation Dynamic data race detection limitations: false positive and false

negative. Can be addressed with the progress in data race detection Loop

Thanks!

Suggestion?

Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program...

Documents

De Cuong Nen Mong Duan 48

Duan Zp Hfrs-20121126

Danhmuc Duan

Chinese Modern (Xiaobing Tang)

Xiaonian Duan

PEDOMAN / PAN DUAN

INSTITUTE OF COMPUTING TECHNOLOGY An Adaptive Task Creation Strategy for Work-Stealing Scheduling Lei Wang, Huimin Cui, Yuelu Duan, Fang Lu, Xiaobing Feng,

Gioithieu duan eu_envn_final_10sec

© 2016 Xiaobing Zhang ALL RIGHTS RESERVED

Duan Slides - George Mason University

Além - Duan Baptista

Thongtin duan

Shih, Thomas and Duan, Ran

Xiaobing Wang Jenifer Piesse - University of …hummedia.manchester.ac.uk/schools/soss/economics/discussionpaper… · Xiaobing Wang Jenifer Piesse ... At its long run equilibrium,

Ba Duan Jin

Le Duan - Giai Phong

MARENKO, DUAN SRDO, GOLUBI** JOIE PEZDIt

Chi Kung - Ba Duan Jin

Distributed Service Architectures Yitao Duan duan@cs.berkeley.edu 03/19/2002

Hongtao Yu Zhaoqing Zhang Xiaobing Feng Wei Huo