24
Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Embed Size (px)

Citation preview

Page 1: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Ditto:Speeding Up Runtime Data Structure Invariant ChecksAJ Shankar and Ras BodikUC Berkeley

Page 2: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Motivation: A Debugging Scenario

Buggy program: a large-scale web application in Java

Primary data structure: hashMap of shopping carts

Carts are modified throughout code Bug: hashMap acting weird: carts disappearing,

etc. Hypothesis: cart modification violates

hashCode() invariance

Page 3: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

How to Check the Hypothesis?

Debugger facilities inadequate Idea: write a runtime check

Iterates over buckets, checks hashCode() of each cart in bucket

Run check frequently to pinpoint error

Page 4: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Problem

The check is slow! (100x slowdown) Rerunning the program is now a problem

Furthermore, what if bug isn’t reproducible? Run the program with the check on entire test

suite? Infeasible.

Page 5: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Our Tool: Ditto

Ditto speeds up data structure invariant checks Usually asymptotically in size of data structure Hash table: 10x speedup at 1600 elements

What invariant checks can Ditto handle? Side-effect-free: cannot return fresh mutable

objects Recursive: not an inherent limitation of

algorithm

Page 6: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Basic Observation: Incrementalize

Invariant checks the entire data structure … … but once checked, a local change can be (re)checked locally! So, first establish invariant, then incrementally check changes

…“Hash code of each cart in

table corresponds to containing bucket.”

Page 7: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

A New Domain

Existing incrementalizers: general purpose but not automatic [Acar PLDI 2006] User must annotate the program For functional programs Other caveats (conversion to CPS, etc.)

Ditto is automatic in this domain Functional invariant checks in an imperative Java

setting No user annotations Allows arbitrary heap updates outside the invariant A simple bytecode-to-bytecode implementation

Page 8: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Ditto Algorithm Overview

1. First run of check: construct graph of the computation

Stores function calls, concrete inputs

2. Track changes to computation inputs 3. Subsequent runs of check: rerun only

subcomputations with changed inputs Incrementally update computation graph =

incrementally compute invariant check

Page 9: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Example Invariant Check

Ensures a tree is locally ordered

boolean isOrdered(Tree t) { if (t == null) return true; if (t.left != null && t.left .value >= t.value) return

false; if (t.right != null && t.right.value <= t.value)

return false; return isOrdered(t.left) && isOrdered(t.right);}

Page 10: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

1. Constructing a Computation Graph

Purpose of computation graph: 1. For unchanged parts of data structure, reuse

existing results2. For changed parts, identify parts of check that

need to be rerun

Graph stores the initial check run: Node = function invocation, along with its

Concrete formal arguments Concrete heap accesses Return value

Same inputs = can reuse return val

Changed inputs = must rerun

Inputs

Page 11: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

1. Constructing a Computation Graph

A

P

C

The Heap Node created

with concrete

formal arg A

Calls children

Heap reads from a.value, a.left, a.right,

a.left.value, a.right.value are

remembered

Returns true

During first check run, by instrumentation

isOrdered(P)

isOrdered(A)

isOrdered(B) isOrdered(C)

B

Page 12: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

2. Detecting Changed Inputs

Inputs to check that could change between runs: Arguments – easy to detect (passed to the check) Heap values – harder (could be modified anywhere in

code) Selective write barriers

Statically determine which fields are read in the check Barriers collect changed heap inputs used by check

In example: add write barriers for all writes into fields: Tree.left Tree.right Tree.value

if (t == null) return true;if (t.left != null && t.left.value >= t.value) return false;if (t.right != null && t.right.value <= t.value) return false;return isOrdered(t.left) && isOrdered(t.right);

Page 13: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

……

B

3. Rerunning the Invariant isOrdered() Data structure modification: Add node N, remove node F

A

D

F

G

……

C

…E

……

N

A

D

F

G

…C

…E

B…

Page 14: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

3. Rerunning the Invariant

Goal: Incrementally update computation graph Graph must look as if check was run afresh

Tree With New Modifications

……

N

A

D

F

G

…C

…E

B…

Computation Graph From Last Run

isOrdered(A)…

……

B

A

D

F

G

……

C

E

true

Write barriers say…

Page 15: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

3. Rerunning the Invariant

isOrdered(A) is first node that needs to be rerun Parent inputs haven’t changed (functions are side-

effect-free) Rerunning exposes new node N What happens at isOrdered(B)?

……

N

A

D

F

G

…C

…E

B…

……

B

A

D

F

G

…C

E

true

N

Page 16: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

3. Rerunning the Invariant

isOrdered(B) has same formal args, heap inputs We’d like to reuse its previous result

And end this subcomputation Problem: isOrdered(B) also depends on return values of

its callees Which might change, since isOrdered(D) will be rerun So we can’t be sure isOrdered(B)’s result will be the same!

……

N

A

D

F

G

…C

…E

B…

……

B

A

D

F

G

…C

E

true

N

Page 17: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Optimistic Memoization

Don’t want to rerun all nodes between B and D

Solution: we optimistically assume that isOrdered(B) will return the same result Invariant checks generally do! (e.g. “success”)

Check assumption when we rerun isOrdered(D)

For now, reuse previous result, finish up A A returns previous result (true), so finished here

N

……

B

A

D

F

G

…C

E

Page 18: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

3. Rerunning the Invariant Now we rerun isOrdered(D) Reuse previous result of isOrdered(E), (G)

No further changes so no need for optimism isOrdered(F) pruned from graph

isOrdered(D) returns previous result (true) So optimistic assumption was correct Computations around isOrdered(A) all correct

N

……

B

A

D

F

G

…C

E

Page 19: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

false

What If isOrdered(D) Returned false? Result propagated up graph

Continues as long as return val differs

In this case, root node of graph is reached Result for entire computation is changed

Automatically corrects optimistic assumptions

……

D

G

E…

N

B

A

false

false

false

false

false

false

Page 20: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Result of Algorithm

We’ve incrementally updated computation graph to reflect updated data structure Even with circular dependencies throughout graph, only reran 3

nodes Result of computation is result of root node (true) Graph is ready for next incremental update

……

N

A

D

F

G

…C

…E

B…

……

B

A

D

G

…C

E

true

N

Page 21: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Evaluation

Ran on a number of common data structure invariants, two real-world examples

Most complex invariant: red-black trees Tree is globally ordered Same # of black nodes to leaf Other RB properties (Black follows Red, etc.) We were unable to incrementalize this check

by hand!

Page 22: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Kernel Results

Ordered list performance

0

200

400

600

800

1000

1200

1400

0 500 1000 1500 2000 2500 3000

Data structure size

Tim

e (m

s)

No invariants

With Ditto

Invariants

Hash table performance

0

500

1000

1500

2000

2500

3000

3500

0 500 1000 1500 2000 2500 3000

Data structure size

Tim

e (m

s)

No invariants

With Ditto

Invariants

Red-black tree performance

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 500 1000 1500 2000 2500 3000

Data structure size

Tim

e (m

s)

No invariants

With Ditto

Invariants

Page 23: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Real-world Examples

Tetris-like game Netcols Invariant: no “floating” jewels in grid With check, main event loop ran at 80ms, noticeably

laggy Result: event loop to 15ms with Ditto

JavaScript obfuscator Invariant: no excluded keywords (based on a set of

criteria) in renaming mapJSO performance

0

5000

10000

15000

20000

25000

0 5000 10000 15000

Lines of JavaScript

Tim

e (m

s)

No invariants

With Ditto

Invariants

Page 24: Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

Summary

Results: Automatic incrementalization made practical For checks in Java programs Data structure checks viable for development

environment

Made possible by Selection of an interesting domain Optimistic memoization

Web: http://www.cs.berkeley.edu/~aj/cs/ditto/