Optimizing dynamic dispatch with fine-grained state tracking
Salikh Zakirov, Shigeru Chiba and Etsuya ShibayamaTokyo Institute of TechnologyDept. of Mathematical and Computing Sciences2010-10-18
•code composition technique
Mixin
2
Server
BaseServer
Server
BaseServer
Additional
Security
Additional
Security
Mixin use declaration Mixin semantics
•Temporary change in class hierarchy•Available in Ruby, Python, JavaScript
Dynamic mixin
3
Server
BaseServer
Server
BaseServer
Additional
Security
Dynamic mixin (2)
•Powerful technique of dynamic languages•Enables▫dynamic patching▫dynamic monitoring
•Can be used to implement▫Aspect-oriented programming▫Context-oriented programming
•Widely used in Ruby, Python▫e.g. Object-Relational Mapping
4
Dynamic mixin in Ruby
•Ruby has dynamic mixin▫but only “install”, no “remove” operation
•“remove” can be implemented easily▫23 lines
5
Target application
•Mixin is installed and removed frequently•Application server with dynamic features
6
class BaseServer def process() … endend
class Server < BaseServer def process() if request.isSensitive() Server.class_eval { include AdditionalSecurity } end super # delegate to superclass … # remove mixin endend
module AdditionalSecurity def process() … # security check super # delegate to superclass endend
Overhead is high
Reasons• Invalidation granularity▫clearing whole method cache▫invalidating all inline caches
next calls require full method lookup• Inline caching saves just 1 target▫which changes with mixin operations▫even though mixin operations are mostly
repeated
7
Our research problem
• Improve performance of application which frequently uses dynamic mixin▫Make invalidation granularity smaller▫Make dynamic dispatch target cacheable in
presence of dynamic mixin operations
8
Proposal
•Reduce granularity of inline cache invalidation▫Fine-grained state tracking
•Cache multiple dispatch targets▫Polymorphic inline caching
•Enable cache reuse on repeated mixin installation and removal▫Alternate caching
9
Basics: Inline caching
10
ic method
cat.speak()
class
consider a call site
cat.speak()
(executable code)
method = lookup(cat, ”speak”)method(cat)
Dynamic dispatch implementation
if (cat has type ic.class) { ic.method(cat)} else { ic.method = lookup(cat, ”speak”) ic.class = cat.class ic.method(cat)}
Inline caching
Expensive!But the result is mostly the
same
Cat
Animal
subclass
cat
instance
speak() { … }
methodimplementation
speak
Cat
Inline caching: problem
11
ic method
cat.speak()
class if (cat has type ic.class) { ic.method(cat)} else { ic.method = lookup(cat, ”speak”) ic.class = cat.class ic.method(cat)}
Inline cachingCat
Animal
cat
instance
Trainingspeak() { … }
speak(){ … }
speak
Cat
•What if the method has been overridden?
Inline caching: invalidation
12
ic method
cat.speak()
classCat
Animal
cat
instance
Trainingspeak() { … }
speak(){ … }
speak
Cat
if (cat has type ic.class && state == ic.state) { ic.method(cat)} else { ic.method = lookup(cat, ”speak”) ic.class = cat.class; ic.state = state ic.method(cat)}
1 Global state
state1
speak
2
2
Single global state object• too coarse invalidation granularity
Fine-grained state tracking•Many state objects▫small invalidation extent▫share as much as possible
•One state object for each family of methods called from the same call site
•State objects associated with lookup path▫ links updated during method lookups
• Invariant▫Any change that may affect method dispatch
must also trigger change of associated state object
13
method
class
pstate
speak *1*
State object allocation
14
speak() { *1* }
Animal
Cat1
speak
ic Noimplemmentation
here
if (cat has type ic.class && ic.pstate.state == ic.state ) { ic.method(cat)} else { ic.method, ic.pstate = lookup(cat, ”speak”, ic.pstate) ic.class = cat.class; ic.state = state method(cat)} inline caching code
1
cat.speak()
state1
Cat
speak() { *1* }
Animal
Cat
speak
ic method
class
pstate
cat.speak()
state
speak *1*speak *2*
112
Mixin installation
15
1Training
speak() { *2* }22
Cat
if (cat has type ic.class && ic.pstate.state == ic.state ) { ic.method(cat)} else { ic.method, ic.pstate = lookup(cat, ”speak”, ic.pstate) ic.class = cat.class; ic.state = state method(cat)} inline caching code
Training
speak() { *2* } Cat
speak
speak() { *1* }
Animal
pstate
if (cat has type ic.class && ic.pstate.state == ic.state ) { ic.method(cat)} else { ic.method, ic.pstate = lookup(cat, ”speak”, ic.pstate) ic.class = cat.class; ic.state = state method(cat)} inline caching code
method
class
cat.speak()
state2
speak *2*
23
speak *1*
3
Mixin removal
16
32ic
Cat
speak() { *1* }
Animal
Cat
speak
Training
speak() { *2* }
method
pstate
state
•Detect repetition•Conflicts detected by
state check
speak *1*speak *2*
34
Alternate caching
17
A
34
super Animal
alternate cache
speak
…
34
Training
ic
class
cat.speak()
Cat
Inline cache contents oscillates
speak() { *1* }
Animal
Cat
speak
Training
speak() { *2* }
method
class
pstate
state
•Use multiple entries in inline cache
Polymorphic caching
18
4ic 3
super Animal
alternate cache
speak
…
34
Training
cat.speak()
Cat Cat
*1* *2*
3 4
Cat
speak
Training
speak() { *2* }
speak() { *1* }
Animal
State object merge
19
executablecode
cat.speak()S
Overridden by
One-time invalidation
animal.speak()
cat
instance
animal
instance
while(true) {
remove mixin}
Overheads of proposed scheme
• Increased memory use▫1 state object per polymorphic method family▫additional method entries▫alternate cache▫polymorphic inline cache entries
• Some operations become slower▫Lookup needs to track and update state
objects▫Explicit state object checks on method
dispatch
20
Generalizations (beyond Ruby)
•Delegation object model▫track arbitrary delegation pointer change
•Thread-local delegation▫allow for thread-local modification of
delegation pointer▫by having thread-local state object values
•Details in the article…
21
Evaluation
• Implementation based on Ruby 1.9.2•Hardware▫Intel Core i7 860 2.8 GHz
22
Evaluation: microbenchmarks
•Single method call overhead▫Inline cache hit
state checks 1% polymorphic inline caching 49% overhead
▫Full lookup 2x slowdown
23
Dynamic mixin-heavy microbenchmark
base method cache state checks fgst fgst+PIC+altern
100%
23%17% 15%
Normalized execution time
24
(smaller is better)
Evaluation: application
•Application server with dynamic mixin on each request
25
baseline method cache state
checks
fgst fgst + PIC fgst + PIC + altern
100%
70%58% 60%
52%
Normalized execution time(smaller is better)
Evaluation
•Fine-grained state tracking considerably reduces overhead
•Alternate caching brings only small improvement▫Number of call sites affected by mixin is low▫Lookup cost / inline cache hit cost is low
about 1.6x on Ruby
26
Related work
•Dependency tracking in Self▫focused on reducing recompilation, rather
than reducing method lookups• Inline caching for Objective-C▫state object associated with method, no
dynamic mixin support
27
Conclusion
•We proposed combination of techniques▫Fine-grained state tracking▫Alternate caching▫Polymorphic inline caching
•To increase efficiency of inline caching▫with frequent dynamic mixin installation
and removal
28
Thank you for your attention
29
Method caching in Ruby
•Global hashtable▫indexed by method name and class
•On method lookup▫gives answer in 1 hash lookup
•On miss▫answer obtained by recursive lookup▫result stored in method cache
•On method redefinition or mixin operation▫method cache cleared completely
30