Upload
others
View
39
Download
0
Embed Size (px)
Citation preview
Horcrux:Automatic JavaScript Parallelism for
Resource-Efficient Web Computations
Shaghayegh Mardani1, Ayush Goel2, Ronny Ko3, Harsha Madhyastha2, Ravi Netravali41UCLA, 2University of Michigan, 3Harvard University, 4Princeton University
1
Modern Web Browsing
2
Performance MattersWeb Traffic is Dominated by Mobile
Users
53% of visits are abandoned if a
mobile site takes longer than 3
seconds to load.Source: thinkwithgoogle.com
Content Providers
Source: bluecorona.com
If your site makes $100,000/day, 1 sec
improvement in page load increases revenue
by $7,000 daily.
The Problem: Computation Delays
• Evaluation setup
• With NO network delays:
3
Developed Region Emerging Region
• 582 pages in US• Google Pixel 3
• 2.0 GHz octa-core
• 91 pages in Pakistan• Redmi 6A
• 2.0 GHz quad-core
Performance metrics:• Page Load Time (PLT)• Speed Index (SI)
% of pages with load times > 3
Why Are Computation Delays Significant?
4
• 80% more JavaScript over the last 5 years
Mobile DevicesJavaScript
primary compute improvements
=more cores
52%
More cores != Better performance
5
Reason: Single-thread execution Solution: Parallelizing JavaScript
Parallelization Opportunities
• Web workers are widely supported by browsers• Constrained APIs:• No access to DOM APIs• No access to the main global state
• Legacy pages are highly amenable to safe parallelization
6
Pass-by-value
DOM APIs
Main Thread
JS Heap
JavaScript Engine
# of cores Speedup in JS Runtimes
2 cores 49%
4 cores 75%
8 cores 87%
Web Worker
Challenges
7
Conservative Signatures
C1: Ensuring CorrectnessThe exact state accessed by parallelized JavaScript- Despite non-determinism and across all possible control paths
Signature Generation
8
Dynamic Instrumentation
Instrumented
f(x)
Conservative Signatures
Original Page
Headless Browser
Concolic Engine
concrete input
JS codeto execute
f(x)
g(x)
h(x)
f´(x)
g´(x)
h´(x)
g(x) h(x)
if (x == 5) {y = x + 5;
} else {z = 9;
}
{"reads":[x],"writes":[y, z]
}Server-side and Offline
Conservative signaturesDynamic analysis: track read/writes to page state
Concolic execution: explore all possible control paths
Challenges
9
Client-sideParallelization
Scheduler
Conservative Signatures
C1: Ensuring CorrectnessThe exact state accessed by parallelized JavaScript- Despite non-determinism and across all possible control paths
C2: Web Workers Constrained APIs- How to parallelize execution with constrained APIs?
Horcrux Dynamic Scheduler
• Runs in the main browser thread• Only task: manage offloads in event-driven mode
10Main Thread
Scheduler
Page State
JS Heap
DOMQueue
f(x) +
g(x) +
h(x) +
Web Worker
Horcrux Dynamic Scheduler
• Runs in the main browser thread• Only task: manage offloads in event-driven mode
11Main Thread
Scheduler
Page State
JS Heap
DOMQueue
g(x) +
h(x) +
Web Worker
no state dependenciesx = "foo"
y = 5
f(x) reads & writes to x, yf(x) +
Horcrux Dynamic Scheduler
• Runs in the main browser thread• Only task: manage offloads in event-driven mode
12Main Thread
Scheduler
Page State
JS Heap
DOMQueue
g(x) +
h(x) +
Web Worker
f(x) +
x = "foo"y = 5
Horcrux Dynamic Scheduler
• Runs in the main browser thread• Only task: manage offloads in event-driven mode
13Main Thread
Scheduler
Page State
JS Heap
DOMQueue
g(x) +
h(x) +
Web Worker
Initializestate
x = "foo"y = 5
Executef(x)
Horcrux Dynamic Scheduler
• Runs in the main browser thread• Only task: manage offloads in event-driven mode
14Main Thread
Scheduler
Page State
JS Heap
DOMQueue
h(x) +
Web Worker
Initializestate
x = "foo"y = 5
Executef(x)
g(x) reads y
f(x) reads & writes to x, y
state dependency
g(x) +
Horcrux Dynamic Scheduler
• Runs in the main browser thread• Only task: manage offloads in event-driven mode
15Main Thread
Scheduler
Page State
JS Heap
DOMQueue
h(x) +
Web Worker
Initializestate
x = "foo"y = 5
Executef(x)
g(x)+
Horcrux Dynamic Scheduler
• Runs in the main browser thread• Only task: manage offloads in event-driven mode
16Main Thread
Scheduler
Page State
JS Heap
DOMQueue
h(x) +
Web Worker
Initializestate
x = "foo"y = 5
Executef(x)
h(x) reads z
g(x)+
no state dependenciesz = "bar"
Horcrux Dynamic Scheduler
• Runs in the main browser thread• Only task: manage offloads in event-driven mode
17Main Thread
Scheduler
Page State
JS Heap
DOMQueue
Web Worker
Initializestate
x = "foo"y = 5
Executef(x)
g(x)+
h(x) +
z = "bar"
Horcrux Dynamic Scheduler
• Runs in the main browser thread• Only task: manage offloads in event-driven mode
18Main Thread
Scheduler
Page State
JS Heap
DOMQueue
Web Worker
Initializestate
x = "foo"y = 5
Executef(x)
g(x)+
Initializestate
Executeh(x)
z = "bar"
Horcrux Dynamic Scheduler
• Runs in the main browser thread• Only task: manage offloads in event-driven mode
19Main Thread
Scheduler
Page State
JS Heap
DOMQueue
Web Worker
Initializestate
x = "foo"y = 5
Executef(x)
g(x)+
Initializestate
Executeh(x)
z = "bar"
Pauses the execution
getEle
mentBy
Id()
Challenges
20
Client-sideParallelization
Scheduler
Conservative Signatures
C1: Ensuring CorrectnessThe exact state accessed by parallelized JavaScript- Despite non-determinism and across all possible control paths
C2: Web Workers Constrained APIs- How to parallelize execution with constrained APIs?
Root-function Granularity
C3: Offloading overheads- Pass-by-value I/O can take ~0-10 ms
Offloading Granularity
• Trade-off: parallelization benefits vs. offloading overheads• Solution: offloading root-function invocations
21
A()
B()
C()
D()
<script>function A() {
…B(); // invokes C()D();…
}A();
</script>
Original Page Root-function
4x fewer offloads and 73% of parallelization
benefits
Evaluation Questions
• Impact on browser computation delays?• Impact on end-to-end performance?• Horcrux comparison to prior compute optimizations?• What do conservative signatures forgo?• How much are the server-side overheads?
22
Computation Delay Reductions
• Total Computation Time (TCT)
23
0
20
40
60
80
Developed Emerging
% Im
prov
emen
t in
TC
T
Network: WiFi LTEMedian Improvements:
Developed: WiFi (41%), LTE (34%)
Emerging: WiFi (44%), LTE (31%)
End-to-end Performance Improvements
• Page Load Time (PLT) and Speed Index (SI)
24
0.00
0.25
0.50
0.75
1.00
0 25 50 75% Improvement
CD
F
LTE, PLTLTE, SIWiFi, PLTWiFi, SI
100
27%
Developed Region
0.00
0.25
0.50
0.75
1.00
0 25 50 75% Improvement
CD
F
LTE, PLTLTE, SIWiFi, PLTWiFi, SI
100
29%
Emerging Region
Conclusion
• More cores != Better performance• Horcrux automatically parallelizes JavaScript execution using concolic
execution to take advantage of phones multi-core CPUs
25
github.com/ShaghayeghMrdn/horcrux-osdi21