26
Speculative Region- based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer Sciences

Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

Embed Size (px)

Citation preview

Page 1: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

Speculative Region-based Memory Management for

Big Data Systems

Khanh Nguyen, Lu Fang, Harry Xu, Brian DemskyDonald Bren School of Information and Computer Sciences

Page 2: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

2

BIG DATA

Page 3: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

3

BIG DATA

Scalability JVM crashes

due to OutOfMemory error at early stage

Page 4: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

4

A moderate-size application on Giraph with 1GB input data can easily run out of memory on a 12 GB heap [Bu et al, ISMM’13]

Page 5: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

5

BIG DATA

Scalability JVM crashes

due to OutOfMemory error at early stage

Management costGC time accounts for

up to 50% of the execution time

[Bu et al, ISMM’13]

High cost of the managed runtime is a fundamental problem!

Page 6: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

6

Existing Work

• Facade [Nguyen et al, ASPLOS’15]

• Broom [Gog et al, HotOS’15]

This work: Purely dynamic technique

Huge manual effort from developers

Page 7: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

7

Control Path vs. Data Path

Pipeline construction

Job state management

Perform optimization

Process the actual data

Code size is small (36%)

Create most of the runtime objects (95%)

Pipeline construction

Job state management

Perform optimization

[Bu et al, ISMM’13]

Page 8: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

8

Execution Pattern

• Data-processing functions are iteration-based

• Each iteration processes a distinct data partition

• Iterations are well-defined

Page 9: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

9

public interface GraphChiProgram <VertexDataType, EdgeDataType> {

public void update(ChiVertex<VertexDataType, EdgeDataType> vertex, GraphChiContext context);

public void beginIteration(GraphChiContext ctx); public void endIteration(GraphChiContext ctx);

public void beginInterval(GraphChiContext ctx, VertexInterval interval); public void endInterval(GraphChiContext ctx, VertexInterval interval);

public void beginSubInterval(GraphChiContext ctx, VertexInterval interval); public void endSubInterval(GraphChiContext ctx, VertexInterval interval);

}

GraphChi [Kyora et al, OSDI’12]

Page 10: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

10

Weak Iteration Hypothesis

• Data objects do not escape iteration boundaries– GC run in the middle is wasted

• Control objects do escape iteration boundaries

PageRank – Twitter graph

5% 181 MILLIONS OBJECTS

Page 11: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

11

Region-based Memory Management

• Region definition• Management:

– Allocation– Deallocation

Page 12: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

12

Advantages

• Low overheads • Improved data locality• More flexible than stack allocation • No GC burden

Page 13: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

13

Challenges

• Escaping control objects• Developers are responsible for semantic

correctness

Precise objects lifetime required!

Facadeannotation & refactoring

Broomspecialized API

static analyses?

Page 14: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

14

Proposed Solution

Speculative Region Allocation

annotate iteration boundary: - iteration_start - iteration_end

Algorithms to guarantee program’s correctness automatically

Page 15: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

15

Observations

• nested

• executed by multiple threads

iteration_ID, thread_ID

Iterations

Page 16: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

16

Region Semi-lattice

T,*

1,t1

2,t1

3,t1

1,t2

2,t2

3,t2

heap

region

JOIN OPERATOR

GC never touches regions

void main() {

} //end of main

iteration_startfor( ) {

}iteration_end

iteration_start for( ) {

} iteration_end

iteration_start for( ) {

} iteration_end

Page 17: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

17

Speculative Region Allocationiteration_start

iteration_start

iteration_end

iteration_end

Parent

Child

Page 18: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

18

Components of Our Approach

• Speculative region allocation• Track inter-region references

– Update boundary set

• Recycle regions – Boundary set promotion

Page 19: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

19

Remember Inter-Region References: Case 1

ba

a.f = b

b

x,tiy,ti

boundary set

Page 20: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

20

Remember Inter-Region References: Case 2

c

a = b.fx,tiy,tj

c

boundary set

bf

Page 21: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

21

Region Recycling AlgorithmT,*

1,t1

2,t1

3,t1

1,t2

2,t2

3,t2

3,t1

boundary set

JOIN( , ) = 1,t1 2,t1 1,t1

Page 22: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

22

Region Recycling AlgorithmT,*

1,t1

2,t1

3,t1

1,t2

2,t2

3,t2

3,t1

JOIN( , ) = 2,t1 2,t2 T,*

boundary set

Page 23: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

23

Region Recycling AlgorithmT,*

1,t1

2,t1

3,t1

1,t2

2,t2

3,t2

3,t1

boundary set

Page 24: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

24

Handling of Intricacies

• Escape via the stack• Data-race-free object relocation

Details are in the paper

Page 25: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

25

Conclusions

• Goal: Reduce user’s effort• Solution: Speculative region allocation

– The cost of object promotion is considerable• Can be reduced by adaptively allocating objects:

feedback-directed allocation policy

• Status: In the process of implementing & evaluating in the OpenJDK

Page 26: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer

26

Thank you!