Upload
huy
View
32
Download
0
Embed Size (px)
DESCRIPTION
Stephen Curial - Xymbiant Systems Inc . Peng Zhao - Intel Corporation J. Nelson Amaral - University of Alberta Yaoqing Gao, Shimin Cui, Raul Silvera, Roch Archambault - IBM Toronto Software Laboratory. MPADS: Memory-Pooling-Assisted Data Splitting. FROM SUN MICROSYSTEMS. Goal. What: - PowerPoint PPT Presentation
Citation preview
Faculty of Computer Science
José Nelson Amaral © 2008
MPADS: Memory-Pooling-Assisted Data SplittingStephen Curial - Xymbiant Systems Inc.Peng Zhao - Intel CorporationJ. Nelson Amaral - University of AlbertaYaoqing Gao, Shimin Cui, Raul Silvera, Roch Archambault - IBM Toronto Software Laboratory
FROM SUN MICROSYSTEMS
© 2006
Department of Computing Science
ISMM 2008
Goal
What:
– Improve spatial locality
Where:
– Linked-based data structures
How:
– Pooling similar structures together
– Grouping same fields from multiple objects together
© 2006
Department of Computing Science
ISMM 2008
Goal (cont.)
Why:
– Because we can
– Allow easy-to-write, easy-to-read, easy-to-maintain code to improve performance
What compiler:
– IBM XL compiler suite
Limitation:
– Needs more precise pointer analysis to benefit from more opportunities
© 2006
Department of Computing Science
ISMM 2008
Most Relevant Earlier Work
Pool Allocation
– Lattner and Adve (CGO 04, PLDI 05)
Reference Affinity
– Zhong, Orlovich, Shen, Ding (PLDI 04)
– Rabbah and Palem (TECS 03)
Array Reshaping
– Zhao, Cui, Gao, Silvera, Amaral (TOPLAS 07)
© 2006
Department of Computing Science
ISMM 2008
A refreshing outcome
“MPADS is not the first implementation of the
combination of memory pools and splitting of
pointer-based data structures.”
“MPADS is still not delivering its full
potential on standard benchmarks in the
IBM XL compiler.”
Reviewer’s Comment:
“The technique only worked for Olden, and did nothing for
SPECcpu2000 (but the authors get bonus points for being honest
about that.)”
© 2006
Department of Computing Science
ISMM 2008
The Cost of Programming Productivity
Easy-to-read and easy-to-maintain code often
results in lower runtime performance.
StudentClass University
© 2006
Department of Computing Science
ISMM 2008
The Cost of Programming Productivity
Abstraction
Inheritance
StudentProfessor Support Staff
Person
© 2006
Department of Computing Science
ISMM 2008
The Cost of Programming Productivity
Data Encapsulation
Person
Date of BirthAddress
Driver Lic.
Citizenship
Name
Gender
Student
FacultyDate of Adm
DepartmentProgram
Univ. ID
Classes Enr.Grades
© 2006
Department of Computing Science
ISMM 2008
A possible data layout
FacultyDate of Adm
DepartmentProgram
Univ. ID
Classes Enr.Grades
Student:
1 byte4 bytes
1 byte2 bytes
4 bytes
4 bytes4 bytes4 bytes
Date of BirthAddress
Driver Lic.Gender
Name
Citizenship
Person:
4 bytes32 bytes
3 bytes1 byte
32 bytes
16 bytes
© 2006
Department of Computing Science
ISMM 2008
Data in Memory
Mem
ory
Add
ress
0 1 2 3 4 5 6 7
Univ. ID Date of Adm.
Fa. De Progr. Classes Enr.
Grades
Univ. ID Date of Adm.
Univ. ID Date of Adm.
Fa. De Progr. Classes Enr.
Grades
0
8
16
24
32
40
48
Mem
ory
Add
ress
0 1 2 3 4 5 6 7
Name
Date of Birth
Address
Dr. Lic. Ge
Citizenship
8000
8008
8016
8024
8032
8040
8048
8056
8064
8072
8080
© 2006
Department of Computing Science
ISMM 2008
Assume a Cache Organization
POWER5 Cache Organization
– L1 Data Cache: 32 Kbytes, 128-byte cache lines
– L2 Cache: 1.44 Mbytes, 128-byte cache lines
– L3 Cache: 32 Mbytes, 512-byte cache lines
© 2006
Department of Computing Science
ISMM 2008
Cache OrganizationBytes
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127
0
1
•••
255
Cac
he L
ines
2
3
4
5
© 2006
Department of Computing Science
ISMM 2008
Example: A search through the data structuresBytes
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127
0
1
•••
255
Cac
he L
ines
2
3
4
5
How many Computing Science students are younger
than 23 year old?
Univ.ID Adm. F. D. Prg •Class. Grades Univ.ID Adm. F. D. Prg Class.
© 2006
Department of Computing Science
ISMM 2008
Example: A search through the data structuresBytes
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127
0
1
•••
255
Cac
he L
ines
2
3
4
5
Student structure: For every 24 bytes loaded, reads
either 1 or 5.
Univ.ID Adm. F. D. Prg •Class. Grades Univ.ID Adm. F. D. Prg Class.
© 2006
Department of Computing Science
ISMM 2008
Example: A search through the data structuresBytes
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127
0
1
•••
255
Cac
he L
ines
2
3
4
5
Univ.ID Adm. F. D. Prg •Class. Grades Univ.ID Adm. F. D. Prg Class.
Name DofB G Citizens. Address DL.
0 32 64 68 72 ••• 127
© 2006
Department of Computing Science
ISMM 2008
Example: A search through the data structuresBytes
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127
0
1
•••
255
Cac
he L
ines
2
3
4
5
Person structure: For every 88 bytes loaded, reads 4.
Univ.ID Adm. F. D. Prg •Class. Grades Univ.ID Adm. F. D. Prg Class.
Name DofB G Citizens. Address DL.
0 32 64 68 72 ••• 127
© 2006
Department of Computing Science
ISMM 2008
Data Reshaping for Arrays of StructuresStudent *ListOfStudents;
….
ListOfStudents = (Student*)malloc(….);
Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades
Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades
Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades
Univ. ID
Date of Adm.
Fa.
De
Progr.
Univ. ID
Date of Adm.
Fa.
De
Progr.
Univ. ID
Date of Adm.
Fa.
De
Progr.
••• •••
•••
•••
•••
•••
•••
•••
© 2006
Department of Computing Science
ISMM 2008
Maximal Structure Splitting
ID1 Adm1 Dep1Fac1 Clas1
ID2 Adm2 Dep2Fac2 Clas2
ID3 Adm3 Dep3Fac3 Clas3
ID1 ID2 ID3
Adm1 Adm2 Adm3
Fac1 Fac2 Fac3
Dep1 Dep2 Dep3
Clas1 Clas2 Clas3
Grad1 1
Grad2 2
Grad3 3
Grad1 Grad2 Grad3
1 2 3
© 2006
Department of Computing Science
ISMM 2008
Implementation of Pool Allocation
Intercept mallocs and
replace by pool
allocation: each
structure layout gets
its own pool.
If pool is full another
pool can be allocated
ID1
Adm1
Fac1
Dep1
Clas1
Grad1
1
ID2
Adm2
Fac2
Dep2
Clas2
Grad2
2
ID3
Adm3
Fac3
Dep3
Clas3
Grad3
3
ID4
Adm4
Fac4
Dep4
Clas4
Grad4
4
ID5
Adm5
Fac5
Dep5
Clas6
Grad5
6
ID7
Adm7
Fac7
Dep7
Clas7
Grad7
7
© 2006
Department of Computing Science
ISMM 2008
Implementing Pool Allocation
The following types of statements need to be
transformed:
– Memory allocation statements
– Memory reference statements
© 2006
Department of Computing Science
ISMM 2008
Transforming Memory Allocation Statements
Extended pointer analysis to maintain a set of
allocation sites associated with each alias set.
When an alias set is selected for transformation:
– Replace each associated allocation with a call to the pool
allocation function.
© 2006
Department of Computing Science
ISMM 2008
Transforming Memory References
Update address calculation for loads and stores:
– Uniform splitting --- all fields are the same size
• Address calculation is simpler
• Restricts application of technique or
• Requires memory padding
– Non-uniform splitting --- fields of different size
• Address calculation is more involved
• Can be applied more generally
© 2006
Department of Computing Science
ISMM 2008
Non-UniformExample
struct example { type_3 a; /* 3 bytes */type_7 b; /* 7 bytes */type_5 c; /* 5 bytes */};
s
How can the compiler
find the address to
access:
s->c
pool_base = s & 0xF…F000
index = (s – pool_base) / 3
field_base = (3+7)*num_structs_per_pool
s->c = *(s + field_base - 3*index + 5*index)
s->c = *(s + field_base + (5-3)*index)
field_base
pool_base
© 2006
Department of Computing Science
ISMM 2008
Data Transformation Safety
How the compiler decide whether it is safe to
transform a given structure?
– Based on the results of the pointer analysis.
© 2006
Department of Computing Science
ISMM 2008
Is it safe to transform a given data structure?
Structure layout: two structures have the same layout if
each field has the same offset and the same length.
Build alias set
– If a pointer P may point to the structure
• Then all the objects in the points-to set of the alias set of P
must have the same layout.
Data Struct 1
Data Struct 2
P
Q
Alias set
Points-to set
© 2006
Department of Computing Science
ISMM 2008
Experimental Results - Micro Benchmarks (Speedup)
Power 4 Power 5
© 2006
Department of Computing Science
ISMM 2008
Experimental Results - Micro Benchmarks(Instruction Count)
Power 4 Power 5
© 2006
Department of Computing Science
ISMM 2008
Experimental Results - Micro Benchmarks(L2 Cache Misses)
Power 4 Power 5
© 2006
Department of Computing Science
ISMM 2008
Experimental Study - Olden & LLU (Speedup)Power 4 Power 5
bhem
3d
healt
h
power tsp llu bh
em3d
healt
h
power tsp llu
© 2006
Department of Computing Science
ISMM 2008
Active Hardware Prefetch Streams
0
5
10
15
20
25
30
35
40
45
bh em3d health power tsp llu
Benchmark
Prefetches to L2 (in Millions)
Baseline
Pool Alloc
MPADS
Active Prefetching Streams from Memory to L2 (in POWER4)
© 2006
Department of Computing Science
ISMM 2008
Related Work
Pool Allocation– Lattner & Adve - PLDI 2005
• Data Structure Analysis Array Based Structure Splitting
– Zhong et al. - PLDI 2004• Reference affinity / affinity based splitting• Memory Trace
Safe Pointer Based Structure Splitting– Jeon, Shin and Han - CC 2007
• Similar to non-uniform splitting• Affinity based splitting uses static analysis
– Regular expression framework– Guarantee Safety with regular expressions
© 2006
Department of Computing Science
ISMM 2008
Final Remarks
Our Compiler-Research Guiding Principles
– Programming productivity
• Enables programmers to be efficient
• Enables easy-to-write/easy-to-maintain programs
– Execution Time Performance
• Recover runtime efficiency (time, storage or energy) through
– Code analysis
– Improved code generation
– Knowledge of computer architecture and memory hierarchy
© 2006
Department of Computing Science
ISMM 2008
© 2006
Department of Computing Science
ISMM 2008
© 2006
Department of Computing Science
ISMM 2008
Pointer Analysis Primer
The following statement:
int *a = malloc(…);
Creates:
• a memory object (A),
• a pointer (a),
• and a points-to relation (a,A):
a A
© 2006
Department of Computing Science
ISMM 2008
Alias Analysis Primer: Andersen’s X Steensgaard’s
a = &b;
Program: Steensgaard (unification-based):
Andersen:
S = {(a,b)}
S = {(a,b)}
a
b
ba
(Shapiro/Horwitz, PPL97)
© 2006
Department of Computing Science
ISMM 2008
a = &b;b = &c;
Program:
Andersen:
S = {(a,b); (b,c)}
S = {(a,b); (b,c)}c
a
b
cba
(Shapiro/Horwitz, PPL97)
Alias Analysis Primer: Andersen’s X Steensgaard’s
Steensgaard (unification-based):
© 2006
Department of Computing Science
ISMM 2008
a = &b;b = &c;a = &d;
Program:
Andersen:
S = {(a,b); (b,c)}
S = {(a,b); (b,c); (a,d)}c
a
b
d
cba
(Shapiro/Horwitz, PPL97)
Alias Analysis Primer: Andersen’s X Steensgaard’s
Steensgaard (unification-based):
What should happenin the Steensgaard analysis?
© 2006
Department of Computing Science
ISMM 2008
a = &b;b = &c;a = &d;
Program:
Andersen:
S = {(a,b); (b,c); (a,d); (d,c)}
S = {(a,b); (b,c); (a,d)}c
a
b
d
c(b,d)a
(Shapiro/Horwitz, PPL97)
Alias Analysis Primer: Andersen’s X Steensgaard’s
Steensgaard (unification-based):
© 2006
Department of Computing Science
ISMM 2008
a = &b;b = &c;a = &d;d = &e;
Program:
Andersen:
S = {(a,b); (b,c); (a,d); (d,c)}
S = {(a,b); (b,c); (a,d)}c
a
b
d
c(b,d)a
(Shapiro/Horwitz, PPL97)
And now?
Alias Analysis Primer: Andersen’s X Steensgaard’s
Steensgaard (unification-based):
© 2006
Department of Computing Science
ISMM 2008
a = &b;b = &c;a = &d;d = &e;
Program:
Andersen:
S = {(a,b); (b,c); (a,d); (d,c); (d,e); (b,e)}
S = {(a,b); (b,c); (a,d); (d,e)}c
a
b
d e
(c,e)(b,d)a
(Shapiro/Horwitz, PPL97)
Alias Analysis Primer: Andersen’s X Steensgaard’s
Steensgaard (unification-based):