Upload
kiele
View
40
Download
0
Embed Size (px)
DESCRIPTION
Dynamic Code Mapping Techniques for Limited Local Memory Systems. Seungchul Jung Compiler Microarchitecture Lab Department of Computer Science and Engineering Arizona State University. Multicore Landscape. Singlecore Architecture. Power. Temperature. NVidia. IBM. Heat. Reliability. - PowerPoint PPT Presentation
Citation preview
CCMMLLCCMMLL
Dynamic Code Mapping Dynamic Code Mapping Techniques for Limited Techniques for Limited Local Memory SystemsLocal Memory Systems
Seungchul JungCompiler Microarchitecture Lab
Department of Computer Science and EngineeringArizona State University
04/22/231
CCMMLLCCMMLL
Multicore LandscapeMulticore Landscape
04/22/232
Singlecore ArchitectureSinglecore Architecture
Multicore ArchitectureMulticore Architecture
Power Temperature
Heat ReliabilityNVidia IBM
Intel
CCMMLLCCMMLL
Multi-core MemoryMulti-core Memory
• Critical issues with cache in DMS– Scalability– Coherency Protocols
04/22/233
SPUSPU
LSLS
DMADMA
Element Interconnect BusElement Interconnect Bus
PPUPPU MemoryControllerMemory
ControllerBus Interface
ControllerBus Interface
Controller
SPUSPU
LSLS
DMADMA
SPUSPU
LSLS
DMADMA
SPUSPU
LSLS
DMADMA
SPUSPU
LSLS
DMADMA
SPUSPU
LSLS
DMADMA
SPUSPU
LSLS
DMADMA
SPUSPU
LSLS
DMADMA
CCMMLLCCMMLL
Core Memory Core Memory ManagementManagement
• Local Memory size is limiting factor
• Need for Automatic Management– Application developers are already busy
• Code, Variable, Heap and stack
04/22/234
int global;
F1(){int var1, var2;global = var1 + var2;F2();
}
int global;
F1(){int var1, var2;DLM.fetch(global);global = var1 + var2;DLM.writeback(global);
ILM.fetch(F2);F2();
}
CCMMLLCCMMLL
(c) Local Memory
F2F2
F3F3
F1F1
Code Management Code Management MechanismMechanism
04/22/235
(d) Main Memory
heapheap
variablevariable
stackstack
codeF2F2
F1F1F3F3
F1F1
F2F2
F3F3
F1
F2
F3
(a) Application Call Graph
SECTIONS { OVERLAY { F1.o F3.o } OVERLAY { F2.o }}
(b) Linker.Script
CCMMLLCCMMLL
Code Management Code Management ProblemProblem
04/22/236
REGION
REGION
REGION
•••
• # of Regions and Function-To-Region Mapping– Two extreme cases
• Wise code management due to NP-Complete– Minimum data transfer with given space
Local Memory Code Section
CCMMLLCCMMLL
Related WorkRelated Work• Cluster functions into regions
– minimize the intra-cluster interference
• ILP formulation [2,3,4,5]– Intractable for large application
• Heuristics are proposed– Best-fit[6], First-Fit[4], SDRM[5]
04/22/237
1. Egger et al. Scratchpad memory management.. EMSOFT '06
2. Steinke et al. Reducing energy consumption.. ISSS '02
3. Egger et al. A dynamic code placement.. CASES '06
4. Verma et al. Overlay techniques.. VLSI’06
5. Pabalkar et al. SDRM: Simultaneous.. HIPC’08
6. Udayakumaran et al. Dynamic allocation for.. ACM Trans.’06
CCMMLLCCMMLL
Limitations of Previous Limitations of Previous Works 1Works 1
04/22/238
F1(){ F2(); F3();}
F2(){ for(i=0;i<10;i++){ F4(); } for(i=0;i<100;i++){ F5(); }}
F3(){ for(i=0;i<10;i++){ F6(); F7(); }}
F1
F3
F6 F7
F2
F4 F5
1 1
10 10 10 100
F1
F2 F3
1 1
L1 L2
F4 F5
L3
F6 F7
1 1 1
1KB
1KB 1KB
1KB 1KB 1KB 1KB
1KB
1KB 1KB
1KB 1KB 1KB 1KB
10 100 10
(b) Call Graph
(a) Example Application(c) GCCFG
1 1 1 1
Clear Execution
Order
CCMMLLCCMMLL
Limitations of Previous Limitations of Previous Works 2Works 2
04/22/239
F1, 2KB
F2, 1.5KB
1
1
1
F3, 0.4KB
F4, 0.2KB
(a) Call Graph
F1(){ F2();}
F2(){ F3();}
F3(){ F4();}
(a) Example
F1
F2
Regoin 02Kb
Region 11.5Kb F2,F3
F1,F4
F2,F3
2Kb + 0.4Kb
1.5Kb + 0.4Kb
Regoin 0
Region 1
(b) Intermediate Mapping
F1
F2,F3,F4
2Kb + 0.2Kb
1.5Kb + 0.2Kb+
0.4Kb + 0.2Kb
Regoin 0
Region 1
(c) NOT considering other
functions
(d) Considering other functions
Regoin 02Kb
Region 11.5Kb
Regoin 02Kb
Region 11.5Kb
2Kb + 0.2Kb
0+
0.4Kb + 0.2Kb
Regoin 0
Region 1
21%
CCMMLLCCMMLL
Our ApproachOur Approach
04/22/2310
F2
F1
L2
F3 L1 F5
F4
1
1 1
100
1
200
F1() { F2(); for(int I = 0; I < 200; i++){ F5(); }}
F2() { F3(); for(int I = 0; I < 100; i++){ F4(); }}
CCMMLLCCMMLL
FMUM HeuristicFMUM Heuristic
04/22/2311
1KB
1.5KB
0.5KB
2KB
1.5KB
1KB
F1
F2
F3
F4
F5
F6
F2
1.5KB
1.5KB
F3
F4
F6
0.5KB
2KB
1KB
F1,F5 1.5KB
1.5KB
F3 0.5KB
F4 2KB
(a) Start (b) Next step (c) Final
F1
F5
F2
F6
Maximum (7.5KB) Given (5.5KB)
CCMMLLCCMMLL
New Region
New Region
FMUP HeuristicFMUP Heuristic• Minimum (2KB) Given Size
(5KB)
04/22/2312
2KB
2KB
1.5KB
(a) START
(b) STEP1
(e) FINAL
1.5KB
F1 F2 F3
F4 F5 F6
1.5KB
(c) STEP2
(d) STEP3
CCMMLLCCMMLL
Interference Cost Interference Cost CalculationCalculation
04/22/2313
F2
F1
L2
F3 L1 F5
F4
1
3 1
100
1
200
CCMMLLCCMMLL
Experiments SetupExperiments Setup
04/22/2314
FMUM FMUP SDRM
CCMMLLCCMMLL
Typical Performance Typical Performance ResultResult
04/22/2315
FMUP performs
better
FMUM performs
better
CCMMLLCCMMLL
Number of Times of Number of Times of ImprovementImprovement
04/22/2316
Pick the better of FMUM and FMUP
82% of time, FMUM + FMUP gives better result
CCMMLLCCMMLL
Average 12% reduction in Average 12% reduction in runtimeruntime
04/22/2317
FMUM + FMUP gives better Perf. by
12%
CCMMLLCCMMLL
Utilizing Given Code Utilizing Given Code SpaceSpace
04/22/2318
Given code space is fully
utilized
CCMMLLCCMMLL
Efficient in-Loop-Functions Efficient in-Loop-Functions MappingMapping
04/22/2319
In-loop-functions are
mapped separatly
CCMMLLCCMMLL
Increase Map-abilityIncrease Map-ability
04/22/2320
100% mappability Guarantee
d
CCMMLLCCMMLL
Impact of Different GCCFG Weight Impact of Different GCCFG Weight AssignmentAssignment
04/22/2321
1.04
0.96 Can Reduce compile time
overhead
CCMMLLCCMMLL
Performance w/ Increased SPU Performance w/ Increased SPU ThreadsThreads
04/22/2322
Scalability with increased number of
cores
CCMMLLCCMMLL
ConclusionConclusion• Trend of Computer Architecture
– Multicore with Limited Local Memory System
• Memory Management is required– Code, Variable, Heap and Stack– Better performance with limited resource
• Limitations of previous works– Call Graph and fixed Interference Cost
• Two new heuristics (FMUM, FMUP)– Overall Performance Increase by 12%– Tolerable Compile Time Overhead
04/22/2323
CCMMLLCCMMLL
Contributions and Contributions and OutcomesOutcomes
• Contributions– Problem formulation using GCCFG– Updating interference cost between
functions
• Outcomes– Software release
(www.public.asu.edu/~sjung)– Paper submission to ASAP2010
• Plans– Journal submission prepared
04/22/2324