Instrumenting parsecs raytrace

Instrumenting a benchmark applicationTools and Measurements TechniquesProject by Mário Almeida (EMDC)

Barcelona, 25 April 2012

Index (1/2)Tools and configuration● Parsec

○ Overview○ Benchmark programs

● Extrae● Paraver● Configuration

Index (2/2)Measurements● Raytrace

○ Overview○ Code○ Inputs○ Traces○ Load Balancing○ Cache misses and instructions○ Execution time○ Configuration comparisons○ Extrae overhead

Conclusions 2

Tools and configuration

ParsecOverview● Benchmark with the following characteristics:

○ Multithreaded○ Emerging workloads○ Diverse○ Not HPC-focused○ Research

ParsecBenchmark programs● blackscholes● bodytrack● canneal● dedup● facesim● ferret● fluidanimate● freqmine● raytrace● ... 4

Extrae● Instrumentation package to trace programs

and run with shared memory model and message passing programming.

Paraver● Detailed quantitative analysis of a program

performance.● Concurrent comparative analysis of several

traces.● Support for mixed message passing and

shared memory.● Building of derived metrics.

Configuration (1/4)Boada server:

● Dual CPU Six Core with Hyperthreading.● Kills applications after a few minutes.● 24 GB of RAM.

Boada server:

● Used cpulimit to limit the cpu usage up to four cores.

Configuration (2/4)Installed and/or configured:

● Parsec 2.1 with raytrace package only.● Extrae 2.2.1.● Paraver 4.3.0 (in my laptop).● CpuLimit● Minor configurations on .bashrc.● Multiple scripts to clean, build and run.

Configuration (3/4)

Configuration (4/4)

Measurements

RaytraceOverview● Physical simulation for visualization● Computer animation● Input is a complex object of many triangles.

RaytraceCodeFor every pixel in the image

calculate trajectory of ray striking pixelfind closest intersection point of ray with scene

geometrycalculate contribution of all lights at intersection pointrecursively trace specularly reflected ray

end for

RaytraceInputs● simsmall - 1 million polygons (480x270)● simmedium - 1 million poly (960x540)● simlarge - 1 million poly (1920x1080)● native - 10 million poly (1920x1080)

RaytraceTrace (1/2)Only 10% of the execution time is parallel!

Not created Running

Render time is proportional to the # of frames!

RaytraceTrace (2/2)

RenderInit and adding object Build Context

RaytraceLoad balancing (1/2)

16Not created

Barrier

Create Threads Task

Wait for all threads

Good load balancing between the slave threads.

RaytraceLoad balancing (2/2)

RaytraceCache and instructions

High number of cache misses Very low number of cache misses

There were no significative diferences of IPC between threads.

RaytraceExecution time (1/3)

These are average times from multiple executions of the parallel code only and without extrae overhead.There was a high average deviation of 0.3 seconds in the experiments.Bigger inputs were more accurate.

There was a smaller average deviation of 0.03 seconds. With 64 threads it runs almost three times faster!

There was a even smaller average deviation of 0.02 seconds. With 64 threads it runs almost three times faster!

RaytraceConfiguration comparison

In the case of the limited configuration, although perfomance doesn't seem to degrade, the execution time seems to stabilize for more than 8 threads.

RaytraceExtrae overhead

Conclusions

Conclusions (1/3)● The system seemed to perform worse for a

number of threads multiple of the total number of physical cores.

● The program has a good load balancing. ● Fine-granular parallelism.

Conclusions (2/3)● Although it wasn't possible to verify,

increasing the input should cause higher cache misses, because of the big working sets that won't fit on the memory.

● Memory bandwidth should be the main issue

for good speedups. ● Boada killed almost all the native input

executions. 25

Conclusions (3/3)● Paraver simplifies the process of analyzing

an application performance. ● Better knowledge of the systems

architecture would be needed in order further analyse the performance of the application.

Questions

Instrumenting parsecs raytrace

Technology

A temperate exo-Earth around a quiet M dwarf at 3.4 parsecs · PDF fileA temperate exo-Earth around a quiet M dwarf at 3.4 parsecs? ... stars, in particular, o er ... that a few dozen

Jets of SS433 on scales of dozens parsecs - arXiv · Jets of SS433 on scales of dozens parsecs ... The evolution of the jets on the scales of the ... The region of the clearly discerned

Author(s) Watkins, John A Title Instrumenting the Naval

A Framework for Dynamically Instrumenting GPU Compute ...gpuocelot.gatech.edu/wp-content/uploads/gpgpu4-ptx-instrumentatio… · A Framework for Dynamically Instrumenting GPU Compute

Instrumenting the MG applicaiton of NAS Parallel Benchmark

Instrumenting Ruby on Rails With Traceview

Instrumenting your Instruments

Instrumenting plugins for Performance Schema

Gather those events : Instrumenting everything for analysis

Delphi Tools Update: Instrumenting Threaded ProgramsDesign for Instrumenting Threads • Whole process vs. Threads – Important performance issue! – Whole process metrics are computed

3d Max Render 01 Raytrace Skylight

Bruno-sost Vulc 2010 01 - Laboratori Nazionali di Frascati · 2010-06-04 · •Geminga: 157 parsecs from Earth and 370,000 years old •B0656+14: 290 parsecs from Earth and 110,000

SystemTap: Instrumenting the Linux Kernel for Analyzing ... · 6 SystemTap: Instrumenting the Linux Kernel for Analyzing Performance and Functional Problems 2.1 Architectural overview

Why is 'instrumenting the grid’ important for e -Infrastructure ?

Understanding, Choosing & Instrumenting NOSQL

Instrumenting Systems for Arbitrary Observability

Instrumenting Accountability in MAS with Blockchainceur-ws.org/Vol-2051/paper2.pdf · Instrumenting Accountability in MAS with Blockchain Fernando Gomes Papi 1, Jomi Fred Hub ner

Instrumenting the city: developing methods for observing ... · Instrumenting the city: developing methods for observing and understanding the digital cityscape Eamonn O’Neill1,

Sensor Captchas: On the Usability of Instrumenting ... is used to digitalize street view addresses as well as books and Sensor Captchas: On the Usability of Instrumenting Hardware

Instrumenting, Analyzing, & Tuning the Performance of ... · Instrumenting, Analyzing, & Tuning the Performance of Oracle ... (activation and passivation) 4 ... – Gives visibility