Algorithms for Independent Task Placement and Their Tuning in Demand-Driven Ray Tracing

Bratislava, January 2005Tomas Plachetka, University of Bratislava / Paderborn / Bristol 1

Algorithms for Independent Task Placement

and Their Tuning in the Context of Demand-Driven Parallel Ray Tracing

Overview

• Demand-driven process farm• Abstraction from assignment mechanism, examples of trade-offs• Formal problem definition• Analysis of chunking and factoring strategies• Experiments with chunking and factoring, with manual tuning

of parameters• Tool for the prediction of efficiency for process farms

(more precisely, “post-prediction”) • If the parameters are tuned then the choice of assignment

strategy does not significantly influence efficiency of parallel

computation in the context of parallel ray tracing on

contemporary machines with an “everyday” input, “everyday”

quality settings etc.

Demand-Driven Process Farm

Eye(camera)

Output image

MASTER

result

LOADBALANCER

job req

WORKERS

How many tasks should the LOADBALANCER process assign in one job?

(Badouel et al. [1994] suggest 9 pixels in one job, Freisleben et al. [1998]

suggest 4096 pixels in one job… Where does this difference come from?)

Abstraction from assignment mechanism

WORKER1 WORKER2 WORKER3 WORKER4

Message passing network (send, receive)

WORKER1 WORKER2 WORKER3 WORKER4

Shared memory (central queue, locking)

The assignment mechanism is not important. Assignment of one job costs constant time L, no matter how many tasks are in the job which is assigned to a worker.

Trade-offs for chunking (which assigns fixed-size jobs)Largest jobs (for 2 workers)

Smallest jobs (1 job=1 task)

Problem: imbalance Problem: many messages

How large should the jobs be?

(so that the parallel time is minimal!)

The job “shape” is irrelevant, only the size is important

Problem definitionGiven:N nr. of worker processes, all equally fastW nr. of tasks, independent on each other (not even spatially coherent)L latency; i.e. penalty for assigning 1 job to a worker (a constant time which does not depend on the number of tasks in the job or anything else)

Unknown:Task time complexities.Goal:Minimise the makespan (the parallel time required for assigning & processing of all tasks). The LOADBALANCER must make a decision as to how many tasks to pack into a job immediately after receiving a job request. (This is not quite online… note that W is constant!)

Probabilistic model average tasks’ time complexity std. dev. of tasks’ complexities

Goal: minimise expected makespan

Deterministic modelTmax maximal tasks’ time complexityTmin minimal tasks’ time complexity

Goal: minimise maximal makespan(for worst possible task arrangement)

Chunking strategy (fixed-size chunks)

LB_CHUNKING(float Tmax, int W, int N, float L)

int work = W;

K=???;

wait for a job request;

if (K > work)

K = work;

assign job of size K to the idle WORKER;

work = work – K;

Chunking, analysis

)(1 maxKTLNK

WM high

0:)('2max

WLKopt

N nr. of workersW nr. of tasksL latency Tmax max. task complexityUnknown: Kopt (chunk size)

The time diagram below depicts thestructure of the worst case (maximal makespan):One of the workers always gets the tasks oftime complexity Tmax. (The last extra-round is the result of integer arithmetic.)

L+KTmaxL+KTmax L+KTmax

L+KTmax

WTM high

maxmax 2

WTM low

minmin

Chunking, probabilistic model

Chunking (Kruskal and Weiss)

WME ln2][

for large W and K and K>>log N.

WLK opt

for K<< W / N and small sqrt(K) / N.

][ ?optK

for K<< W / N and large sqrt(K) / N.

Factoring strategy, example

,1maxNT

WK rest

N=2T=3

Parameterisation:N nr. of workersW nr. of tasks T max. ratio of tasks’ complexities (T=Tmax/Tmin)Unknown: the job size K used for the next round

t1 sec t2 sec≤

t sec t sec

≤ T·

33 ≥ x = 1

Example

Factoring, analysis

)1,(memin_par_ti)(memax_seq_ti NKwK iii

wi denotes the number of yet unassigned tasks after round i. Obviously, the larger the assigned job sizes Ki are, the smaller is the assignment overhead. Hence, we want:

)1,(memin_par_ti)(memax_seq_ti NKwK iii

Left-hand side:

max)(memax_seq_ti TKLK ii Right-hand side, simplified (the assignment latency is only counted once):

)()1,(memin_par_ti min

TKwLNKw ii

Note that this simplification ignores the assignment latency. Solving the simplified equation above yields (we denote T=Tmax/Tmin)

wK isimplei

Factoring (simplified), analysis

This work remaining after round i

yields WNr NTN /log1 ))1(1/(1

Factoring (simplified), analysis

Makespan, upper-bound )1()1/(max rLNWTM high

Makespan, lower-bound LrNWTM low /min

Factoring, probabilistic model

Factoring (Flynn-Hummel)

wi is the rest of work at the beginning of round i

1 / (N xi) is the division factor

Ki is the chunk size for round i

In their experiments [1991, 1995], the authors did not attempt to estimate the covariance σ/µ. They used constant division factor 1/(2N) (this means xi =2).

Experiments: data

Experiments: setting

(Machine: hpcLine in PC2)

(Application: parallel ray tracing with “everyday setting”)

Given:N=1…128 nr. of worker processes, all equally fastW=720*576 nr. of tasks, independent on each otherL=0.007 latency; i.e. penalty for assigning 1 job to a worker

(a constant time which does not depend on thenumber of tasks in the job or anything else)

Estimated from measured data:Tmax=0.00226average time on one pixel (360 pixels)Tmin=0.00075 factor between the times on atomic jobs of

size 360 pixels (i.e. T=ca. 3)

Experiments: empirical optimal chunk size (90 workers)

pixelsAK FACTORINGopt

CHUNKINGopt 36045ˆˆ

Experiments: chunking efficiency (K=360)

Experiments: factoring efficiency (atomic_job_size=360)

Tuning of assignment strategies: estimation of future

K=f(W, N, L, Tmin, Tmax)

Suggestion: the unknown parameters Tmin and Tmax can be initially estimated.

This estimation is continually adjusted according to measured run-time statistics. (The estimation of remaining time needed for copying files in Windows uses a similar approach.)

The optimal chunk size for chunking and factoring strategies depends on parameters which are unknown. (However, all these parameters are known when the computation finishes; this is what we call “post-prediction”!)

Conclusions• Farming yields almost a linear speedup (efficiency 95% with 128 workers) for parallel ray tracing (POV||Ray) on a fairly complex “everyday” scene.

• Trivial chunking algorithm with optimal chunk size does not perform worse than a theoretically better factoring algorithm with optimal chunk size; for the particular machine, particular nr. of processors, particular input, particular quality settings, particular room temperature etc. used during experiments.

• Efficiency of chunking/factoring can be predicted (or at least “post-predicted”) for a particular machine, particular nr. of processors, particular input, particular room temperature etc.

• In experiments with process farming, parameters W (nr of tasks), N (nr of workers), L (latency), Tmin and Tmax (min/max tasks’ or jobs’ time complexities)

must be reported. Reporting only some of these parameters is insufficient for drawing conclusions from experiments with process farming (e.g. chunking).

• The parameters specific to chunking/factoring can (and must) be tuned automatically in run-time.

Algorithms for Independent Task Placement and Their Tuning in Demand-Driven Ray Tracing

Documents

Columbia Application Performance Tuning Case Studies · techniques enabled approximately 2- to 20-fold improvements in application performance. Key words: Code tuning, process-placement,

RayChip : Real-time Ray-tracing Chip for Embedded … · Ray-tracing Algorithm – Ray-tracing VS. Rasterization Rasterization Ray-tracing . Siliconarts RayChip® Presentation, Aug

Overview Tuning Defined Tuning in the US The Tuning Process Benefits of Tuning Why Tuning is Different

McCartney - Session Level Tuning and Tracing

Performance, Tracing and Tuning Guide for Oracle US.pdfThe word and design marks set forth herein are trademarks and ... Infor ERP LN Performance, Tracing and Tuning Guide for

Tipi di tuning: tuning dell’architettura fisica tuning dell’istanza tuning dell’architettura logica tuning applicativo Metodi di tuning: il tuning prevede

BEAAquaLogic Service Bus - Oracle Cloud · System Alerts History ... Tracing A. Tuning AquaLogic Service Bus B. Debugging AquaLogic ... services, set up security, manage resources,

Interline Power Flow Controller - serialsjournals.comserialsjournals.com/serialjournalmanager/pdf/1481093126.pdf · have been applied for optimal placement and tuning of UPFC[8-11]

A new hybrid psotvac/bfa technique for solving robust placement … 1.pdf · A new hybrid psotvac/bfa technique for solving robust placement and tuning of ... This paper presents

Tracing 1. Contents Why Tracing Why Tracing Tracing in ASP.NET Tracing in ASP.NET Page Level tracing Page Level tracing Application

Tipi di tuning: tuning dell’architettura fisica tuning dell’istanza

Performance Tuning and Analysis Tools · 2013-09-14 · TAU: Tuning and Analysis Utilities TAU is a performance evaluation tool It supports parallel profiling and tracing Profiling

The Law of Tracing...3. Common law tracing: an illusion 4. Equitable tracing: prerequisites 5. Equitable tracing: mixing 6. Equitable tracing: the “lowestintermediate balance”rule

Performance, Tracing and Tuning Guide for Oracle

Hardware-assisted software tracing - eLinux.org · Hardware-assisted software tracing ... hardware-assisted branch tracing faster than pure-software event tracing?” BTS not meant

Performance Tracing and Tuning Guide - Infor · · 2018-01-18Oracle Solaris ... This document provides guidelines to improve the Infor LN performance by tracing and tuning the environment

STEAM TRACING DESIGN AND INSTALLATION · STEAM TRACING DESIGN AND INSTALLATION 1. STEAM TRACING DESIGN AND INSTALLATION CONSIDERATIONSMMER When designing a modern steam tracing system,

Power System Stabilizer Placement and Tuning

Heat Tracing Basics - Mullan Consultants Tracing Basics... · By: Homi R. Mullan 2 Heat Tracing Basics What is Heat Tracing? Why Heat Tracing? Fundamentals of Heat Loss and Heat Replenishment

第五课 Ray Tracing. Overview of the Section Why Ray Tracing? What is Ray Tracing? How to tracing rays? How to accelerating ray-tracing?