Upload
truonghoangkiet
View
230
Download
0
Embed Size (px)
Citation preview
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 1/19
ThoaiNam
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 2/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Speedup & Efficiency
Amdahl’s Law Gustafson’s Law
Sun & Ni’s Law
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 3/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Speedup:
S = Time(the most efficient sequential
algorithm) / Time(parallel algorithm)
Efficiency:
E = S / N with N is the number of processors
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 4/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Amdahl’s Law – Fixed Problem
Size (1)
The main objective is to produce the results as
soon as possible– (ex) video compression, computer graphics, VLSI
routing, etc
Implications– Upper-bound is
– Make Sequential bottleneck as small as possible
– Optimize the common case
Modified Amdahl’s law for fixed problem size
including the overhead
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 5/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Amdahl’s Law – Fixed Problem
Size (2)
ParallelSequentialSequential
P5 P6 P7 P8P4P0 P1 P2 P3 P9SequentialParallel
T(1)
T(N)
Ts Tp
Ts=αT(1) ⇒ Tp= (1-α)T(1)
T(N) = αT(1)+ (1-α)T(1)/N
Number ofprocessors
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 6/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Amdahl’s Law – Fixed Problem
Size (3)
∞→→−
+
=−
+
= N as
N N
T T
T Speedup
α α
α
α
α
1
)1(
1
)1()1()1(
)1(
)(
)1(
N Time
TimeSpeedup =
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 7/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
∞→
+
→
+−
+
= N as
T T T
N T T
T Speedup
overhead overhead
)1(
1
)1()1()1(
)1(
α α
α
The overhead includes parallelism
and interaction overheads
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 8/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Gustafson’s Law – Fixed Time (1)
User wants more accurate results within a time limit
– Execution time is fixed as system scales– (ex) FEM for structural analysis, FDM for fluid dynamics
Properties of a work metric
– Easy to measure
– Architecture independent
– Easy to model with an analytical expression
– No additional experiment to measure the work
– The measure of work should scale linearly with sequentialtime complexity of the algorithm
Time constrained seems to be most generally viable
model!
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 9/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Gustafson’s Law – Fixed Time (2)
Parallel
P5 P6 P7 P8P4P0 P1 P2 P3 P9Sequential
Sequential
P0Sequential
P9
.
.
.
W0Ws
α = Ws / W(N)
W(N) = αW(N) + (1-α)W(N)⇒ W(1) = αW(N) + (1-α)W(N)N
W(N)
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 10/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Gustafson’s Law – Fixed Time
without overhead
N W
NW W
k N W
k W
N T
T
Speedup )1(
1(
).(
).1(
)(
)1(α α
α α
−+=
)−+
===
Time = Work . k
W(N) = W
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 11/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Gustafson’s Law – Fixed Time
with overhead
W
W
N
W W
NW W
k N W
k W
N T
T Speedup
00 1
1(1(
).(
).1(
)(
)1(
+
)−+=
+
)−+===
α α α α
W(N) = W + W0
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 12/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Sun and Ni’s Law –
Fixed Memory (1)
Scale the largest possible solution limited
by the memory space. Or, fix memoryusage per processor
Speedup,
– Time(1)/Time(N) for scaled up problem is notappropriate.
– For simple profile, and G(N) is the increase of
parallel workload as the memory capacityincreases n times.
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 13/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Sun and Ni’s Law –
Fixed Memory (2)
N
N G N GSpeedup MC )()1(
)()1(α α
α α
−+
−+
=
W=αW+(1- α)W
Let M be the memory capacity of a singlenode
N nodes:
– the increased memory nM– The scaled work: W=αW+(1- α)G(N)W
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 14/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Definition:
A function is homomorphism if there exists a functionsuch that for any real number c and variable x ,
.
Theorem:
If W = for some homomorphism function ,, then, with all data being shared by
all available processors, the simplified memory-
bounced speedup is
Sun and Ni’s Law –
Fixed Memory (3)
N
N G
N G
W N
N g W
W N g W S
N
N N )(
)1(
)()1(
)(
)(
1
1*
α α
α α
−+
−+=
+
+=
g )()()( x g c g cx g =
g
)( M g g )()()( x g c g cx g =
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 15/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Proof:
Let the memory requirement of W n be M, W n = .M is the memory requirement when 1 node is available.
With N nodes available, the memory capacity willincrease to NM.
Using all of the available memory, for the scaled parallelportion :
.
Sun and Ni’s Law –
Fixed Memory (4)
N N W N g M g N g NM g W )()()()(*===
)( M g
*
N W
N
N
N
N N
W N
N g W
W N g W
N
W W
W W S )()(
1
1
**
1
**
1*
+
+=
+
+=
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 16/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
– When the problem size is independent of the system,
the problem size is fixed, G(N)=1 ⇒ Amdahl’s Law.– When memory is increased N times, the workload also
increases N times, G(N)=N ⇒ Gustafson’s Law
– For most of the scientific and engineering applications,the computation requirement increases faster than the
memory requirement, G(N)>N .
N
N N
W N N GW
W N GW S
)(
)(
1
1*
+
+=
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 17/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
0
2
4
6
8
10
Processors
S(Linear)
S(Normal)
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 18/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Parallelizing a code does not always result in a speedup;sometimes it actually slows the code down! This can be
due to a poor choice of algorithm or to poor coding
The best possible speedup is linear, i.e. it is proportionalto the number of processors: T(N) = T(1)/N where
N = number of processors, T(1) = time for serial run. A code that continues to speed up reasonably close to
linearly as the number of processors increases is said tobe scalable. Many codes scale up to some number of
processors but adding more processors then brings noimprovement. Very few, if any, codes are indefinitelyscalable.
8/10/2019 4 PP Speedup
http://slidepdf.com/reader/full/4-pp-speedup 19/19
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Software overheadEven with a completely equivalent algorithm, software overheadarises in the concurrent implementation. (e.g. there may be additionalindex calculations necessitated by the manner in which data are "splitup" among processors.)i.e. there is generally more lines of code to be executed in the parallel
program than the sequential program.
Load balancing
Communication overhead