30
Mitglied der Helmholtz-Gemeinschaft Computation of Mutual Information Metric for Image Registration on Multiple GPUs Andrew V. Adinetz 1 , Markus Axer 2 , Marcel Huysegoms 2 , Stefan Köhnen 2 , Jiri Kraus 3 , Dirk Pleiter 1 26.03.2014 1 JSC, Forschungszentrum Jülich 2 INM-1, Forschungszentrum Jülich 3 NVIDIA GmbH Presented at HeteroPar’13 workshop of EuroPar‘13

Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Computation of Mutual Information Metric for Image Registration on Multiple GPUs

Andrew V. Adinetz1, Markus Axer2, Marcel Huysegoms2, Stefan Köhnen2, Jiri Kraus3, Dirk Pleiter1

26.03.2014

1 JSC, Forschungszentrum Jülich 2 INM-1, Forschungszentrum Jülich 3 NVIDIA GmbH

Presented at HeteroPar’13 workshop of EuroPar‘13

Page 2: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

•  Brain Image Registration •  Multi-GPU Implementation

•  system memory •  listupdate

•  Performance Evaluation •  Conclusion

Outline

March 26, 2014 2 GPU Technology Conference 2014

Page 3: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Preparation of the brain

March 26, 2014 3 GPU Technology Conference 2014

Page 4: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

BigBrain – first high-resolution brain model at microscopical scale

!   7404  histological  sec/ons  stained  for  cell  bodies  !   scanned  with  a  flad  bed  scanner  !   original  resolu/on  10  ×  10  ×  20  μm3  (11.000  ×  13.000  pixels)  !   downscaling  to  20  μm  isotropic  !   removal  of  ar/facts    !   1  Terabyte  

in cooperation with Alan Evans, McGill, Montreal

Amunts et al. (2013) Science

Pushing the limits for a cellular brain model

Page 5: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple
Page 6: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple
Page 7: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple
Page 8: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

•  Registration = process of image alignment

Image Registration

ITK Workflow

March 26, 2014 8 GPU Technology Conference 2014

Page 9: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

•  i, j – pixel values (0 .. 255)

•  successful for multi-modal registration

Mutual Information Metric

MI(I f ,Im ) = p(i, j)log2i, j∑ p(i, j)

pf (i)pm ( j)

pf (i) = p(i, j)j∑

pm ( j) = p(i, j)i∑

March 26, 2014 9 GPU Technology Conference 2014

Page 10: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

•  main computational kernel •  transform can be complex (1000+ parameters) •  GPU implementation: 1 pixel/thread, atomics

Two Image Cross-Histogram

for(int y = 0; y < fixed_sz_y; y++) for(int x = 0; x < fixed_sz_x; x++) { int i = bin(fixed[x, y]); float x1 = transform_x(x, y); float y1 = transform_y(x, y); int j = bin(interpolate(moving, x1, y1)); histogram[i, j]++; // atomic on GPU }

March 26, 2014 10 GPU Technology Conference 2014

Page 11: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Large Data Size

size: 3.000 × 3.000 px

pixel size: 60 × 60 µm

file size: 30 MB

Large-area Polarimeter

size: 100.000 × 100.000 px

pixel size: 1.6 x 1.6 µm

file size: 40 GB

Polarizing Microscope

March 26, 2014 11 GPU Technology Conference 2014

Need  mul(ple  GPUs!  

Page 12: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

•  Domain decomposition •  distribute fixed and moving images •  histogram contributions summed up

•  Moving image: how to handle? •  irregular access pattern

•  Approaches •  System memory replication (sysmem) •  Listupdate (listupdate)

Multi-GPU Mutual Information

March 26, 2014 12 GPU Technology Conference 2014

Page 13: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

•  Replicate entire moving image in pinned host RAM •  accessible to GPU

+ easy to implement

– system memory accesses are slower – cannot use texture interpolation

•  Optimizations •  moving image halo in GPU RAM

System Memory Replication

March 26, 2014 13 GPU Technology Conference 2014

Page 14: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

•  On remote access •  „send message“

•  „On receiving message“ •  compute contributions

•  Active messaging variant •  buffering •  relies on undocumented features

•  Listupdate •  chunking •  buffer size bounded •  communication-computation

overlap

Listupdate typedef struct { float[2] movingCoords; short destRank; char fixedBin; } message_t;

March 26, 2014 14 GPU Technology Conference 2014

Page 15: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Writeout: Atomics vs Grouping

March 26, 2014 15 GPU Technology Conference 2014

Atomics  

Grouping  

write  to  per-­‐pixel  buffer  

group  (compress)  

determine  write  posi(on  using  atomics  

warp-­‐aggregated  increment  

Page 16: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Chunk Processing and Overlap

Process  chunk   Group   Exchange  Handle  

messages  

Process  chunk   Group   Exchange  

Process  chunk   Group  1  

2  

Fixed  Image  Fixed  Image  

y  

x  (0,0)  

Page 17: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

+ computation-communication overlap – hard to implement – chunk processing (or won‘t fit into buffer)

•  Optimizations •  buffers: AoS vs. SoA •  atomics vs. grouping •  using multiple streams

Listupdate typedef struct { float[2] movingCoords; short destRank; char fixedBin; } message_t;

March 26, 2014 17 GPU Technology Conference 2014

Page 18: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Benchmark setup

Fixed  Image  Fixed  Image  

y  

x  (0,0)  

Remote  access  

Mask  

March 26, 2014 18 GPU Technology Conference 2014

Page 19: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

•  JUDGE •  256-node GPU cluster •  Each M2070 node:

•  2x M2070 (Fermi) GPU, each 6 GB RAM •  12-core X5650 CPU @ 2.67 GHz, 96 GB RAM

•  JuHydra •  single-node Kepler machine

•  2x K20X (Kepler) GPU, each 6 GB RAM •  16-core E5-2650 CPU @ 2 GHz, 64 GB RAM

Test Hardware

March 26, 2014 19 GPU Technology Conference 2014

Page 20: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Baseline: Full Replication (M2070)

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

0.8  

0.9  

1  

0  5.4  

10.8  

16.2  

21.6  

27  

32.4  

37.8  

43.2  

48.6  

54  

59.4  

64.8  

70.2  

75.6  

81  

86.4  

91.8  

97.2  

102.6  

108  

113.4  

118.8  

124.2  

129.6  

135  

140.4  

145.8  

151.2  

156.6  

162  

167.4  

172.8  

178.2  

Run/

me  in  secon

ds  

Rota/on  angle  

1  -­‐  GPU  

2  -­‐  GPUs  

4  -­‐  GPUs  

ideal  scalability  March 26, 2014 20 GPU Technology Conference 2014

Page 21: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sysmem on Fermi

0  

0.2  

0.4  

0.6  

0.8  

1  

1.2  

0  5.4  

10.8  

16.2  

21.6  

27  

32.4  

37.8  

43.2  

48.6  

54  

59.4  

64.8  

70.2  

75.6  

81  

86.4  

91.8  

97.2  

102.6  

108  

113.4  

118.8  

124.2  

129.6  

135  

140.4  

145.8  

151.2  

156.6  

162  

167.4  

172.8  

178.2  

Run/

me  in  secon

ds  

Rota/on  angle  

1-­‐GPU  

2-­‐GPUs  Baseline  

2  GPUs  

March 26, 2014 21 GPU Technology Conference 2014

Page 22: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sysmem on Fermi: Explanation

No  sysmem  Access  Good  Coalescing  

Few  sysmem  Access  Bad  Coalescing  

Many  sysmem  Access  Bad  Coalescing  

Most  sysmem  Access  Good  Coalescing  

March 26, 2014 22 GPU Technology Conference 2014

Page 23: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sysmem on Fermi: PCI-E Queries

0  

20000000  

40000000  

60000000  

80000000  

100000000  

120000000  

0  

0.2  

0.4  

0.6  

0.8  

1  

1.2  

0  5.4  

10.8  

16.2  

21.6  

27  

32.4  

37.8  

43.2  

48.6  

54  

59.4  

64.8  

70.2  

75.6  

81  

86.4  

91.8  

97.2  

102.6  

108  

113.4  

118.8  

124.2  

129.6  

135  

140.4  

145.8  

151.2  

156.6  

162  

167.4  

172.8  

178.2  

Sysm

em_q

ueries  

Run/

me  in  secon

ds  

Rota/on  angle  

2-­‐GPUs  Baseline   2  GPUs   Total  Sysmem_queries  

March 26, 2014 23 GPU Technology Conference 2014

Page 24: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sysmem: Halo Sizes

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

0.8  

0   1.8   3.6   5.4   7.2   9   10.8   12.6   14.4   16.2   18   19.8   21.6   23.4   25.2   27   28.8   30.6   32.4   34.2   36  

Time,  s  

Angle,  degrees  

2  K20X,  baseline   2  K20X,  sysmem   2  K20X,  5%  halo   2  K20X,  10%  halo  

2  K20X,  15%  halo   2  K20X,  20%  halo   2  K20X,  25%  halo  

mostly  quan(ta(ve,  not  qualita(ve  difference  March 26, 2014 24 GPU Technology Conference 2014

Page 25: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Listupdate: Multiple Streams

4  streams  look  the  best  

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

0.8  

0.9  

0  

5.4  

10.8  

16.2  

21.6  

27  

32.4  

37.8  

43.2  

48.6  

54  

59.4  

64.8  

70.2  

75.6  

81  

86.4  

91.8  

97.2  

102.6  

108  

113.4  

118.8  

124.2  

129.6  

135  

140.4  

145.8  

151.2  

156.6  

162  

167.4  

172.8  

178.2  

Time,  s  

Angle,  degrees  

2  K20X,  1  stream   2  K20X,  2  streams   2  K20X,  3  streams   2  K20X,  4  streams  

March 26, 2014 25 GPU Technology Conference 2014

Page 26: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Listupdate: AoS vs SoA, Atomics vs Group

SoA  +  atomics  looks  best  

0  

0.2  

0.4  

0.6  

0.8  

1  

1.2  

0  

5.4  

10.8  

16.2  

21.6  

27  

32.4  

37.8  

43.2  

48.6  

54  

59.4  

64.8  

70.2  

75.6  

81  

86.4  

91.8  

97.2  

102.6  

108  

113.4  

118.8  

124.2  

129.6  

135  

140.4  

145.8  

151.2  

156.6  

162  

167.4  

172.8  

178.2  

Time,  s  

Angle,  degrees  

2  K20X,  SoA   2  K20X,  AoS   2  K20X,  compress  

March 26, 2014 26 GPU Technology Conference 2014

typedef struct { float[2] movingCoords; char fixedBin; } message_t;

Page 27: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sysmem vs. Listupdate: Fermi

0  

0.5  

1  

1.5  

2  

2.5  

0  5.4  

10.8  

16.2  

21.6  

27  

32.4  

37.8  

43.2  

48.6  

54  

59.4  

64.8  

70.2  

75.6  

81  

86.4  

91.8  

97.2  

102.6  

108  

113.4  

118.8  

124.2  

129.6  

135  

140.4  

145.8  

151.2  

156.6  

162  

167.4  

172.8  

178.2  

Time,  s  

Angle,  degrees  

4  M2070,  SoA   4  M2070,  baseline   4  M2070,  sysmem   4  M2070,  25%  halo  

on  Fermi,  sysmem  is  be_er  March 26, 2014 27 GPU Technology Conference 2014

Page 28: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

Sysmem vs. Listupdate: Kepler (Closeup)

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

0.8  

0   1.8   3.6   5.4   7.2   9   10.8   12.6   14.4   16.2   18   19.8   21.6   23.4   25.2   27   28.8   30.6   32.4   34.2   36  

Time,  s  

Angle,  degrees  

2  K20X,  SoA   2  K20X,  baseline   2  K20X,  sysmem   2  K20X,  25%  halo  

on  Kepler,  listupdate  is  be_er  March 26, 2014 28 GPU Technology Conference 2014

Page 29: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

•  Fermi •  performance limited by atomics •  system memory replication is better

•  Kepler •  10x faster than Fermi •  no longer dominated by atomics •  listupdate (atomic, SoA, 4 streams) is better

•  Future work •  Compression •  Trials on real images

Conclusions

March 26, 2014 29 GPU Technology Conference 2014

Page 30: Computation of Mutual Information Metric for Image ...on-demand.gputechconf.com/gtc/2014/presentations/S... · Computation of Mutual Information Metric for Image Registration on Multiple

Mitg

lied

der H

elm

holtz

-Gem

eins

chaf

t

•  INM-1 at FZJ: http://www.fz-juelich.de/inm/inm-1/EN/Home/home_node.html

•  NVidia Application Lab at FZJ: http://www.fz-juelich.de/ias/jsc/nvlab •  Andrew V. Adinetz: [email protected] •  Jiri Kraus: [email protected] •  Dirk Pleiter: [email protected]

Questions

?  

March 26, 2014 30 GPU Technology Conference 2014