Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Large Scale Matrix-Free
Topology Optimization on the GPU
Krishnan Suresh
Associate Professor
Mechanical Engineering
Topology Optimization (2D)
Reduce weight, but keep it stiff
Structure problem Optimal topologies
10% reduction in weight
2% loss in stiffness
D
50% reduction in weight
12% loss in stiffness
Topology optimization is the systematic generation of such structures
3
Topology Optimization (3D)
40% reduction in weight
10% loss in stiffness
50% reduction in weight
30% loss in stiffness
5
Topology Optimization
Design Space
Finite Element Analysis (FEA)
Optimal?
Change Topology
No
10^5 ~ 10^7 dof
Solve Ku = f
K: Sparse SPD
100’s of iterations!
- Multi-load
- Nonlinear FEA
- Constraints
- … 8
Topology Optimization Cost
Size DOF [Wang]
Medium (84,28,14) 107,184 2.4 hours
Large (180,60,30) 1,010,160 45.7 hours
AMD Opteron TM252 2.6GHz
64-bit processor, with 8GB RAM
9
Iterative Solve Ku = f
Two Goals:
1. Minimize #iterations
2. Minimize cost of K*u (SpMv)
1 ( )
: Preconditioner of K
i i iu u B f Ku
B
12
13
SpMV on GPU
Double precision real world SpMv
100,000 to 1 Million degrees of freedom
GPU (GTX 280): 2~6 increase in flop-rate
14
Iterative Ku = f (GTC 2012)
Fine-grained Parallel Preconditioners …(Wed)
CULA (Wed)
MAGMA (Wed)
Accelerating Iterative Linear Solvers (Wed)
Efficient AMG on Hybrid GPU Clusters (Wed)
Preconditioning for Large-Scale Linear Solvers (Thu)
…
TopOpt (SIMP)
0 1 : ' Density' e
r< £
TopOpt (SIMP)
- Ill-conditioned K
- Poor convergence
SIMP: Most popular TopOpt method
( ) 0
n
e eK Kr=
Large #iterations… bottleneck
16
1 ( ) i i iu u B f Ku
PareTO
Don’t Assign Densities!!
Compute T-Field!!
18
3 1 1 510 : ( ) ( )
2 7 5 1 2tr tr
n ns e s e
n n
é ù- -ê ú= - -ê ú- -ë û
T=
T
0.00004 A
1.36 B
T(x,y) = Stiffness-change
when hole is added @(x,y)
Approach-1: Classic
GPU thread per node ?
~100 non-zeros
1. Assemble K (CPU/GPU)
2. Push to GPU
3. Execute K*u in parallel
Coalesced memory access
challenging!
Exploit Element Congruency
20 distinct elements
120 distinct elements 1 distinct element
1.48 MB storage (vs. 384 MB)
Algorithm
Detect
Congruency
Hex-mesh
Compute Ke of templates
Push data to GPU
Assembly-free
PCG Solve
Ku = f
on GPU
30
Element: Present
or Absent
TopOpt
Typical FEA Computing Time
34
DOF PareTO
CPU GPU
Cantilever
Beam; Edge
110K 8.3 secs 1.9 secs
Stool 2.7M 186 secs 32 secs
Point Load
Cantilever
15M 51 mins 5.73 mins
92M 12 hr -
1.5 hr on 44 core (ONR)
TopOpt Comparison
Size DOF [Wang]*
Medium (84,28,14) 107,184 2.4 hours
Large (180,60,30) 1,010,160 45.7 hours
AMD Opteron TM252 2.6GHz
64-bit processor, with 8GB RAM GTX 480
Identical results
35
TopOpt Computing Time
40
Name of part &
volume fraction
DOF Pub.
Data
PareTO
CPU GPU
Cantilever
Beam; Edge (0.50)
110K 2.4 hr 200 secs 45 secs
Knuckle (0.55) 20K -- 111 secs 44 secs
Bridge (0.35) 113K -- 2 mins 36.2 secs
Stool; N=96 (0.20) 2.7M 21.8 hr 1 hr, 24
mins
14 mins
Point Load
Cantilever (0.50)
783K 3.9 hr 16 mins 125 s
15M - 19hr, 28
mins
2hr, 12 mins
92M - 12 days,
2hr
-