33
The Plastic Domain Decomposition Boris Jeremi´ c Department of Civil and Environmental Engineering University of California, Davis MRCCS/NSF Summer School High Performance Computing in Finite Element Analysis 1st - 5th September 2003, University of Manchester Supported in part by the NSF, PEER, Caltrans, and Cal–EPA. Collaborators: Professors Zhaohui Yang (UAA), Sashi Kunnath (UCD), Gregory Fenves (UCB), Jacobo Bielak (CMU), Zhaojun Bai (UCD), George Karypis (UMN), Drs. Francis McKenna (UCB), Ingrid Hotz (UCD), and graduate students Xiaoyan Wu (UCD) Ritu Jain (UCD), Zhao Cheng (UCD), Kallol Sett (UCD), Qing Liu (UCD), Jinxiu Liao (UCD). Guanzhou Jie (UCD). Jeremi´ c, Manchester, Sept. 2003 1

The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

The Plastic Domain Decomposition

Boris JeremicDepartment of Civil and Environmental Engineering

University of California, Davis

MRCCS/NSF Summer School

High Performance Computing in Finite Element Analysis

1st - 5th September 2003,

University of Manchester

Supported in part by the NSF, PEER, Caltrans, and Cal–EPA.

Collaborators: Professors Zhaohui Yang (UAA), Sashi Kunnath (UCD), Gregory Fenves (UCB),

Jacobo Bielak (CMU), Zhaojun Bai (UCD), George Karypis (UMN), Drs. Francis McKenna

(UCB), Ingrid Hotz (UCD), and graduate students Xiaoyan Wu (UCD) Ritu Jain (UCD), Zhao

Cheng (UCD), Kallol Sett (UCD), Qing Liu (UCD), Jinxiu Liao (UCD). Guanzhou Jie (UCD).

Jeremic, Manchester, Sept. 2003 1

Page 2: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Motivation

• Create high fidelity models of constructed facilities (bridges, build-

ings, port structures, dams...).

• Models will live concurrently with the physical system they repre-

sent.

• Models to provide owners and operators with the capabilities to

assess operations and future performance.

• Use observed performance to update and validate models through

simulations.

Jeremic, Manchester, Sept. 2003 2

Page 3: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Large Scale Numerical SimulationGoal

• Scalable parallel finite element method for inelastic computations

(solid and structural elements)

• Based on state of the art computational mechanics theories and

implementation

• Available for a range of sequential and parallel machines, includ-

ing clusters, grids of machines and clusters (DMPs), and also

multiprocessors (SMPs)

• Public domain, portable platform.

Jeremic, Manchester, Sept. 2003 3

Page 4: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Incomplete Historical Background

• Substructuring to achieve partitions (Noor et al. [5], Utku et al.

[7] and Storaasil and Bergan [6])

• Techniques to account for different types of elements but used

substructures of same element types (non–balanced computations)

(Fulton and Su [2] ).

• dynamic analysis of framed structures with the objective of mini-

mizing communications (Hajjar and Abel [3])

• parallel computational techniques for elastic–plastic problems but

tied the algorithm to the specific multiprocessor computers used

(and specific network connectivity architecture) (Klaas et al. [4])

• Greedy domain partitioning algorithm good for topological DD but

does not redistributed the domains as a function of developed

nonlinearities (Farhat [1])

Jeremic, Manchester, Sept. 2003 4

Page 5: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Brief Methodology Background

• Static domain decomposition will lead to optimal parallel run for

many problems

– equal number of mesh elements to each processor, this will

balance computational load (CL) on parallel machine,

– minimize the size of subdomain boundaries, this will minimize

the inter-processor communications overhead,

• This can be done many ways, recently a set of good graphs

partitioner were developed (METIS family by Karypis et al.)

• This is indeed good for problems where the computational domain

or discretization does not change.

• For inelastic computations using finite element method, the change

will occur in internal state determination.

Jeremic, Manchester, Sept. 2003 5

Page 6: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Inelastic Parallel Problem

• Presence of elastic and inelastic (plasticity, damage) computational

domains.

• Difference in computational load for elastic and inelastic state

determination leads to computational load imbalance

• This leads to imbalanced computations, very inefficient, not much

gain from using parallelization

• Internal state determination can take as much as 80% of CL

Stage

Iteration

Increment

Jeremic, Manchester, Sept. 2003 6

Page 7: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Inelastic Computational LoadBalancing Challenge:

Adaptive Computations

• Dynamic computations, structure of elastic and elastic–plastic

domains changes dynamically and unpredictably

• Periodic computational load-balancing is required during the course

of the computation

• Computational load balancing cost similar to static DD (balance

the mesh elements and minimize the inter-processor communica-

tions

• It also requires that the cost associated with redistributing the

data in order to balance the computational load is minimized

Jeremic, Manchester, Sept. 2003 7

Page 8: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Inelastic Computational LoadBalancing Challenge:

Multi-phase Computation

• Elastic–plastic computations follow up the elastic computations,

there is a synchronization step between

• Each phase needs to be separately computational load balanced

– Elastic phase computations (topological DD might be OK)

– Elastic–plastic phase computations (PDD)

• System of equations needs to be CL balanced as well

Jeremic, Manchester, Sept. 2003 8

Page 9: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Elastic Computations

(m)KIacJ =∫

V mHI,b Eabcd HJ,d dV m

Eijkl = λδijδkl + µ (δikδjl + δilδjk)

Lame (elastic) coefficients:

λ =νE

(1 + ν) (1− 2ν); µ =

E

2 (1 + ν)

Applies to linear and nonlinear elasticity

Jeremic, Manchester, Sept. 2003 9

Page 10: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Elastic–Plastic Computations

• Integration of constitutive equations at each integration point:

– Explicit integration (Forward Euler (FE) algorithm)

– Implicit integration (Backward Euler (BE) algorithm)

– Mid–point integration (Crank–Nickolson (CN) algorithm)

• Single step (FE) and iterative algorithms (BE, CN)

• Elastic computations, computational load per element (integration

point) known a priori

• Elastic–Plastic computations, computational load much larger

• Computations Load is not known a priori (the extent of plastic

zone is not known prior to analysis)

Jeremic, Manchester, Sept. 2003 10

Page 11: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Implicit Constitutive Integration

Iterative procedure, unknown number of steps

∆σmn = −(oldrij +n+1F old − n+1nmn

oldrijn+1T−1

ijmn

n+1nmnEijkln+1Hkl

n+1T−1ijmn − n+1ξ∗ h∗

Eijkln+1Hkl)n+1T−1

ijmn

∆q∗ =

(n+1F old − n+1nmn

oldrijn+1T−1

ijmn

n+1nmnEijkln+1Hkl

n+1T−1ijmn − n+1ξ∗ h∗

)h∗

n+1Tijmn = δimδnj + λ Eijkl

∂mkl

∂σmn

∣∣∣∣n+1

;n+1

Hkl =n+1

mkl + λ∂mkl

∂q∗

∣∣∣∣n+1

h∗

nij =∂F

∂σij

; mij =∂Q

∂σij

; ξ∗ =∂F

∂q∗

Jeremic, Manchester, Sept. 2003 11

Page 12: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Computational Load Varies

• Load per integration point for each iteration/increment (4 CPU

cycles (CPUc) for addition, 6 CPUc for multiplication, 13 CPUc

for division...):

– Elastic computation: approx. 3600 CPUc

– Forward Euler single step: approx. 7000 CPUc

– Backward Euler single step: approx. 12000 + n× 15000 CPUc

• Example #1: say 5 iteration steps → 87000 CPUc (24 times more

CPU load)

• Example #2: say 25 iteration steps → 387000 CPUc (108 times

more CPU load)

• This is all load per one integration point!

• Standard 20 node brick element should have at least 27 integration

points (3× 3× 3)

Jeremic, Manchester, Sept. 2003 12

Page 13: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Computational Load Imbalance

• Existing parallel computational algorithms: topologi-

cal(preprocessing) DD into substructures with equal number of

elements

• Topological DD assumes equal computational load per element

• Topological DD performed during a pre–processing phase

• No provisions for variation of computational load per element or

for variation in connectivity speed (latency and bandwidth)

• Separate problem for internal state determination (elastic–plastic

computations on the element level) and for the system of equations

solution

Jeremic, Manchester, Sept. 2003 13

Page 14: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Possible Solutions

• Perform pre–processing stage DD, topological DD ( equal compu-

tational load per element, pre–processing phase, no provisions for:

variation of computational load per element, connectivity speed,

latency and bandwidth)

• Distribute small subdomains to processor as they become idle, feed

hungry CPUs (dynamic CL by design, expensive as it sends a lot

of ”small” data packets, internal state determination and system

solution phases are totally separate, a lot of data moving around)

• Element by element family of methods in conjunction with iterative

equation solvers

• Dynamic CL balancing → Plastic Domain Decomposition (cur-

rently being implemented into OpenSees)

Jeremic, Manchester, Sept. 2003 14

Page 15: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Feed Hungry CPUs

• Distribute computations of element matrices to parallel nodes in

small groups

• Solve the system in parallel using MP SOLVE

• Both phases separate

stiffness matrix formations system solution

Jeremic, Manchester, Sept. 2003 15

Page 16: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Development of the Plastic DomainDecomposition

• Based on work of Karypis et al.

• Multilevel Graph Partitioning

– graph coarsening

– initial partitioning

– multilevel refinement

• Par–METIS system

• Weighted graphs edges and

graph nodes

• Zoltan framework used as an

interface layer for extensibility

GG

3

O

G4

G

2

1

G

3G

G

O

1G

2G

Co

arse

nin

g P

has

e

Un

coarsen

ing

Ph

ase

Initial Partitioning Phase

Multilevel K-way Partitioning

Jeremic, Manchester, Sept. 2003 16

Page 17: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Adaptive Graph Partitioning

• Dynamic computational load-balancing algorithms, based on the

multilevel graph partitioning paradigm

• Attempt to minimize the data redistribution costs using one of the

two methods:

– compute a new partitioning from scratch and then intelligently

map this back to the original partitioning

– Perturb the original partitioning just enough so as to balance the

computational load

Jeremic, Manchester, Sept. 2003 17

Page 18: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Adaptive Graph Partitioning I

Compute a new partitioning from scratch and then intelligently

map this back to the original partitioning:

• result in partitioning that do very well at minimizing the inter-

processor communications costs,

• because when a new partitioning is computed from scratch, we can

use a state-of-the-art partitioning method to do so,

• results in higher data redistribution costs compared to other meth-

ods,

• especially when the partitioning is only slightly imbalanced

• because it is often the case that the newly computed partitioning

is substantially different from the original partitioning

• so even a good remapping of the new partitioning can still incur

large data redistribution costs

Jeremic, Manchester, Sept. 2003 18

Page 19: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Adaptive Graph Partitioning II

Perturb the original partitioning just enough so as to balance the

computational load:

• extremely low data redistribution costs,

• result in higher inter-processor communications costs for highly

imbalanced partitioning,

• because during the process of balancing the partitioning, locally

optimum selections are made and the more that partitioning needs

to be perturbed in order to balance it, the greater the likelihood

that the effect of the locally optimum decisions that are made to

balance the partitioning will be globally sub-optimal

Jeremic, Manchester, Sept. 2003 19

Page 20: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Multi-phase Graph Partitioning

• Traditional graph partitioners CL balance only single phase

• Other phases might be seriously imbalanced

• Generalized formulation of the graph partitioning problem that is

able to balance multiple phases simultaneously, while also mini-

mizing the inter-processor communications costs

• Think of adaptive partitioning as a multi-objective optimization

problem, that is minimize both the inter-processor communica-

tions, the data redistribution costs and create good partitions (CL

balanced with minimal boundary)

• Unified adaptive partitioning–repartitioning algorithms

• Compute decompositions that take into account the relative costs

of performing inter-processor communications and data redistribu-

tions

Jeremic, Manchester, Sept. 2003 20

Page 21: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Plastic Domain Decomposition

• Take into the account:

– heterogeneous element loads that change in each iteration

– heterogeneous processor performance (multiple generations

nodes)

– inter-processor communications

– data redistribution costs

• Perform global optimization for both internal state determination

and system solution phases

• Available for all elements (solid, structural) that provide the inter-

face (sendSelf, RecvSelf, timer or CL weight estimate)

• Able to handle SMPs, local clusters, grids of computers

Jeremic, Manchester, Sept. 2003 21

Page 22: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

PDD Implementation

• Internal state determination:

– Use lightweight measurement of each element performance (or

provided information on CL)

– Use lightweight measurement of network latency and bandwidth

(including local are and wide area networks)

– Use information from previous incremental step to make PDD

decisions

– Question: do we perform PDD for each iteration (or after few

iterations) or for each increment (or after few increments)

• System of equations solution (SES):

– Use previous decomposition as much as possible

– might be somewhat imbalanced but it is OK if SES takes only

20% of time

Jeremic, Manchester, Sept. 2003 22

Page 23: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

PDD Goals: SFSI

Jeremic, Manchester, Sept. 2003 23

Page 24: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

PDD Working Example

Pile, column and a mass on top

0 5 10 15 20 25 30 35 40

0.0000−38.0

0.0077−30.0

0.0783−28.0

0.0791−20.0

0.0844−16.0

0.0878−12.0

0.1248 −8.0

0.1800 −4.0

0.1946 0.0

Time (s)

Dis

plac

emen

t (m

)

Z(m) Max

0 5 10 15 20 25 30 35 40

0.0000−38.0

0.0224−30.0

0.0879−28.0

0.1102−20.0

0.1161−16.0

0.1227−12.0

0.3961 −8.0

0.6258 −4.0

0.6928 0.0

Time (s)

Dis

plac

emen

t (m

)

Z(m) Max

Stiff soil Soft soil

Jeremic, Manchester, Sept. 2003 24

Page 25: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

PDD: Elastic Decomposition

• 16 subdomains,

• approx. equal number of elastic elements per domain,

• minimized inter–domain boundary

• OpenSees ⇒ Zoltan ⇒ Par–METIS

Jeremic, Manchester, Sept. 2003 25

Page 26: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

PDD: Mild Plasticity

• Small amount of plastic elements close to the pile

• Note smaller domain close to the pile

• Note also somewhat increased inter–domain boundary

Jeremic, Manchester, Sept. 2003 26

Page 27: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

PDD: Severe Plasticity

• Note small number of elements per CL heavy domains

• Note also two large, mostly elastic, subdomains

• Good DD for internal state determination, but bad for SES

• Use graph–partitioner to optimize SES phase using info on imbal-

ance and network speed

Jeremic, Manchester, Sept. 2003 27

Page 28: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Current Grid Resources

mali.engr.ucdavis.edudowitcher.ce.berkeley.edu

Quake project (CMU)

User defined

Simulation Framework

Experimental Data Provider

Application Testing Providers

Virtual Organization

CIPIC (UCD)

Application Service Providers

Visualization Providers

PEER experimental projects

GeoWulf−BigDisk.engr.ucdavis.eduMil−4.ce.berkeley.educml00.engr.ucdavis.edu

GeoWulf cluster (DMP parallel)GeoWulf−dev.engr.ucdavis.edu

UCB SeismoLab

Storage Service Provider

Strong Motions Input

sokocalo.engr.ucdavis.edu

bechtel.colorado.edu

bechtel.colorado.edu

GeoWulf cluster (UCD)Koyaanisqatsi SMP (UCD)Millenium Cluster (UCB)alpha−8 SMP (UCB)Seaborg (NERSC−LBL)

UMFPACKLAPACKSuperLU

Blitz++

CBLAS

OpenSees−coreZOLTAN

Par−METIS

Geomechanics SML (UCD)

NEES projectsCaltrans test DatabasePublic data on the WWW

Jeremic, Manchester, Sept. 2003 28

Page 29: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Stress Field Visualization

Jeremic, Manchester, Sept. 2003 29

Page 30: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Pile Group Visualization

Jeremic, Manchester, Sept. 2003 30

Page 31: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Constitutive IntegrationVisual Debugging

pstart point

TE

TC

θF

E

B model surface

E: Elastic predictorF: Forward Euler, elastoplastic corrector

B: Backward EulerI: Iterative steps

IB

ρ

q

p

φ

erratic iterations

yield surface

correct iterations

predictor point

Jeremic, Manchester, Sept. 2003 31

Page 32: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

Conclusions

• Plastic Domain Decomposition for tightly and loosely connected

processors

• Can handle both computational load imbalance, heterogeneous

processors and heterogeneous connectivity speeds

• Use of Par–METIS, Zoltan (and all that interfaces with Zoltan),

Super–LU, MP SOLVE

• Public domain within OpenSees framework

Jeremic, Manchester, Sept. 2003 32

Page 33: The Plastic Domain Decompositionsokocalo.engr.ucdavis.edu/~jeremic/...The Plastic Domain Decomposition Boris Jeremi´c Department of Civil and Environmental Engineering University

References

[1] Farhat, C. Muliprocessors in Computational Mechanics. PhD thesis, University of California, Berkeley, 1987.

[2] Fulton, R. E., and Su, P. S. Parallel substructure approach for massively parallel computers. Computers inEngineering 2 (1992), 75–82.

[3] Hajjar, J. F., and Abel, J. F. Parallel processing for transient nonlinear strcutraul dynamics of three–dimensional framed structures using domain decomposition. Computers & Structures 30, 6 (1988), 1237–1254.

[4] Klaas, O., Kreienmeyer, M., and Stein, E. Elastoplastic finite element anslysis on a MIMD parallel–computer. Engineering Computations 11 (1994), 347–355.

[5] Noor, A. K., Kamel, A., and Fulton, R. E. Substructuring techniques – status and projections. Computers& Structures 8 (1978), 628–632.

[6] Storaasli, O. O., and Bergan, P. Nonlinear substructuring method for concurrent processing computers.AIAA Journal 25, 6 (1987), 871–876.

[7] Utku, S., Melosh, R., Islam, M., and Salama, M. On nonlinear finite element analysis in single– multi–and parallel processors. Computers & Structures 15, 1 (1982), 39–47.

Jeremic, Manchester, Sept. 2003 33