Hierarchical Methods and Sparse Grids, Summer Term 2010 3 … · 2010-07-19 · Hierarchical Methods and Sparse Grids, Summer Term 2010 3 Algorithms of Scientic Computing Hierarchical

Technische Universitat Munchen

Algorithms of Scientific ComputingHierarchical Methods and Sparse Grids

Tobias Neckel, Dirk Pfluger


Summer Term 2010

Tobias Neckel, Dirk Pfluger: Algorithms of Scientific Computing

Hierarchical Methods and Sparse Grids, Summer Term 2010 1


Overview




Topics

Non-hierarchical and hierarchical quadrature and interpolationCurse of dimensionalityHierarchical basis and subspace decompositionsHigh-dimensional function representations: sparse gridsHierarchical finite elementsMulti-recursive and hierarchical algorithms on sparse gridsApplicationsMulti-grid methodsMore hierarchical bases: wavelets, . . .




Part I

Archimedes’ Quadrature, One-Dimensional




Numerical Quadrature

Why Quadrature?Integration integral part in many applications

Determine volumes (e.g. of beer/wine barrels)Option pricing (expectation values)Defuzzification for fuzzy controllerOptimizationRadiosity (accumulating light)...

Often no analytical solution available⇒ Approximate solution: numerical quadrature

Core-problem: representation of functions in several variablesIn higher-dimensional settings only stochastic or hierarchicalmethods availableHere: focus on hierarchical methods




Quadrature One-Dimensional

Approximations for the definite integral

F1(f ,a,b) :=

∫ b

af (x) dx

for f : [a,b]→ RFirst example for a hierarchical methodWe first consider classical methodsThen hierarchical approachAssumption in the following: f is sufficiently often continuouslydifferentiable




Trapezoidal Rule, Simpson Rule

Classical methods for numerical quadrature: Newton-Cotesformulas

f (xi ) at equally spaced points xi = ih + aIntegrate ∫ b

af (x) ≈

∑wi f (xi )




Trapezoidal Rule, Simpson Rule

Trapezoidal ruleInterpolate in interval boundaries with linear function

F1 ≈ T := (b − a)f (a) + f (b)

2

Simpson ruleInterpolate in interval boundaries and midpoint with quadraticfunction

F1 ≈ S := (b − a)f (a) + 4f

( a+b2

)+ f (b)

6




Quadrature Error

It holds for the error term of the two methods

|T − F1| ≤ M2

12(b − a)3

|S − F1| ≤ M4

2880(b − a)5

M2 and M4 are bounds for the second, resp. fourth, derivative:

M2 := supx∈[a,b]

|f ′′(x)|,

M4 := supx∈[a,b]

|f (4)(x)|.




Composite Quadrature Rules

Error bounds imply:Split interval [a,b] into smaller subintervalsApply simple quadrature rule in each of them

Simplest case: take uniform grid with n intervals and mesh-widthh = (b − a)/nComposite trapezoidal rule

CT := h ·[

f (a)

2+

n−1∑i=1

f (a + ih) +f (b)

2

]Composite Simpson’s rule

CS :=h6

[f (a) + 4f

(a +

h2

)+ 2f (a + h) + 4f

(a +

3h2

)+ . . .+ 4f

(b − h

2

)+ f (b)

]Tobias Neckel, Dirk Pfluger: Algorithms of Scientific Computing



Composite Quadrature Rules – Error

To measure the error: sum up n = (b − a)/h termsTerms are in O(h3) and O(h5) resp.

|CT − F1| ≤ M2

12(b − a) · h2,

|CS − F1| ≤ M4

2880(b − a) · h4.

Accuracy increases with nDoubling the computational effort (h h/2) reduces error boundto 1/4 (CT) and 1/16 (CS), if f is sufficiently smooth




Composite Quadrature Rules – Summary

Typical non-hierarchical methodsSummands have (more or less) same weightTo store: use arrayTo implement: use for-loopTo increase accuracy: discard old result, start all over once again




Archimedes’ Hierarchical Approach

We now decompose the area F1 in a hierarchical mannerStart with trapezoid as for trapezoidal rule:

T1(f ,a,b) =b − a

2(f (a) + f (b)).

Let remaining error term (area betweentrapezoid and curve) be S1:

F1(f ,a,b) = T1(f ,a,b) + S1(f ,a,b).

Hierarchical approach if currentapproximation too inaccurate:

Take trapezoid (intermediate solution)Add approximation for S1

a b

f(x) S1S1

T1




Decomposition of Remainder Term S1

Decompose remainder term S1 into triangleD1 with (projected) base (b − a) and height

f(

a + b2

)− f (a) + f (b)

2:

a b

f(x) S1

(a+b)/2

D1

D1

D1(f ,a,b) =b − a

2

(f(

a + b2

)− f (a) + f (b)

2

)

We obtain two remainder terms of similar type

S1(f ,a,b) = D1(f ,a,b) + S1(f ,a,a + b

2) + S1(f ,

a + b2

,b)

Both are typically much smaller!




Recursive Computation of F1

Interprete formulas for F1 (area below curve), T1 (trapezoid), S1(remainder) as function definitions

⇒ Obtain recursive method to compute F1

Stopping criterionNote: recursion does not terminate so farAs we’re only interested in approximation: implement terminationcriterion in function S1, for example:

Count recursion depth (t = 0 for whole interval [a,b], t = 1for the first two subintervals, . . . )Stop recursion for certain t = lThen we exactly compute the composite trapezoidalquadrature for n = 2l

Alternatively, we could have used b − a ≤ h for someh = 2−l as stopping criterion




Adaptive Stopping Criterion

Intuitive assumption (look at drawings): triangle D1 comprisesmost of S1

Later, we’ll see that it is 3/4 of the area for sufficiently smoothfunctions and asymptotically for small hWe can hope (but not be sure!):

Error for for the computation of S1 is about D1/3 whenstopping the recursion

Hierarchical approach provides a stopping criterion for free⇒ We can control the error of the quadrature!

Even better:Take height of triangle (hierarchical surplus) instead of areaStop if smaller than some ε

⇒ We can even hope to bound global error (w.r.t. F1) byε(b − a)




Some Remarks

For polynomials f of degree 2, it holds exactly

D1 =34

S1.

When stopping the recursion, we can take 4/3 ·D1 rather than D1

⇒ We obtain the integrand exactlyIn total, we just compute the composite Simpson’s rule

Currently, we have to evaluate f three times to compute thehierarchical surplusWhen calling function S1, we have already computed f at theinterval boundaries

⇒ Extend S(f ,a,b) to S(f ,a,b, f (a), f (b)) at no extra cost




Part II

Cost and Accuracy




So far. . .

Hierarchical and non-hierarchical one-dimensional quadratureAim: dealing with high-dimensional functionsQuadrature as an example: well-studied, relatively simple

On the way to high dimensionalities we have to consider whethereffort (measured in function evaluations, computations, . . . ) iswell-invested?

⇒ Consider ratio of cost vs. accuracy




ε-Complexity

Numerical methods: usually approximate solution with error εError can be due to discretization, rounding, truncation, . . .

To measure cost W : count operationsRelate cost W to error ε

How many operations W (ε) required to obtain error ofat most ε?

To this end: assumptions about solution again (differentiability,bounds for derivatives, . . . )

Often don’t hold in real-world settingsBut good indication to compare different methods

Composite trapezoidal rule with n subintervals:n+1 evaluationsError O(n−2) (sufficiently smooth)ε-complexity W (ε) = O(

√1/ε) (unit: number of evaluations)

Composite Simpson’s rule correspondingly W (ε) = O( 4√

1/ε)




CT and CS: Example

Cost-error diagram for F1 :=∫ π

0 sin(x) dx :|CT − F1| and |CS − F1|No function evaluations on the boundary

1e-14

1e-12

1e-10

1e-08

1e-06

0.0001

0.01

1

1 10 100 1000 10000

Err

or

# function evaluations

Composite TrapezoidalComposite Simpson's

ε-complexities O(√

1/ε) and O( 4√

1/ε) Different gradients of the curves (asymptotically for large n;

double-logarithmic scale)Tobias Neckel, Dirk Pfluger: Algorithms of Scientific Computing



Multi-Dimensional Quadrature

Now on to multi-dimensional functions:

Area of integration Ω :=d∏

k=1

[ak ,bk ], function f : Ω→ R

Compute approximation for

Fd (f ,Ω) :=

∫Ω

f (x1, . . . , xd ) d~x .

a1 b1a2

b2

f(x1,x2)




Decomposition into One-Dimensional Integrals

Decompose d-dimensional integral into sequence ofone-dimensional ones (cf. Fubini’s Theorem)

Fd (f ,Ω) =

∫ bd

ad

· · ·∫ b2

a2

(∫ b1

a1

f (x1, . . . , xd ) dx1

)dx2 . . . dxd .

a1 b1a2

b2

a2

b2

x2

∫a1b1 f x1 , x2dx1f x1 , x2




Decomposition: Implementation

Consider this decomposition using the function F1(one-dimensional integration), and functions Gk :

G0(x1, x2, x3, . . . , xd ) := f (x1, x2, x3, . . . , xd )

G1(x2, x3, . . . , xd ) := F1(G0(•, x2, x3, . . . , xd ),a1,b1)

G2(x3, . . . , xd ) := F1(G1(•, x3, . . . , xd ),a2,b2)

......

Gd () := F1(Gd−1(•),ad ,bd )

Gk integrates over x1, . . . , xk ; remaining variables free

Numerical quadratureJust replace F1 by a quadrature formula, e.g. CT, CS




Cost and Accuracy

CostUniform grid with n subintervals for 1d quadratured dimensions: Cartesian product of 1d gridsIndices

(i1, . . . , id ) ∈ 0,1,2, . . . ,nd

with corresponding grid points

(x1, . . . , xd ) with xk = ak + ikbk − ak

n

Total cost:(n + 1)d (with grid points on domain’s boundary ∂Ω)(n − 1)d (if f is zero on ∂Ω)




Cost and Accuracy (2)

AccuracyStill O(n−2) for CT, O(n−4) for CSRemark: starting with G2, the current function values areerroneous by O(n−2) and O(n−4) resp.; this does not alter theoverall accuracy

⇒ Thus everything is fine. . . ?




Multidimensional Quadrature: Example

Integration of

f (x1, . . . , xd ) :=d∏

k=1

4xk (1− xk )

on Ω = [0,1]d with the composite Trapezoidal ruleError:

1e-05

0.0001

0.001

0.01

0.1

1

1 2 3 4 5 6 7

Err

or

Level l (mesh-width h = 2^(-l))

d=1d=2d=3d=4d=5d=6d=7d=8d=9

d=10




Multidimensional Quadrature: Example (2)

Having ε-complexity in mind:Use cost (number of function evaluations) as abscissa

1e-05

0.0001

0.001

0.01

0.1

1

1 100000 1e+10 1e+15 1e+20 1e+25

Err

or# function evaluations

d=1d=2d=3d=4d=5d=6d=7d=8d=9

d=10

Does not look that good any more. . .




Multidimensional Quadrature: Example (3)

1021

Large number. . .1 ZByte (Zeta) = 1.000.000.000 TByte =1.000.000.000.000.000.000.000 Byte to store grid (one Byte pergrid point)Compare national super computer HLRBII (Altix) @ LRZ:

Peak performance: 62.3 TFlop/sMemory: 39 TByte

It would take 6 months to compute quadrature, assuming thatone integration operation can be performed in one clock cycle. . .




Curse of Dimensionality

ε-complexity

CT: O(ε−d2 ), CS: O(ε−

d4 )

Curse of dimensionalityExponential dependency on dimensionality dHigher-dimensional problems infeasible to tackle (d = 10 is stillmoderate. . . )Property of the problem – or just of the algorithm?It’s the algorithm⇒ hierarchical methods can mitigate the curseof dimensionality to some extent




Monte-Carlo Integration

To motivate the search for better methods for numericalquadrature:

Consider Monte-Carlo method: simple approach, simple toimplement

ApproachBe X a random variable, uniformly distributed on ΩThen it holds for the expectation value

E(f (X )) =

∫Ω

f (x)

Vol(Ω)dx =

1Vol(Ω)

Fd (f ,Ω)

On the other hand: if xk are realizations of X we obtain

limM→∞

1M

M∑k=1

f (xk ) = E(f (X ))

with probability 1: strong law of large numbersTobias Neckel, Dirk Pfluger: Algorithms of Scientific Computing



Monte-Carlo Integration (2)

Simple to implementCost completely independent of d (counting function evaluations)Accuracy?

Estimate stochastically: compute standard deviation (useadditivity of variances)√√√√Var

(1M

M∑k=1

f (xk )

)=

√√√√ 1M2

M∑k=1

Var(f ) =

√Var(f )

M

Independent of d , tooDependencies of d only in Var(f ) and Vol(Ω) possible; doesnot affect exponent of M

Thus (stochastically) ε-complexity of O(ε−2)Very slow convergenceIndependence of d : very helpful tackling high-dimensionalproblems!




What next?

We know, that the curse of dimensionality can be overcomeSearch for alternative (better?) methods

. . . which can be used for other applications apart fromintegration as well, for example




Part III

Hierarchical Decomposition, 1d




Archimedes’ Quadrature

Compute an approximation of F1 :=∫ 1

0 4 · x · (1− x) dx = 23

0 1½

1

0 1½0 1½

¼

0 1½

¼

t=1 t=2




Archimedes’ Quadrature (2)

Integrating 4x(1− x), we have to consider several quantitiesOrdered by (recursive) level t :

Level-depth 1 2 3 4 . . . t

Mesh-width h 1/2 1/4 1/8 1/16 . . . 2−t

# triangles 1 2 4 8 . . . 12 2t

surplus v 1 1/4 1/16 1/64 . . . 4 · 2−2t

Area of triangle D1 1/2 1/16 1/128 1/1024 . . . 4 · 2−3t

Sum (current t) 1/2 1/8 1/32 1/128 . . . 2 · 2−2t

Sum (≤ t) 1/2 5/8 21/32 85/128 . . . 23

(1− 2−2t

)Error 1/6 1/24 1/96 1/384 . . . 2

3 2−2t




Approximation of Functions

To analyze Archimedes’ quadrature rule, we consider functionsWe need a representation of the (approximating) function u(x)which we are integrating:

u as linear combination of ansatz functions φi :

u(x) =n∑

i=1

αi · φi (x)

Integrating u(x):∫ b

au(x) dx =

n∑i

αi

∫ b

aφi (x) dx ,

Weighted sum of αiRemember: Newton-Cotes formulas are weighted sum offunction evaluations




Composite Trapezoidal Rule: Function

InterpolantContinuous, piecewise linear functionRepresent u in nodal point (hat) basis

.

.

Koefficients αi are function values at grid pointsAnsatz functions have area h (h/2 at boundaries)




Piecewise Linear Functions

Ansatz spaceOnly consider u : [0,1]→ RConsider discretization level n ∈ NObtain

Mesh-width hn = 2−n

Grid points xn,i = i · hnDefine “mother of all hat functions”

φ(x) := max1− |x |,0

⇒ Ansatz functions

φn,i (x) := φ

(x − xn,i

hn

)Nodal point basis Φn := φn,i ,0 ≤ i ≤ 2n




Piecewise Linear Functions (2)

Space of continous piecewise linear functions

Vn = span (Φn)

Interpolants un ∈ Vn

un(x) =2n∑

i=0

αn,iφn,i (x)




Composite Simpson’s Rule: Function

InterpolantContinuous, piecewise quadratic functionMore complicated basis:

.

Ansatz functions: Lagrangian polynomials, glued togetherαi : function values at grid pointsAnsatz functions have area h/6 (blue), 4h/6 (red), 2h/6 (green)We’ll not formally define basis functions here. . .




From Composite Trapezoidal to Archimedes

Piecewise linear functionsWe restrict our functions u to u(0) = u(1) = 0Nodal point basis for discretization level n:

Φn := φn,i ,1 ≤ i ≤ 2n − 1Function space

V :=∞⋃l=1

Vl

contains all functions which are for sufficiently large l in Vl

Generating system of V as

Φ :=∞⋃l=1

Φl

Note: not minimal, thus not a basis (not linear independent)Tobias Neckel, Dirk Pfluger: Algorithms of Scientific Computing



Hierarchical Basis

We are interested in a hierarchical decomposition of Vl

Define hierarchical increment Wl , s.t. Vl is a direct sum ofWl :

Vl = Vl−1 ⊕Wl

Side-note: direct sumEvery ul ∈ Vl can be uniquely decomposed asul = ul−1 + wl , with ul−1 ∈ Vl−1 and wl ∈Wl

Wl has to contain 2l−1 ansatz functions:dim Vl = 2l − 1 = dim Vl−1 + dim Wl

This holds (introducing index sets Il ) for

Il := i : 1 ≤ i < 2l , i oddWl := span φl,i : i ∈ Il




Hierarchical Increments

Set of hierarchical increments Wl

For l = 1: W1 = V1

Example for l = 1,2,3:

.x1,1

.

.

x2,1 x2,3

x3,1 x3,3 x3,5 x3,7

Φ1,1

Φ2,1 Φ2,3

Φ3,1 Φ3,3 Φ3,5 Φ3,7




Hierarchical Basis (cont.)

Then

Vn =n⊕

l=1

Wl

is a direct sum, too:u ∈ Vn can be decomposed uniquely into wl ∈Wl :

u =n∑

l=1

wl =n∑

l=1

∑i∈Il

vl,iφl,i

Coefficients vl,i are hierarchical surplussesCorresponding basis of Vn (or, with∞ instead of n, of V )

Ψn :=n⋃

l=1

φl,i : i ∈ Il.




Comparison

.x1,1

.

.

x2,1 x2,3

x3,1 x3,3 x3,5 x3,7

Φ1,1

Φ2,1 Φ2,3

Φ3,1 Φ3,3 Φ3,5 Φ3,7

l =1

l =3

l =2

W1

W3

V3W2

.x1,1

.

.

x2,1 x2,3

x3,1 x3,3 x3,5 x3,7

x2,2

x3,2 x3,6x3,4

V1

V2

V3




Comparison (2)

h3=2-3xi

f(x)u(x)=iii(x)u(x)

0 10

1

2

3

0 10

1

2

3

h3=2-3xi

ii(x)

0 10

1

2

3

0 10

1

2

3u(x)=iii(x)




Analysis of Hierarchical Decomposition

Contribution of summands in hierarchical decomposition

u =n∑

l=1

wl =n∑

l=1

∑i∈Il

vl,iφl,i .

Interesting in univariate settingWill be crucial in mulitvariate setting

Cost/benefit analysis will help to significantly reduce effortNeed several norms to measure wl (cf. worksheet 5)




Norms of Functions

Again, we assume sufficiently smooth functions u : [0,1]→ R

NormsMaximum-norm

‖u‖∞ := maxx∈[0,1]

|u(x)|

L2-norm

‖u‖2 :=

√∫ 1

0u(x)2 dx ,

for the L2 scalar product

(u, v)2 :=

∫ 1

0u(x)v(x) dx

Energy-norm‖u‖E := ‖u′‖2




Norms of Basis Functions

For the basis functions φl,i , we obtain

‖φl,i‖∞ = 1

‖φl,i‖2 =

√2hl

3

‖φl,i‖E =

√2hl

. . .xl,ixl,i-1 xl,i+1 xl,ixl,i-1 xl,i+1 xl,ixl,i-1 xl,i+1

1 1 h-2

Φ Φ2 (Φ')2




Estimation of Surplusses

Let ψl,i := − hl2 φl,i

Surplus vl,i of basis function φl,i

u two times differentiable⇒ We can then write vl,i as (compare worksheet 4)

vl,i =

∫ 1

0ψl,i (x)u′′(x) dx .

vl,i depends on u′′, thus we define for future use

µ2(u) := ‖u′′‖2 und µ∞(u) := ‖u′′‖∞.




Estimation of Surplusses (2)

Starting from integral representation of vl,i , we can bound

|vl,i | ≤ hl

2·(∫ 1

0φl,i dx

)· µ∞(u) =

h2l

2· µ∞(u)

and (via Cauchy-Schwartz inequality |(u, v)| ≤ ‖u‖ · ‖v‖)

|vl,i | ≤ hl

2‖φl,i‖2 · µ2(u|Ti ) =

√h3

l6· µ2(u|Ti ),

u|Ti restricts u to the support Ti = [xl,i−1, xl,i+1] of φl,i




Estimation of wl

Estimate contribution of

wl =∑i∈Il

vl,iφl,i .

in hierarchical decomposition of uUse that supports of φl,i are pairwise disjoint

Maximum-norm

‖wl‖∞ ≤ h2l

2· µ∞(u),

L2-norm

‖wl‖22 =

∑i∈Il

|vl,i |2 · ‖φl,i‖22 ≤

h3l

6· 2hl

3·∑i∈Il

µ2(u|Ti )2 =

h4l

9µ2(u)2,

⇒ ‖wl‖2 ∈ O(h2l )




Estimation of wl (2)

Energy-norm

‖wl‖2E =

∑i∈Il

|vl,i |2 · ‖φl,i‖2E =

∑i∈Il

|vl,i |2 2hl

≤ 2hl· h4

l4· 1

2hlµ∞(u)2 =

h2l

4µ∞(u)2

(2l−1 = 1/(2hl ) summands)⇒ ‖wl‖E ∈ O(hl )




Estimation of wl (3)

We can write u (two times differentiable) as infinite series

u =∞∑l=1

wl

Convergent in all three normsWith

u − un := u −n∑

l=1

wl =∞∑

l=n+1

wl

in maximum- and L2-norm O(h2n), in energy-norm O(hn)




Part IV

Archimedes, d-Dimensional




Current State

One-dimensional quadratureOne-dimensional functions f , interval [a,b]

Compute approximation F1(f ,a,b) of area:

F1(f ,a,b) ≈∫ b

af (x) dx

Notation for appoximation of exact integral value in the following:Fd (.)

One-dimensional quadrature rules:Composite trapeziodal ruleComposite Simpson’s ruleArchimedes’ quadrature




Multi-Dimensional Quadrature

Consider multi-dimensional setting

Fd (f ,Ω) ≈∫

Ω

f (x1, . . . , xd ) d~x , Ω :=d∏

k=1

[ak ,bk ]

a1 b1a2

b2

f(x1,x2)




First Attempt

Use full-grid approach as before:

G0(x1, x2, x3, . . . , xd ) := f (x1, x2, x3, . . . , xd )

G1(x2, x3, . . . , xd ) := F1(G0(•, x2, x3, . . . , xd ),a1,b1)

G2(x3, . . . , xd ) := F1(G1(•, x3, . . . , xd ),a2,b2)

......

Gd () := F1(Gd−1(•),ad ,bd )

We now consider the effect of Archimedes’ quadrature asone-dimensional quadrature method for F1




First Attempt: Employing Archimedes

d nested loops (x1, x2, . . . )Summation of weighted function valuesNo real advantages apart from adaptivity (which is not veryuseful this way)

Interplay of hierarchization and summation (integration)Consider setting with d = 2First, compute integrals in x1-direction

Involves hierarchization in x1-directionBut no impact on G1(x2)

G1(x2): no hierarchical values, thus all G1(x2) of same orderAfter summation (integration) in x1-direction:

Hierarchization in x2-directionFinally summation in x2-direction




Improved Version

Consider computing G1(x2)

We are only interested in hierarchical surplusHierarchical surplus typically much smaller than functionvalue

⇒ Could be computed with much less grid points x1-directionWe change the order of “integration in x1-direction” and“hierarchization in x2-direction”

Write hierarchical area elements of quadrature inx2-direction (trapezoid, segments, triangles) as function of x1Integrate those in x1-direction

Now interplay of dimensions for integration much morecomplicated. . . but this will lead to much more efficient method




Example, 2d

Consider

f (x1, x2) :=

(x1 +

12

)(x1 − 3

2

)(x2 +

12

)(x2 − 3

2

)on Ω = [0,1]× [0,2]

00.2

0.40.6

0.81

x1

0

0.5

1

1.5

2

x2

–1

–0.5

0

0.5

1




Trapezoidal Volume and Remainder Segment

Decompose volume intotrapezoidal (for constant x1) cross-section with area

T2(x1) :=b2 − a2

2(f (x1,a2) + f (x1,b2)),

Can be integrated using quadrature rule F1

and remainder segment

S2(f ,Ω) := F2(f ,Ω)− F1(T2,a1,b1)

=

∫ b2

a2

∫ b1

a1

(f (x1, x2)− f (x1,a2)(b2 − x2) + f (x1,b2)(x2 − a2)

b2 − a2

)dx1 dx2




Trapezoidal Volume and Remainder Segment (2)

The first step of the hierarchical decomposition

F2(f ,Ω) = F1(T2,a1,b1) + S2(f ,Ω)

00.2

0.40.6

0.81

x1

0

0.5

1

1.5

2

x2

–1

–0.5

0

0.5

1

00.2

0.40.6

0.81

x1

0

0.5

1

1.5

2

x2

0

0.2

0.4

0.6

0.8

1




Triangular Volumes and Remainder Segments

Decompose remainder segment S2(f ,Ω) intotriangular (for constant x1) cross-section with area

D2(x1) :=b2 − a2

2

(f(

x1,a2 + b2

2

)− f (x1,a2) + f (x1,b2)

2

)and two remainder segments

S2(f , [a1,b1]× [a2,b2]) = F1(D2,a1,b1)

+ S2(f , [a1,b1]×[a2,

a2 + b2

2

])

+ S2(f , [a1,b1]×[

a2 + b2

2,b2

])




Triangular Volumes and Remainder Segments (2)

The second step of the hierarchical decomposition

S2(f ,Ω) = F1(D2,a1,b1) + S2(f , . . .) + S2(f , . . .)

00.2

0.40.6

0.81

x1

0

0.5

1

1.5

2

x2

0

0.2

0.4

0.6

0.8

1

00.2

0.40.6

0.81

x1

0

0.5

1

1.5

2

x2

0

0.05

0.1

0.15

0.2

0.25




Triangular Volumes and Remainder Segments (3)

Recursive decompositionRepeat last step for both remainder segmentsDecompose each into triangular sub-volume and two remaindersegmentsExample for one of the two segments and sum of trapezoidal andfirst three triangular sub-volumes:

00.2

0.40.6

0.81

x1

0

0.5

1

1.5

2

x2

0

0.05

0.1

0.15

0.2

0.25

00.2

0.40.6

0.81

x1

0

0.5

1

1.5

2

x2

–1

–0.5

0

0.5

1




Recursive Structure of Function Calls

Nested recursive structure offunction callsFor higher-dimensional problems:one more level (Fd and Sd ) for eachadditional dimension

F1 S1Segment

F2 S2Segment

Trapezoidalvolume

Triangularvolume

Trapezoid Triangle

f

2 segments

2 segments

Consider number of function evaluations for grid point inside of Ω

Straightforward: 3d evaluations to compute surplusAll but one have already been computed!




Subvolumes

F1: the subvolumes (hierarchized in x2-direction) aredecomposed (in x1-direction) into trapezoid and many trianglesIntegrand itself is area (one slice trapezoidal/triangular subareas)Subvolumes which are added in quadrature are pagodas(neglecting trapezoidals)

Height of pagodas: d-dimensional hierarchical surplusVolume of pagodas: 2−d times size of support times surplus(more in next part)

Taking stopping criterion depending on surplus (d criteria: one inSi each)

Find those grid points for which function evaluation isworthwileIn general much less than naive implementation

Extend from composite trapezoidal rule to Simpsons’ as inone-dimensional setting




Part V

Hierarchical Decomposition, d-Dimensional




Intermezzo/“Big Picture”: Archimedes’ QuadratureStart with 2d example (compare worksheet 6):

f := 16x1(x1 − 1)x2(x2 − 1), Ω = [0,1]2 ⇒ f |∂Ω = 0

Consider hierarchical surplus at grid points with n = 3, h3 = 2−8

0

1

1

1

1256

164

1256

116

1256

164

1256

164

116

164

14

164

116

164

1256

164

1256

116

1256

164

1256

116

14

116 1 1

1614

116

1256

164

1256

116

1256

164

1256

164

116

164

14

164

116

164

1256

164

1256

116

1256

164

1256




“Big Picture”: Archimedes’ Quadrature (2)∫Ω

f d~x = 4/9 = 0.4∑

=441

1024= 0.4306640625

Consider volume of subvolumes (pagodas) for quadrature

0

1

1

1

116384

12048

116384

1256

116384

12048

116384

12048

1256

12048

132

12048

1256

12048

116384

12048

116384

1256

116384

12048

116384

1256

132

1256

14

1256

132

1256

116384

12048

116384

1256

116384

12048

116384

12048

1256

12048

132

12048

1256

12048

116384

12048

116384

1256

116384

12048

116384




“Big Picture”: Archimedes’ Quadrature (3)

What, if we leave out (adaptively) all subvolumeswith volume < ε = 1

256 ?49 grid points (full grid)⇒ 17 grid points (sparse grid)

0

1

1

1

0

1

1

1Approximation of volume:

4411024

= 0.4306640625 ⇒ 2764

= 0.421875




Hierarchical Decomposition – Step by Step

Now back (more formally), starting with d-dimensional hierarchicaldecompositions. . .

Transfer from d = 1 to d > 1Functions in multiple variables ~x = (x1, . . . , xd )

Domain Ω := [0,1]d

We consider only functions u which are 0 on ∂Ω (on the edges ofthe square, sides of the cube, . . . )Each hierarchical grid described by multi-index

~l = (l1, . . . , ld ) ∈ Nd

Grids can have different mesh-widths in different dimensions:

~h~l := (h1, . . . ,hd ) := (2−l1 , . . . ,2−ld ) =: 2−~l




Hierarchical Decomposition, d > 1

Two norms for multi-indices~l (which we’ll need later on)

|~l |1 := |l1|+ . . .+ |ld |

|~l |∞ := max |l1|, . . . , |ld |(we would not need the absolute value bars for lk ∈ N here)Comparisons of multi-indices component-wise:

~l ≤~i ⇐⇒ lk ≤ ik , k = 1, . . . ,d

We obtain grid points

~x~l,~i = (i1 · hl1 , . . . , id · hld )




Practicing Identifiers~l , ~h~l , ~x~l ,~i

l1=1 l1=2 l1=3 l1

l2=1

l2=2

l2=3

l2




Piecewise d-linear Functions

Suitable generalization of piecewise linear functions

Piecewise d-linear functions w.r.t. ~h~l gridIf you fix d − 1 coordinates, they are in remaining xj

Space of all functions for given~l denoted as V~l

Alternative point of viewDefine suitable basis Φ~lRegard V~l as span of Φ~l

d-dimensional basis functions:products of one-dimensional hat functions:

φ~l,~i (~x) =d∏

j=1

φlj ,ij (xj )




d-dimensional Basis Functions

Basis functions are pagodas (not pyramids!)Examples: φ(1,1),(1,1), and φ(2,3),(3,5):

00.2

0.40.6

0.81

x10.20.4

0.60.8

1

x2

0

0.2

0.4

0.6

0.8

1

00.2

0.40.6

0.81

x10.20.4

0.60.8

1

x2

0

0.2

0.4

0.6

0.8

1




Function Spaces V~l and Vn

Basis for space of piecewise linear functions w.r.t. h~l grid

Φ~l := φ~l,~i , ~1 ≤~i < 2~l

Function spaceV~l := spanΦ~l

withdim V~l = (2l1 − 1) · . . . · (2ld − 1) ∈ O(2|~l|1 )

Special case l1 = . . . = ld :Function space denoted as Vn:

Vn := V(n,...,n)




Hierarchical Increments W~l

As before in 1d :Omit grid points with even indexNow in all directions

I~l := ~i : ~1 ≤~i < 2~l , all ij odd⇒ Hierarchical increments

W~l := spanφ~l,~i~i∈I~l

Contain all those functions of V~lwhich vanish at grid points ofcoarser grids

l1=1 l1=2 l1=3 l1

l2=1

l2=2

l2=3

l2




Hierarchical Subspace Decomposition

We obtain for~l ′ ∈ Nd a unique representation of each u ∈ V~l′ as

u =∑~l≤~l′

w~l

with w~l ∈W~l

⇒ Representation

u =∑~l≤~l′

w~l =∑~l≤~l′

∑~i∈I~l

v~l,~iφ~l,~i

in the hierarchical basis with d-dimensional hierarchicalsurplusses v~l,~i




Determining the Hierarchical SurplussesWe now compute the hierarchical surplusses v~l,~i for some

Vn 3 u =∑

φ~l,~i∈Φ(n,...,n)

u(x~l,~i ) · φ~l,~i

First stepHierarchization in x1-direction (fix x2, . . . , xd and employ 1dhierarchization):

u =n∑

l1=1

∑i1∈Il1

vl1,i1 (x2, . . . , xd )φl1,i1 (x1)

with 1d surplus

vl1,i1(x2, . . . , xd) = u(xl1,i1 , x2, . . . , xd)− u(xl1,i1−1, x2, . . . , xd) + u(xl1,i1+1, x2, . . . , xd)

2

=

Z 1

0ψl1,i1(x1) · ∂

2

∂x21

u(x1, x2, . . . , xd) dx1

(For the last step, see 1d decomposition and worksheet 5)Tobias Neckel, Dirk Pfluger: Algorithms of Scientific Computing



Determining the Hierarchical Surplusses (2)

A bit more intuitive:We mark the grid points of the corresponding ansatz functions weuse (before and after)

l1=1 l1=2 l1=3 l1

l2=1

l2=2

l2=3

l2

l1=1 l1=2 l1=3 l1

l2=1

l2=2

l2=3

l2




Determining the Hierarchical Surplusses (3)

Second stepHierarchize every vl1,i1 : Rd−1 → R (separately) in its firstargument:

u =n∑

l1=1

∑i1∈Il1

n∑l2=1

∑i2∈Il2

v(l1,l2),(i1,i2)(x3, . . . , xd )φl1,i1 (x1)φl2,i2 (x2)

l1=1 l1=2 l1=3 l1

l2=1

l2=2

l2=3

l2

l1=1 l1=2 l1=3 l1

l2=1

l2=2

l2=3

l2




Determining the Hierarchical Surplusses (3)Steps 3 to d

All steps correspondinglyAfterwards we have computed surplusses v~l,~i (functions in zeroparameters / scalar values)Representation

u =∑~l

∑~i∈I~l

v~l,~iφl1,i1 (x1)φl2,i2 (x2) · . . . · φld ,id (xd )

=∑~l

∑~i∈I~l

v~l,~iφ~l,~i (~x)

=∑~l

w~l .

What if we’d like to work in another subspace than Vn(e.g. V(1,3))?

Take sufficiently large n, then Vn contains subspaceTobias Neckel, Dirk Pfluger: Algorithms of Scientific Computing



Norms of φ~l ,~i

Estimating the w~l will enable us to select those subspaces thatcontribute most to overall solution (best cost-benefit ratios)Same procedure as for d = 1; only slightly more complicatedfunctions

Start with normsMaximum-norm:

‖φ~l,~i‖∞ := max~x∈[0,1]d

|φ~l,~i (~x)| = 1

(follows from definition)L2-norm:

‖φ~l,~i‖2 :=

√∫[0,1]d

φ~l,~i (~x)2 d~x =d∏

j=1

‖φlj ,ij‖2 =

√√√√(23

)d d∏j=1

hj =

√(23

)d

2−|~l|1




Norms of φ~l ,~i (2)Energy-norm (defined as L2-norm of the squared Euclideannorm of the gradient ∇φ~l,~i ):

‖φ~l,~i‖E :=

√∫[0,1]d

∇φ~l,~i (~x)∇φ~l,~i (~x) d~x = . . . =

=

√√√√2(

23

)d−1 d∑j=1

h1 · . . . · hd

h2j

=

√√√√2(

23

)d−1

2−|~l|1d∑

j=1

22lj

As usual in multi-dimensional settings, we look at d = 2,obtaining

‖φ~l,~i‖E =

√43

(h1

h2+

h2

h1

)Tobias Neckel, Dirk Pfluger: Algorithms of Scientific Computing



Estimation of Surplusses

Hierarchical surplusses now depend on mixed 2nd derivatives

∂2du :=∂2du

∂x21 · . . . · ∂x2

d

If we define

ψ~l,~i :=d∏

j=1

ψlj ,ij =

d∏j=1

−hj

2

φ~l,~i = (−1)d2−|~l|1−dφ~l,~i

we obtain integral representation

v~l,~i =

∫[0,1]d

ψ~l,~i · ∂2du d~x

(Proof: Fubini’s theorem and 1d integral representation)




Estimation of Surplusses (2)

We define (correspondingly to 1d)

µ2(u) := ‖∂2du‖2 and µ∞(u) := ‖∂2du‖∞We can thus bound v~l,~i as

|v~l,~i | ≤ d∏

j=1

hj

2

·(∫[0,1]d

φ~l,~i d~x

)·µ∞(u) =

d∏j=1

h2j

2

·µ∞(u) = 2−2|~l|1−dµ∞(u)

and

|v~l,~i | ≤ d∏

j=1

hj

2

‖φ~l,~i‖2 · µ2(u|T~i) =

√h3

1 · . . . · h3d

6d · µ2(u|T~i)

=

(16

)d/2

2−3|~l|1/2µ2(u|T~i).




Estimation of w~lObtain estimates for w~l in subspace W~l analogously as in 1d :

Make use of the fact that supports of basis functions for agrid are disjoint (apart from the boundaries)

Maximum-norm

‖w~l‖∞ ≤ d∏

j=1

h2j

2

· µ∞(u) = 2−2|~l|1−dµ∞(u),

L2-norm

‖w~l‖2 ≤ d∏

j=1

h2j

3

· µ2(u) = 3−d · 2−2|~l|1µ2(u),

Energy-norm

‖w~l‖E ≤vuut1

4

„1

12

«d−1 dXj=1

h41 · . . . · h4

d

h2j

·µ∞(u) =

vuut14

„1

12

«d−1

2−4|~l|1dX

j=1

22lj ·µ∞(u)




Analysis of Cost-Benefit Ratio

Consider not single basis functions, but whole hierarchicalincrementsSelect those subspaces out of the tableau of subspaces whichminimize the cost, or maximize the benefit respectively, for(sufficiently often differentiable) u : [0,1]d → R

CostMeasure cost in number of grid points (degrees of freedom)

c(~l) = |I~l | = 2|~l|1−d .

BenefitHow to measure benefit?First, let L ⊂ Nd be the set of indices to all selected grids.We obtain

uL :=∑~l∈L

w~l , u − uL =∑~l 6∈L

w~l .




Analysis of Cost-Benefit Ratio (2)

For each component w~l , we have derived bounds of type

‖w~l‖ ≤ s(~l)µ(u)

(with appropriate indices for norm and µ)We obtain

‖u − uL‖ ≤∑~l 6∈L

‖w~l‖ ≤∑~l 6∈L

s(~l)

µ(u)

=

∑~l∈Nd

s(~l)

−∑~l∈L

s(~l)

µ(u)

Justifies to interpret s(~l) as benefit/contribution of subspace W~l .




Quality of Approximation of Full Grid Vn

Examine c(~l) and s(~l) for full gridRegular grid with mesh-width 2−n in each direction (full grid) forfunction space Vn

Bounds in L2- and maximum-norm of order

s(~l) = 2−2|~l|1

The remaining,~l-independent factors are left out and can beappended again after estimation

Subset of hierarchical increments under consideration

Ln := ~l : |~l |∞ ≤ n.




Quality of Approximation of Full Grid Vn (2)

We can estimate∑~l∈Ln

s(~l) =∑~l∈Ln

2−2|~l|1 =

(n∑

k=1

2−2k

)d

=

(14· 1− 1

4n

1− 14

)d

=

(13

)d (1− 2−2n)d

≥(

13

)d (1− d · 2−2n) .

with (1− ε)d ≥ 1− dε for 0 ≤ ε ≤ 1, and d ∈ N)⇒ For n→∞ we obtain ∑

~l∈Nd

s(~l) =

(13

)d

.




Quality of Approximation of Full Grid Vn (3)

Leads to bounds for the approximation error in L2- andmaximum-norm

‖u − uLn‖ ≤ C ·∑~l 6∈Ln

s(~l) ≤ C · d3d 2−2n ∈ O(h2

n)

with constant C (independent of n)Correspondingly in energy-norm

‖u − uLn‖E ∈ O(hn)




Sparse Grids

Final steps to high-dimensional numericsConsider benefit sum of benefits/contributions (L2- andmaximum-norm) ∑

~l∈Ln

2−2|~l|1

⇒ Same benefit of hierarchical increments W~l for constant |~l |1For cost c(~l) = 2|l|1−d (number of grid points of W~l ) this holds, too

⇒ . . . and for cost-benefit ratio c(~l)/s(~l)

Full grids?Quadratic extract of subspaces is not economical:We take large subgrids with low contributionWe could have taken others with much higher contribution




Sparse Grids!

Best choice: Cut diagonally intableau of subspaces:

L1n := ~l : |~l |1 ≤ n + d − 1

⇒ Resulting sparse grid space

V 1n :=

⊕|~l|1≤n+d−1

W~l

Sparse grid for d = 2 andoverall level n = 5

Grid points x~l,~i of sameimportance in same color




Sparse Grids – Cost

Number of grid points?For d = 2:

dim V 1n =

∑|~l|1≤n+1

dim W~l =∑|~l|1≤n+1

2|~l|1−2 =n∑

k=1

k ·2k−1 = 2n(n−1)+1,

For d = 3:

dim V 1n =

n∑k=1

k(k + 1)

2· 2k−1 = 2n

(n2

2− n

2+ 1)− 1,

⇒ Both in O(2n · nd−1)

Holds for general d as well (proof with some combinatorics)




Sparse Grids – Cost (2)

In numbers. . .Compare cost for full grid Vn and sparse grid V 1

n :

d = 2:

n 1 2 3 4 5 . . . 10

dim Vn = (2n − 1)2 1 9 49 225 961 . . . 1,046,529

dim V 1n = 2n(n − 1) + 1 1 5 17 49 129 . . . 9,217

Even more distinct for d = 3:n 1 2 3 4 . . . 10

dim Vn = (2n − 1)3 1 27 343 3,375 . . . 1,070,590,167

dim V 1n = 2n

“n2

2 − n2 + 1

”− 1 1 7 31 111 . . . 47,103




Sparse Grids – Cost (3)

. . . and for overall level n = 5 in different dimensions

d V5 V 15

1 31 312 961 1293 29,791 3514 923,521 7695 28,629,151 1,4716 887,503,681 2,5617 27,512,614,111 4,1598 852,891,037,441 6,4019 26,439,622,160,671 9,439

10 819,628,286,980,801 13,441

The higher the dimension, the more sparse grids pay out!




Sparse Grids – Examples

Sparse Grids of overall level n = 6 in d = 2 and d = 3




Sparse Grids – Accuracy

Much less grid points⇒ much lower accuracy?Would force us to choose larger choice of n to obtain similaraccuracy, spoiling everything

Error in L2- and maximum-norm:Compute sum (|~l |1 = k + 1):

∑~l 6∈L1

n

s(~l) =∞∑

k=n+1

k · 2−2(k+1) =

(n12

+19

)2−2n

And for d = 3 (with |~l |1 = k + 2):

∑~l 6∈L1

n

s(~l) =∞∑

k=n+3

k(k + 1)

2· 2−2(k+2) =

(n2

96+

11n288

+127

)2−2n




Sparse Grids – Accuracy (2)

In general, it can be shownError of interpolation in L2- and maximum-norm O(2−2nnd−1)

Only polynomial (in n) factor worse than full grid with O(2−2n)

Analysis is more complicated for energy-norm (lines throughsubspaces with similar s(~l), and thus c(~l)/s(~l), are morecomplicated)

Result even betterObtain with only O(2n) grid points accuracy of O(2−n) – nopolynomial terms left!




Part VI

Finite Elements:An Introduction to the Most Common

Prejudices




Solving Differential Equations

Solution of differential equations (DEs) as another application forsparse grids (apart from integration)Algorithmically much more interesting than quadratureFirst, we have to introduce the method of finite elements (FE) to(discretize and) numerically solve DEs

There, we can directly plug in our hierarchical basisAs an example, we consider a simple linear ordinary differentialequation (ODE):

u(x)− u′′(x) = f (x) for x ∈ (0,1); u(0) = u(1) = 0




Finite-Dimensional Function Space

To represent a function in a computer, only finite number ofcoefficients possible

⇒ Choose function space Vh with finite dimension NThink of Vn:Continuous, piecewise linear functions u w.r.t. grid withmesh-width h = 2−n

u(0) = u(1) = 0 N = 1/h − 1 = 2n − 1Define basis

φj

1≤j≤N (think of hat functions φj := φn,j )

Task: determine N coefficients uj in

uh =N∑

j=1

ujφj

such that uh approximates exact solution u well




Conditions for uh

Derive N conditions for uh from ODE⇒determine N coefficientsStraightforward approach:

Demand that ODE is fulfilled at grid points xi

uh(xi )− u′′h (xi ) = f (xi ),1 ≤ i ≤ N

Fails – u′′h does not make sense for functions in Vh withbends at grid points




Conditions for uh (2)More reasonable conditions:

Multiply ODE with φi (so-called test functions)Demand that integral over [0,1] fulfills ODE:∫ 1

0[uh(x)− u′′h (x)]φi (x) dx =

∫ 1

0f (x)φi (x) dx

Replace critical term u′′ according to partial integration∫ 1

0−u′′h (x)φi (x) dx

∫ 1

0u′h(x)φ′i (x) dx

For sufficiently smooth u this is the same(φi (0) = φi (1) = 0, as φi ∈ Vh)We just take form on the right, without further considerations

We obtain N equations∫ 1

0uh(x)φi (x) dx +

∫ 1

0u′h(x)φ′i (x) dx =

∫ 1

0f (x)φi (x) dx




Conditions for uh (3)

Note: if uh fulfills these N conditions, it holds for arbitrary vh ∈ Vh:∫ 1

0uh(x)vh(x) dx +

∫ 1

0u′h(x)v ′h(x) dx =

∫ 1

0f (x)vh(x) dx

We can expand equation by vh =∑N

j=1 vjφj into linearcombination of equations with test functions φi

⇒ No matter which basis of Vh used for test functions: theequations for uh are equivalent.Solutions uh just depends on ansatz space, not on basis usedWe’ll always use same basis for test functions as for uh




Determining the Coefficients

Obtain system of linear equations for coefficients uj bysubstituting uh(x) =

∑Nj=1 ujφj (x) in each equation∫ 1

0

N∑j=1

ujφj (x)

︸︷︷︸

uh(x)

φi (x) dx+

∫ 1

0

N∑j=1

ujφ′j (x)

︸︷︷︸

u′h(x)

φ′i (x) dx =

∫ 1

0f (x)φi (x) dx

Looks bad, but is good – a linear equation in the uj :

N∑j=1

∫ 1

0φj (x)φi (x) dx︸︷︷︸

=:bi,j

+

∫ 1

0φ′j (x)φ′i (x) dx︸︷︷︸

=:ai,j

uj =

∫ 1

0f (x)φi (x) dx︸︷︷︸

=:fi




Determining the Coefficients (2)

Integral-free slide!We obtained a N × N system of linear equationsAssemble coefficients in two N × N matrices

A := (ai,j )1≤i,j≤N , B := (bi,j )1≤i,j≤N

and vector of length N

~f := (fi )1≤i≤N)

⇒ System of linear equations

(B + A)~u = ~f

Solution ~u will contain coefficients of uh in our basis




Determining the Coefficients – Side Note

Only of minor interest for us is mathematical background of thistechnique (we’re just users!)

Has the linear system a unique solution?(in our example: yes)Is uh a reasonable approximation of the exact solution?(yes; one can even show that it’s the best possible approximationin Vh, measured in a suitable norm)And much more, we’re not interested in. . .




Finite Elements in a Nutshell

Steps to solve the DE using FETransform equation to integral representation (“weak form”)Choose ansatz space Vh (typically: choose grid, select ansatzfunctions)Now we have determined uh, we only have to compute thecoefficientsChoose basis φi1≤i≤N

Assemble matrix (here B + A), and right-hand-side ~fSolve system of linear equationsConstruct the function uh using ~u, and plot a colorful picture




Example: ODE

Previous exampleu(x)− u′′(x) = f (x) fur x ∈ (0,1); u(0) = u(1) = 0Vh: continuous, piecewise linear functions defined on grid withmesh-width h with u(0) = u(1) = 0Nodal point basis: φn,i , 1 ≤ i ≤ 2n − 1 (h = 2−n)

bi,j :=

∫ 1

0φj (x)φi (x) dx =

23 h if i = j16 h if |i − j | = 10 else

ai,j :=

∫ 1

0φ′j (x)φ′i (x) dx =

2h if i = j− 1

h if |i − j | = 10 else




Stencil

More intuitive: Write as stencilNotate coefficients for an equation ordered corresponding to thegrid points:

B [

16

h23

h16

h]

orh6

[1 4 1]

and

A [−1

h2h− 1

h

]or

1h

[−1 2 − 1]

Make sure to know how the matrices look like!Order grid points in their natural order




Partial Differential Equations

Now: transition to partial differential equations (PDEs, more thanone variable)

Notation a bit more complicated, but for the (elliptic) PDEsunder consideration nothing substantially new

Domain Ω := [0,1]d

Again, we consider only functions which are 0 on ∂Ω

Our model problem transfered to d dimensions contains Laplaceoperator

∆u :=∂2u∂x2

1+∂2u∂x2

2+ . . .+

∂2u∂x2

d,

Can be “partially integrated” as well (Green’s first identity, ∇:gradient):

−∫

Ω

∆u(~x) · φ(~x) d~x =

∫Ω

∇u(~x)∇φ(~x) d~x .




Model Problem

Back to our previous example, but d-dimensionalWe now dare to solve the PDE

u(~x)−∆u(~x) = f (~x).

With grid with mesh-width h = 2−n, function space Vn with nodalpoint basis Ψ~n

To assemble the matrices: compute d-dimensional integrals forall pairs of basis functions (φi , φj ):

bi,j =

∫Ω

φi (~x)φj (~x) d~x , ai,j =

∫Ω

∇φi (~x)∇φj (~x) d~x

Nice property: in each row of the matrix, at most 3d coefficients6= 0

Corresponds to grid point and all neighbors




Stencil (d = 2)

For d = 2, they can be still written as stencil

B h2

36

1 4 14 16 41 4 1

und A 13

−1 −1 −1−1 8 −1−1 −1 −1

More important than the calculations leading to those entries:

How do matrices A and B look like?Best to order grid points lexicographically (e.g. row-wise)




Part VII

Algorithms and Data Structures for SparseGrids




Algorithms and Data Structures

We will now look at typical sparse grid algorithmsCan, e.g., be used for solution of PDE in previous partAlgorithms depend on data structure:

Efficient traversal of sparse grid necessaryThus, we deal with data structures for sparse grids, too




Data Structures (d = 1)

How to store function u : [0,1]→ R in hierarchical representation(i.e. surplusses v~l,~i )?

Order and store grid points and associated values in binary treeRoot is node x1,1 = 1/2Children of node xl,i are – if existent – the grid pointsxl+1,2i−1 and xl+1,2i+1 of level l + 1Alternative point of view if child does not exist:Complete subtree of binary tree starting from child with allsurplusses set to 0




Data Structures (d = 1) (2)

.x1,1

.

.

x2,1 x2,3

x3,1 x3,3 x3,5 x3,7




Typical Algorithms (d = 1)

Hierarchization and DehierarchizationPrototype for typical algorithm (c.f. worksheet 5)

Our data structure has to allow1 Iteration over all grid points, considering the hierarchical

relationsE.g. for hierarchization: first handle all grid points in the support ofφl,i , then compute vl,i

2 Access to hierarchical neighbors: grid points at intervalboundaries of support of φl,i (if possible – exception forpoints 0 and 1 as not in the tree), e.g. to compute

vl,i = ul,i − 12

(ul + ur ).




Typical Algorithms (d = 1) (2)

Hierarchical neighbors are easy to find geometrically

xl,i−1, xl,i+1

But have even indices⇒ really are on another level (< l)In the binary tree structure:

Can be found on way from root to nodeOne is parent node

For hierarchization/dehierarchization: pass hierarchicalneighbors as additional parameters

Developing algorithms:Try to store all information to process one node at the nodeand its hierarchical neighborsAccess to other nodes typically expensiveTree traversal with “supply of hierarchical neighbors” onlylinear in number of nodes




Data Structures and Typical Algorithms (d > 1)

What data structure to use in more than one dimension?Algorithmically: use construction of basis functions as product ofone-dimensional hats. Ideally:

Use a loop 1, . . . ,d over the dimensionApply 1d algorithm on one-dimensional structures in eachdimension (see also worksheet 7)

⇒ Need access to hierarchical neighbors in each spacial direction;implies to create binary tree structure in each dimension

Disadvantages:Storage requirements (2d pointers)High effort to keep structure consistent when inserting ordeleting points




Data Structures and Typical Algorithms (d > 1) (2)

If you could recognize anything, it would be binary tree structures forrows (black) and columns (magenta)





Often better:Store in a node only two pointers for one direction (e.g. x1)A binary tree of nodes is a row (a 1d structure parallel to the x1axis)For next spacial direction x2, only a binary tree in x2 directionrequiredStores one plane parallel to x1−−x2 coordinate plane; nodesare the binary trees with 1d structuresFor each additional spatial direction xd build binary tree with(d − 1)-dimensional structures as nodes

Disadvantage: Access to hierarchical neighbors not that easyany more (except for x1-direction)But can be achieved without much more computational effort bysuitable reordering of loops and tree traversals





Already more clear: One plane (two-dimensional structure) consistsof one binary tree (magenta) of which the nodes are binary trees(black) for each row





Hash tableMuch more comfortable (and not half inefficient) alternative

Store magnitudes as target values, with, e.g., (~l ,~i) as keysNo need to care about tree structuresOnly have to compute indices of designated node (hierarchicalneighbor, . . . )

⇒ Best solution for your own sparse grid experiments

Further assumptions on data structuresAlgorithms will assume that all hierarchical neighbors exist foreach grid point

⇒ If creating grid points adaptively, create them if necessaryNo further assumptions




Solving Differential Equations on Sparse Grids

Preliminary ConsiderationsFinite elements: method to transform DE to system of linearequations

M~v = ~f

for hierarchical surplusses (threaded in vector ~v )Linear system has to be solved

Problem: matrix in hierarchical basis not sparse, but denselypopulatedSo many non-zero entries that explicit assembly can be tooexpensiveEven worse: prohibits direct solution, e.g. via Choleskydecomposition




Preliminary Considerations (2)

Less problematic than it seems:Matrix can be applied to vector in linear time in length of vector(algorithmically tricky!)Use iterative solvers:

Only have to implement application of matrix to a givenvector

~v 7→ M~v

(algorithmically challenging and interesting)Algorithmically less interesting part is done by someone else(conjugated gradients (CG), . . . )

We consider it at the example of matrix B (L2 scalar product)Matrix A for energy scalar product less clear, but can be donesimilarly




Matrix-Vector multiplication

Computing M~v , with M = B, bi,j :=∫ 1

0 φj (x)φi (x) dxGiven: hierarchical coefficients v~l,~iCompute in each node~l ,~i the corresponding component of M~v :

∫Ω

φ~l,~i

∑~l′,~j

v~l′,~jφ~l′,~j

d~x =∑~l′,~j

(∫Ω

φ~l,~iφ~l′,~j d~x)

v~l′,~j =∑~l′,~j

(φ~l,~i , φ~l′,~j

)2

v~l′,~j

Think of transport of contributions:Transport surplus at~l ′,~j with weight (φ~l,~i , φ~l′,~j )2 to position~l ,~iEverything that arrives at a node is summed up

It is possible with effort proportional to number of unknowns(squared would be too easy)!




Multiplication with B, d = 1

Order 1d unknowns by level l (and within level by index i , e.g.) –important for mathematics, not for implementationExample for n = 3:

B~v =

1/3 1/8 1/8 1/32 3/32 3/32 1/321/8 1/6 0 1/16 1/16 0 01/8 0 1/6 0 0 1/16 1/16

1/32 1/16 0 1/12 0 0 03/32 1/16 0 0 1/12 0 03/32 0 1/16 0 0 1/12 01/32 0 1/16 0 0 0 1/12

v1,1

v2,1v2,3

v3,1v3,3v3,5v3,7

If follows from hierarchical structure:Surplusses have always to be propagated up or down the tree,never sidewards




Splitting of B

We now split transports depending on direction in tree structureinto

procedure down, which does transport towards leaves,procedure up, which does transport towards root

Note: splitting not necessary for d = 1, but helpful to deriveoperations and necessary for d > 1

In matrix notation this corresponds to

B =: BD + BU

BD corresponds to down, contains entries of B below diagonal(strictly lower triangular matrix)BU corresponds to up, contains entries on and above diagonal(upper triangular matrix)




Multiplication with B, d = 1: down

Down computes in node~l ,~i

∫Ω

φl,i

∑l′<l,j

vl′,jφl′,j

dx ,

Value depends only on linear interpolant between hierarchicalneighbors of x~l,~i !

⇒ Take procedure for dehierarchizationModify it to compute integral as

hlul + ur

2.

before computing function value u~l,~i




Multiplication with B, d = 1: up

Up is more difficult. . .Contribution of other basis functions not linear on support

To understand up operations, we consider the following:We can neglect the diagonal for the following considerations(unproblematic, as no communication of different nodes)B is symmetric, thus BU and BD are transposedMultiplication with BD did consist mainly of operations

ul,i := ul,i +ul + ur

2

They can be described by matrix BD~l,~i

(how does it look like?)




Multiplication with B, d = 1: up (2)

Multiplication with transposed of matrix BD~l,~i

as follows:

ur := ur +ul,i

2; ul := ul +

ul,i

2

Multiplication with hl corresponds to diagonal matrix (slightlytransposed)Now only apply those building blocks in reverse order (due to(CD)T = CT DT )⇒ bottom-up tree traversal

⇒ Multiplication with B with constant cost per node (even thoughmuch more coefficients in B are non-zero): O(N)




Multiplication with B, d > 1

In multi-dimensional case, we can write weights as products of1d weights: (

φ~l,~i , φ~l′,~j

)2

=d∏

k=1

(φlk ,ik , φl′k ,jk

)2

This implies general strategy for d-dimensional problems:Loop over the dimensionLoop over all 1d structures in corresponding directionApply 1d algorithm (up and down) there




Multiplication with B, d > 1 (2)Example: transport of contributions

Transport from grid point~l ′,~j = (4,1), (7,1) to grid point~l ,~i = (1,3), (1,3)Up along row (black arrows) computes weight (φ1,1, φ4,7),down along column (magenta) computes weight (φ3,3, φ1,1)




Multiplication with B, d > 1 (2)

Troubles with that. . .Nice algorithm, but does not work on sparse gridsConsider reverse direction – would be like this:

Three grid points are missing!Creating all missing grid points on the fly⇒ full gridIt works, if we reorder up and down processes

Execute all ups before any down




All together. . .

We have considered suitable data structures. . . and efficient algorithms working on themWe could now start

solving PDEs (iteratively)integrate (and interpolate) multi-dimensional functionsand much more. . .




Part VIII

More on Sparse Grids:Numerical Classification in Data Mining




Classification in Data Mining

Now for something completely different?We consider one more application: classification in data miningAim is extraction of new and (hopefully) useful information out ofdata bases

problemidentification

dataacquisition,processing

DataMining

evaluation,interpretationof results

We consider predictive modelling in data mining:Forecast values on new, previously unseen dataPrediction based on given set of data points (training data)




Binary Classification

Classification problemClassification aims to

assign a “correct” class label k ∈ Kto all data points ~x in some d-dimensional feature space Ωbased on set S of pre-classified data points for training

S := (~xi , yi ) ∈ Ω× Kmi=1

Here: binary classification, for us K := +1,−1Tasks:

Is person male or female (dimensions: shoe size and bodyheight)?Is customer of bank credit-worthy (dimensions: income, typeof house, . . . )?Will direct mailing pay out (dimensions: interests, . . . )?. . .




Classification

Classical approachesDecision treesRule-based classifiers (decision rules)Instance-based classifiers (k -NN, . . . )Probabilistic (Bayes) classifiersBased on function representation (ANN, SVM, . . . )

ProblemDepend all at least quadratically on size of training set(think of classification based on comparisons of data points)Approach based on discretization of Ω would allow linear trainingtimeBut: curse of dimensionality⇒ sparse grids!




Sparse Grid Classification

Training set (normalized)

S :=

(~xi , yi ) ∈ [0,1]d × +1,−1m

i=1

Assume training data obtained by randomly sampling ofunknown function f disturbed by noiseReconstruct piecewise d-linear sparse grid approximation u of f :

fN(~x) =N∑

i=1

viφi (~x)

To determine class at new data location ~x :Compute fN(~x)Predict class +1, if fN(~x) ≥ 0; otherwise −1




Sparse Grid Classification

Solve regularized least squares problem

fN!

= arg minfN∈VN

(1m

m∑i=1

(yi − fN(~xi ))2

+ λ||∇fN ||2L2

)

Aims:Be close to training data: minimize quadratical errorPrevent overfitting: minimize gradient to avoid oszillationsdue to noise in training dataParameter λ to stir trade-off

Derive system of linear equations:We plug-in fNAnd minimize by setting each first derivative ∂/∂vi to zero




From Minimization to System of Linear Equations

⇒ N linear equations for N unknowns(1m

BBT + λC)~v =

1m

B~y ,

With matrices C and B

(C)ij = 〈∇φi (~x),∇φj (~x)〉L2 , (B)ij = φi (~xj )

B here is simple, can be applied to vector in O(Nm) (and evenbetter)We already know C (matrix A from PDE-part): O(N)

⇒ Solve linear system iteratively to determine hierarchicalsurplusses vi and thus classifier fN




Example 1 – Ripley Data Set

Artificial, 2d data set250 points for training, 1000 to test on

0

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

Constructed to contain 8% of noise




Example 1 – Ripley Data Set (2)

Compute adaptive sparse grid classifierResult can look as follows

0 0.2 0.4 0.6 0.8 1 0.20.40.60.8 1–1

–0.5

0

0.5

1

x1

x1

x2

x2f(x) 0

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

Best accuracy: 91.5% on test data (max. 92%)Suitable treatment of boundary needed




Example 2 – Optidigits

Now for something really high-dimensional. . .Optical recognition of handwritten digits:Classify images of handwritten digits64-dimensional data set of gray-values (0,1,. . . ,16)




Example 2 – Optidigits (2)

Construct ten different binary classifiers (one class (+1) againstthe others (−1))Take the one with highest prediction (function value)

⇒ Best accuracy: 97.7% correctly classified

SummaryEven high-dimensional problems (“real problem” not thathigh-dimensional) can be successfully solvedTypically requires to adapt sparse grids to problem

What to do with boundary (3d would be really large ford = 64)?Adaptive refinement!Consider dependency of algorithms in d , N, and m(not only exponential parts can hurt!). . .




Part IX

Multigrid Methods




Multigrid Methods

Up to now we have consideredHierarchical bases (and algorithms) to represent functions withlow costs as well as possible for

interpolation and quadrature,the solution of a system of linear equations stemming from aPDE discretization

We haven’t considered so farThe solution of the system of linear equationsThere, hierarchical methods play a very important roleEspecially multigrid methods allow efficient iterative methods tosolve linear system for important classes of discretized problemsWe will now consider multigrid (MG) methodsBut before, we have to look at classical iteration methods (esp. attheir rates of convergence) to be able to evaluate multigrids




System of Linear Equations

Aim: solve linear system

Ax = b with A ∈ Rn×n and x ,b ∈ Rn

We assume that n is so large that a direct solution (with Gaussianelimination, e.g.) is too expensive regarding time or spaceAs usual, we assume that A is sufficiently well-behaved

Typically desired properties: invertible, symmetric, withnon-zero entries on diagonal, . . .




Iterative Solvers

Iterative solvers compute sequence of approximations

x0, x1, x2, . . .

Converges against solution x of Ax = bIn practice: stop method after finite number of steps;take iterate as approximation of xAn iterative method will make compromise as well as possiblebetween two requirements

x i+1 should be cheap to be computed out of x0, . . . , x i

⇒ in the majority of cases: x i+1 is function only of x i (and, ofcourse, of A and b) – just consider storage spaceConvergence should be as fast as possible: accurate resultsafter as few steps as possible




Residual and Error

For further considerations, the following notions are helpful:The residual after i steps is defined as r i := b − Ax i

The error is ei := x i − xResidual easy to compute, but we would like to know error(would directly provide access to exact solution)Both magnitudes are related via equation r i = −Aei

This suggests to compute estimate of error out of residual:Apply (cheap) approximation of −A−1 on residual

(Outlook for hierarchical methods: approximation can beobtained cheaply on coarser grid)




Linear Iterative Methods

Linear method to solve linear systemsCan be written as

x i+1 = Mx i + Nb (1)

M and N are n × n matricesM and N depend only on A, not on b or x i

Linear iterative methods are popular for several reasons (theiranalysis is relatively easy, e.g.)




Linear Iterative Methods: Consistency

Minimal requirement at iterative method:Exact solution has to be fixed point of iterationIf we provide solution as x0 it should not be destroyedThus, for all b and a with Ax = b it has to hold

x != Mx + Nb = (M + NA)x

⇒M + NA = I (2)

(as we can choose b and thus x arbitrarily)This requirement is called consistency




Linear Iterative Methods: Convergence

With the consistency requirement M + NA = I (2), we rewrite theiteration scheme x i+1 = Mx i + Nb (1) as

x i+1 = (I − NA)x i + Nb= x i − N(Ax i − b)

= x i + Nr i

For the error, this results to

ei+1 = x i+1 − x = x i − x + Nr i

= ei − NAei = Mei




Linear Iterative Methods: Convergence (2)

Therefore, speed of convergence of iteration depends on M,more accurately on the norm of M:

‖ei+1‖ ≤ ‖M‖ · ‖ei‖⇒ ‖M‖ should be smaller than 1; the closer to 0 the better

First tryWe therefore choose M := 0The iterative method solves the linear system in the first step:

x1 = x

As expected, there is no free lunch, as

N = (I −M)A−1 = A−1

We would have to solve our linear system to compute x1. . .




Jacobi Method

We obtain feasible methods by the decomposition

A =: D − E − F

whereD contains the diagonal of A,−E the (strictly) lower triangular part, and−F the (strictly) upper triangular part

As an example, we look at the Jacobi method

A short remark:For sparse grids, we have mentioned the conjugated gradientmethod (CG)It is not a linear iterative method, but behaves rather similar




Jacobi Method (2)

The Jacobi method chooses

N := D−1 ,

thusM = I − D−1A

This results in the following algorithm:Compute x i+1 out of x i :

Compute residual r i := b − Ax i

for k = 1, . . . ,n:x i+1

k = x ik + 1

ak,kr ik

endforComputing r i is, of course, a loop over all components, tooBut the matrix could be as well provided as a procedure allowingto compute matrix-vector products x 7→ AxAdditionally, we only need knowledge about diagonal entries ak,k




Jacobi Method (3)

Unfortunately, we do not have convergence for arbitrary AIn practice, one often introduces a damping factor 0 < α ≤ 1 forthe modification D−1r i to obtain convergenceIf α is too small this goes at the expense of speedIn the algorithm this looks like

. . .x i+1

k = x ik + α 1

ak,kr ik

. . .For the matrices we obtain

N(α) = αD−1

andM(α) = I − αD−1A




Speed of Convergence

How many iterations do we need to perform to obtain sufficientlysmall convergence error?

We can obtain propositions about speed of convergence fromequation

ei+1 = Mei

To simplify things we assumeM has a full set of eigen vectors η1, . . . , ηn for real-valuedeigen values λ1, . . . , λnIn reality, this does not always hold; but the concepts fromthe simplified case remain mainly the same




Speed of Convergence (2)

We then can write e0 as linear combination of eigen vectors

e0 :=n∑

k=1

βkηk .

Iterating (applying M) multiplies each component withcorresponding λi

ei =n∑

k=1

λikβkηk .

Typically, some of the eigen values are close to 0⇒ The corresponding components decay very fast: at first,

convergence makes a lot of progress





Unfortunately, there are eigen values with absolute value justslightly below 1

The corresponding components of the error are hardlyreducedThey dominate after few iterations the whole progress whichbecomes very slow after the first progress

This effect is widely independent of x0:Error contains almost always components of all eigenvectorsIf not: introduced at the latest by rounding errors





Be λn the eigen value with absolute value closest to 1, thus

δ := 1− |λn|(unfortunately) close to 0Then it is for sufficiently many iterations

‖ei+1‖‖ei‖ ≈ 1− δ.

Number of iterations to obtain

‖ei+nit‖ < ε‖ei‖for given 0 < ε < 1:

nit ≈ ln εln(1− δ)

≈ − ln εδ

(Expansion at δ = 0 leads to ln(1− δ).

= −δ)Tobias Neckel, Dirk Pfluger: Algorithms of Scientific Computing




Unfortunately, we obtain for the discretization of PDEs on a gridwith mesh-width h typically δ ∼ hγ for some γ > 0

⇒ The number of steps grows with h−γ

Even if the cost per step is proportional to number of unknowns,the overall effort grows disproportionally high




Speed of Convergence, Example

Example: damped JacobiWe solve linear system for discretized one-dimensional Poissonequation

Poisson equation −u′′ = fStencil 1

h2 [−1 2 − 1]

on grid with m − 1 inner grid points (zero on boundary)We solve linear system with damped Jacobi

x i+1 := x i + αD−1r i and 0 < α ≤ 1




Speed of Convergence, Example (2)

The iteration matrix MJac := I − αD−1A has as eigen vectors thediscrete sine oscillations

ηk :=

(sin(

ikπm

))1≤i<m

∈ Rm−1

The corresponding eigen values are

λk := α cos(

kπm

)+ 1− α = 1− 2α sin2

(kπ2m

)





–1

–0.5

0

0.5

1

0.2 0.4 0.6 0.8 1k/m

Eigen values of M jac for different values of α ∈ [0.5,1]





In the diagram, we can observe eigen values for low-frequencyerror components (k/m small)They are practically independent of α and very close to 1Closest to one is

λ1 := 1− 2α sin2( π

2m

).

= 1− απ2

2m2 .

At the right end of the diagram: eigen values to high-frequencyerror components (k/m ≈ 1)We can adjust damping via α

α > 0.5 results in oscillations (negative eigen values)Convergence deteriorates with α→ 1

For all 0 < α ≤ 1, the convergence rate is determined by λ1

⇒ δ ∈ O(m−2): half the mesh-width results in four times as manyiterations for given error reduction




Why is Jacobi so Slow?

Summarizing the main observationsFor α suitably chosen, we can damp high-frequency errorcomponents (k/m→ 1) very wellFor all values of α, the low-frequency error components remainalmost undamped

We could have assumed that as low-frequency error termsproduce only very small residualsThus, residuum is not a very well-suited to construct estimate oferrorRemark: Related methods, such as Gauss-Seidel, thereforebehave similarly)




Why is Jacobi so Slow? (2)

Further search for reasons of this problem lead to observationDifferent error components (frequencies) have completelydifferent relations between error ei and residual r i = −Aei

This can be expressed asA has a large condition number

κ(A) =

max‖x‖=1

‖Ax‖min‖x‖=1

‖Ax‖

(not so large that we get problems with accuracy of solution,but large enough to make iterative solution annoyingly slow)




Why is Jacobi so Slow? (3)

Where does large condition number come from?Not a property of our problem (solve a DE), but ofdiscretization!

For same problem, discretizations can be provided leading toarbitrary well-conditioned coefficient matricesTaking hierarchical basis (for a suitable scaling) leads to matriceswith significantly better condition

In the following, we proceed otherwiseWe kind of try to remedy the gaucheness during discretizationwith as little computational effort as possibleThe original problem with its properties will play an importantrole:

(Geometric) multigrid methods as treated here cannot beconsidered independent of problem




Multiple Grids

Ah: coefficient matrix of example problem for mesh widthh = 1/mCondition number of Ah is in O(h−2)⇒ gets worse with h→ 0But need small h to keep discretization error low

⇒ Take solutions of coarser grids for solution on current gridWe will have to deal with family of linear systems on interval[0,1]:

Ahxh = bh

for mesh-width h = 2−l as index for discretization levell = lmin, . . . , lmax




Multiple Grids (2)

Solutions on different grids represented by vectors of differentlengthTo be able to compare them, we decompose h-grid into

coarse grid points, which also exist on 2h-gridfine grid points, which don’t

The prolongation operator

Ih2h : R1/(2h)−1 → R1/h−1,

maps a u2h on a 2h-grid to a uh on the h-gridThe uh

takes the values of u2h at the coarse grid points andinterpolates the values at fine grid points linear out of coarsegrid points (or boundary values, resp.)




Prolongation Operator

As matrix and picture it looks as follows

Ih2h =

12

121 1

2. . .

1. . . 1. . . 2

1

½ ½ ½ ½ ½ ½

1 1 1




Multiple Grids (cont.)

Compare xh with Ih2hx2h:

The difference will be small and mainly high-frequencyThe low-frequency parts of the exact solution can berepresented well on 2h-grid

Therefore straightforward to use Ih2hx2h as initial solution for

iteration on h-grid, so that contributions of resistant error modessmall:

Solve for h = 2−lmin linear system Ahxh = bh.for l = lmin + 1, . . . , lmax :

Iterate with h = 2−l and x0h := Ih

2hx2hthe linear system Ahxh = bh sufficiently often,call the result (afflicted with remaining error) xh

endfor




Multigrids

Idea goes in right direction, but not far enough for mostproblems:

Ih2hx2h contains – even if A2hx2h = b2h solved exactly –

low-frequency error components, too⇒ Still many iterations on fine grid necessary if very accurate

solution desiredThus, modify scheme so that coarse grids can be used multipletimes to compute suitable correctionMain idea:

After some iterations on fine grid, solve auxiliary equationwhich combines error and residual,

r i = −Aei ,

approximately for ei on coarse grid




Multigrids (2)

We need another operator, the restriction

R2hh : R1/h−1 → R1/(2h)−1,

Maps right-hand side of linear system on h-grid onto right-handside on 2h-grid(Haven’t needed that so far, as we assumed that bh is known forall h))




Multigrids (3)Restriction can be obtained by simply omitting every second gridpointOther choice: weighted restriction

R2hh :=

14

1 2 1

1 2 1. . . . . . . . .

1 2 1

¼ ¼ ¼ ¼ ¼ ¼

½ ½ ½

Remark: The property R2hh = c · (Ih

2h

)T(provided by weighted

restriction) is sometimes helpful, especially concerning theanalysis of such schemes




Multigrids (4)

We now have assembled everything for coarse grid correction:Compute on fine grid residual rh = bh − Ahxh for currentiterated xhTransport to coarse grid: r2h = R2h

h rhSolve (approximately) A2h(−e2h) = r2hTransport correction to fine grid and apply to current iterate:xnew

h := xh + Ih2h(−e2h)

Additionally, we have to treat the high frequenciesFor example, by some steps of a damped Jacobi methodwhich we call smootherDoesn’t necessarily reduce error, but smooths them as highfrequencies are eliminated




Multigrids (5)

In the example problem, the eigen values of the iteration matrixsuggest to damp as high-frequency error components oscillateso much that they are practically not reducedWith, e.g., α = 1/2 we can overcome thisOn 2h-grid we typically don’t solve linear system exactlyInstead: apply idea of coarse grid correction recursivelyRecursion stops if h is so large that obtaining exact solution ischeap or – for h = 1/2, e.g. – trivialWe thus assume that m is power of twoAs we compute −e2h iteratively, we need initial value;zero vector is best choice due to several reasons. . .




Multigrid Algorithm

We have two yet unknown parameters in our algorithm:The numbers ν1, ν2, µ ∈ N denote the number of smoothing stepsbefore and after the coarse grid correction, and the number ofrecursive calls within coarse grid correction:mg(xh,bh, ν1, ν2, µ):

if h = 2−lmin

solve Ahxh = bh exactlyelse

Pre-smoothing:Apply ν1 smoothing steps to Ahxh = bh

Coarse grid correction:for k = 1, . . . , µ:

xh := xh + Ih2h(mg(02h,R2h

h (bh − Ahxh), ν1, ν2, µ))Post-smoothing:

Apply ν2 smoothing steps to Ahxh = bhend if




Effort/Cost

How does it look like for the effort?Consider the application of Ih

2h and R2hh to be about as expensive

as one smoothing stepsThis is realistic: we have to evaluate a local stencil for the finegrid

⇒ The effort with N grid points is about

C ·M · (ν1 + ν2 + 2µ)

plus effort for µ coarse grid correctionsC is cost per grid point (small, constant number of operations)




Effort/Cost (2)Cost for coarse grid points

One-dimensional model problem (as considered):Smoothness and grid transfer costs for 2h-grid are half ashigh as for h-gridFor µ = 1 the total cost is about

1 + 1/2 + 1/4 + . . . = 2

times the cost on fine grid, i.e., proportional to number ofgrid points (on finest grid)For µ = 2 the cost per grid point grows logarithmically in M

Problem transferred to two or three spatial dimensions:Calculation looks even much better:On all coarse grids together there are just 1/3 (2d), or 1/7(3d) respectively, as many grid points as on finest gridThere, even for µ < 4 and µ = 8 we obtain overall effortproportional to number of unknowns on fine grid




Effort/Cost (3)

Typical choices: µ = 1 (V-cycle), and µ = 2 (W-cycle)




Effort/Cost (4)

We do not recompute, but just notice (and verify experimentally?)the other part of efficiency considerations:

Convergence rate (for ν1 = ν1 = µ = 1, e.g.) is bounded by1 independent of fine grid mesh-width h

Typical multigrid convergence rates for well-behaved problemsare about 1/2 and smaller

⇒ We need only few steps to obtain small error




Hierarchical Methods. . .

Summary

Hierarchical are beneficial in many settingsCan allow to reduce costCan allow to “compress” functions (represent with few degrees offreedom)Can allow to estimate errorsCan provide “level of detail” (coarse partial solutions)Can allow to compute coarse approximationsCan be used to define refinement criteriaCan speed up the solution of linear systems. . .



Documents

Hierarchical Methods and Sparse Grids, Summer Term 2010 3 … · 2010-07-19 · Hierarchical Methods and Sparse Grids, Summer Term 2010 3 Algorithms of Scientic Computing Hierarchical