An algorithm for the efficient selection of higher order ... EuroAd Workshop... · 3 = 1 Lower: 3 2...

Preview:

Citation preview

An algorithm for the efficient selection of higher orderderivatives

Max Sagebaum and Nicolas R. Gauger

AG Scientific ComputingTU Kaiserslautern

20th AD WorkshopINRIA Sophia-Antipolis

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 1/ 27

Overview

Higher order notationSelection algorithmImplementationExamples

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 2/ 27

Motivation - Higher order ADLet F : R→ R be given by:

y = F (x)

Then the forward mode of F is:

y =dF

dxx

and the reverse mode of F is:

x =dF

dx

T

y

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 3/ 27

Motivation - Higher order AD

AD yields a new code that computes:

(y , y) = Ft(F (x),dF

dxx)

and

(y , x) = Fa(x , y) = (F (x),dF

dx

T

y) .

Apply AD again! How to call the new variables ˙x?

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 4/ 27

Motivation - Higher order AD

AD yields a new code that computes:

(y , y) = Ft(F (x),dF

dxx)

and

(y , x) = Fa(x , y) = (F (x),dF

dx

T

y) .

Apply AD again! How to call the new variables ˙x?

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 4/ 27

Higher order AD notationNotation taken from Naumann et al.1 with modifications.

Add the application index to the variable

a=a(1)

The second application of AD on a and a(1) becomes then:

a(2) and·

[a(1)](2)

Multiple indices are merged:

·[a(1)]

(2)

= a(1,2)

This allows for arbitrary high order notations.

1U. Naumann. The Art of Differentiating Computer Programs. SIAM, 2012.M. Sagebaum Higher Order Derivatives 20-th AD Workshop 5/ 27

Higher order AD notationNotation taken from Naumann et al.1 with modifications.

Add the application index to the variable

a=a(1)

The second application of AD on a and a(1) becomes then:

a(2) and·

[a(1)](2)

Multiple indices are merged:

·[a(1)]

(2)

= a(1,2)

This allows for arbitrary high order notations.

1U. Naumann. The Art of Differentiating Computer Programs. SIAM, 2012.M. Sagebaum Higher Order Derivatives 20-th AD Workshop 5/ 27

Higher order AD notation

The same is done for the reverse mode:Add the application index to the variable

a=a(1)

The second application of AD on a and a(1) becomes then:

a(2) and_

[a(1)](2)

Multiple indices are merged:_

[a(1)](2)= a(1,2)

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 6/ 27

Higher order AD notation

Mixed notations are:Reverse AD (third application) on forward AD a and a(1,2):

˙a(1)(3) and ˙a(1,2)

(3)

Forward AD (third application) on reverse AD a and a(1,2):

˙a(3)(1) and ˙a(3)

(1,2)

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 7/ 27

Example - Higher order AD notation

Forward AD on Ft :

(y (2), y (1,2)) =

(dF

dxx (2),

d2F

d2xx (1)x (2) +

dF

dxx (1,2)

)which can be combined to

(y , y (1), y (2), y (1,2)) =Ftt(x , x(1), x (2), x (1,2))

=

(F (x),

dF

dxx (1),

dF

dxx (2),

d2F

d2xx (1)x (2) +

dF

dxx (1,2)

).

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 8/ 27

Example - Higher order AD notation

Forward AD on Ftt :

y (3) =dF

dxx (3)

y (1,3) =d2F

d2xx (1)x (3) +

dF

dxx (1,3)

y (2,3) =d2F

d2xx (2)x (3) +

dF

dxx (2,3)

y (1,2,3) =d3F

d3xx (1)x (2)x (3) +

d2F

d2xx (2)x (1,3) +

d2F

d2xx (1)x (2,3) +

d2F

d2xx (1,2)x (3) +

dF

dxx (1,2,3) .

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 9/ 27

Example - Higher order AD notation

Reverse AD on Fa:

x(2) =d2F

d2xy(1)x(1,2) +

dF

dxy(2)

y(1,2) =dF

dxx(1,2)

which can be combined to

(y , x(1), y(1,2), x(2)) =Faa(x , y(1), x(1,2), y(2))

=

(F (x),

dF

dx

T

y(1),dF

dx

T

x(1,2),d2F

d2xy(1)x(1,2) +

dF

dxy(2)

).

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 10/ 27

Example - Higher order AD notationReverse AD on Faa:

y(2,3) =dF

dx

T

x(2,3)

y(1,3) =d2F

d2xx(1,2)x(2,3) +

dF

dx

T

x(1,3)

x(1,2,3) =d2F

d2xy(1)x(2,3) +

dF

dx

T

y(1,2,3)

x(3) =d3F

d3x

T

y(1)x(1,2)x(2,3) +d2F

d2xy(1)x(1,3) +

d2F

d2xy(2)x(2,3) +

d2F

d2xx(1,2)y(1,2,3)

+dF

dx

T

y(3)

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 11/ 27

Example - Higher order AD notation

Reverse AD on Ft and forward AD on Fa:

(y , y (1), ˙x (1)(2) , x(2)) =Fta(x , x (1), ˙y (1)

(2) , y(2))

=

(F (x),

dF

dxx (1),

dF

dx

T

˙y (1)(2) ,

d2F

d2xx (1) ˙y (1)

(2) +dF

dxy(2)

)(y , y (2), x(1), ˙x (2)

(1) ) =Fat(x , x(2), y(1), ˙y (2)

(1) )

=

(F (x),

dF

dxx (2),

dF

dx

T

y(1),d2F

d2xy(1)x

(2) +dF

dx˙y (2)(1)

).

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 12/ 27

Higher order AD notation

Higher order AD notation is possibleAlready quite complicated for small ordersEven more involved for vector valued functions

How to enable the user to access higher order derivativesefficiently?

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 13/ 27

Higher order AD notation

Higher order AD notation is possibleAlready quite complicated for small ordersEven more involved for vector valued functions

How to enable the user to access higher order derivativesefficiently?

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 13/ 27

Selection algorithmAD can be applied n times on a code

This yields 2n valuesOne primal value (e.g. a)2n − 1 derivative values

This can be split into the binomial coefficients:

2n =n∑

l=0

(n

l

)

Each term defines how many derivatives oforder l exist

11 1

1 2 11 3 3 1

1 4 6 4 11 5 10 10 5 1

1 6 15 20 15 6 1

Create an algorithm were the user can specify:The derivative order lWhich derivative he/she wants: e.g. 1 ≤ k ≤

(nl

)

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 14/ 27

Selection algorithmAD can be applied n times on a code

This yields 2n valuesOne primal value (e.g. a)2n − 1 derivative values

This can be split into the binomial coefficients:

2n =n∑

l=0

(n

l

)

Each term defines how many derivatives oforder l exist

11 1

1 2 11 3 3 1

1 4 6 4 11 5 10 10 5 1

1 6 15 20 15 6 1

Create an algorithm were the user can specify:The derivative order lWhich derivative he/she wants: e.g. 1 ≤ k ≤

(nl

)

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 14/ 27

Selection algorithmEach coefficient can be split into a recursive formulation(

n

l

)=

(n − 1l

)+

(n − 1l − 1

)The application of AD can be viewed as a graph

2

1 d0 d

0 p

1 p0 d

0 p

The first term(n−1l

)describes the number of derivatives in the upper

branch p (primal)The second term

(n−1l−1

)describes the number of derivatives in the lower

branch d (derivative)

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 15/ 27

Selection algorithmRequire: n : The maximum derivative orderRequire: l : The derivative order the user wants to selectRequire: k : The derivative the user wants to select 1 ≤ k ≤

(nl

)ln := lkn := kfor i = n, . . . , 1 do

if ki ≤(i−1li

)then

select upper branch (select primal part)li−1 := liki−1 := ki

elseselect lower branch (select derivative part)li−1 := li − 1ki−1 := ki −

(i−1li

)end if

end for

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 16/ 27

Example - Selection algorithm

User input: Select the secondderivative of third order (a(1,2,4))

n = 4l = 3k = 2

Algorithmi = 4l4 = 3k4 = 2Upper:

(33

)= 1

Lower:(32

)= 3

Select lower: l3 = 3− 1 = 2Select lower: k3 = 2− 1 = 1

4

3 d

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3 p

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3

1

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 17/ 27

Example - Selection algorithm

User input: Select the secondderivative of third order (a(1,2,4))

n = 4l = 3k = 2

Algorithmi = 3l3 = 2k3 = 1Upper:

(22

)= 1

Lower:(21

)= 2

Select upper: l2 = 2Select upper: k2 = 1

4

3 d

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3 p

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3

21

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 17/ 27

Example - Selection algorithm

User input: Select the secondderivative of third order (a(1,2,4))

n = 4l = 3k = 2

Algorithmi = 2l2 = 2k2 = 1Upper:

(12

)= 0

Lower:(11

)= 1

Select lower: l1 = 2− 1 = 1Select lower: k1 = 1− 0 = 1

4

3 d

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3 p

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3

11

0

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 17/ 27

Example - Selection algorithm

User input: Select the secondderivative of third order (a(1,2,4))

n = 4l = 3k = 2

Algorithmi = 1l1 = 2k1 = 1Upper:

(01

)= 0

Lower:(00

)= 1

Select lower: l0 = 1− 1 = 0Select lower: k0 = 1− 0 = 1

4

3 d

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3 p

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3

11

1

0

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 17/ 27

Example - Selection algorithm

User input: Select the secondderivative of third order (a(1,2,4))

n = 4l = 3k = 2

Algorithmi = 1l1 = 2k1 = 1Upper:

(01

)= 0

Lower:(00

)= 1

Select lower: l0 = 1− 1 = 0Select lower: k0 = 1− 0 = 1

4

3 d

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3 p

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3

11

1

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 17/ 27

Example - Selection algorithm

What are you selecting?

3

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p a = (l = 0, k = 1)

a(1) = (l = 1, k = 1)

a(2) = (l = 1, k = 2)

a(3) = (l = 1, k = 3)

a(1,2) = (l = 2, k = 1)

a(1,3) = (l = 2, k = 2)

a(2,3) = (l = 2, k = 3)

a(1,2,3) = (l = 3, k = 1)

k accesses the multi indices in a sorted order.

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 18/ 27

Implementation - Selection algorithm

Strongly simplifiedType safe implementation makes it quite involved

template<typename Real, size_t n>struct DerivativeSelector {

static double select(Real& value, int l, int k) {size_t upperDerivatives = binomial(n - 1, l);

if(k <= upperDerivatives) {return DerivativeSelector<typename Real::Real, n - 1>

::select(value.value(), l, k);} else {

return DerivativeSelector<typename Real::GradientValue, n - 1>::select(value.gradient(), l - 1, k - upperDerivatives);

}}

};

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 19/ 27

Implementation - Selection algorithm

This is used for several helper functions:Select a specific derivative:derivative(value, l, k)

Set all derivatives of a specific order l :setDerivatives(value, l, derivative)

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 20/ 27

Example - CoDiPackCreate higher order types:typedef codi::RealForwardGen<double> t1s;typedef codi::RealForwardGen<t1s> t2s;typedef codi::RealForwardGen<t2s> t3s;typedef codi::RealForwardGen<t3s> t4s;typedef codi::RealForwardGen<t4s> t5s;typedef codi::RealForwardGen<t5s> t6s;

typedef codi::RealReverseGen<t5s> r6s;

Compute 2nd order derivatives:{

typedef codi::DerivativeHelper<t2s> DH;

t2s aFor = 2.0;DH::setDerivatives(aFor, 1, 1.0);

t2s cFor = func(aFor);

cout << "t0s: " << DH::derivative(cFor, 0, 0) << std::endl;cout << "t1_1s: " << DH::derivative(cFor, 1, 0) << std::endl;cout << "t1_2s: " << DH::derivative(cFor, 1, 1) << std::endl;cout << "t2s: " << DH::derivative(cFor, 2, 0) << std::endl;

}

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 21/ 27

Example - CoDiPack

With func defined as:f (x) = 3 ∗ x7

The result is:t0s: 384t1_1s: 1344t1_2s: 1344t2s: 4032

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 22/ 27

Example - CoDiPackCreate 6th order derivatives:

With setDerivatives:t6s aFor = 2.0;DH::setDerivatives(aFor, 1, 1.0);

With derivative:t6s aFor = 2.0;DH::derivative(aFor, 1, 0) = 1.0;DH::derivative(aFor, 1, 1) = 1.0;DH::derivative(aFor, 1, 2) = 1.0;DH::derivative(aFor, 1, 3) = 1.0;DH::derivative(aFor, 1, 4) = 1.0;DH::derivative(aFor, 1, 5) = 1.0;

Using the old way:t6s aFor = 2.0;aFor.value().value().value().value().value().gradient() = 1.0;aFor.value().value().value().value().gradient().value() = 1.0;aFor.value().value().value().gradient().value().value() = 1.0;aFor.value().value().gradient().value().value().value() = 1.0;aFor.value().gradient().value().value().value().value() = 1.0;aFor.gradient().value().value().value().value().value() = 1.0;

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 23/ 27

Example - CoDiPack

Get the result:With derivative:

t6s cFor = func(aFor);

cout << "t0s: " << cFor << std::endl;cout << "t6s: " << DH::derivative(cFor, 6, 0) << std::endl;

Using the old way:t6s cFor = func(aFor);

cout << "t0s: " << cFor << std::endl;cout << "t6s: " << cFor.gradient().gradient().gradient()

.gradient().gradient().gradient() << std::endl;

Which produces:t0s: 384t6s: 30240

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 24/ 27

Example - CoDiPackUsing the reverse mode:

r6s aRev = 2.0;// set all first order directions on the primal valueDH::setDerivativesForward(aRev, 1, 1.0);tape.registerInput(aRev);

r6s cRev = func(aRev);

tape.registerOutput(cRev);// set all first order directions on the adjoint valueDH::setDerivativesReverse(cRev, 1, 1.0);

tape.evaluate();

cout << "r0s: " << cRev << std::endl;cout << "r6s: " << DH::derivative(aRev, 6, 0) << std::endl;

Uses two specialized functions:setDerivativesForward: Sets all derivatives of the forward types (dotvalues)setDerivativesReverse: Sets all derivatives of the reverse types (bar values)

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 25/ 27

Conclusion & Outlook

Conclusion:Higher order notationAlgorithm for efficient higher order derivatives selectionImplementation in CoDiPack

Outlook:Useful applications: e.g. One-Shot OptimizationPerformance testingTaylor implementationProvide a multi indices specification

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 26/ 27

CoDiPack Release 1.4CoDiPack release 1.4

Generalized types: RealForwardGen, RealReverseGen, etc.Removed Float types (Use generalized types)Higher order derivative helper

Select single derivatives by there order and numberSet all derivatives of a specific orderSee tutorials 7, 7.1 and 7.2

Write/read tapes to and from filesManagement of multiple tapesExternal functions interface changed

User functions have now a pointer to the calling tape.

Thank you for your attention!

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 27/ 27

Recommended