An algorithm for the efficient selection of higher order ... EuroAd Workshop... · 3 = 1 Lower: 3 2 = 3 Selectlower:l 3 = 3 1= 2 Selectlower:k 3 = 2 0d1= 1 4 3d 2d 1d 0d 0p 1p 0p

An algorithm for the efficient selection of higher orderderivatives

Max Sagebaum and Nicolas R. Gauger

AG Scientific ComputingTU Kaiserslautern

20th AD WorkshopINRIA Sophia-Antipolis

M. Sagebaum Higher Order Derivatives 20-th AD Workshop 1/ 27

Overview

Higher order notationSelection algorithmImplementationExamples


Motivation - Higher order ADLet F : R→ R be given by:

y = F (x)

Then the forward mode of F is:

y =dF

dxx

and the reverse mode of F is:

x =dF

dx

T

y


Motivation - Higher order AD

AD yields a new code that computes:

(y , y) = Ft(F (x),dF

dxx)

and

(y , x) = Fa(x , y) = (F (x),dF

dx

T

y) .

Apply AD again! How to call the new variables ˙x?


Motivation - Higher order AD

AD yields a new code that computes:

(y , y) = Ft(F (x),dF

dxx)

and

(y , x) = Fa(x , y) = (F (x),dF

dx

T

y) .

Apply AD again! How to call the new variables ˙x?


Higher order AD notationNotation taken from Naumann et al.1 with modifications.

Add the application index to the variable

a=a(1)

The second application of AD on a and a(1) becomes then:

a(2) and·

[a(1)](2)

Multiple indices are merged:

·[a(1)]

(2)

= a(1,2)

This allows for arbitrary high order notations.

1U. Naumann. The Art of Differentiating Computer Programs. SIAM, 2012.M. Sagebaum Higher Order Derivatives 20-th AD Workshop 5/ 27

Higher order AD notationNotation taken from Naumann et al.1 with modifications.

Add the application index to the variable

a=a(1)


a(2) and·

[a(1)](2)

Multiple indices are merged:

·[a(1)]

(2)

= a(1,2)

This allows for arbitrary high order notations.

1U. Naumann. The Art of Differentiating Computer Programs. SIAM, 2012.M. Sagebaum Higher Order Derivatives 20-th AD Workshop 5/ 27

Higher order AD notation

The same is done for the reverse mode:Add the application index to the variable

a=a(1)


a(2) and_

[a(1)](2)

Multiple indices are merged:_

[a(1)](2)= a(1,2)



Mixed notations are:Reverse AD (third application) on forward AD a and a(1,2):

˙a(1)(3) and ˙a(1,2)

(3)

Forward AD (third application) on reverse AD a and a(1,2):

˙a(3)(1) and ˙a(3)

(1,2)


Example - Higher order AD notation

Forward AD on Ft :

(y (2), y (1,2)) =

(dF

dxx (2),

d2F

d2xx (1)x (2) +

dF

dxx (1,2)

)which can be combined to

(y , y (1), y (2), y (1,2)) =Ftt(x , x(1), x (2), x (1,2))

=

(F (x),

dF

dxx (1),

dF

dxx (2),

d2F

d2xx (1)x (2) +

dF

dxx (1,2)

).



Forward AD on Ftt :

y (3) =dF

dxx (3)

y (1,3) =d2F

d2xx (1)x (3) +

dF

dxx (1,3)

y (2,3) =d2F

d2xx (2)x (3) +

dF

dxx (2,3)

y (1,2,3) =d3F

d3xx (1)x (2)x (3) +

d2F

d2xx (2)x (1,3) +

d2F

d2xx (1)x (2,3) +

d2F

d2xx (1,2)x (3) +

dF

dxx (1,2,3) .



Reverse AD on Fa:

x(2) =d2F

d2xy(1)x(1,2) +

dF

dxy(2)

y(1,2) =dF

dxx(1,2)

which can be combined to

(y , x(1), y(1,2), x(2)) =Faa(x , y(1), x(1,2), y(2))

=

(F (x),

dF

dx

T

y(1),dF

dx

T

x(1,2),d2F

d2xy(1)x(1,2) +

dF

dxy(2)

).


Example - Higher order AD notationReverse AD on Faa:

y(2,3) =dF

dx

T

x(2,3)

y(1,3) =d2F

d2xx(1,2)x(2,3) +

dF

dx

T

x(1,3)

x(1,2,3) =d2F

d2xy(1)x(2,3) +

dF

dx

T

y(1,2,3)

x(3) =d3F

d3x

T

y(1)x(1,2)x(2,3) +d2F

d2xy(1)x(1,3) +

d2F

d2xy(2)x(2,3) +

d2F

d2xx(1,2)y(1,2,3)

+dF

dx

T

y(3)



Reverse AD on Ft and forward AD on Fa:

(y , y (1), ˙x (1)(2) , x(2)) =Fta(x , x (1), ˙y (1)

(2) , y(2))

=

(F (x),

dF

dxx (1),

dF

dx

T

˙y (1)(2) ,

d2F

d2xx (1) ˙y (1)

(2) +dF

dxy(2)

)(y , y (2), x(1), ˙x (2)

(1) ) =Fat(x , x(2), y(1), ˙y (2)

(1) )

=

(F (x),

dF

dxx (2),

dF

dx

T

y(1),d2F

d2xy(1)x

(2) +dF

dx˙y (2)(1)

).



Higher order AD notation is possibleAlready quite complicated for small ordersEven more involved for vector valued functions

How to enable the user to access higher order derivativesefficiently?



Higher order AD notation is possibleAlready quite complicated for small ordersEven more involved for vector valued functions

How to enable the user to access higher order derivativesefficiently?


Selection algorithmAD can be applied n times on a code

This yields 2n valuesOne primal value (e.g. a)2n − 1 derivative values

This can be split into the binomial coefficients:

2n =n∑

l=0

(n

l

)

Each term defines how many derivatives oforder l exist

11 1

1 2 11 3 3 1

1 4 6 4 11 5 10 10 5 1

1 6 15 20 15 6 1

Create an algorithm were the user can specify:The derivative order lWhich derivative he/she wants: e.g. 1 ≤ k ≤

(nl

)


Selection algorithmAD can be applied n times on a code

This yields 2n valuesOne primal value (e.g. a)2n − 1 derivative values

This can be split into the binomial coefficients:

2n =n∑

l=0

(n

l

)

Each term defines how many derivatives oforder l exist

11 1

1 2 11 3 3 1

1 4 6 4 11 5 10 10 5 1

1 6 15 20 15 6 1

Create an algorithm were the user can specify:The derivative order lWhich derivative he/she wants: e.g. 1 ≤ k ≤

(nl

)


Selection algorithmEach coefficient can be split into a recursive formulation(

n

l

)=

(n − 1l

)+

(n − 1l − 1

)The application of AD can be viewed as a graph

2

1 d0 d

0 p

1 p0 d

0 p

The first term(n−1l

)describes the number of derivatives in the upper

branch p (primal)The second term

(n−1l−1

)describes the number of derivatives in the lower

branch d (derivative)


Selection algorithmRequire: n : The maximum derivative orderRequire: l : The derivative order the user wants to selectRequire: k : The derivative the user wants to select 1 ≤ k ≤

(nl

)ln := lkn := kfor i = n, . . . , 1 do

if ki ≤(i−1li

)then

select upper branch (select primal part)li−1 := liki−1 := ki

elseselect lower branch (select derivative part)li−1 := li − 1ki−1 := ki −

(i−1li

)end if

end for


Example - Selection algorithm

User input: Select the secondderivative of third order (a(1,2,4))

n = 4l = 3k = 2

Algorithmi = 4l4 = 3k4 = 2Upper:

(33

)= 1

Lower:(32

)= 3

Select lower: l3 = 3− 1 = 2Select lower: k3 = 2− 1 = 1

4

3 d

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3 p

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3

1




n = 4l = 3k = 2


(22

)= 1

Lower:(21

)= 2

Select upper: l2 = 2Select upper: k2 = 1

4

3 d

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3 p

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3

21




n = 4l = 3k = 2


(12

)= 0

Lower:(11

)= 1


4

3 d

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3 p

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3

11

0




n = 4l = 3k = 2


(01

)= 0

Lower:(00

)= 1


4

3 d

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3 p

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3

11

1

0




n = 4l = 3k = 2


(01

)= 0

Lower:(00

)= 1


4

3 d

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3 p

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p

3

11

1



What are you selecting?

3

2 d

1 d0 d

0 p

1 p0 d

0 p

2 p

1 d0 d

0 p

1 p0 d

0 p a = (l = 0, k = 1)

a(1) = (l = 1, k = 1)

a(2) = (l = 1, k = 2)

a(3) = (l = 1, k = 3)

a(1,2) = (l = 2, k = 1)

a(1,3) = (l = 2, k = 2)

a(2,3) = (l = 2, k = 3)

a(1,2,3) = (l = 3, k = 1)

k accesses the multi indices in a sorted order.


Implementation - Selection algorithm

Strongly simplifiedType safe implementation makes it quite involved

template<typename Real, size_t n>struct DerivativeSelector {

static double select(Real& value, int l, int k) {size_t upperDerivatives = binomial(n - 1, l);

if(k <= upperDerivatives) {return DerivativeSelector<typename Real::Real, n - 1>

::select(value.value(), l, k);} else {

return DerivativeSelector<typename Real::GradientValue, n - 1>::select(value.gradient(), l - 1, k - upperDerivatives);

}}

};


Implementation - Selection algorithm

This is used for several helper functions:Select a specific derivative:derivative(value, l, k)

Set all derivatives of a specific order l :setDerivatives(value, l, derivative)


Example - CoDiPackCreate higher order types:typedef codi::RealForwardGen<double> t1s;typedef codi::RealForwardGen<t1s> t2s;typedef codi::RealForwardGen<t2s> t3s;typedef codi::RealForwardGen<t3s> t4s;typedef codi::RealForwardGen<t4s> t5s;typedef codi::RealForwardGen<t5s> t6s;

typedef codi::RealReverseGen<t5s> r6s;

Compute 2nd order derivatives:{

typedef codi::DerivativeHelper<t2s> DH;

t2s aFor = 2.0;DH::setDerivatives(aFor, 1, 1.0);

t2s cFor = func(aFor);

cout << "t0s: " << DH::derivative(cFor, 0, 0) << std::endl;cout << "t1_1s: " << DH::derivative(cFor, 1, 0) << std::endl;cout << "t1_2s: " << DH::derivative(cFor, 1, 1) << std::endl;cout << "t2s: " << DH::derivative(cFor, 2, 0) << std::endl;

}


Example - CoDiPack

With func defined as:f (x) = 3 ∗ x7

The result is:t0s: 384t1_1s: 1344t1_2s: 1344t2s: 4032


Example - CoDiPackCreate 6th order derivatives:

With setDerivatives:t6s aFor = 2.0;DH::setDerivatives(aFor, 1, 1.0);

With derivative:t6s aFor = 2.0;DH::derivative(aFor, 1, 0) = 1.0;DH::derivative(aFor, 1, 1) = 1.0;DH::derivative(aFor, 1, 2) = 1.0;DH::derivative(aFor, 1, 3) = 1.0;DH::derivative(aFor, 1, 4) = 1.0;DH::derivative(aFor, 1, 5) = 1.0;

Using the old way:t6s aFor = 2.0;aFor.value().value().value().value().value().gradient() = 1.0;aFor.value().value().value().value().gradient().value() = 1.0;aFor.value().value().value().gradient().value().value() = 1.0;aFor.value().value().gradient().value().value().value() = 1.0;aFor.value().gradient().value().value().value().value() = 1.0;aFor.gradient().value().value().value().value().value() = 1.0;


Example - CoDiPack

Get the result:With derivative:

t6s cFor = func(aFor);

cout << "t0s: " << cFor << std::endl;cout << "t6s: " << DH::derivative(cFor, 6, 0) << std::endl;

Using the old way:t6s cFor = func(aFor);

cout << "t0s: " << cFor << std::endl;cout << "t6s: " << cFor.gradient().gradient().gradient()

.gradient().gradient().gradient() << std::endl;

Which produces:t0s: 384t6s: 30240


Example - CoDiPackUsing the reverse mode:

r6s aRev = 2.0;// set all first order directions on the primal valueDH::setDerivativesForward(aRev, 1, 1.0);tape.registerInput(aRev);

r6s cRev = func(aRev);

tape.registerOutput(cRev);// set all first order directions on the adjoint valueDH::setDerivativesReverse(cRev, 1, 1.0);

tape.evaluate();

cout << "r0s: " << cRev << std::endl;cout << "r6s: " << DH::derivative(aRev, 6, 0) << std::endl;

Uses two specialized functions:setDerivativesForward: Sets all derivatives of the forward types (dotvalues)setDerivativesReverse: Sets all derivatives of the reverse types (bar values)


Conclusion & Outlook

Conclusion:Higher order notationAlgorithm for efficient higher order derivatives selectionImplementation in CoDiPack

Outlook:Useful applications: e.g. One-Shot OptimizationPerformance testingTaylor implementationProvide a multi indices specification


CoDiPack Release 1.4CoDiPack release 1.4

Generalized types: RealForwardGen, RealReverseGen, etc.Removed Float types (Use generalized types)Higher order derivative helper

Select single derivatives by there order and numberSet all derivatives of a specific orderSee tutorials 7, 7.1 and 7.2

Write/read tapes to and from filesManagement of multiple tapesExternal functions interface changed

User functions have now a pointer to the calling tape.

Thank you for your attention!


Documents

An algorithm for the efficient selection of higher order ... EuroAd Workshop... · 3 = 1 Lower: 3 2 = 3 Selectlower:l 3 = 3 1= 2 Selectlower:k 3 = 2 0d1= 1 4 3d 2d 1d 0d 0p 1p 0p