Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
An algorithm for the efficient selection of higher orderderivatives
Max Sagebaum and Nicolas R. Gauger
AG Scientific ComputingTU Kaiserslautern
20th AD WorkshopINRIA Sophia-Antipolis
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 1/ 27
Overview
Higher order notationSelection algorithmImplementationExamples
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 2/ 27
Motivation - Higher order ADLet F : R→ R be given by:
y = F (x)
Then the forward mode of F is:
y =dF
dxx
and the reverse mode of F is:
x =dF
dx
T
y
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 3/ 27
Motivation - Higher order AD
AD yields a new code that computes:
(y , y) = Ft(F (x),dF
dxx)
and
(y , x) = Fa(x , y) = (F (x),dF
dx
T
y) .
Apply AD again! How to call the new variables ˙x?
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 4/ 27
Motivation - Higher order AD
AD yields a new code that computes:
(y , y) = Ft(F (x),dF
dxx)
and
(y , x) = Fa(x , y) = (F (x),dF
dx
T
y) .
Apply AD again! How to call the new variables ˙x?
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 4/ 27
Higher order AD notationNotation taken from Naumann et al.1 with modifications.
Add the application index to the variable
a=a(1)
The second application of AD on a and a(1) becomes then:
a(2) and·
[a(1)](2)
Multiple indices are merged:
·[a(1)]
(2)
= a(1,2)
This allows for arbitrary high order notations.
1U. Naumann. The Art of Differentiating Computer Programs. SIAM, 2012.M. Sagebaum Higher Order Derivatives 20-th AD Workshop 5/ 27
Higher order AD notationNotation taken from Naumann et al.1 with modifications.
Add the application index to the variable
a=a(1)
The second application of AD on a and a(1) becomes then:
a(2) and·
[a(1)](2)
Multiple indices are merged:
·[a(1)]
(2)
= a(1,2)
This allows for arbitrary high order notations.
1U. Naumann. The Art of Differentiating Computer Programs. SIAM, 2012.M. Sagebaum Higher Order Derivatives 20-th AD Workshop 5/ 27
Higher order AD notation
The same is done for the reverse mode:Add the application index to the variable
a=a(1)
The second application of AD on a and a(1) becomes then:
a(2) and_
[a(1)](2)
Multiple indices are merged:_
[a(1)](2)= a(1,2)
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 6/ 27
Higher order AD notation
Mixed notations are:Reverse AD (third application) on forward AD a and a(1,2):
˙a(1)(3) and ˙a(1,2)
(3)
Forward AD (third application) on reverse AD a and a(1,2):
˙a(3)(1) and ˙a(3)
(1,2)
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 7/ 27
Example - Higher order AD notation
Forward AD on Ft :
(y (2), y (1,2)) =
(dF
dxx (2),
d2F
d2xx (1)x (2) +
dF
dxx (1,2)
)which can be combined to
(y , y (1), y (2), y (1,2)) =Ftt(x , x(1), x (2), x (1,2))
=
(F (x),
dF
dxx (1),
dF
dxx (2),
d2F
d2xx (1)x (2) +
dF
dxx (1,2)
).
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 8/ 27
Example - Higher order AD notation
Forward AD on Ftt :
y (3) =dF
dxx (3)
y (1,3) =d2F
d2xx (1)x (3) +
dF
dxx (1,3)
y (2,3) =d2F
d2xx (2)x (3) +
dF
dxx (2,3)
y (1,2,3) =d3F
d3xx (1)x (2)x (3) +
d2F
d2xx (2)x (1,3) +
d2F
d2xx (1)x (2,3) +
d2F
d2xx (1,2)x (3) +
dF
dxx (1,2,3) .
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 9/ 27
Example - Higher order AD notation
Reverse AD on Fa:
x(2) =d2F
d2xy(1)x(1,2) +
dF
dxy(2)
y(1,2) =dF
dxx(1,2)
which can be combined to
(y , x(1), y(1,2), x(2)) =Faa(x , y(1), x(1,2), y(2))
=
(F (x),
dF
dx
T
y(1),dF
dx
T
x(1,2),d2F
d2xy(1)x(1,2) +
dF
dxy(2)
).
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 10/ 27
Example - Higher order AD notationReverse AD on Faa:
y(2,3) =dF
dx
T
x(2,3)
y(1,3) =d2F
d2xx(1,2)x(2,3) +
dF
dx
T
x(1,3)
x(1,2,3) =d2F
d2xy(1)x(2,3) +
dF
dx
T
y(1,2,3)
x(3) =d3F
d3x
T
y(1)x(1,2)x(2,3) +d2F
d2xy(1)x(1,3) +
d2F
d2xy(2)x(2,3) +
d2F
d2xx(1,2)y(1,2,3)
+dF
dx
T
y(3)
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 11/ 27
Example - Higher order AD notation
Reverse AD on Ft and forward AD on Fa:
(y , y (1), ˙x (1)(2) , x(2)) =Fta(x , x (1), ˙y (1)
(2) , y(2))
=
(F (x),
dF
dxx (1),
dF
dx
T
˙y (1)(2) ,
d2F
d2xx (1) ˙y (1)
(2) +dF
dxy(2)
)(y , y (2), x(1), ˙x (2)
(1) ) =Fat(x , x(2), y(1), ˙y (2)
(1) )
=
(F (x),
dF
dxx (2),
dF
dx
T
y(1),d2F
d2xy(1)x
(2) +dF
dx˙y (2)(1)
).
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 12/ 27
Higher order AD notation
Higher order AD notation is possibleAlready quite complicated for small ordersEven more involved for vector valued functions
How to enable the user to access higher order derivativesefficiently?
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 13/ 27
Higher order AD notation
Higher order AD notation is possibleAlready quite complicated for small ordersEven more involved for vector valued functions
How to enable the user to access higher order derivativesefficiently?
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 13/ 27
Selection algorithmAD can be applied n times on a code
This yields 2n valuesOne primal value (e.g. a)2n − 1 derivative values
This can be split into the binomial coefficients:
2n =n∑
l=0
(n
l
)
Each term defines how many derivatives oforder l exist
11 1
1 2 11 3 3 1
1 4 6 4 11 5 10 10 5 1
1 6 15 20 15 6 1
Create an algorithm were the user can specify:The derivative order lWhich derivative he/she wants: e.g. 1 ≤ k ≤
(nl
)
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 14/ 27
Selection algorithmAD can be applied n times on a code
This yields 2n valuesOne primal value (e.g. a)2n − 1 derivative values
This can be split into the binomial coefficients:
2n =n∑
l=0
(n
l
)
Each term defines how many derivatives oforder l exist
11 1
1 2 11 3 3 1
1 4 6 4 11 5 10 10 5 1
1 6 15 20 15 6 1
Create an algorithm were the user can specify:The derivative order lWhich derivative he/she wants: e.g. 1 ≤ k ≤
(nl
)
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 14/ 27
Selection algorithmEach coefficient can be split into a recursive formulation(
n
l
)=
(n − 1l
)+
(n − 1l − 1
)The application of AD can be viewed as a graph
2
1 d0 d
0 p
1 p0 d
0 p
The first term(n−1l
)describes the number of derivatives in the upper
branch p (primal)The second term
(n−1l−1
)describes the number of derivatives in the lower
branch d (derivative)
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 15/ 27
Selection algorithmRequire: n : The maximum derivative orderRequire: l : The derivative order the user wants to selectRequire: k : The derivative the user wants to select 1 ≤ k ≤
(nl
)ln := lkn := kfor i = n, . . . , 1 do
if ki ≤(i−1li
)then
select upper branch (select primal part)li−1 := liki−1 := ki
elseselect lower branch (select derivative part)li−1 := li − 1ki−1 := ki −
(i−1li
)end if
end for
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 16/ 27
Example - Selection algorithm
User input: Select the secondderivative of third order (a(1,2,4))
n = 4l = 3k = 2
Algorithmi = 4l4 = 3k4 = 2Upper:
(33
)= 1
Lower:(32
)= 3
Select lower: l3 = 3− 1 = 2Select lower: k3 = 2− 1 = 1
4
3 d
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p
3 p
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p
3
1
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 17/ 27
Example - Selection algorithm
User input: Select the secondderivative of third order (a(1,2,4))
n = 4l = 3k = 2
Algorithmi = 3l3 = 2k3 = 1Upper:
(22
)= 1
Lower:(21
)= 2
Select upper: l2 = 2Select upper: k2 = 1
4
3 d
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p
3 p
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p
3
21
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 17/ 27
Example - Selection algorithm
User input: Select the secondderivative of third order (a(1,2,4))
n = 4l = 3k = 2
Algorithmi = 2l2 = 2k2 = 1Upper:
(12
)= 0
Lower:(11
)= 1
Select lower: l1 = 2− 1 = 1Select lower: k1 = 1− 0 = 1
4
3 d
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p
3 p
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p
3
11
0
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 17/ 27
Example - Selection algorithm
User input: Select the secondderivative of third order (a(1,2,4))
n = 4l = 3k = 2
Algorithmi = 1l1 = 2k1 = 1Upper:
(01
)= 0
Lower:(00
)= 1
Select lower: l0 = 1− 1 = 0Select lower: k0 = 1− 0 = 1
4
3 d
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p
3 p
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p
3
11
1
0
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 17/ 27
Example - Selection algorithm
User input: Select the secondderivative of third order (a(1,2,4))
n = 4l = 3k = 2
Algorithmi = 1l1 = 2k1 = 1Upper:
(01
)= 0
Lower:(00
)= 1
Select lower: l0 = 1− 1 = 0Select lower: k0 = 1− 0 = 1
4
3 d
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p
3 p
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p
3
11
1
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 17/ 27
Example - Selection algorithm
What are you selecting?
3
2 d
1 d0 d
0 p
1 p0 d
0 p
2 p
1 d0 d
0 p
1 p0 d
0 p a = (l = 0, k = 1)
a(1) = (l = 1, k = 1)
a(2) = (l = 1, k = 2)
a(3) = (l = 1, k = 3)
a(1,2) = (l = 2, k = 1)
a(1,3) = (l = 2, k = 2)
a(2,3) = (l = 2, k = 3)
a(1,2,3) = (l = 3, k = 1)
k accesses the multi indices in a sorted order.
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 18/ 27
Implementation - Selection algorithm
Strongly simplifiedType safe implementation makes it quite involved
template<typename Real, size_t n>struct DerivativeSelector {
static double select(Real& value, int l, int k) {size_t upperDerivatives = binomial(n - 1, l);
if(k <= upperDerivatives) {return DerivativeSelector<typename Real::Real, n - 1>
::select(value.value(), l, k);} else {
return DerivativeSelector<typename Real::GradientValue, n - 1>::select(value.gradient(), l - 1, k - upperDerivatives);
}}
};
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 19/ 27
Implementation - Selection algorithm
This is used for several helper functions:Select a specific derivative:derivative(value, l, k)
Set all derivatives of a specific order l :setDerivatives(value, l, derivative)
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 20/ 27
Example - CoDiPackCreate higher order types:typedef codi::RealForwardGen<double> t1s;typedef codi::RealForwardGen<t1s> t2s;typedef codi::RealForwardGen<t2s> t3s;typedef codi::RealForwardGen<t3s> t4s;typedef codi::RealForwardGen<t4s> t5s;typedef codi::RealForwardGen<t5s> t6s;
typedef codi::RealReverseGen<t5s> r6s;
Compute 2nd order derivatives:{
typedef codi::DerivativeHelper<t2s> DH;
t2s aFor = 2.0;DH::setDerivatives(aFor, 1, 1.0);
t2s cFor = func(aFor);
cout << "t0s: " << DH::derivative(cFor, 0, 0) << std::endl;cout << "t1_1s: " << DH::derivative(cFor, 1, 0) << std::endl;cout << "t1_2s: " << DH::derivative(cFor, 1, 1) << std::endl;cout << "t2s: " << DH::derivative(cFor, 2, 0) << std::endl;
}
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 21/ 27
Example - CoDiPack
With func defined as:f (x) = 3 ∗ x7
The result is:t0s: 384t1_1s: 1344t1_2s: 1344t2s: 4032
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 22/ 27
Example - CoDiPackCreate 6th order derivatives:
With setDerivatives:t6s aFor = 2.0;DH::setDerivatives(aFor, 1, 1.0);
With derivative:t6s aFor = 2.0;DH::derivative(aFor, 1, 0) = 1.0;DH::derivative(aFor, 1, 1) = 1.0;DH::derivative(aFor, 1, 2) = 1.0;DH::derivative(aFor, 1, 3) = 1.0;DH::derivative(aFor, 1, 4) = 1.0;DH::derivative(aFor, 1, 5) = 1.0;
Using the old way:t6s aFor = 2.0;aFor.value().value().value().value().value().gradient() = 1.0;aFor.value().value().value().value().gradient().value() = 1.0;aFor.value().value().value().gradient().value().value() = 1.0;aFor.value().value().gradient().value().value().value() = 1.0;aFor.value().gradient().value().value().value().value() = 1.0;aFor.gradient().value().value().value().value().value() = 1.0;
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 23/ 27
Example - CoDiPack
Get the result:With derivative:
t6s cFor = func(aFor);
cout << "t0s: " << cFor << std::endl;cout << "t6s: " << DH::derivative(cFor, 6, 0) << std::endl;
Using the old way:t6s cFor = func(aFor);
cout << "t0s: " << cFor << std::endl;cout << "t6s: " << cFor.gradient().gradient().gradient()
.gradient().gradient().gradient() << std::endl;
Which produces:t0s: 384t6s: 30240
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 24/ 27
Example - CoDiPackUsing the reverse mode:
r6s aRev = 2.0;// set all first order directions on the primal valueDH::setDerivativesForward(aRev, 1, 1.0);tape.registerInput(aRev);
r6s cRev = func(aRev);
tape.registerOutput(cRev);// set all first order directions on the adjoint valueDH::setDerivativesReverse(cRev, 1, 1.0);
tape.evaluate();
cout << "r0s: " << cRev << std::endl;cout << "r6s: " << DH::derivative(aRev, 6, 0) << std::endl;
Uses two specialized functions:setDerivativesForward: Sets all derivatives of the forward types (dotvalues)setDerivativesReverse: Sets all derivatives of the reverse types (bar values)
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 25/ 27
Conclusion & Outlook
Conclusion:Higher order notationAlgorithm for efficient higher order derivatives selectionImplementation in CoDiPack
Outlook:Useful applications: e.g. One-Shot OptimizationPerformance testingTaylor implementationProvide a multi indices specification
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 26/ 27
CoDiPack Release 1.4CoDiPack release 1.4
Generalized types: RealForwardGen, RealReverseGen, etc.Removed Float types (Use generalized types)Higher order derivative helper
Select single derivatives by there order and numberSet all derivatives of a specific orderSee tutorials 7, 7.1 and 7.2
Write/read tapes to and from filesManagement of multiple tapesExternal functions interface changed
User functions have now a pointer to the calling tape.
Thank you for your attention!
M. Sagebaum Higher Order Derivatives 20-th AD Workshop 27/ 27