Upload
ngokiet
View
245
Download
0
Embed Size (px)
Citation preview
High Order Reverse Mode of ADTheory and Implementation
Mu Wang and Alex Pothen
Department of Computer SciencePurdue University
September 30, 2016
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 1 / 1
Research Overview
I Second order reverse mode :I More efficient in evaluating Hessian in both complexity and memory
usage in many applications.I Proved to be equivalent to an variance of vertex elimination on the
computational graph of the gradient
I High order reverse mode :I High order reverse mode : evaluating derivative tensor ∇d f up to any
order in reverse modeI Implementation : ReverseAD
I Applications:I Uncertainty quantificationI Chemistry : exchange-correlation (XC) energy functional
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 2 / 1
Research Overview
I Second order reverse mode :I More efficient in evaluating Hessian in both complexity and memory
usage in many applications.I Proved to be equivalent to an variance of vertex elimination on the
computational graph of the gradient
I High order reverse mode :I High order reverse mode : evaluating derivative tensor ∇d f up to any
order in reverse modeI Implementation : ReverseAD
I Applications:I Uncertainty quantificationI Chemistry : exchange-correlation (XC) energy functional
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 2 / 1
Research Overview
I Second order reverse mode :I More efficient in evaluating Hessian in both complexity and memory
usage in many applications.I Proved to be equivalent to an variance of vertex elimination on the
computational graph of the gradient
I High order reverse mode :I High order reverse mode : evaluating derivative tensor ∇d f up to any
order in reverse modeI Implementation : ReverseAD
I Applications:I Uncertainty quantificationI Chemistry : exchange-correlation (XC) energy functional
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 2 / 1
Research Overview
I Second order reverse mode :I More efficient in evaluating Hessian in both complexity and memory
usage in many applications.I Proved to be equivalent to an variance of vertex elimination on the
computational graph of the gradient
I High order reverse mode :I High order reverse mode : evaluating derivative tensor ∇d f up to any
order in reverse modeI Implementation : ReverseAD
I Applications:I Uncertainty quantificationI Chemistry : exchange-correlation (XC) energy functional
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 2 / 1
Research Overview
I Second order reverse mode :I More efficient in evaluating Hessian in both complexity and memory
usage in many applications.I Proved to be equivalent to an variance of vertex elimination on the
computational graph of the gradient
I High order reverse mode : ← (this talk)I High order reverse mode : evaluating derivative tensor ∇d f up to any
order in reverse modeI Implementation : ReverseAD
I Applications:I Uncertainty quantificationI Chemistry : exchange-correlation (XC) energy functional
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 2 / 1
Background
I For a scalar objective function f : Rn → RI First order AD:
I Forward : [F1 ◦ f ](x, x) =∑i
∂f∂xi
xi = ∇f T · x
I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1
, · · · , ∂f∂x1
) = ∇f
I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1
2 xT · ∇2f · x
I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f
I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1
Background
I For a scalar objective function f : Rn → RI First order AD:
I Forward : [F1 ◦ f ](x, x) =∑i
∂f∂xi
xi = ∇f T · x
I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1
, · · · , ∂f∂x1
) = ∇f
I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1
2 xT · ∇2f · x
I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f
I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1
Background
I For a scalar objective function f : Rn → RI First order AD:
I Forward : [F1 ◦ f ](x, x) =∑i
∂f∂xi
xi = ∇f T · x
I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1
, · · · , ∂f∂x1
) = ∇f
I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1
2 xT · ∇2f · x
I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f
I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1
Background
I For a scalar objective function f : Rn → RI First order AD:
I Forward : [F1 ◦ f ](x, x) =∑i
∂f∂xi
xi = ∇f T · x
I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1
, · · · , ∂f∂x1
) = ∇f
I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1
2 xT · ∇2f · x
I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f
I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1
Background
I For a scalar objective function f : Rn → RI First order AD:
I Forward : [F1 ◦ f ](x, x) =∑i
∂f∂xi
xi = ∇f T · x
I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1
, · · · , ∂f∂x1
) = ∇f
I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1
2 xT · ∇2f · x
I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f
I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1
Background
I For a scalar objective function f : Rn → RI First order AD:
I Forward : [F1 ◦ f ](x, x) =∑i
∂f∂xi
xi = ∇f T · x
I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1
, · · · , ∂f∂x1
) = ∇f
I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1
2 xT · ∇2f · x
I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f
I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1
Background
I For a scalar objective function f : Rn → RI First order AD:
I Forward : [F1 ◦ f ](x, x) =∑i
∂f∂xi
xi = ∇f T · x
I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1
, · · · , ∂f∂x1
) = ∇f
I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1
2 xT · ∇2f · x
I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f
I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1
High Order Forward Mode
I Accumulate high order taylor coefficients1 :I ∇d f : d-th order derivative tensor (symmetric).I ∇d f · x : A tensor-vector product, (d − 1)-th order symmetric tensor
I...
I
[[[∇d f · x] · x
]· · ·]· x : A scalar, the d-th order taylor coefficients.
1Griewank, Andreas, Jean Utke, and Andrea Walther. ”Evaluating higher derivativetensors by forward propagation of univariate Taylor series.” Mathematics ofComputation of the American Mathematical Society 69.231 (2000): 1117-1130.
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 4 / 1
High Order Forward Mode
I Accumulate high order taylor coefficients1 :I ∇d f : d-th order derivative tensor (symmetric).I ∇d f · x : A tensor-vector product, (d − 1)-th order symmetric tensor
I...
I
[[[∇d f · x] · x
]· · ·]· x : A scalar, the d-th order taylor coefficients.
I d = 2:I [F2 ◦ f ](x, x) = 1
2 xT · ∇2f · x
I [∇2f ]ij = [F2 ◦ f ](x, ei + ej)− [F2 ◦ f ](x, ei )− [F2 ◦ f ](x, ej)
1Griewank, Andreas, Jean Utke, and Andrea Walther. ”Evaluating higher derivativetensors by forward propagation of univariate Taylor series.” Mathematics ofComputation of the American Mathematical Society 69.231 (2000): 1117-1130.
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 4 / 1
High Order Forward Mode
I Accumulate high order taylor coefficients1 :I ∇d f : d-th order derivative tensor (symmetric).I ∇d f · x : A tensor-vector product, (d − 1)-th order symmetric tensor
I...
I
[[[∇d f · x] · x
]· · ·]· x : A scalar, the d-th order taylor coefficients.
I General case:I [Fd ◦ f ](x, x) = 1
d!
[[[∇d f · x] · x
]· · ·]· x
I [∇d f ]i1···id : a linear combination of
{[Fd ◦ f ](x, e) : e ∈ Span{ei1 , · · · , eid}}}
1Griewank, Andreas, Jean Utke, and Andrea Walther. ”Evaluating higher derivativetensors by forward propagation of univariate Taylor series.” Mathematics ofComputation of the American Mathematical Society 69.231 (2000): 1117-1130.
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 4 / 1
High Order Forward Mode
I Accumulate high order taylor coefficients1 :I ∇d f : d-th order derivative tensor (symmetric).I ∇d f · x : A tensor-vector product, (d − 1)-th order symmetric tensor
I...
I
[[[∇d f · x] · x
]· · ·]· x : A scalar, the d-th order taylor coefficients.
I General case:I [Fd ◦ f ](x, x) = 1
d!
[[[∇d f · x] · x
]· · ·]· x
I [∇d f ]i1···id : a linear combination of
{[Fd ◦ f ](x, e) : e ∈ Span{ei1 , · · · , eid}}}I Complexity : O(
((n+d−1)d
)· l)
1Griewank, Andreas, Jean Utke, and Andrea Walther. ”Evaluating higher derivativetensors by forward propagation of univariate Taylor series.” Mathematics ofComputation of the American Mathematical Society 69.231 (2000): 1117-1130.
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 4 / 1
Reverse Mode Revisit
Definition
After process the SAC vi = ϕi (vj){vj :vj≺vi} in reverse mode,the process SACs define an equivalent function fi (Si ).The objective function is the composition of fi and theremaining SACs and Si is the current live variable set.
Observation
reverse mode computes the derivatives of fi (Si ) in eachstep by following the order chain rule.
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 5 / 1
Reverse Mode Revisit
Definition
After process the SAC vi = ϕi (vj){vj :vj≺vi} in reverse mode,the process SACs define an equivalent function fi (Si ).The objective function is the composition of fi and theremaining SACs and Si is the current live variable set.
Observation
Second order reverse mode computes the first and thesecond order derivatives of fi (Si ) in each step by followingthe first and second order chain rule.
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 5 / 1
Reverse Mode Revisit
Definition
After process the SAC vi = ϕi (vj){vj :vj≺vi} in reverse mode,the process SACs define an equivalent function fi (Si ).The objective function is the composition of fi and theremaining SACs and Si is the current live variable set.
Observation
Second order High order reverse mode computes the firstand the second order derivatives up to order d of fi (Si ) ineach step by following the first and second high order chainrule.
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 5 / 1
High Order Chain Rule
Observation
High order reverse mode computes the derivatives up toorder d of fi in each step by following the high order chainrule.
I When process vi = ϕi (vj){vj :vj≺vi}:
I Si = Si+1 \ {vi} ∪ {vj : vj ≺ vi}I fi (Si ) = fi+1(Si+1 \ {vi}, vi = ϕi (vj){vj :vj≺vi})
I High order chain rule:
derivatives of fi+1(Si+1) → derivatives of fi (Si )
I General case of Faa di Bruno equationI Special case of the equation in Ma, 20092
2Ma, Tsoy-Wo. ”Higher chain formula proved by combinatorics.” the electronicjournal of combinatorics 16.1 (2009): N21.
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 6 / 1
High Order Chain Rule
Observation
High order reverse mode computes the derivatives up toorder d of fi in each step by following the high order chainrule.
I When process vi = ϕi (vj){vj :vj≺vi}:
I Si = Si+1 \ {vi} ∪ {vj : vj ≺ vi}I fi (Si ) = fi+1(Si+1 \ {vi}, vi = ϕi (vj){vj :vj≺vi})
I High order chain rule:
derivatives of fi+1(Si+1) → derivatives of fi (Si )
I General case of Faa di Bruno equationI Special case of the equation in Ma, 20092
2Ma, Tsoy-Wo. ”Higher chain formula proved by combinatorics.” the electronicjournal of combinatorics 16.1 (2009): N21.
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 6 / 1
High Order Chain Rule
Multiset
A multiset D is a generalization of the notion of a set inwhich members are allowed to appear more than once.
We use DS to represent the family of all multisets over S.That is: DS = {D : D = {e1, e2, · · · , ed}, ei ∈ S , 1 ≤ i ≤ d}
Derivative Mapping
For a function f (S), its order d derivative tensor can berepresented as a mapping from D ∈ DS , |D| = d to R as:
Tf (D) =∂|D|f
∂D=
∂|D|f
∂vi1∂vi2 · · · ∂vi|D|
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 7 / 1
High Order Chain Rule
Multiset
A multiset D is a generalization of the notion of a set inwhich members are allowed to appear more than once.
We use DS to represent the family of all multisets over S.That is: DS = {D : D = {e1, e2, · · · , ed}, ei ∈ S , 1 ≤ i ≤ d}
Derivative Mapping
For a function f (S), its order d derivative tensor can berepresented as a mapping from D ∈ DS , |D| = d to R as:
Tf (D) =∂|D|f
∂D=
∂|D|f
∂vi1∂vi2 · · · ∂vi|D|
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 7 / 1
High Order Chain Rule
Multiset
A multiset D is a generalization of the notion of a set inwhich members are allowed to appear more than once.
We use DS to represent the family of all multisets over S.That is: DS = {D : D = {e1, e2, · · · , ed}, ei ∈ S , 1 ≤ i ≤ d}
Derivative Mapping
For a function f (S), its order d derivative tensor can berepresented as a mapping from D ∈ DS , |D| = d to R as:
Tf (D) =∂|D|f
∂D=
∂|D|f
∂vi1∂vi2 · · · ∂vi|D|
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 7 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.I First order : D = {v}
I ai (v) = ai+1(v) + ∂ϕi
∂v ai+1(vi )
I ∂ϕi
∂v ai+1(vi ) : DL = ∅, D1 = {v}
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 8 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.I First order : D = {v}
I ai (v) = ai+1(v) + ∂ϕi
∂v ai+1(vi )
I ∂ϕi
∂v ai+1(vi ) : DL = ∅, D1 = {v}
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 8 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.I First order : D = {v}
I ai (v) = ai+1(v) + ∂ϕi
∂v ai+1(vi )
I ∂ϕi
∂v ai+1(vi ) : DL = ∅, D1 = {v}
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 8 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.I First order : D = {v}
I ai (v) = ai+1(v) + ∂ϕi
∂v ai+1(vi )
I ∂ϕi
∂v ai+1(vi ) : DL = ∅, D1 = {v}
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 8 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.I First order : D = {v}
I ai (v) = ai+1(v) + ∂ϕi
∂v ai+1(vi )
I ∂ϕi
∂v ai+1(vi ) : DL = ∅, D1 = {v}
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 8 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Second order : D = {v , u}
hi (v , u) = hi+1(v , u) +∂ϕi
∂vhi+1(vi , u) +
∂ϕi
∂uhi+1(v , vi )
+∂ϕi
∂v
∂ϕi
∂uhi+1(vi , vi ) +
∂2ϕi
∂v∂uai+1(vi )
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Second order : D = {v , u}
hi (v , u) = hi+1(v , u) +∂ϕi
∂vhi+1(vi , u) +
∂ϕi
∂uhi+1(v , vi )
+∂ϕi
∂v
∂ϕi
∂uhi+1(vi , vi ) +
∂2ϕi
∂v∂uai+1(vi )
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Second order : D = {v , u}
hi (v , u) = hi+1(v , u) +∂ϕi
∂vhi+1(vi , u) +
∂ϕi
∂uhi+1(v , vi )
+∂ϕi
∂v
∂ϕi
∂uhi+1(vi , vi ) +
∂2ϕi
∂v∂uai+1(vi )
I ∂ϕi
∂v hi+1(vi , u) : DL = {u}, D1 = {v}
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Second order : D = {v , u}
hi (v , u) = hi+1(v , u) +∂ϕi
∂vhi+1(vi , u) +
∂ϕi
∂uhi+1(v , vi )
+∂ϕi
∂v
∂ϕi
∂uhi+1(vi , vi ) +
∂2ϕi
∂v∂uai+1(vi )
I ∂ϕi
∂u hi+1(v , vi ) : DL = {v}, D1 = {u}
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Second order : D = {v , u}
hi (v , u) = hi+1(v , u) +∂ϕi
∂vhi+1(vi , u) +
∂ϕi
∂uhi+1(v , vi )
+∂ϕi
∂v
∂ϕi
∂uhi+1(vi , vi ) +
∂2ϕi
∂v∂uai+1(vi )
I ∂ϕi
∂v∂ϕi
∂u hi+1(vi , vi ) : DL = ∅, D1 = {v}, D2 = {u}
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1
High Order Chain Rule
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Second order : D = {v , u}
hi (v , u) = hi+1(v , u) +∂ϕi
∂vhi+1(vi , u) +
∂ϕi
∂uhi+1(v , vi )
+∂ϕi
∂v
∂ϕi
∂uhi+1(vi , vi ) +
∂2ϕi
∂v∂uai+1(vi )
I ∂2ϕi
∂v∂u ai+1(vi ) : DL = ∅, D1 = {v , u}
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1
High Order Reverse Mode : Complexity
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Bd+1 summations : (d + 1) th Bell number.
I O(Bd+1 · sd−1i ) updates for each SAC.
I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.
I When d = 2 : O(l · s) second order reverse mode
I When d = 3 : O(l · s2) third order reverse mode
I...
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1
High Order Reverse Mode : Complexity
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Bd+1 summations : (d + 1) th Bell number.
I O(Bd+1 · sd−1i ) updates for each SAC.
I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.
I When d = 2 : O(l · s) second order reverse mode
I When d = 3 : O(l · s2) third order reverse mode
I...
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1
High Order Reverse Mode : Complexity
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Bd+1 summations : (d + 1) th Bell number.
I O(Bd+1 · sd−1i ) updates for each SAC.
I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.
I When d = 2 : O(l · s) second order reverse mode
I When d = 3 : O(l · s2) third order reverse mode
I...
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1
High Order Reverse Mode : Complexity
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Bd+1 summations : (d + 1) th Bell number.
I O(Bd+1 · sd−1i ) updates for each SAC.
I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.
I When d = 2 : O(l · s) second order reverse mode
I When d = 3 : O(l · s2) third order reverse mode
I...
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1
High Order Reverse Mode : Complexity
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Bd+1 summations : (d + 1) th Bell number.
I O(Bd+1 · sd−1i ) updates for each SAC.
I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.
I When d = 2 : O(l · s) second order reverse mode
I When d = 3 : O(l · s2) third order reverse mode
I...
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1
High Order Reverse Mode : Complexity
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Bd+1 summations : (d + 1) th Bell number.
I O(Bd+1 · sd−1i ) updates for each SAC.
I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.
I When d = 2 : O(l · s) second order reverse mode
I When d = 3 : O(l · s2) third order reverse mode
I...
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1
High Order Reverse Mode : Complexity
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Bd+1 summations : (d + 1) th Bell number.
I O(Bd+1 · sd−1i ) updates for each SAC.
I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.
I When d = 2 : O(l · s) second order reverse mode
I When d = 3 : O(l · s2) third order reverse mode
I...
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1
High Order Reverse Mode : Complexity
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D.
I Bd+1 summations : (d + 1) th Bell number.
I O(Bd+1 · sd−1i ) updates for each SAC.
I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.
I When d = 2 : O(l · s) second order reverse mode
I When d = 3 : O(l · s2) third order reverse mode
I...
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1
High Order Reverse Mode : Implementation
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D
I Generate all DL, s.t, Tfi+1(DL ∪ {v r
i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r
I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr
I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into
DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1
High Order Reverse Mode : Implementation
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D
I Generate all DL, s.t, Tfi+1(DL ∪ {v r
i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r
I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr
I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into
DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1
High Order Reverse Mode : Implementation
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D
I Generate all DL, s.t, Tfi+1(DL ∪ {v r
i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r
I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr
I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into
DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1
High Order Reverse Mode : Implementation
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D
I Generate all DL, s.t, Tfi+1(DL ∪ {v r
i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r
I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr
I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into
DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1
High Order Reverse Mode : Implementation
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D
I Generate all DL, s.t, Tfi+1(DL ∪ {v r
i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r
I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr
I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into
DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1
High Order Reverse Mode : Implementation
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D
I Generate all DL, s.t, Tfi+1(DL ∪ {v r
i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r
I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr
I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into
DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1
High Order Reverse Mode : Implementation
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I DL,D1, · · · ,Dr is a partition of D
I Generate all DL, s.t, Tfi+1(DL ∪ {v r
i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r
I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr
I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into
DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1
High Order Reverse Mode : Implementation
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I ReverseAD : an operator overloading implementation of the highorder reverse mode in C++11.
I Available at https://github.com/wangmu0701/ReverseAD.
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1
High Order Reverse Mode : Implementation
Tfi (D) = Tfi+1(D) +
∑DL*D
[ ∑Di∩Dj=∅
D1∪···∪Dr=D\DL
∂ϕi
∂D1· · · ∂ϕi
∂DrTfi+1
(DL ∪ {v ri })]
I ReverseAD : an operator overloading implementation of the highorder reverse mode in C++11.
I Available at https://github.com/wangmu0701/ReverseAD.
I Monotonic indexing for variables on the tracevj ≺ vi =⇒ index(vj) < index(vi )
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1
Performance : Synthetic Function
I A synthetic function designed with parameters:I n : number of independent variablesI s : size of live variables during the function evaluationI l : the complexity of the functionI Dense derivatives
ID(z) =
√z ∗ z ,
2.0 + z − 2.0,
z ∗ 2.0 ∗ 0.5,
log(exp(z)),
1.0/(1.0/z),
sin(asin(z)).
y =
√s∏
i=1
ti ,
ti = IDk ◦ · · · ◦ ID1(zi ),zi = t,
t =n∑
i=1
xi .
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 12 / 1
Performance : Synthetic Function
I A synthetic function designed with parameters:I n : number of independent variablesI s : size of live variables during the function evaluationI l : the complexity of the functionI Dense derivatives
ID(z) =
√z ∗ z ,
2.0 + z − 2.0,
z ∗ 2.0 ∗ 0.5,
log(exp(z)),
1.0/(1.0/z),
sin(asin(z)).
y =
√s∏
i=1
ti ,
ti = IDk ◦ · · · ◦ ID1(zi ),zi = t,
t =n∑
i=1
xi .
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 12 / 1
Performance : Synthetic Function
I Fixed l , let n and s change simultaneously
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 13 / 1
Performance : Synthetic Function
I Fixed l , let n and s change simultaneously
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 13 / 1
Performance : Synthetic Function
I Fixed l and n, changed s
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 14 / 1
Performance : Synthetic Function
I Fixed l and n, changed s
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 14 / 1
Performance : Synthetic Function
I Fixed l and s, changed n
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 15 / 1
Performance : Synthetic Function
I Fixed l and s, changed n
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 15 / 1
Application : XCFUN (on going)
I Arbitrary order Exchange-Correlation functional library
I https://github.com/dftlibs/xcfun
I Using libtaylor to evaluate derivatives of functionals
I Up to third order in current implementation
I Small number of independents : 20 at most
I Not so-complex functionals
I On a collection of functionals:
I Third order Libtaylor : 81ms
I Third order ReverseAD : 20ms
I Fourth order ReverseAD : 83ms
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 16 / 1
Application : XCFUN (on going)
I Arbitrary order Exchange-Correlation functional library
I https://github.com/dftlibs/xcfun
I Using libtaylor to evaluate derivatives of functionals
I Up to third order in current implementation
I Small number of independents : 20 at most
I Not so-complex functionals
I On a collection of functionals:
I Third order Libtaylor : 81ms
I Third order ReverseAD : 20ms
I Fourth order ReverseAD : 83ms
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 16 / 1
Conclusion and Future Work
I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.
I A series of algorithms to evaluate derivatives Tf up to order d :
Fd → · · · → F1 ◦ Rd−1 → Rd
I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :
[[[∇d f · x] · x
]· · ·]· x
I The structural (and sparsity) properties of Tf determines the optimalmethod.
I General compression and recovery using F1 ◦ Rd−1.
perfectly parallelizable
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1
Conclusion and Future Work
I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.
I A series of algorithms to evaluate derivatives Tf up to order d :
Fd → · · · → F1 ◦ Rd−1 → Rd
I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :
[[[∇d f · x] · x
]· · ·]· x
I The structural (and sparsity) properties of Tf determines the optimalmethod.
I General compression and recovery using F1 ◦ Rd−1.
perfectly parallelizable
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1
Conclusion and Future Work
I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.
I A series of algorithms to evaluate derivatives Tf up to order d :
Fd → · · · → F1 ◦ Rd−1 → Rd
I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :
[[[∇d f · x] · x
]· · ·]· x
I The structural (and sparsity) properties of Tf determines the optimalmethod.
I General compression and recovery using F1 ◦ Rd−1.
perfectly parallelizable
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1
Conclusion and Future Work
I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.
I A series of algorithms to evaluate derivatives Tf up to order d :
Fd → · · · → F1 ◦ Rd−1 → Rd
I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :
[[[∇d f · x] · x
]· · ·]· x
I The structural (and sparsity) properties of Tf determines the optimalmethod.
I General compression and recovery using F1 ◦ Rd−1.
perfectly parallelizable
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1
Conclusion and Future Work
I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.
I A series of algorithms to evaluate derivatives Tf up to order d :
Fd → · · · → F1 ◦ Rd−1 → Rd
I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :
[[[∇d f · x] · x
]· · ·]· x
I The structural (and sparsity) properties of Tf determines the optimalmethod.
I General compression and recovery using F1 ◦ Rd−1.
perfectly parallelizable
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1
Conclusion and Future Work
I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.
I A series of algorithms to evaluate derivatives Tf up to order d :
Fd → · · · → F1 ◦ Rd−1 → Rd
I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :
[[[∇d f · x] · x
]· · ·]· x
I The structural (and sparsity) properties of Tf determines the optimalmethod.
I General compression and recovery using F1 ◦ Rd−1.
perfectly parallelizable
Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1