69
High Order Reverse Mode of AD Theory and Implementation Mu Wang and Alex Pothen Department of Computer Science Purdue University September 30, 2016 Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 1/1

High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

  • Upload
    ngokiet

  • View
    245

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode of ADTheory and Implementation

Mu Wang and Alex Pothen

Department of Computer SciencePurdue University

September 30, 2016

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 1 / 1

Page 2: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Research Overview

I Second order reverse mode :I More efficient in evaluating Hessian in both complexity and memory

usage in many applications.I Proved to be equivalent to an variance of vertex elimination on the

computational graph of the gradient

I High order reverse mode :I High order reverse mode : evaluating derivative tensor ∇d f up to any

order in reverse modeI Implementation : ReverseAD

I Applications:I Uncertainty quantificationI Chemistry : exchange-correlation (XC) energy functional

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 2 / 1

Page 3: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Research Overview

I Second order reverse mode :I More efficient in evaluating Hessian in both complexity and memory

usage in many applications.I Proved to be equivalent to an variance of vertex elimination on the

computational graph of the gradient

I High order reverse mode :I High order reverse mode : evaluating derivative tensor ∇d f up to any

order in reverse modeI Implementation : ReverseAD

I Applications:I Uncertainty quantificationI Chemistry : exchange-correlation (XC) energy functional

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 2 / 1

Page 4: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Research Overview

I Second order reverse mode :I More efficient in evaluating Hessian in both complexity and memory

usage in many applications.I Proved to be equivalent to an variance of vertex elimination on the

computational graph of the gradient

I High order reverse mode :I High order reverse mode : evaluating derivative tensor ∇d f up to any

order in reverse modeI Implementation : ReverseAD

I Applications:I Uncertainty quantificationI Chemistry : exchange-correlation (XC) energy functional

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 2 / 1

Page 5: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Research Overview

I Second order reverse mode :I More efficient in evaluating Hessian in both complexity and memory

usage in many applications.I Proved to be equivalent to an variance of vertex elimination on the

computational graph of the gradient

I High order reverse mode :I High order reverse mode : evaluating derivative tensor ∇d f up to any

order in reverse modeI Implementation : ReverseAD

I Applications:I Uncertainty quantificationI Chemistry : exchange-correlation (XC) energy functional

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 2 / 1

Page 6: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Research Overview

I Second order reverse mode :I More efficient in evaluating Hessian in both complexity and memory

usage in many applications.I Proved to be equivalent to an variance of vertex elimination on the

computational graph of the gradient

I High order reverse mode : ← (this talk)I High order reverse mode : evaluating derivative tensor ∇d f up to any

order in reverse modeI Implementation : ReverseAD

I Applications:I Uncertainty quantificationI Chemistry : exchange-correlation (XC) energy functional

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 2 / 1

Page 7: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Background

I For a scalar objective function f : Rn → RI First order AD:

I Forward : [F1 ◦ f ](x, x) =∑i

∂f∂xi

xi = ∇f T · x

I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1

, · · · , ∂f∂x1

) = ∇f

I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1

2 xT · ∇2f · x

I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f

I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1

Page 8: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Background

I For a scalar objective function f : Rn → RI First order AD:

I Forward : [F1 ◦ f ](x, x) =∑i

∂f∂xi

xi = ∇f T · x

I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1

, · · · , ∂f∂x1

) = ∇f

I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1

2 xT · ∇2f · x

I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f

I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1

Page 9: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Background

I For a scalar objective function f : Rn → RI First order AD:

I Forward : [F1 ◦ f ](x, x) =∑i

∂f∂xi

xi = ∇f T · x

I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1

, · · · , ∂f∂x1

) = ∇f

I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1

2 xT · ∇2f · x

I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f

I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1

Page 10: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Background

I For a scalar objective function f : Rn → RI First order AD:

I Forward : [F1 ◦ f ](x, x) =∑i

∂f∂xi

xi = ∇f T · x

I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1

, · · · , ∂f∂x1

) = ∇f

I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1

2 xT · ∇2f · x

I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f

I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1

Page 11: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Background

I For a scalar objective function f : Rn → RI First order AD:

I Forward : [F1 ◦ f ](x, x) =∑i

∂f∂xi

xi = ∇f T · x

I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1

, · · · , ∂f∂x1

) = ∇f

I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1

2 xT · ∇2f · x

I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f

I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1

Page 12: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Background

I For a scalar objective function f : Rn → RI First order AD:

I Forward : [F1 ◦ f ](x, x) =∑i

∂f∂xi

xi = ∇f T · x

I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1

, · · · , ∂f∂x1

) = ∇f

I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1

2 xT · ∇2f · x

I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f

I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1

Page 13: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Background

I For a scalar objective function f : Rn → RI First order AD:

I Forward : [F1 ◦ f ](x, x) =∑i

∂f∂xi

xi = ∇f T · x

I Reverse : [R1 ◦ f ](x) = ( ∂f∂x1

, · · · , ∂f∂x1

) = ∇f

I Second order AD:I (Pure) Forward : [F2 ◦ f ](x, x) = 1

2 xT · ∇2f · x

I Mixed : [R1 ◦ F1 ◦ f ](x, x) = ∇2f · xI (Pure) Reverse : [R2 ◦ f ](x) = ∇2f

I High order AD:I (Pure) Forward : High order taylor coefficientsI (Pure) Reverse : High order reverse modeI Mixed modes then can be generated

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 3 / 1

Page 14: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Forward Mode

I Accumulate high order taylor coefficients1 :I ∇d f : d-th order derivative tensor (symmetric).I ∇d f · x : A tensor-vector product, (d − 1)-th order symmetric tensor

I...

I

[[[∇d f · x] · x

]· · ·]· x : A scalar, the d-th order taylor coefficients.

1Griewank, Andreas, Jean Utke, and Andrea Walther. ”Evaluating higher derivativetensors by forward propagation of univariate Taylor series.” Mathematics ofComputation of the American Mathematical Society 69.231 (2000): 1117-1130.

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 4 / 1

Page 15: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Forward Mode

I Accumulate high order taylor coefficients1 :I ∇d f : d-th order derivative tensor (symmetric).I ∇d f · x : A tensor-vector product, (d − 1)-th order symmetric tensor

I...

I

[[[∇d f · x] · x

]· · ·]· x : A scalar, the d-th order taylor coefficients.

I d = 2:I [F2 ◦ f ](x, x) = 1

2 xT · ∇2f · x

I [∇2f ]ij = [F2 ◦ f ](x, ei + ej)− [F2 ◦ f ](x, ei )− [F2 ◦ f ](x, ej)

1Griewank, Andreas, Jean Utke, and Andrea Walther. ”Evaluating higher derivativetensors by forward propagation of univariate Taylor series.” Mathematics ofComputation of the American Mathematical Society 69.231 (2000): 1117-1130.

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 4 / 1

Page 16: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Forward Mode

I Accumulate high order taylor coefficients1 :I ∇d f : d-th order derivative tensor (symmetric).I ∇d f · x : A tensor-vector product, (d − 1)-th order symmetric tensor

I...

I

[[[∇d f · x] · x

]· · ·]· x : A scalar, the d-th order taylor coefficients.

I General case:I [Fd ◦ f ](x, x) = 1

d!

[[[∇d f · x] · x

]· · ·]· x

I [∇d f ]i1···id : a linear combination of

{[Fd ◦ f ](x, e) : e ∈ Span{ei1 , · · · , eid}}}

1Griewank, Andreas, Jean Utke, and Andrea Walther. ”Evaluating higher derivativetensors by forward propagation of univariate Taylor series.” Mathematics ofComputation of the American Mathematical Society 69.231 (2000): 1117-1130.

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 4 / 1

Page 17: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Forward Mode

I Accumulate high order taylor coefficients1 :I ∇d f : d-th order derivative tensor (symmetric).I ∇d f · x : A tensor-vector product, (d − 1)-th order symmetric tensor

I...

I

[[[∇d f · x] · x

]· · ·]· x : A scalar, the d-th order taylor coefficients.

I General case:I [Fd ◦ f ](x, x) = 1

d!

[[[∇d f · x] · x

]· · ·]· x

I [∇d f ]i1···id : a linear combination of

{[Fd ◦ f ](x, e) : e ∈ Span{ei1 , · · · , eid}}}I Complexity : O(

((n+d−1)d

)· l)

1Griewank, Andreas, Jean Utke, and Andrea Walther. ”Evaluating higher derivativetensors by forward propagation of univariate Taylor series.” Mathematics ofComputation of the American Mathematical Society 69.231 (2000): 1117-1130.

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 4 / 1

Page 18: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Reverse Mode Revisit

Definition

After process the SAC vi = ϕi (vj){vj :vj≺vi} in reverse mode,the process SACs define an equivalent function fi (Si ).The objective function is the composition of fi and theremaining SACs and Si is the current live variable set.

Observation

reverse mode computes the derivatives of fi (Si ) in eachstep by following the order chain rule.

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 5 / 1

Page 19: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Reverse Mode Revisit

Definition

After process the SAC vi = ϕi (vj){vj :vj≺vi} in reverse mode,the process SACs define an equivalent function fi (Si ).The objective function is the composition of fi and theremaining SACs and Si is the current live variable set.

Observation

Second order reverse mode computes the first and thesecond order derivatives of fi (Si ) in each step by followingthe first and second order chain rule.

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 5 / 1

Page 20: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Reverse Mode Revisit

Definition

After process the SAC vi = ϕi (vj){vj :vj≺vi} in reverse mode,the process SACs define an equivalent function fi (Si ).The objective function is the composition of fi and theremaining SACs and Si is the current live variable set.

Observation

Second order High order reverse mode computes the firstand the second order derivatives up to order d of fi (Si ) ineach step by following the first and second high order chainrule.

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 5 / 1

Page 21: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Observation

High order reverse mode computes the derivatives up toorder d of fi in each step by following the high order chainrule.

I When process vi = ϕi (vj){vj :vj≺vi}:

I Si = Si+1 \ {vi} ∪ {vj : vj ≺ vi}I fi (Si ) = fi+1(Si+1 \ {vi}, vi = ϕi (vj){vj :vj≺vi})

I High order chain rule:

derivatives of fi+1(Si+1) → derivatives of fi (Si )

I General case of Faa di Bruno equationI Special case of the equation in Ma, 20092

2Ma, Tsoy-Wo. ”Higher chain formula proved by combinatorics.” the electronicjournal of combinatorics 16.1 (2009): N21.

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 6 / 1

Page 22: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Observation

High order reverse mode computes the derivatives up toorder d of fi in each step by following the high order chainrule.

I When process vi = ϕi (vj){vj :vj≺vi}:

I Si = Si+1 \ {vi} ∪ {vj : vj ≺ vi}I fi (Si ) = fi+1(Si+1 \ {vi}, vi = ϕi (vj){vj :vj≺vi})

I High order chain rule:

derivatives of fi+1(Si+1) → derivatives of fi (Si )

I General case of Faa di Bruno equationI Special case of the equation in Ma, 20092

2Ma, Tsoy-Wo. ”Higher chain formula proved by combinatorics.” the electronicjournal of combinatorics 16.1 (2009): N21.

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 6 / 1

Page 23: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Multiset

A multiset D is a generalization of the notion of a set inwhich members are allowed to appear more than once.

We use DS to represent the family of all multisets over S.That is: DS = {D : D = {e1, e2, · · · , ed}, ei ∈ S , 1 ≤ i ≤ d}

Derivative Mapping

For a function f (S), its order d derivative tensor can berepresented as a mapping from D ∈ DS , |D| = d to R as:

Tf (D) =∂|D|f

∂D=

∂|D|f

∂vi1∂vi2 · · · ∂vi|D|

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 7 / 1

Page 24: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Multiset

A multiset D is a generalization of the notion of a set inwhich members are allowed to appear more than once.

We use DS to represent the family of all multisets over S.That is: DS = {D : D = {e1, e2, · · · , ed}, ei ∈ S , 1 ≤ i ≤ d}

Derivative Mapping

For a function f (S), its order d derivative tensor can berepresented as a mapping from D ∈ DS , |D| = d to R as:

Tf (D) =∂|D|f

∂D=

∂|D|f

∂vi1∂vi2 · · · ∂vi|D|

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 7 / 1

Page 25: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Multiset

A multiset D is a generalization of the notion of a set inwhich members are allowed to appear more than once.

We use DS to represent the family of all multisets over S.That is: DS = {D : D = {e1, e2, · · · , ed}, ei ∈ S , 1 ≤ i ≤ d}

Derivative Mapping

For a function f (S), its order d derivative tensor can berepresented as a mapping from D ∈ DS , |D| = d to R as:

Tf (D) =∂|D|f

∂D=

∂|D|f

∂vi1∂vi2 · · · ∂vi|D|

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 7 / 1

Page 26: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.I First order : D = {v}

I ai (v) = ai+1(v) + ∂ϕi

∂v ai+1(vi )

I ∂ϕi

∂v ai+1(vi ) : DL = ∅, D1 = {v}

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 8 / 1

Page 27: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.I First order : D = {v}

I ai (v) = ai+1(v) + ∂ϕi

∂v ai+1(vi )

I ∂ϕi

∂v ai+1(vi ) : DL = ∅, D1 = {v}

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 8 / 1

Page 28: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.I First order : D = {v}

I ai (v) = ai+1(v) + ∂ϕi

∂v ai+1(vi )

I ∂ϕi

∂v ai+1(vi ) : DL = ∅, D1 = {v}

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 8 / 1

Page 29: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.I First order : D = {v}

I ai (v) = ai+1(v) + ∂ϕi

∂v ai+1(vi )

I ∂ϕi

∂v ai+1(vi ) : DL = ∅, D1 = {v}

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 8 / 1

Page 30: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.I First order : D = {v}

I ai (v) = ai+1(v) + ∂ϕi

∂v ai+1(vi )

I ∂ϕi

∂v ai+1(vi ) : DL = ∅, D1 = {v}

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 8 / 1

Page 31: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Second order : D = {v , u}

hi (v , u) = hi+1(v , u) +∂ϕi

∂vhi+1(vi , u) +

∂ϕi

∂uhi+1(v , vi )

+∂ϕi

∂v

∂ϕi

∂uhi+1(vi , vi ) +

∂2ϕi

∂v∂uai+1(vi )

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1

Page 32: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Second order : D = {v , u}

hi (v , u) = hi+1(v , u) +∂ϕi

∂vhi+1(vi , u) +

∂ϕi

∂uhi+1(v , vi )

+∂ϕi

∂v

∂ϕi

∂uhi+1(vi , vi ) +

∂2ϕi

∂v∂uai+1(vi )

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1

Page 33: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Second order : D = {v , u}

hi (v , u) = hi+1(v , u) +∂ϕi

∂vhi+1(vi , u) +

∂ϕi

∂uhi+1(v , vi )

+∂ϕi

∂v

∂ϕi

∂uhi+1(vi , vi ) +

∂2ϕi

∂v∂uai+1(vi )

I ∂ϕi

∂v hi+1(vi , u) : DL = {u}, D1 = {v}

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1

Page 34: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Second order : D = {v , u}

hi (v , u) = hi+1(v , u) +∂ϕi

∂vhi+1(vi , u) +

∂ϕi

∂uhi+1(v , vi )

+∂ϕi

∂v

∂ϕi

∂uhi+1(vi , vi ) +

∂2ϕi

∂v∂uai+1(vi )

I ∂ϕi

∂u hi+1(v , vi ) : DL = {v}, D1 = {u}

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1

Page 35: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Second order : D = {v , u}

hi (v , u) = hi+1(v , u) +∂ϕi

∂vhi+1(vi , u) +

∂ϕi

∂uhi+1(v , vi )

+∂ϕi

∂v

∂ϕi

∂uhi+1(vi , vi ) +

∂2ϕi

∂v∂uai+1(vi )

I ∂ϕi

∂v∂ϕi

∂u hi+1(vi , vi ) : DL = ∅, D1 = {v}, D2 = {u}

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1

Page 36: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Chain Rule

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Second order : D = {v , u}

hi (v , u) = hi+1(v , u) +∂ϕi

∂vhi+1(vi , u) +

∂ϕi

∂uhi+1(v , vi )

+∂ϕi

∂v

∂ϕi

∂uhi+1(vi , vi ) +

∂2ϕi

∂v∂uai+1(vi )

I ∂2ϕi

∂v∂u ai+1(vi ) : DL = ∅, D1 = {v , u}

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 9 / 1

Page 37: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Complexity

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Bd+1 summations : (d + 1) th Bell number.

I O(Bd+1 · sd−1i ) updates for each SAC.

I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.

I When d = 2 : O(l · s) second order reverse mode

I When d = 3 : O(l · s2) third order reverse mode

I...

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1

Page 38: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Complexity

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Bd+1 summations : (d + 1) th Bell number.

I O(Bd+1 · sd−1i ) updates for each SAC.

I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.

I When d = 2 : O(l · s) second order reverse mode

I When d = 3 : O(l · s2) third order reverse mode

I...

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1

Page 39: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Complexity

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Bd+1 summations : (d + 1) th Bell number.

I O(Bd+1 · sd−1i ) updates for each SAC.

I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.

I When d = 2 : O(l · s) second order reverse mode

I When d = 3 : O(l · s2) third order reverse mode

I...

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1

Page 40: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Complexity

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Bd+1 summations : (d + 1) th Bell number.

I O(Bd+1 · sd−1i ) updates for each SAC.

I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.

I When d = 2 : O(l · s) second order reverse mode

I When d = 3 : O(l · s2) third order reverse mode

I...

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1

Page 41: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Complexity

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Bd+1 summations : (d + 1) th Bell number.

I O(Bd+1 · sd−1i ) updates for each SAC.

I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.

I When d = 2 : O(l · s) second order reverse mode

I When d = 3 : O(l · s2) third order reverse mode

I...

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1

Page 42: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Complexity

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Bd+1 summations : (d + 1) th Bell number.

I O(Bd+1 · sd−1i ) updates for each SAC.

I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.

I When d = 2 : O(l · s) second order reverse mode

I When d = 3 : O(l · s2) third order reverse mode

I...

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1

Page 43: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Complexity

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Bd+1 summations : (d + 1) th Bell number.

I O(Bd+1 · sd−1i ) updates for each SAC.

I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.

I When d = 2 : O(l · s) second order reverse mode

I When d = 3 : O(l · s2) third order reverse mode

I...

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1

Page 44: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Complexity

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D.

I Bd+1 summations : (d + 1) th Bell number.

I O(Bd+1 · sd−1i ) updates for each SAC.

I Overall complexity : O(Bd+1 · sd−1 · l), s = max{si}I When d = 1 : O(l) Baur-Strassen theorem.

I When d = 2 : O(l · s) second order reverse mode

I When d = 3 : O(l · s2) third order reverse mode

I...

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 10 / 1

Page 45: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Implementation

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D

I Generate all DL, s.t, Tfi+1(DL ∪ {v r

i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r

I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr

I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into

DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1

Page 46: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Implementation

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D

I Generate all DL, s.t, Tfi+1(DL ∪ {v r

i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r

I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr

I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into

DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1

Page 47: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Implementation

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D

I Generate all DL, s.t, Tfi+1(DL ∪ {v r

i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r

I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr

I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into

DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1

Page 48: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Implementation

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D

I Generate all DL, s.t, Tfi+1(DL ∪ {v r

i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r

I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr

I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into

DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1

Page 49: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Implementation

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D

I Generate all DL, s.t, Tfi+1(DL ∪ {v r

i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r

I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr

I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into

DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1

Page 50: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Implementation

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D

I Generate all DL, s.t, Tfi+1(DL ∪ {v r

i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r

I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr

I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into

DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1

Page 51: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Implementation

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I DL,D1, · · · ,Dr is a partition of D

I Generate all DL, s.t, Tfi+1(DL ∪ {v r

i }) 6= 0 and D1, · · · ,Dr , s.t,∂ϕi∂Di6= 0, 1 ≤ i ≤ r

I Then perform incremental updates on D = DL ∪ D1 ∪ · · · ∪ Dr

I More than one way to partition D into DL,D1, · · · ,Dr .I SymCoeff (DL,D1, · · · ,Dr ) : Multiplicity that partition D into

DL,D1, · · · ,Dr .I Flat code for pre-computed symmetric coefficientsI ∼ 5k lines of generated code for up to sixth order

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1

Page 52: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Implementation

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I ReverseAD : an operator overloading implementation of the highorder reverse mode in C++11.

I Available at https://github.com/wangmu0701/ReverseAD.

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1

Page 53: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

High Order Reverse Mode : Implementation

Tfi (D) = Tfi+1(D) +

∑DL*D

[ ∑Di∩Dj=∅

D1∪···∪Dr=D\DL

∂ϕi

∂D1· · · ∂ϕi

∂DrTfi+1

(DL ∪ {v ri })]

I ReverseAD : an operator overloading implementation of the highorder reverse mode in C++11.

I Available at https://github.com/wangmu0701/ReverseAD.

I Monotonic indexing for variables on the tracevj ≺ vi =⇒ index(vj) < index(vi )

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 11 / 1

Page 54: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Performance : Synthetic Function

I A synthetic function designed with parameters:I n : number of independent variablesI s : size of live variables during the function evaluationI l : the complexity of the functionI Dense derivatives

ID(z) =

√z ∗ z ,

2.0 + z − 2.0,

z ∗ 2.0 ∗ 0.5,

log(exp(z)),

1.0/(1.0/z),

sin(asin(z)).

y =

√s∏

i=1

ti ,

ti = IDk ◦ · · · ◦ ID1(zi ),zi = t,

t =n∑

i=1

xi .

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 12 / 1

Page 55: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Performance : Synthetic Function

I A synthetic function designed with parameters:I n : number of independent variablesI s : size of live variables during the function evaluationI l : the complexity of the functionI Dense derivatives

ID(z) =

√z ∗ z ,

2.0 + z − 2.0,

z ∗ 2.0 ∗ 0.5,

log(exp(z)),

1.0/(1.0/z),

sin(asin(z)).

y =

√s∏

i=1

ti ,

ti = IDk ◦ · · · ◦ ID1(zi ),zi = t,

t =n∑

i=1

xi .

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 12 / 1

Page 56: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Performance : Synthetic Function

I Fixed l , let n and s change simultaneously

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 13 / 1

Page 57: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Performance : Synthetic Function

I Fixed l , let n and s change simultaneously

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 13 / 1

Page 58: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Performance : Synthetic Function

I Fixed l and n, changed s

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 14 / 1

Page 59: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Performance : Synthetic Function

I Fixed l and n, changed s

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 14 / 1

Page 60: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Performance : Synthetic Function

I Fixed l and s, changed n

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 15 / 1

Page 61: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Performance : Synthetic Function

I Fixed l and s, changed n

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 15 / 1

Page 62: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Application : XCFUN (on going)

I Arbitrary order Exchange-Correlation functional library

I https://github.com/dftlibs/xcfun

I Using libtaylor to evaluate derivatives of functionals

I Up to third order in current implementation

I Small number of independents : 20 at most

I Not so-complex functionals

I On a collection of functionals:

I Third order Libtaylor : 81ms

I Third order ReverseAD : 20ms

I Fourth order ReverseAD : 83ms

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 16 / 1

Page 63: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Application : XCFUN (on going)

I Arbitrary order Exchange-Correlation functional library

I https://github.com/dftlibs/xcfun

I Using libtaylor to evaluate derivatives of functionals

I Up to third order in current implementation

I Small number of independents : 20 at most

I Not so-complex functionals

I On a collection of functionals:

I Third order Libtaylor : 81ms

I Third order ReverseAD : 20ms

I Fourth order ReverseAD : 83ms

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 16 / 1

Page 64: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Conclusion and Future Work

I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.

I A series of algorithms to evaluate derivatives Tf up to order d :

Fd → · · · → F1 ◦ Rd−1 → Rd

I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :

[[[∇d f · x] · x

]· · ·]· x

I The structural (and sparsity) properties of Tf determines the optimalmethod.

I General compression and recovery using F1 ◦ Rd−1.

perfectly parallelizable

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1

Page 65: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Conclusion and Future Work

I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.

I A series of algorithms to evaluate derivatives Tf up to order d :

Fd → · · · → F1 ◦ Rd−1 → Rd

I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :

[[[∇d f · x] · x

]· · ·]· x

I The structural (and sparsity) properties of Tf determines the optimalmethod.

I General compression and recovery using F1 ◦ Rd−1.

perfectly parallelizable

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1

Page 66: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Conclusion and Future Work

I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.

I A series of algorithms to evaluate derivatives Tf up to order d :

Fd → · · · → F1 ◦ Rd−1 → Rd

I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :

[[[∇d f · x] · x

]· · ·]· x

I The structural (and sparsity) properties of Tf determines the optimalmethod.

I General compression and recovery using F1 ◦ Rd−1.

perfectly parallelizable

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1

Page 67: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Conclusion and Future Work

I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.

I A series of algorithms to evaluate derivatives Tf up to order d :

Fd → · · · → F1 ◦ Rd−1 → Rd

I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :

[[[∇d f · x] · x

]· · ·]· x

I The structural (and sparsity) properties of Tf determines the optimalmethod.

I General compression and recovery using F1 ◦ Rd−1.

perfectly parallelizable

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1

Page 68: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Conclusion and Future Work

I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.

I A series of algorithms to evaluate derivatives Tf up to order d :

Fd → · · · → F1 ◦ Rd−1 → Rd

I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :

[[[∇d f · x] · x

]· · ·]· x

I The structural (and sparsity) properties of Tf determines the optimalmethod.

I General compression and recovery using F1 ◦ Rd−1.

perfectly parallelizable

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1

Page 69: High Order Reverse Mode of AD Theory and Implementation · High Order Reverse Mode of AD Theory and Implementation ... usage in many applications. ... tensors by forward propagation

Conclusion and Future Work

I High order derivative tensors could (and probably should) be directlyevaluated via reverse mode.

I A series of algorithms to evaluate derivatives Tf up to order d :

Fd → · · · → F1 ◦ Rd−1 → Rd

I Rd : symmetric derivative tensor ∇d fI F1 ◦ Rd−1 : tensor-vector ∇d f · xI Fd :

[[[∇d f · x] · x

]· · ·]· x

I The structural (and sparsity) properties of Tf determines the optimalmethod.

I General compression and recovery using F1 ◦ Rd−1.

perfectly parallelizable

Mu Wang and Alex Pothen High Order Reverse AD September 30, 2016 17 / 1