57
,,,,, Escaping the curse of dimensionality with a tree-based regressor Samory Kpotufe UCSD CSE

Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Escaping the curse of dimensionality with atree-based regressor

Samory Kpotufe

UCSD CSE

Page 2: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Curse of dimensionality• In general: Computational and/or prediction performance

deteriorate as the dimension D increases.

• For nonparametric regression: Worst case bounds on excessrisk ‖fn − f‖2 are of the form n−2/(2+D).Here ‖fn − f‖2 = EX ‖fn(X)− f(X)‖2.

Reasons for hope: data often has low intrinsic complexity

d-dimensional manifold d-sparse data

Page 3: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Curse of dimensionality• In general: Computational and/or prediction performance

deteriorate as the dimension D increases.

• For nonparametric regression: Worst case bounds on excessrisk ‖fn − f‖2 are of the form n−2/(2+D).Here ‖fn − f‖2 = EX ‖fn(X)− f(X)‖2.

Reasons for hope: data often has low intrinsic complexity

d-dimensional manifold d-sparse data

Page 4: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Curse of dimensionality• In general: Computational and/or prediction performance

deteriorate as the dimension D increases.

• For nonparametric regression: Worst case bounds on excessrisk ‖fn − f‖2 are of the form n−2/(2+D).Here ‖fn − f‖2 = EX ‖fn(X)− f(X)‖2.

Reasons for hope: data often has low intrinsic complexity

d-dimensional manifold d-sparse data

Page 5: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

How do we take advantage of such situations?

• Manifold learning (e.g. LLE, Isomap): embed the data ina lower dimensional space where traditional learners mightperform well.

• Adaptivity: can we design learners which run in IRD butwhose performance depend just on the intrinsic complexity ofthe data?

Page 6: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

How do we take advantage of such situations?

• Manifold learning (e.g. LLE, Isomap): embed the data ina lower dimensional space where traditional learners mightperform well.

• Adaptivity: can we design learners which run in IRD butwhose performance depend just on the intrinsic complexity ofthe data?

Page 7: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Recent adaptivity results• Classification with dyadic trees (Scott and Nowak, 2006).

Manifold data.

• Kernel regression (Bickel and Li, 2006).Manifold data.

• Vector Quantization with RPtree partitioning (Dasguptaand Freund, 2008).Data with low Assouad dimension.

Tree-based regressors are computationally inexpensive relative tokernel regressors. Is there an adaptive tree-based regressor?

Page 8: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Recent adaptivity results• Classification with dyadic trees (Scott and Nowak, 2006).

Manifold data.

• Kernel regression (Bickel and Li, 2006).Manifold data.

• Vector Quantization with RPtree partitioning (Dasguptaand Freund, 2008).Data with low Assouad dimension.

Tree-based regressors are computationally inexpensive relative tokernel regressors. Is there an adaptive tree-based regressor?

Page 9: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Recent adaptivity results• Classification with dyadic trees (Scott and Nowak, 2006).

Manifold data.

• Kernel regression (Bickel and Li, 2006).Manifold data.

• Vector Quantization with RPtree partitioning (Dasguptaand Freund, 2008).Data with low Assouad dimension.

Tree-based regressors are computationally inexpensive relative tokernel regressors. Is there an adaptive tree-based regressor?

Page 10: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Recent adaptivity results• Classification with dyadic trees (Scott and Nowak, 2006).

Manifold data.

• Kernel regression (Bickel and Li, 2006).Manifold data.

• Vector Quantization with RPtree partitioning (Dasguptaand Freund, 2008).Data with low Assouad dimension.

Tree-based regressors are computationally inexpensive relative tokernel regressors. Is there an adaptive tree-based regressor?

Page 11: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Recent adaptivity results• Classification with dyadic trees (Scott and Nowak, 2006).

Manifold data.

• Kernel regression (Bickel and Li, 2006).Manifold data.

• Vector Quantization with RPtree partitioning (Dasguptaand Freund, 2008).Data with low Assouad dimension.

Tree-based regressors are computationally inexpensive relative tokernel regressors. Is there an adaptive tree-based regressor?

Page 12: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Recent adaptivity results• Classification with dyadic trees (Scott and Nowak, 2006).

Manifold data.

• Kernel regression (Bickel and Li, 2006).Manifold data.

• Vector Quantization with RPtree partitioning (Dasguptaand Freund, 2008).Data with low Assouad dimension.

Tree-based regressors are computationally inexpensive relative tokernel regressors. Is there an adaptive tree-based regressor?

Page 13: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Recent adaptivity results• Classification with dyadic trees (Scott and Nowak, 2006).

Manifold data.

• Kernel regression (Bickel and Li, 2006).Manifold data.

• Vector Quantization with RPtree partitioning (Dasguptaand Freund, 2008).Data with low Assouad dimension.

Tree-based regressors are computationally inexpensive relative tokernel regressors. Is there an adaptive tree-based regressor?

Page 14: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Tree-based regression

Build a hierarchy of nested partitions of X , somehow pick apartition A:fn,A(x) .= average Y value in A(x), the cell of A in which x falls.

Dyadic tree k-d tree

Page 15: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Tree-based regression

Build a hierarchy of nested partitions of X , somehow pick apartition A:fn,A(x) .= average Y value in A(x), the cell of A in which x falls.

Dyadic tree k-d tree RPtree

Page 16: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Random Partition tree (RPtree)

Recursively bisect the data near the median along a random direction.

Figure: First level.

Page 17: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Random Partition tree (RPtree)

Recursively bisect the data near the median along a random direction.

Figure: Second level.

Page 18: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Our results:We show how to use the RPtree for regression and obtain ratesthat depend just on the intrinsic complexity of the data, namely itsAssouad dimension.

This is the first such adaptivity result for tree-based regression.

Page 19: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Our results:We show how to use the RPtree for regression and obtain ratesthat depend just on the intrinsic complexity of the data, namely itsAssouad dimension.

This is the first such adaptivity result for tree-based regression.

Page 20: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

What we’ll see next:

• Preliminaries: (1) Assouad dimension. (2)Algorithmic issues regarding the choice ofa partition.

• How we choose an RPtree partition for regression.

• Analysis overview.

Page 21: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Assouad dimension.

Definition

The Assouad dimension (or doubling dimension) of X is thesmallest d such that any ball B ⊂ X can be covered by 2d balls ofhalf its radius.

22 balls 23 balls

Page 22: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Examples of data with low Assouad dimension

d-dimensional manifold d-sparse data

Page 23: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Choosing a good partition: bias-variance tradeoff

Partition A with small cell diameters?Low bias: x is near all the data in A(x) =⇒ similar Y values.High variance: fewer data in A(x) =⇒ unstable estimates.

So choose partition with mid-range cell diameters.

x x x

Page 24: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Choosing a good partition: bias-variance tradeoff

Partition A with small cell diameters?Low bias: x is near all the data in A(x) =⇒ similar Y values.High variance: fewer data in A(x) =⇒ unstable estimates.

So choose partition with mid-range cell diameters.

x x x

Page 25: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Choosing a good partition: bias-variance tradeoff

Partition A with small cell diameters?Low bias: x is near all the data in A(x) =⇒ similar Y values.High variance: fewer data in A(x) =⇒ unstable estimates.

So choose partition with mid-range cell diameters.

x x x

Page 26: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Choosing a good partition: bias-variance tradeoff

Partition A with small cell diameters?Low bias: x is near all the data in A(x) =⇒ similar Y values.High variance: fewer data in A(x) =⇒ unstable estimates.

So choose partition with mid-range cell diameters.

x x x

Page 27: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Problem with RPtree cells:• Cell diameters are hard to assess.

• Cell diameters may not decrease at all.

Page 28: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Problem with RPtree cells:• Cell diameters are hard to assess.

• Cell diameters may not decrease at all.

Page 29: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Problem with RPtree cells:• Cell diameters are hard to assess.

• Cell diameters may not decrease at all.

Cell diameter

Data diameter

Page 30: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Choosing a partition based on data diameter:This is of general interest for algorithm design and risk analysis:many trees have misbehaved cell diameters like RPtree does.

Data diameters aren’t stable =⇒ hard to generalize from.

RPtree quickly decreases data diameters from the root down:data diameters are halved every d log d levels [DF08].

Fast data diameter decrease rate implies: We reach cells withsmall data diameters without decreasing the number of points percell too much.

Page 31: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Choosing a partition based on data diameter:This is of general interest for algorithm design and risk analysis:many trees have misbehaved cell diameters like RPtree does.

Data diameters aren’t stable =⇒ hard to generalize from.

RPtree quickly decreases data diameters from the root down:data diameters are halved every d log d levels [DF08].

Fast data diameter decrease rate implies: We reach cells withsmall data diameters without decreasing the number of points percell too much.

Page 32: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Choosing a partition based on data diameter:This is of general interest for algorithm design and risk analysis:many trees have misbehaved cell diameters like RPtree does.

Data diameters aren’t stable =⇒ hard to generalize from.

RPtree quickly decreases data diameters from the root down:data diameters are halved every d log d levels [DF08].

Fast data diameter decrease rate implies: We reach cells withsmall data diameters without decreasing the number of points percell too much.

Page 33: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Choosing a partition based on data diameter:This is of general interest for algorithm design and risk analysis:many trees have misbehaved cell diameters like RPtree does.

Data diameters aren’t stable =⇒ hard to generalize from.

RPtree quickly decreases data diameters from the root down:data diameters are halved every d log d levels [DF08].

Fast data diameter decrease rate implies: We reach cells withsmall data diameters without decreasing the number of points percell too much.

Page 34: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Choosing a partition based on data diameter:This is of general interest for algorithm design and risk analysis:many trees have misbehaved cell diameters like RPtree does.

Data diameters aren’t stable =⇒ hard to generalize from.

RPtree quickly decreases data diameters from the root down:data diameters are halved every d log d levels [DF08].

Fast data diameter decrease rate implies: We reach cells withsmall data diameters without decreasing the number of points percell too much.

Page 35: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

What we’ll see next:

• Preliminaries: (1) Assouad dimension. (2) Algorithmic issuesregarding the choice of a partition.

• How we choose an RPtree partition forregression.

• Analysis overview.

Page 36: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

How we choose an RPtree partition for regression

Identify candidate partitions Ai such that data diameters arehalved from Ai to Ai+1.

Diameter decrease rate k:max levels between Ai and Ai+1.

Two stopping options:I: Stop after O(log n) levels, and testall fn,Ai on new sample.

II: Stop when data diameters are smallenough. Requires no testing.

Page 37: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

How we choose an RPtree partition for regression

Identify candidate partitions Ai such that data diameters arehalved from Ai to Ai+1.

Diameter decrease rate k:max levels between Ai and Ai+1.

Two stopping options:I: Stop after O(log n) levels, and testall fn,Ai on new sample.

II: Stop when data diameters are smallenough. Requires no testing.

Page 38: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

How we choose an RPtree partition for regression

Identify candidate partitions Ai such that data diameters arehalved from Ai to Ai+1.

Diameter decrease rate k:max levels between Ai and Ai+1.

Two stopping options:I: Stop after O(log n) levels, and testall fn,Ai on new sample.

II: Stop when data diameters are smallenough. Requires no testing.

Page 39: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

How we choose an RPtree partition for regression

Identify candidate partitions Ai such that data diameters arehalved from Ai to Ai+1.

Diameter decrease rate k:max levels between Ai and Ai+1.

Two stopping options:I: Stop after O(log n) levels, and testall fn,Ai on new sample.

II: Stop when data diameters are smallenough. Requires no testing.

Page 40: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

How we choose an RPtree partition for regression

Identify candidate partitions Ai such that data diameters arehalved from Ai to Ai+1.

Diameter decrease rate k:max levels between Ai and Ai+1.

Two stopping options:I: Stop after O(log n) levels, and testall fn,Ai on new sample.

II: Stop when data diameters are smallenough. Requires no testing.

Page 41: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

TheoremWith probability at least 1− δ, under either stopping option,

‖fn − f‖2 ≤ C(

log2 n+ log 1/δn

)2/(2+k)

,

where k ≤ C ′d log d is the observed diameter decrease rate.

Assumptions:

Regression function f(x) = E [Y |X = x] is Lipschitz, X and Y arebounded. No distributional assumption.

Page 42: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

What we’ll see next:

• Preliminaries: (1) Assouad dimension. (2) Algorithmic issuesregarding the choice of a partition.

• How we choose an RPtree partition for regression.

• Analysis overview.

Page 43: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Risk analysis: Handling data diameters

Remember: Data diameters don’t generalize well to distribution.Solution: ∀A ∈

{Ai

}, replace A with alternate partition A′.

Partition A

• Dense cells of A′ have manageable diameters: cell diametersapproximate data diameters.

• Risk(fn,A′) ≈ Risk(fn,A).

Page 44: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Risk analysis: Handling data diameters

Remember: Data diameters don’t generalize well to distribution.Solution: ∀A ∈

{Ai

}, replace A with alternate partition A′.

Partition A

• Dense cells of A′ have manageable diameters: cell diametersapproximate data diameters.

• Risk(fn,A′) ≈ Risk(fn,A).

Page 45: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Risk analysis: Handling data diameters

Remember: Data diameters don’t generalize well to distribution.Solution: ∀A ∈

{Ai

}, replace A with alternate partition A′.

Cover B Partition A Partition A′

• Dense cells of A′ have manageable diameters: cell diametersapproximate data diameters.

• Risk(fn,A′) ≈ Risk(fn,A).

Page 46: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Risk analysis: Handling data diameters

Remember: Data diameters don’t generalize well to distribution.Solution: ∀A ∈

{Ai

}, replace A with alternate partition A′.

Cover B Partition A Partition A′

• Dense cells of A′ have manageable diameters: cell diametersapproximate data diameters.

• Risk(fn,A′) ≈ Risk(fn,A).

Page 47: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

So we can just analyze the risk of fn,A′, for every A ∈{Ai

}Handle the empty cells and the dense cells separately.

Figure: Partition A′.

Page 48: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

So we can just analyze the risk of fn,A′, for every A ∈{Ai

}Handle the empty cells and the dense cells separately.

Figure: Partition A′.

Page 49: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Relative VC bounds for A′ ∈ A′

• µn(A′) ≈ 0 =⇒ µ(A′) . 1n : empty cells don’t affect risk.

• µn(A′) >> 0 =⇒ µn(A′) ≈ µ(A′): dense cells will be stable.

Need to exhibit VC class containing all A′ ∈ A′

• Cells A′ have random shapes.

• Define a VC class C ⊃ A′ by conditioning on the randomnessin the algorithm and n.

Page 50: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Relative VC bounds for A′ ∈ A′

• µn(A′) ≈ 0 =⇒ µ(A′) . 1n : empty cells don’t affect risk.

• µn(A′) >> 0 =⇒ µn(A′) ≈ µ(A′): dense cells will be stable.

Need to exhibit VC class containing all A′ ∈ A′

• Cells A′ have random shapes.

• Define a VC class C ⊃ A′ by conditioning on the randomnessin the algorithm and n.

Page 51: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Relative VC bounds for A′ ∈ A′

• µn(A′) ≈ 0 =⇒ µ(A′) . 1n : empty cells don’t affect risk.

• µn(A′) >> 0 =⇒ µn(A′) ≈ µ(A′): dense cells will be stable.

Need to exhibit VC class containing all A′ ∈ A′

• Cells A′ have random shapes.

• Define a VC class C ⊃ A′ by conditioning on the randomnessin the algorithm and n.

Page 52: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Relative VC bounds for A′ ∈ A′

• µn(A′) ≈ 0 =⇒ µ(A′) . 1n : empty cells don’t affect risk.

• µn(A′) >> 0 =⇒ µn(A′) ≈ µ(A′): dense cells will be stable.

Need to exhibit VC class containing all A′ ∈ A′

• Cells A′ have random shapes.

• Define a VC class C ⊃ A′ by conditioning on the randomnessin the algorithm and n.

Page 53: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Risk bound for fn,A′, ∀A ∈{Ai

}:∥∥fn,A′ − f

∥∥2.|A′|n

+ diam2n

(A′

).

Bound is minimized when:

diam2n (A′) ≈ |A

′|n , in which case

∥∥fn,A′ − f∥∥2

. n−2/(2+k).

Show that we can find A ∈{Ai

}s.t. A′ minimizes the bound,

and conclude.

Page 54: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Risk bound for fn,A′, ∀A ∈{Ai

}:∥∥fn,A′ − f

∥∥2.|A′|n

+ diam2n

(A′

).

Bound is minimized when:

diam2n (A′) ≈ |A

′|n , in which case

∥∥fn,A′ − f∥∥2

. n−2/(2+k).

Show that we can find A ∈{Ai

}s.t. A′ minimizes the bound,

and conclude.

Page 55: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Risk bound for fn,A′, ∀A ∈{Ai

}:∥∥fn,A′ − f

∥∥2.|A′|n

+ diam2n

(A′

).

Bound is minimized when:

diam2n (A′) ≈ |A

′|n , in which case

∥∥fn,A′ − f∥∥2

. n−2/(2+k).

Show that we can find A ∈{Ai

}s.t. A′ minimizes the bound,

and conclude.

Page 56: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Recap:If the data space has low Assouad dimension d, the excess risk ofan RPtree regressor depends just on d.

Page 57: Escaping the curse of dimensionality with a tree …skk2175/Papers/presentationShort.pdfCurse of dimensionality In general: Computational and/or prediction performance deteriorate

, , , , ,

Extending to a more general settingWhat if the data has high Assouad dimension overall, but theintrinsic dimension is low in smaller regions?