12
Stability of Kernel–Based Interpolation Stefano De Marchi Department of Computer Science, University of Verona (Italy) Robert Schaback Institut f¨ ur Numerische und Angewandte Mathematik, University of G¨ ottingen (Germany) Abstract It is often observed that interpolation based on translates of radial basis func- tions or non-radial kernels is numerically unstable due to exceedingly large con- dition of the kernel matrix. But if stability is assessed in function space without considering special bases, this paper proves that kernel–based interpolation is sta- ble. Provided that the data are not too wildly scattered, the L 2 or L norms of interpolants can be bounded above by discrete 2 and norms of the data. Furthermore, Lagrange basis functions are uniformly bounded and Lebesgue con- stants grow at most like the square root of the number of data points. However, this analysis applies only to kernels of limited smoothness. Numerical examples support our bounds, but also show that the case of infinitely smooth kernels must lead to worse bounds in future work, while the observed Lebesgue constants for kernels with limited smoothness even seem to be independent of the sample size and the fill distance. 1 Introduction We consider the recovery of a real–valued function f R on some compact domain Ω R d from its function values f (x j ) on a scattered set X = {x 1 , ..., x N }⊂ Ω R d . Independent of how the reconstruction is done in detail, we denote the result as s f,X and assume that it is a linear function of the data, i.e. it takes the form s f,X = N j =1 f (x j )u j (1) 1

Stability of kernel-based interpolation

Embed Size (px)

Citation preview

Stability of Kernel–Based Interpolation

Stefano De Marchi

Department of Computer Science,University of Verona (Italy)

Robert SchabackInstitut fur Numerische und Angewandte Mathematik,

University of Gottingen (Germany)

Abstract

It is often observed that interpolation based on translates of radial basis func-

tions or non-radial kernels is numerically unstable due to exceedingly large con-

dition of the kernel matrix. But if stability is assessed in function space without

considering special bases, this paper proves that kernel–based interpolation is sta-

ble. Provided that the data are not too wildly scattered, the L2 or L∞ norms

of interpolants can be bounded above by discrete ℓ2 and ℓ∞ norms of the data.

Furthermore, Lagrange basis functions are uniformly bounded and Lebesgue con-

stants grow at most like the square root of the number of data points. However,

this analysis applies only to kernels of limited smoothness. Numerical examples

support our bounds, but also show that the case of infinitely smooth kernels must

lead to worse bounds in future work, while the observed Lebesgue constants for

kernels with limited smoothness even seem to be independent of the sample size

and the fill distance.

1 Introduction

We consider the recovery of a real–valued function f : Ω → R on some compact domainΩ ⊆ R

d from its function values f(xj) on a scattered set X = x1, ..., xN ⊂ Ω ⊆ Rd.

Independent of how the reconstruction is done in detail, we denote the result as sf,X

and assume that it is a linear function of the data, i.e. it takes the form

sf,X =N∑

j=1

f(xj)uj (1)

1

with certain continuous functions uj : Ω → R. To assert the stability of the recoveryprocess f 7→ sf,X , we look for bounds of the form

‖sf,X‖L∞(Ω) ≤ C(X)‖f‖ℓ∞(X) (2)

which imply that the map taking the data into the interpolant is continuous in theL∞(Ω) and ℓ∞(X) norm. Of course, one can also use L2(Ω) and ℓ2(X) norms above.

An upper bound for the stability constant C(X) is of course supplied by putting (1) into(2) to get

C(X) ≤∥

N∑

j=1

|uj(x)|∥

L∞(Ω)

=: ΛX .

This involves the Lebesgue function

λX(x) :=N∑

j=1

|uj(x)| .

Its maximum value ΛX := maxx∈Ω λX(x) is called the associated Lebesgue constant.

It is a classical problem to derive upper bounds for the stability constant in (2) and forits upper bound, the Lebesgue constant ΛX . As well-known in recovery by polynomials,in both the univariate and the bivariate case, there exist upper bounds for the Lebesguefunction. Moreover, many authors faced the problem of finding near-optimal pointsfor polynomial interpolation. All these near-optimal sets of N points have a Lebesguefunction that behaves in the one dimensional case like log(N) while as log2(N) in thetwo dimensional one (cf. [2] and references therein). An important example, worth tomention, of points suitable for polynomial interpolation in the square whose Lebesgueconstant grows as O(log2(N)) are the so-called Padua-points (see [1]).

However, stability bounds for multivariate kernel–based recovery processes are missing.We shall derive them as follows. Given a positive definite kernel Φ : Ω × Ω → R, therecovery of functions from function values f(xj) on the set X = x1, ..., xN ⊂ Ω ⊆ R

d

of N different data sites can be done via interpolants of the form

sf,X :=N∑

j=1

αjΦ(·, xj) (3)

from the finite-dimensional space

VX := span Φ(·, x) : x ∈ Xof translates of the kernel, and satisfying the linear system

f(xk) = sf,X(xk) =

N∑

j=1

αjΦ(xk, xj), 1 ≤ k ≤ N

2

based on the kernel matrix with entries Φ(xk, xj), 1 ≤ j, k ≤ N . The case of condition-ally positive definite kernels is similar, and we suppress details here.

The interpolant of (3), as in classical polynomial interpolation, can also be written interms of cardinal functions uj ∈ VX such that uj(xk) = δj,k. Then, the interpolant (3)takes the usual Lagrangian form (1).

The reproduction quality of kernel–based methods is governed by the fill distance ormesh norm

hX,Ω = supx∈Ω

minxj∈X

‖x − xj‖2 (4)

describing the geometric relation of X to the domain Ω. In particular, the reproductionerror is small if hX,Ω is small.

But unfortunately the kernel matrix has bad condition if the data locations come close,i.e. if the separation distance

qX =1

2min

xi, xj ∈ X

xi 6= xj

‖xi − xj‖ . (5)

is small. Then the coefficients of the representation (3) get very large even if the datavalues f(xk) are small, and simple linear solvers will fail. However, users often reportthat the final function sf,X of (3) behaves nicely in spite of the large coefficients, andusing stable solvers lead to useful results even in case of unreasonably large conditionnumbers. This means that the interpolant can be stably calculated in the sense of (2),while the coefficients in the basis supplied by the Φ(x, xj) are unstable. This calls forthe construction of new and more stable bases, but here we shall focus only on stabilitybounds.

The fill distance (4) and the separation distance (5) are used for standard error andstability estimates for multivariate interpolants, and they will be also of importancehere. The inequality qX ≤ hX,Ω will hold in most cases, but if points of X nearlycoalesce, qX can be much smaller than hX,Ω, causing instability of the standard solutionprocess. Point sets X are called quasi–uniform with uniformity constant γ > 1, if theinequality

1

γqX ≤ hX,Ω ≤ γqX

holds. Later, we shall consider arbitrary sets with different cardinalities, but with uni-formity constants bounded above by a fixed number. Note that hX,Ω and qX play animportant role in finding good points for radial basis function interpolation, as recentlystudied in [6, 3, 4].

3

To generate interpolants, we allow conditionally positive definite translation-invariantkernels

Φ(x, y) = K(x − y) for all x, y ∈ Rd, K : R

d → R

which are reproducing in their “native” Hilbert space N which we assume to be norm–equivalent to some Sobolev space W τ

2 (Ω) with τ > d/2. The kernel will then have aFourier transform satisfying

0 < c(1 + ‖ω‖22)

−τ ≤ K(ω) ≤ C(1 + ‖ω‖22)

−τ (6)

at infinity. This includes polyharmonic splines, thin-plate splines, the Sobolev/Maternkernel, and Wendland’s compactly supported kernels. It is well-known that under theabove assumptions the interpolation problem is uniquely solvable, and the space VX isa subspace of Sobolev space W τ

2 (Ω).

In the following, the constants are dependent on the space dimension, the domain,and the kernel, and the assertions hold for all sets X of scattered data locations withsufficiently small fill distance hX,Ω.

Our central result is

Theorem 1 The classical Lebesgue constant for interpolation with Φ on N = |X| datalocations X = x1, . . . , xN in a bounded domain Ω ⊆ R

d satisfying an outer conecondition has a bound of the form

ΛX ≤ C√

N

(

hX,Ω

qX

)τ−d/2

.

For quasi-uniform sets with bounded uniformity γ, this simplifies to

ΛX ≤ C√

N.

Each single cardinal function is bounded by

‖uj‖L∞(Ω) ≤ C

(

hX,Ω

qX

)τ−d/2

,

which in the quasi-uniform case simplifies to

‖uj‖L∞(Ω) ≤ C.

There also is an L2 analog of this:

Theorem 2 Under the above assumptions, the cardinal functions have a bound

‖uj‖L2(Ω) ≤ C

(

hX,Ω

qX

)τ−d/2

hd/2X,Ω

and for quasi-uniform data locations they behave like

‖uj‖L2(Ω) ≤ Chd/2X,Ω.

4

But the Lebesgue constants are only upper bounds for the stability constant in functionspace. In fact, we can do better:

Theorem 3 Interpolation on sufficiently many quasi–uniformly distributed data is sta-ble in the sense of

‖sf,X‖L∞(Ω) ≤ C(

‖f‖ℓ∞(X) + ‖f‖ℓ2(X)

)

and‖sf,X‖L2(Ω) ≤ Chd/2‖f‖ℓ2(X)

with a constant C independent of X.

Note that the right-hand side of the final inequality is a properly scaled discrete versionof the L2 norm.

We shall prove these results in the next section. However, it turns out there that theassumption (6) is crucial, and we were not able to extend the results to kernels withinfinite smoothness. The final chapter will provide examples showing that similar resultsare not possible for kernels with infinite smoothness.

2 Bounds for Lagrange Bases

Our most important tool for the proof of Theorem 1 is a sampling inequality (cf. [12,Th. 2.6]). For any a bounded Lipschitz domain Ω with an inner cone condition, and forSobolev space W τ

2 (Ω) ⊂ Rd with τ > d/2 there are positive constants C and h0 such

that‖u‖L∞(Ω) ≤ C

(

hτ−d/2X,Ω ‖u‖W τ

2(Ω) + ‖u‖ℓ∞(X)

)

(7)

holds for all u ∈ W τ2 (Ω) and all finite subsets X ⊂ Ω with hX,Ω ≤ h0. This is indepen-

dent of kernels.

We can apply the sampling inequality in two ways:

‖sf,X‖L∞(Ω) ≤ C(

hτ−d/2X,Ω ‖sf,X‖W τ

2(Ω) + ‖sf,X‖ℓ∞(X)

)

≤ C(

hτ−d/2X,Ω ‖sf,X‖W τ

2(Ω) + ‖f‖ℓ∞(X)

)

,

≤ C(

hτ−d/2X,Ω ‖sf,X‖N + ‖f‖ℓ∞(X)

)

,

‖uj‖L∞(Ω) ≤ C(

hτ−d/2X,Ω ‖uj‖W τ

2(Ω) + ‖uj‖ℓ∞(X)

)

≤ C(

hτ−d/2X,Ω ‖uj‖W τ

2(Ω) + 1

)

≤ C(

hτ−d/2X,Ω ‖uj‖N + 1

)

5

since we know that the space VX is contained in W τ2 (Ω). To get a bound on the norm

in native space, we need bounds of the form

‖s‖N ≤ C(X, Ω, Φ)‖s‖ℓ∞(X)

for arbitrary elements s ∈ VX . Such bounds are available from [10], but we repeat thebasic notation here. Let Φ satisfy (6). Then [10] has

‖s‖2W τ

2(Ω) ≤ Cq−2τ+d

X ‖s‖2ℓ2(X) ≤ CNq−2τ+d

X ‖s‖2ℓ∞(X) for all s ∈ VX

with a different generic constant. If we apply this to uj, we get

‖uj‖L∞(Ω) ≤ C

(

(

hX,Ω

qX

)τ−d/2

+ 1

)

,

while application to sf,X yields

‖sf,X‖L∞(Ω) ≤ C

(

(

hX,Ω

qX

)τ−d/2

‖f‖ℓ2(X) + ‖f‖ℓ∞(X)

)

≤ C

(√N(

hX,Ω

qX

)τ−d/2

+ 1

)

‖f‖ℓ∞(X).

Then the assertions of Theorem 1 and the first part of Theorem 3 follow.

For the L2 case covered by Theorem 2, we take the sampling inequality

‖f‖L2(Ω) ≤ C(

hτX,Ω‖f‖W τ

2(Ω) + ‖f‖ℓ2(X)h

d/2X,Ω

)

, for all f ∈ W τ2 (Ω) (8)

of [7, Thm. 3.5]. We can apply the sampling inequality as

‖sf,X‖L2(Ω) ≤ C(

hτX,Ω‖sf,X‖W τ

2(Ω) + ‖sf,X‖ℓ2(X)h

d/2X,Ω

)

≤ C(

hτX,Ω‖sf,X‖W τ

2(Ω) + ‖f‖ℓ2(X)h

d/2X,Ω

)

,

≤ C(

hX,Ω

qX

)τ−d/2

‖f‖ℓ2(X)hd/2X,Ω,

‖uj‖L2(Ω) ≤ C(

hτX,Ω‖uj‖W τ

2(Ω) + ‖uj‖ℓ2(X)h

d/2X,Ω

)

≤ C(

hτ−d/2X,Ω ‖uj‖W τ

2(Ω) + 1

)

hd/2X,Ω

≤ C

(

(

hX,Ω

qX

)τ−d/2

+ 1

)

hd/2X,Ω.

This proves Theorem 2 and the second part of Theorem 3.

3 Examples

We ran a series of examples for uniform grids on [−1, 1]2 and increasing numbers Nof data locations. Figure 1 shows the values ΛX of the Lebesgue constants for the

6

Sobolev/Matern kernel (r/c)νKν(r/c) for ν = 1.5 at scale c = 20. In this and otherexamples for kernels with finite smoothness, one can see that our bounds on the Lebesgueconstants are valid, but the experimental Lebesgue constants seem to be uniformlybounded. In all cases, the maximum of the Lebesgue function is attained in the interiorof the domain.

Things are different for infinitely smooth kernels. Figure 2 shows the behavior forthe Gaussian. The maximum of the Lebesgue function is attained near the corners forlarge scales, while the behavior in the interior is as stable as for kernels with limitedsmoothness. The Lebesgue constants do not seem to be uniformly bounded.

0 500 1000 1500 2000

100.28

100.3

100.32

100.34

100.36

Lebesgue against n

InteriorCorner

Figure 1: Lebesgue constants for the Sobolev/Matern kernel

A second series of examples was run on 225 regular points in [−1, 1]2 for differentkernels at different scales using a parameter c as Φc(x) = Φ(x/c).

Figures 3 to 5 show how the scaling of the Gaussian kernel influences the shape ofthe associated Lagrange basis functions. The limit for large scales is called the flat limit[5] which is a Lagrange basis function of the de Boor/Ron polynomial interpolation [9].It cannot be expected that such Lagrange basis functions are uniformly bounded.

In contrast to this, Figure 6 shows the corresponding Lagrange basis function forthe Sobolev/Matern kernel at scale 320. The scales were such that the conditions ofthe kernel matrices were unfeasible for the double scale. Figure 7 shows the Lebesguefunction in the situation of Figure 5, while Figure 8 shows the Sobolev/Matern case inthe situation of Figure 6.

Acknowledgments. This work has been supported by the CNR-DFG exchange pro-gramme, 2006 and by the GNCS-Indam funds 2007.

7

0 500 1000 1500 200010

0

101

102

Lebesgue against n

InteriorCorner

Figure 2: Lebesgue constants for the Gauss kernel

Figure 3: Lagrange basis function on 225 data points, Gaussian kernel with scale 0.1

8

Figure 4: Lagrange basis function on 225 data points, Gaussian kernel with scale 0.2

Figure 5: Lagrange basis function on 225 data points, Gaussian kernel with scale 0.4

9

Figure 6: Lagrange basis function on 225 data points, Sobolev/Matern kernel with scale320

Figure 7: Lebesgue function on 225 regular data points, Gaussian kernel with scale 0.4

10

Figure 8: Lebesgue function on 225 regular data points, Sobolev/Matern kernel withscale 320

References

[1] Bos, L., Caliari, M., De Marchi, S., Vianello, M. and Xu, Y., Bivariate Lagrange in-terpolation at the Padua points: the generating curve approach, J. Approx. Theory,Vol. 143 (2006), 15-25.

[2] Caliari, M., De Marchi, S. and Vianello, M. Bivariate polynomial interpolation onthe square at new nodal sets, Appl. Math. Comput., Vol. 165(2) (2005), 261-274.

[3] De Marchi, S. On optimal center locations for radial basis interpolation: computa-tional aspects, Rend. Sem. Mat. Torino, Vol. 61(3)(2003), 343-358.

[4] De Marchi, S., Schaback, R. and Wendland, H. Near-Optimal Data-independentPoint Locations for Radial Basis Function Interpolation, Adv. Comput. Math., Vol.23(3) (2005), 317-330.

[5] Driscoll, T.A. and Fornberg, B. Interpolation in the limit of increasingly flat radialbasis functions, Comput. Math. Appl., Vol. 43 (2002) 413–422

[6] Iske, A. Optimal distribution of centers for radial basis function methods, Tech. Rep.M0004, Technische Universitat Munchen, 2000.

[7] W. R. Madych, An estimate for Multivariate interpolation II, to appear in J. Ap-prox. Theory 142(2) (2006), 116–128.

[8] R. Schaback, Lower bounds for norms of inverses of interpolation matrices for radialbasis functions, J. Approx. Theory 79 (1994), 287–306.

11

[9] R. Schaback, Multivariate interpolation by polynomials and radial basis functions,Constr. Approx. 21 (2005), 293–317.

[10] R. Schaback, H. Wendland, Inverse and saturation theorems for radial basis functioninterpolation, Mathematics of Computation 71 (2002), 669-681

[11] H. Wendland, Scattered Data Approximation, Cambridge Monographs on Appliedand Computational Mathematics, Vol. 17, 2005.

[12] H. Wendland, C. Rieger, Approximate interpolation with applications to selectingsmoothing parameters,, Num. Math. 101 (2005), 643-662.

12