Non-parametric Sequential Estimation of a Regression ...politis/PAPER/PV2.pdf · Non-parametric Sequential Estimation of a Regression Function Based on Dependent Observations

Non-parametric Sequential Estimation of a Regression Function Based onDependent Observations

Dimitris N. Politis and Vyacheslav A. VasilievDepartment of Mathematics, University of California, San Diego, California, USA

Department of Applied Mathematics and Cybernetics, Tomsk State University, Tomsk, Russia

Abstract: This paper presents a sequential estimation procedure for an unknown regression function. Observedregressors and noises of the model are supposed to be dependent and form sequences of dependent numbers. Two typesof estimators are considered. Both estimators are constructed on the basis of Nadaraya–Watson kernel estimators.

First, sequential estimators with given bias and mean square error are defined. According to the sequential ap-proach the duration of observations is a special stopping time. Then on the basis of these estimators of a regressionfunction, truncated sequential estimators on a time interval of a fixed length are constructed. At the same time thevariance of these estimators is controlled by a (non-asymptotic) bound.

In addition to non-asymptotic properties, the limiting behavior of presented estimators is investigated. It is shown,in particular, that by the appropriate chosen bandwidths both estimators have optimal (as compared to the case ofindependent data) rates of convergence of Nadaraya–Watson kernel estimators.

Keywords: Non-parametric kernel regression estimation; Dependent observations; Sequential approach; Guaranteedmean square accuracy; Finite sample size.

Subject Classifications: 62G05; 62G08; 62G20; 62J02; 62L12.

1. INTRODUCTION

This paper considers the problem of non-parametric kernel estimation of a regression function f(x), x ∈ R1

at a point x = x0 satisfying the equation

Yi = f(xi) + δi, i ≥ 1, (1.1)

in the case of dependent regressors X = (xi)i≥1 and noises ∆ = (δi)i≥1. It is supposed, that pairs (Yi, xi),i = 1, 2, . . . are under observations but the errors δi are unobservable. We allow in the paper for generaldependence conditions on random variables (xi) and (δi) supposing mutual dependence of the processes Xand ∆. In particular, model inputs can be unbounded with a positive probability and estimation problem ofnon-linear autoregressive function can be solved (see Section 5.2).

There are a lot of results and publications dedicate this problem for independent and dependent obser-vations, in asymptotic and non-asymptotic problem statements (see, e.g, Bai and Liu (2007), Dharmasena atal. (2008)-Hondaa (1998), Leonov (1999), Nazin and Ljung (2002), Politis at al. (1999)-Scott (1992) etc.)It should be noted, that often regressors X a supposed to be non-random or bounded and noises ∆ of themodel are independent, see Bai and Liu (2007); Dharmasena at al. (2008); Gyorfi at al. (2002); Nazin andLjung (2002); Politis at al. (1999), Roll at al. (2002)-Rosenblatt (1991) etc.

Address correspondence to V. A. Vasiliev, Department of Applied Mathematics and Cybernetics, TomskState University, Lenin Ave. 36, Tomsk, 634050, Russia; Fax: (382) 252-9895; Email: [email protected]

1

The main goal of the paper is twofold. We will construct sequential and so-called truncated sequentialestimators of a regression function. Estimators of both types have known mean square errors (MSE’s). Toobtain sequential estimators with an arbitrary accuracy one needs to have a sample of unbounded size. Thisapproach has been primarily proposed by A.Wald for various statistical problems for a scheme of indepen-dent observations, see Wald (1947, 1950). It proposes a construction of a special stopping time as a durationof observations to obtain statistics with desired properties. Then this idea has been applied to parame-ter estimation problem of one-parameter stochastic differential equations in Liptser and Shiryaev (1978);Novikov (1971) and for multiparameter continuous and discrete-time dynamic systems in many papers andbooks (see Borisov and Konev (1977); Dobrovidov at al. (2012); Konev (1985); Kuchler and Vasiliev (2010)among others). Sequential approach has also been applied to non-parametric regression and density func-tion estimation problems as well (see, e.g., Carroll (1976); Davies and Wegman (1975), Dharmasena atal. (2008)-Efroimovich (2007), Hondaa (1998); Liski at al. (2002); Martinsek (1992); Mukhopadhyay andSolanky (1994); Rao (1983); Stute (1983); Vasiliev (1997); Vasiliev and Koshkin (1998); Xu and Martinsek(1995)). Peculiarities and difficulties of sequential approach to nonparametric problems are discussed indetail in Efroimovich (2007).

However in practice the observation time of a system is usually not only finite but fixed which does notallow for the use of such estimators. One of the possibilities for finding estimators with the guaranteed qual-ity of inference using a sample of fixed size is provided by the approach of truncated sequential estimation.Truncated sequential estimation method was developed in Konev and Pergamenshchikov (1990a)-Konevand Pergamenshchikov (1992), as well as in Fourdrinier at al. (2009) for parameter estimation problems indiscrete and continuous time dynamic models. Using sequential approach, they have constructed estimatorsof dynamic systems parameters with known variance by samples of fixed size.

Non-parametric truncated sequential estimators of a regression function presented in this paper consti-tute Nadaraya–Watson estimators calculated at a special stopping time. These estimators have known meansquare errors as well. The duration of observations is also random but bounded from above to a non-randomfixed number.

Non-asymptotic and asymptotic properties of estimators of both types are investigated. It is shown,that truncated sequential estimators coincide with the afore-mentioned sequential estimators (i.e. with theNadaraya–Watson estimators calculated at a special stopping time) for sufficiently large sample size.

It should be noted, that usually non-parametric estimators of functionals are studied by asymptotic as-sumptions even in sequential problem statement (except of some special cases, see, e.g., Dobrovidov atal. (2012) and references therein). Nadaraya–Watson estimators of a regression function with given non-asymptotic properties are derived here for the first time.

The outline of the paper is as follows. In Section 2 basic assumptions on a regression function to beestimated, as well as regressors and noises of the model are given. In Section 3 we present a constructionand basic properties of sequential estimation procedure of the regression function, which can be estimatedwith an arbitrary mean square accuracy. Section 4 presents truncated sequential estimation procedure, whichgive a possibility to obtain an estimator of the regression function with a known variance based on a sampleof finite size. In Section 5 we consider basic examples of dependence type of regressors and noises of themodel. In particular, non-linear autoregressive estimation problem is considered. Section 6 is the conclusionof the paper. In Section 7 all proves are given.

Similar results were presented in Politis and Vasiliev (2012).

2. PROBLEM SET-UP AND BASIC ASSUMPTIONS

Consider the problem of non-parametric estimation of a regression function f(x) at a point x = x0 from theequation (1.1) based on data (Yi, xi), i = 1, 2, . . . .

The notation 1,m means the set of integers 1, 2, . . . ,m.

2

We suppose that the function to be estimated satisfies the following assumption.

Assumption (f): The function f(x) is p-th differentiable at a point x0, p ≥ 0 and for some positive numbersCf,i, i = 0, p (p ≥ 0), Lf , γf ∈ (0, 1], such that

|f (i)(x0)| ≤ Cf,i, i = 0, p

and for every x, y ∈ R1,|f (p)(x)− f (p)(y)| ≤ Lf |x− y|γf .

For estimation of the function f(·) at a point x0, f = f(x0) we use so-called sequential estimators,constructed on the basis of kernel type estimators of the form

fN =

(N∑i=1

K

(x0 − xih

))+

·N∑i=1

YiK

(x0 − xih

), N ≥ 1, (2.1)

where a+ = a−1 for a 6= 0; and a+ = 0 when a = 0; K(·) is a kernel function and h is a bandwidth,satisfying the Assumptions (K) and (h) below respectively.

Define for every i ≥ 1 density functions fi−1(t) of conditional distribution functions Fi−1(t) = P (xi ≤t|Fxi−1), Fxi = σ{x1, . . . , xi}.

As regards to the regressors X and ∆ we suppose

Assumption (X): Assume the conditional density functions fi−1(t), i ≥ 1 are q-th differentiable at a pointx0, q ≥ 0 and for some positive numbers c0

x, Cx,k, k = 0, q (q ≥ 0), Lx and γx ∈ (0, 1] the followingrelations

infi≥1

fi−1(x0) ≥ c0x,

supi≥1|f (k)i−1(x0)| ≤ Cx,k, k = 0, q

and for every x, y ∈ R1,

supi≥1|f (q)i−1(x)− f (q)

i−1(y)| ≤ Lx|x− y|γx

are fulfilled.

Assumption (∆) : Let the noises ∆ be conditionally zero mean E(δi|Fxi ) = 0, i ≥ 1 and for somemonotonically non-increasing as k →∞ non-negative function ϕ(k), k ≥ 1 the following ”general mixingcondition” for the sequence ∆ holds true

|E(δiδj |Fxi∨j)| ≤ ϕ(|i− j|), i, j ≥ 1.

Define admissible sets of kernels K(·) and bandwidths h as follows.

Assumption (K): Let the function K(·) is uniformly bounded from above:

supz|K(z)| ≤ K <∞

3

and such that ∫K(z)dz = 1,

∫|zsK(z)|dz <∞, 0 ≤ s ≤ q + γx + 2(p+ γf ).

Moreover, we suppose that the integrals∫zjK (z) dz = 0, j = 1, p+ q.

Assumption (h): Let (hn)n≥1 be a sequence of monotonically decreasing to zero positive numbers such thatnhn →∞ as n→∞ and h1 ≤ 1.

It should be noted that, according to Assumption (K), the allowed kernels can be of order higher thantwo, i.e. not necessarily nonnegative everywhere. For example, the infinite-order flat-top kernels of Politis(2001) are allowed, including the infinitely differentiable flat-top kernel, see McMurry and Politis (2004).

3. ESTIMATION OF THE REGRESSION FUNCTION WITH GUARANTEEDACCURACY

In this section we consider sequential estimators of the regression function f at a point x0 with a givenvariance, calculated at a special stopping time. These estimators are constructed on the basis of Nadaraya-Watson kernel estimators. Bandwidth giving an optimal convergence rate of estimators (which coincideswith the rate of Nadaraya-Watson estimators constructed by independent observations) is found.

3.1. Sequential Estimation Procedure

According to sequential approach we introduce the following definition.

Definition 3.1. Let (Hn)n≥1 be an unboundedly monotonically increasing given sequence of positive num-bers and (hn)n≥1 satisfies Assumption (h).

The sequential plans (τn, f∗n), n ≥ 1 of estimation of the function f = f(x0) will be defined by the

formulae

τn = inf{j ≥ 1 :

j∑i=1

K

(x0 − xihn

)≥ Hn} ,

f∗n =1

Qn

τn∑i=1

YiK

(x0 − xihn

), Qn =

τn∑i=1

K

(x0 − xihn

), (3.1)

where τn is the duration of estimation, and f∗n is the estimator of f with given accuracy in the mean squaresense.

We have introduced in Definition 3.1 the sequence of sequential estimation plans. At the same timeTheorem 3.1 below gives non-asymptotic properties of these plans for every n ≥ 1, in particular, upperbound for the MSE of the estimator f∗n. It gives a possibility to investigate asymptotic properties of definedplans as n→∞ as well (see Section 3.2).

To formulate this theorem we need the following notation.Assume (ln)n≥1 to be an arbitrary non-decreasing sequence of positive numbers.

4

Define the function γx,δ = 0 if processesX and ∆ are independent and 1 otherwise; γδ = 0 if ϕ(k) = 0,k > 0 and 1 otherwise; the numbers

ϕn = ϕ(ln), Kα =

∫|zαK(z)|dz, α ≥ 0, κ = (q!)−1LxKq+γx ,

C1 =1

q!

p∑j=1

1

j!Cf,jKq+j+γxχ(p > 0), C2 =

1

p!Cx,0Kp+γf , C3 = Cx,0

∫K2 (y) dy,

C4 = 4K(p+ 2)

q+1∑j=0

p+1∑k=0

1

j!(k!)2Cx,jC

2f,kKβ(j,k), Cx,q+1 =

1

q!Lx, Cf,p+1 =

1

p!Lf ,

β(j, k) = j + 2k + (γx − 1)χ(j = q + 1) + 2(γf − 1)χ(k = p+ 1),

as well asn0 = inf{n ≥ 1 : c0

x − κhq+γxn > 0},

tn =(Cx,0 + κhq+γxn ) ·

√C3(Hn +K)

(c0x − κh

q+γxn )5/2hn

+

√[C3(Cx,0 + κhq+γxn )2 · (c0

x − κhq+γxn )−3 + 2K

2(c0x − κh

q+γxn )−1](Hn +K) +H2

n

(c0x − κh

q+γxn )hn

,

wn =

√ϕ(0)K

2γδ

lnH2n

(1 + 3ln) + C3ϕ(0)(c0x − κh

q+γxn )−1

1

Hn

(1 +

K

Hn

)

+2γδϕ(0)K0K(c0x − κh

q+γxn )−1

lnHn

(1 +K

Hn) + 2γδK0Kϕn

hnt2nH2n

,

bn = (c0x − κhq+γxn )−1[C1h

q+1+γxn + C2h

p+γfn ]

(1 +

K

Hn

)+ wn

and

Vn = 2

{(2C3C

2f,0 + (1 + γx,δ)C4)(c0

x − κhq+γxn )−1 1

Hn

(1 +

K

Hn

)+(1 + γx,δ)w

2n +

2(C1hq+1+γxn + C2h

p+γfn )2h2

nt2n

H2n

}.

Theorem 3.1. Assume model (1.1) and let the Assumptions (f), (K), (X), (h) and (∆) be fulfilled and n ≥ n0.Then the sequential estimation plans (τn, f

∗n) are closed, i.e. τn < ∞ a.s. and have the following non-

asymptotic properties:

1. for the expected duration of observationsa)

(Cx,0 + κhq+γxn )−1h−1n Hn ≤ Eτn ≤ (c0

x − κhq+γxn )−1h−1n (Hn +K), (3.2)

b)Eτ2

n ≤ t2n; (3.3)

2. for the bias|Ef∗n − f | ≤ bn;

3. for the MSEE(f∗n − f)2 ≤ Vn.

5

Proof of Theorem 3.1 is given in the Appendix.

Remark 3.1. From Definition 3.1 it follows that the presented sequential estimators coincide with the usualNadaraya–Watson estimators calculated at a special stopping time. Therefore, at least in the case of in-dependent inputs of the model, these estimators have the same asymptotic properties as Nadaraya–Watsonestimators. However, sequential estimators have the above exact, non-asymptotic properties, that may beimportant for practitioners.

3.2. Bandwidth Choice for Sequential Estimators

From Theorem 3.1 follows, that it is natural to define Hn = nhn. In this case and under the condition on thebandwidth nhn →∞ as n→∞ from Assumption (h) the expected duration of observations is proportionalto n, i.e.,

C−1x,0n−

C−1x,0κh

q+γxn n

Cx,0 + κhq+γxn

≤ Eτn ≤ (c0x)−1n+

(c0x)−1κhq+γxn n+Kh−1

n

c0x − κh

q+γxn

, n ≥ n0,

as well asEτ2

n ≤ (c0x)−2n2 + o(n2) as n→∞

and

|Ef∗n − f | = O

(hq+1+γxn + h

p+γfn +

√ϕnh

−1n +

1 + γδ√ln√

nhn

),

E(f∗n − f)2 = O

(h2(q+1+γx)n + h

2(p+γf )n + ϕnh

−1n +

1 + γδlnnhn

)as n→∞.

From these formulae it follows, that for the asymptotic unbiasnessy and L2-convergency of f∗n we haveto use the sequences (ln) and (hn) satisfying the condition

ϕnh−1n +

lnnhn

= o(1) as n→∞. (3.4)

Remark 3.2. If the regressors xi form a sequence of i.i.d.r.v’s, then c0x = Cx,0. Moreover if the number

c0x = fi(x

0) is known, we can put Hn = c0xnhn. In this case

n− κhq+γxn n

c0x + κhq+γxn

≤ Eτn ≤ n+κhq+γxn n+Kh−1

n

c0x − κh

q+γxn

andlimn→∞

[(hq+γxn n)−1 ∧ hn]|Eτn − n| <∞,

where a ∧ b = min{a, b}, as well as

Eτ2n ≤ n2 + o(n2) as n→∞.

It should be noted that in the case of i.i.d. noises (δi) the MSE of the constructed sequential estimatorhas similar optimal decreasing rate

E(f∗n − f)2 = O

(h2(q+1+γx)n + h

2(p+γf )n +

1

nhn

)as n→∞

as in the case of i.i.d. regressors (xi).In particular, for the case p = 0, γf = 1 (see, for comparison, Gyorfi at al. (2002) among others),

E(f∗n − f)2 = O

(h2n +

1

nhn

)as n→∞.

6

4. TRUNCATED SEQUENTIAL ESTIMATION OF THE REGRESSION FUNCTION f

We shall consider in this section the problem of estimation of the regression function f with a known meansquare accuracy based on observations of the process (Yi, xi) for i = 1, . . . , N on the time interval [1, N ]for a fixed time N. These estimators are constructed on the basis of sequential estimators considered in theprevious section.

Such possibility gives the so-called truncated sequential estimation method, developed in Konev andPergamenshchikov (1990a)-Konev and Pergamenshchikov (1992), as well as in Fourdrinier at al. (2009) forparameter estimation problems in discrete and continuous time dynamic models.

Bandwidth giving an optimal convergence rate of estimators is found.

4.1. Truncated Sequential Estimation Procedure

Definition 4.1. Let H and h be positive numbers.The truncated sequential plans (τN (h,H), f∗N (h,H)), N ≥ 1 of estimation of the function f = f(x0)

will be defined by the formulae

τN (h,H) = inf{j ∈ [1, N ] :

j∑i=1

K

(x0 − xih

)≥ H}

with the provision that inf{Ø} = N,

f∗N (h,H) = fN (h,H) · χ

(N∑i=1

K

(x0 − xih

)≥ H

)+Cf,0

2· χ

(N∑i=1

K

(x0 − xih

)< H

), (4.1)

fN (h,H) =1

QN (h,H)

τN (h,H)∑i=1

YiK

(x0 − xih

), QN (h,H) =

τN (h,H)∑i=1

K

(x0 − xih

),

where τN (h,H) is the duration of estimation, which is bounded by construction (τN (h,H) ≤ N ) andf∗N (h,H) is the estimator of f. The notation χ(A) means the indicator function of set A.

The parameters H and h will be defined in a special way as functions of N for optimization of aconvergence rate of the estimator f∗N (h,H) as N →∞ in the mean square sense.

Assume l(H) to be an arbitrary non-decreasing function of positive numbers.Define

VN (h,H) = 2

{(2C3C

2f,0 + (1 + γx,δ)C4)

hN

H2+ (1 + γx,δ)

[ϕ(0)K

2γδl(H)

H2(1 + 3l(H))

+C3ϕ(0)hN

H2+ 2γδϕ(0)K0

l(H)hN

H2+ 2γδK0Kϕ(l(H))

hN2

H2

]+

2(C1hq+1+γx + C2h

p+γf )2h2N2

H2

}+

C3C2f,0Nh

4[Nh(c0x − κhq+γx)−H]2

.

From the proof of the following Theorem 4.1 it follows that the number H satisfies the condition

Nh(c0x − κhq+γx)−H > 0. (4.2)

Thus it is natural to putH := HN = NhNC

−1α , (4.3)

7

whereCα = (c0

x − κhq+γxn0− α−1/2)−1, α > (c0

x − κhq+γxn0)−2

and (hN ) is a sequence satisfying Assumption (h). It should be noted that, according to the assumptions ofthe theorem below the number Cα and the sequence (HN ) are assumed known.

Then, denoting lN = l(HN ) and ϕN = ϕ(lN ), we have

V 0N := VN (hN , HN ) = 2

{[(2C3C

2f,0 + (1 + γx,δ)C4)C2

α +C3C

2f,0α

8+ (1 + γx,δ)

×(C3C2αϕ(0) + 2γδϕ(0)K0C

2αlN )]

1

NhN+ (1 + γx,δ)[ϕ(0)K

2C2αγδ

lN(NhN )2

(1 + 3lN )

+2γδK0KC2αϕNh

−1N ] + 2C2

α(C1hq+1+γxN + C2h

p+γfN )2

}.

The main result of this section is the following

Theorem 4.1. Assume model (1.1) and let the Assumptions (f), (K), (X) and (∆) be fulfilled, where thenumber Cf,0 in Assumption (f) is supposed to be known. Then:

1) for every positive numbers h, H satisfying the condition (4.2) the truncated sequential estimatorf∗N (h,H) has the MSE:

E(f∗N (h,H)− f)2 ≤ VN (h,H);

2) for H = (HN ) defined in (4.3) and (hN ) satisfying (4.2) it holds

E(f∗N (hN , HN )− f)2 ≤ V 0N ;

3) assume there exists a number r > 2, such that the bandwidth h = (hN ) from Assumption (h) satisfiesthe additional condition ∑

N≥1

1

hr−1N N r/2

<∞ (4.4)

and H = (HN ) is defined in (4.3). Then

limN→∞

τN (hN , HN )

N< 1 a.s.

and the estimator f∗N (hN , HN ) is non-degenerate as N →∞ in the following sense:

P (f∗N (hN , HN ) = fN (hN , HN )) = 1 for N large enough.

Proof of Theorem 4.1 is given in the Appendix.

Remark 4.1. From the third assertion of Theorem 4.1 it follows, that similarly to sequential estimators,the truncated sequential estimators have the asymptotic properties of Nadaraya–Watson estimators (2.1).However, in distinction from the sequential estimators considered in Section 3.1, the truncated estimatorshave known variance for fixed (non-random) sample sizes.

Remark 4.2. If the numberCf,0 in Assumption (f) is unknown, then the Definition 2.2 of sequential estimatoris modified as follows:

f∗N (h,H) = fN (h,H) · χ

(N∑i=1

K

(x0 − xih

)≥ H

).

All assertions of Theorem 4.1 will be fulfilled with slightly changed (but having similar structure) functionsVN (h,H) and V 0

N . More precisely, the number 4 in the denominator of the last summand in the definitionof VN (h,H) should be omitted and the number 8 in the denominator of the third summand in the right handside in the definition of V 0

N should be changed to the number 2.

8

4.2. Bandwidth Choice for Truncated Sequential Estimators

From Theorem 4.1 it follows, that by the definition of V 0N and assertion 2) of Theorem 4.1, the truncated

sequential estimator (4.1) has similar to the sequential estimator (3.1) rate of convergency (as N →∞).In the case of i.i.d. noises (δi) the MSE of the constructed sequential estimator has optimal decreasing

rate

E(f∗N (hN , HN )− f)2 = O

(h2%N +

1

NhN

)as N →∞,

% = (q + 1 + γx) ∧ (p+ γf ).Then the bandwidth h with the optimal rate is proportional to

hN ∼ N− 1

(2%+1)

and the condition (4.4) is fulfilled for % > 1/2 and

r >4%

2%− 1.

In particular, for the case p = 0, γf = 1,

E(f∗N (hN , HN )− f)2 = O

(h2N +

1

NhN

)and r > 4.

Remark 4.3. Theorems 3.1 and 4.1 give known upper bounds for the bias and the MSE of presented esti-mators as well as for the mean of observation time in Theorem 3.1 if the parameters of classes of functions,introduced in Assumptions (f) and (X) are known (Cf,k, p, γf , etc). In this case ’optimal’ bandwidth can befound from minimization of the upper bounds for the MSE’s.

At the same time the assertions of Theorems 3.1 and 4.1 are fulfilled even if some of these parametersare unknown. Estimators with such properties can be successfully used in various adaptive procedures aspilot estimators (see, e.g., Dobrovidov at al. (2012),Vasiliev (1997)-Vasiliev and Koshkin (1998)).

5. EXAMPLES

In this section we consider examples of dependence types of observed regressors and model noises satisfyingAssumptions (X) and (∆).

5.1. Case of Independent Regressors X and ∆

Consider examples of inputs of the model in this case.

5.1.1. Example of Regressors X

Assume that regressors (xi) satisfy the following equation

xi = Ψ(xi−1, . . . , xi−r) + εi, i ≥ 1, (5.1)

where Ψ(·) is a bounded function |Ψ(y)| ≤ Ψ <∞, y ∈ Rr, r ≥ 1.Assume that the input noises εi in the model (5.1) are i.i.d. with the density function fε(·) and indepen-

dent from the initial vector (x0, x−1, . . . , x1−r). In this case fi−1(t) = fε(t−Ψ(xi−1, . . . , xi−r)|Fi−1).

9

Thus Assumption (X) holds true if the density function fε(·) is q-th differentiable, q ≥ 0 and for somepositive numbers c0

x, Cx,0, Lx and γx ∈ (0, 1] the following relations

inf|t−x0|≤Ψ

fε(t) ≥ c0x,

sup|t−x0|≤Ψ

|f (i)ε (t)| ≤ Cx,i, i = 0, q

and for every x, y ∈ R1,|f (q)ε (x)− f (q)

ε (y)| ≤ Lx|x− y|γx

are fulfilled.

5.1.2. Example of Noises ∆

Assume that noises (δi) satisfy the following autoregressive equation

δi = λδi−1 + ηi, i ≥ 1, (5.2)

which is supposed to be stable |λ| < 1, and ηi are i.i.d., Eηi = 0, Eη2i ≤ σ2

η, i ≥ 0; η0 = δ0. Then

ϕ(k) =|λ|kσ2

η

1− λ2.

In this case we can put (after optimization of the upper bound of the MSE)

ln ∼ (1 + µ) log|λ|1

n, µ > 0

and the summandslnnhn

∼ log n

nhn, ϕnh

−1n ∼

1

n1+µhn= o

(1

nhn

).

Then the condition (3.4) (which is necessary for truncated estimators as well) is fulfilled if

log n

nhn= o(1) as n→∞.

It is clear, that this example can be easy generalized for the stable autoregressive process (5.2) of anarbitrary order.

5.2. Case of Dependent Inputs X and ∆

Consider the estimation problem in the nonlinear autoregressive model

Yi = f(Yi−1) + δi, i ≥ 1,

where δi, i ≥ 1 form the sequence of zero mean i.i..d.r.v’s with a density function fδ(·) and finite varianceEδ2

i = σ2, i ≥ 1; Y0 is a zero mean random variable with finite variance and independent from δi, i ≥ 1.Then xn = Yn−1, n ≥ 0 and ϕ(k) = σ2χ(k = 0), k ≥ 0 and in this case fi(t) = fδ(t −

f(Yi−1))|FYi−1), FYi = σ{Y0, δ1, . . . , δi}.

10

Thus Assumption (X) holds true if the function f(·) is uniformly bounded supt∈R1

f(t) ≤ Cf , the density

function fδ(·) is q-th differentiable, q ≥ 0 and for some positive numbers c0x, Cx,0, Lx and γx ∈ (0, 1] the

following relationsinf

|t−x0|≤Cf

fδ(t) ≥ c0x,

sup|t−x0|≤Cf

f(i)δ (t) ≤ Cx,i, i = 0, q

and for every x, y ∈ R1,

|f (q)δ (x)− f (q)

δ (y)| ≤ Lx|x− y|γx

are fulfilled.

6. CONCLUSION

The sequential approach for the problem of estimation of a regression function from dependent observationsis developed. It is supposed, that the function to be estimated belongs to a Holder class and input processesof the model can be unbounded with a positive probability.

Two types estimators are presented. Sequential estimators give the possibility to get estimators withan arbitrary mean square accuracy by finite stopping time. Truncated sequential estimators have knownvariance and constructed by a sample of finite (fixed) size.

Both estimation procedures work under general dependency types of model inputs. Asymptotic rate ofconvergence of both estimators coincides with the optimal rate of Nadaraya–Watson estimators constructedfrom independent observations. At that we consider the mean Eτn having the rate (3.2) as a duration ofobservations in sequential approach for comparison with Nadaraya–Watson estimators calculated by thesample of the size n.

Presented estimators can be used directly and as pilot estimators in various statistical problems.In this paper we have assumed that the function to be estimated has a scalar argument. In most applica-

tions, in particular to dynamic systems, the regressors will have a higher dimension. The extension to thiscase is immediate.

7. APPENDIX

7.1. Proof of Theorem 3.1

First we note that by the Definition 3.1 for the closeness of stopping times τn, n ≥ 1 it is enough to verifyfor every n ≥ n0 the following equalities

∞∑i=1

K

(x0 − xihn

)= +∞ a.s. (7.1)

To this end we consider for every n ≥ n0 and N ≥ 1 the following representation

1

Nhn

N∑i=1

K

(x0 − xihn

)=

1

Nhn

N∑i=1

E[K

(x0 − xihn

)|Fi−1]

+1

Nhn

N∑i=1

(K

(x0 − xihn

)− E[K

(x0 − xihn

)|Fi−1]

).

11

Using the Assumptions (X) and (K), the first summand in the right hand side of this formula can beestimated from below

1

Nhn

N∑i=1

E[K

(x0 − xihn

)|Fi−1] =

1

N

N∑i=1

∫K(z)fi−1(x0 − hnz)dz

=1

N

N∑i=1

fi−1(x0) +1

N

N∑i=1

∫K(z)[fi−1(x0 − hnz)− fi−1(x0)]dz

=1

N

N∑i=1

fi−1(x0) +

q∑j=1

(−1)j

j!

hjnN

N∑i=1

f(j)i−1(x0)

∫zjK (z) dz

+(−1)q

q!

hqnN

N∑i=1

∫zqK (z) [f

(q)i−1(x0 − θq,n,zhnz)− f (q)

i−1(x0)]dz

≥ c0x − κhq+γxn ≥ c0

x − κhq+γxn0> 0,

where the number θq,n,z ∈ [0, 1].As regards to the second summand we note, that for every fixed n ≥ n0 according to Assumption (K)

the processes (Mn,N )N≥1,

Mn,N =N∑i=1

1

i

{K

(x0 − xihn

)− E[K

(x0 − xihn

)|Fi−1]

}form square integrable martingales with uniformly bounded variances:

supN≥1

EM2n,N <∞, n ≥ 1.

Thus, from the Doob theorem for square integrable martingales, see Liptser and Shiryaev (1988) and theKronecker lemma (see, e.g., Loeve (1963)), the second summand goes to zero a.s.

Therefore the relations (7.1) and the finiteness of the stopping times τn are proven.For the proof of the first assertion of Theorem 3.1 we note that the stopping times τn, n ≥ 1 are {Fi−1}-

adaptive, i.e. {i ≤ τn} ∈ Fi−1, i > 1, n ≥ 1. Thus, by the definition of τn and using a well-knowntechnique of sequential analysis (see, e.g., Borisov and Konev (1977); Dobrovidov at al. (2012); Konev(1985); Kuchler and Vasiliev (2010); Liptser and Shiryaev (1978); Wald (1947, 1950) among others) andAssumption (X), we have

Hn > E

τn−1∑i=1

K

(x0 − xihn

)= E

τn∑i=1

K

(x0 − xihn

)− EK

(x0 − xτnhn

)

≥ Eτn∑i=1

E[K

(x0 − xihn

)|Fi−1]−K = hnE

τn∑i=1

∫K (z) fi−1(x0 − hnz)dz −K

= hnE

τn∑i=1

fi−1(x0) + hnE

τn∑i=1

∫K (z) (fi−1(x0 − hnz)− fi−1(x0))dz −K

= hnE

τn∑i=1

fi−1(x0) +

q∑j=1

(−1)j

j!

hj+1n

HnE

τn∑i=1

f(j)i−1(x0)

∫zjK (z) dz

12

+(−1)q

q!hq+1n E

τn∑i=1

∫zmK (z) [f


i−1(x0)]dz

≥ c0xhnEτn − κhq+1+γx

n Eτn −K.

ThenEτn ≤ (c0

x − κhq+γxn )−1h−1n (Hn +K), n ≥ n0.

On the other hand

Hn ≤ Eτn∑i=1

K

(x0 − xihn

)= hnE

τn∑i=1

∫K (z) fi−1(x0 − hnz)dz

= hnE

τn∑i=1

fi−1(x0) + hnE

τn∑i=1

∫K (z) (fi−1(x0 − hnz)− fi−1(x0))dz

≤ hn(Cx,0 + κhq+γxn )Eτn

and as followsEτn ≥ (Cx,0 + κhq+γxn )−1h−1

n Hn, n ≥ 1.

Similarly, we have

H2n > E[

τn−1∑i=1

K

(x0 − xihn

)]2 ≥ E[

τn∑i=1

K

(x0 − xihn

)]2

−2KE

τn∑i=1

K

(x0 − xihn

)≥ E{

τn∑i=1

E[K

(x0 − xihn

)|Fi−1]}2

−2(Cx,0 + κhq+γxn )hn√Eτ2

n ·

√√√√E

τn∑i=1

(K

(x0 − xihn

)− E[K

(x0 − xihn

)|Fi−1])2

−2Khn

τn∑i=1

∫K (y) fi−1(x0 − hny)dy ≥ (c0

x − κhq+γxn )2h2nEτ

2n − 2(Cx,0 + κhq+γxn )hn

√Eτ2

n

×

√√√√hnE

τn∑i=1

∫[K (y)− E[K

(x0 − xihn

)|Fi−1]]2fi−1(x0 − hny)dy − 2KK0Cx,0hnEτn

≥ (c0x − κhq+γxn )2h2

nEτ2n − 2(Cx,0 + κhq+γxn )hn

√Eτ2

n ·√C3hnEτn

−2KK0Cx,0hn(c0x − κhq+γxn )−1h−1

n (Hn +K) ≥ (c0x − κhq+γxn )2h2

n · Eτ2n

−2(Cx,0 + κhq+γxn )hn ·√C3(c0

x − κhq+γxn )−1(Hn +K) ·

√Eτ2

n

−2KK0Cx,0(c0x − κhq+γxn )−1(Hn +K).

Thus we have

Eτ2n ≤

(Cx,0 + κhq+γxn ) ·√C3(Hn +K)

(c0x − κh

q+γxn )5/2hn

13

+

√[C3(Cx,0 + κhq+γxn )2 · (c0

x − κhq+γxn )−3 + 2K

2(c0x − κh

q+γxn )−1](Hn +K) +H2

n

(c0x − κh

q+γxn )hn

2

.

Then the first assertion of the theorem follows from the definition of the function tn.Further, from Definition 3.1 and (1.1) we find the formula for the deviation of the sequential estimator

f∗n :f∗n − f = In + Jn, (7.2)

where

In =1

Qn

τn∑i=1

(f(xi)− f(x0))K

(x0 − xihn

), Jn =

1

Qn

τn∑i=1

δiK

(x0 − xihn

).

Consider the bias of the estimator f∗n using the representation (7.2) for its deviation and a technique ofconditional mathematical expectations.

EIn = E1

Qn

τn∑i=1

(f(xi)− f(x0))K

(x0 − xihn

)

= EhnQn

τn∑i=1

∫K (z) (f(x0 − hnz)− f(x0))fi−1(x0 − hnz)dz.

By the Tailor expansion for the functions f(x0 − hnz)− f(x0) and fi−1(x0 − hnz) at a point x0 up tothe orders p and q respectively (Assumptions (f) and (X)) and according to Assumption (K) we have∣∣∣∣∫ K (z) (f(x0 − hnz)− f(x0))fi−1(x0 − hnz)dz

∣∣∣∣ =

∣∣∣∣(−1)q

q!hqn

×p∑j=1

(−1)j

j!hjnf

(j)(x0)

∫zq+jK (z) (f


i−1(x0))dz

+(−1)p

p!hpn

∫zpK (z) (f (p)(x0 − θq,n,zhnz)− f (p)(x0))fi−1(x0 − hnz)dz

∣∣∣∣≤ C1h

q+1+γxn + C2h

p+γfn , (7.3)

where θq,n,z ∈ [0, 1].Thus, using the first assertion of the theorem, we obtain

|EIn| ≤1

Hn[C1h

q+2+γxn + C2h

p+1+γfn ]Eτn

≤ 1

Hn[C1h

q+2+γxn + C2h

p+1+γfn ](c0

x − κhq+γxn )−1h−1n (Hn +K)

= (c0x − κhq+γxn )−1[C1h

q+1+γxn + C2h

p+γfn ]

(1 +

K

Hn

). (7.4)

According to Assumptions (K) and (∆) we have

EJ2n · χ{τn≤ln} ≤ γδ

1

H2n

ln∑i=1

ln∑j=1

E|K(x0 − xihn

)K

(x0 − xjhn

)| · ϕ(|i− j|)

14

≤ ϕ(0)K2γδ

lnH2n

(1 + ln). (7.5)

Further, by Definition 3.1 we can estimate

EJ2n · χ{τn>ln} ≤

ϕ(0)

H2n

E

τn∑i=1

K2

(x0 − xihn

)

+2

H2n

E∑

1≤i<j≤τn

ϕ(j − i)K(x0 − xihn

)K

(x0 − xjhn

)· χ{τn>ln}

≤ ϕ(0)

H2n

E

τn∑i=1

E[K2

(x0 − xihn

)|Fxi−1] +

2

H2n

ESn ≤ ϕ(0)C3hnEτnH2n

+2

H2n

ESn, (7.6)

where

Sn =

τn∑j=2

K

(x0 − xjhn

) j−1∑i=1

K

(x0 − xihn

)ϕ(j − i) · χ{τn>ln}.

We have

Sn =

ln∑j=2

K

(x0 − xjhn

) j−1∑i=1

K

(x0 − xihn

)ϕ(j − i) · χ{τn>ln}

+

τn∑j=ln+1

K

(x0 − xjhn

) j−ln∑i=1

K

(x0 − xihn


+

τn∑j=ln+1

K

(x0 − xjhn

) j−1∑i=j−ln+1

K

(x0 − xihn


≤ γδϕ(0)

ln∑j=1

|K(x0 − xjhn

)|

2

+ γδϕn

τn∑j=ln+1

|K(x0 − xjhn

)|j−ln∑i=1

|K(x0 − xihn

)| · χ{τn>ln}

+γδϕ(0)

τn∑j=ln+1

|K(x0 − xjhn

)|

j−1∑i=j−ln+1

|K(x0 − xihn

)| · χ{τn>ln}

≤ γδϕ(0)l2nK2

+ γδK0hnϕnEτn

τn∑j=1

|K(x0 − xjhn

)|+ γδϕ(0)Kln

τn∑j=1

|K(x0 − xjhn

)|

≤ γδϕ(0)l2nK2

+ γδK0KhnϕnEτ2n + γδϕ(0)K0KlnhnEτn. (7.7)

Then, by the definition of wn and from (3.2), (3.3) and (7.5)-(7.7) it follows

EJ2n ≤ ϕ(0)K

2γδ

lnH2n

(1 + 3ln) + C3ϕ(0)hnEτnH2n

+2γδϕ(0)K0KlnhnEτnH2n

+ 2γδK0KϕnhnEτ

2n

H2n

≤ w2n (7.8)

and, by the definition of bn as well as (7.4), (7.8) we have

|Ef∗n − f | ≤ |EIn|+√EJ2

n ≤ bn.

15

The second assertion of Theorem 3.1 is proved.For the proof of the third assertion we define the function

f∗n =

1

Qn

τn∑i=1

E

[K

(x0 − xihn

)f(xi)|Fi−1

]

=hnQn

τn∑i=1

∫K (z) f(x0 − hnz)fi−1(x0 − hnz)dz.

Now we estimate the second moment

E(f∗n − f∗n)2 ≤ (1 + γx,δ)

[E

1

Q2n

(τn∑i=1

[f(xi)K

(x0 − xihn

)

−E[K

(x0 − xihn

)f(xi)|Fi−1

]]

)2

+ EJ2n

]

≤ (1 + γx,δ)

[1

H2n

E

τn∑i=1

[f(xi)K

(x0 − xihn

)− E

[K

(x0 − xihn

)f(xi)|Fi−1

]]2 + EJ2

n

]

= (1 + γx,δ)

[hnH2n

E

τn∑i=1

∫[K (y) f(x0 − hny)

−E[K

(x0 − xihn

)f(xi)|Fi−1

]]2fi−1(x0 − hny)dy + EJ2

n

]

≤ (1 + γx,δ)4hnH2n

E

τn∑i=1

∫K2 (y) f2(x0 − hny)fi−1(x0 − hny)dy + w2

n.

Thus, using Tailor expansion for the functions f(x0 − hny), fi−1(x0 − hny) at a point x0 we estimate

f2(x0 − hny) ≤ (p+ 2)

p+1∑k=0

1

(k!)2C2f,ky

2k+2(γf−1)χ(k=p+1),

|fi−1(x0 − hny)| ≤q+1∑j=0

1

j!Cx,j |y|j+(γx−1)χ(j=q+1)

and by the definition of C4 we have

E(f∗n − f∗n)2 ≤ (1 + γx,δ)

[C4

hnH2n

Eτn + w2n

]

≤ (1 + γx,δ)

[C4

Hn(c0x − κhq+γxn )−1

(1 +

K

Hn

)+ w2

n

]. (7.9)

Consider the difference

f∗n − f =

hnQn

τn∑i=1

∫K (z) (f(x0 − hnz)− f(x0))fi−1(x0 − hnz)dz

16

+f1

Qn

τn∑i=1

{E

[K

(x0 − xihn

)|Fi−1

]−K

(x0 − xihn

)}.

Using (7.3), we have

E(f∗n − f)2 ≤ 2E

(hnQn

τn∑i=1

∫K (z) (f(x0 − hnz)− f(x0))fi−1(x0 − hnz)dz

)2

+2E

(f

1

Qn

τn∑i=1

{E

[K

(x0 − xihn

)|Fi−1

]−K

(x0 − xihn

)})2

≤ 2h2n(C1h

q+1+γxn + C2h

p+γfn )2

H2n

Eτ2n

+2C2

f,0

H2n

E

τn∑i=1

{E

[K

(x0 − xihn

)|Fi−1

]−K

(x0 − xihn

)}2

≤ 2(C1hq+1+γxn + C2h

p+γfn )2h2

nt2n

H2n

+2C3C

2f,0hn

H2n

Eτn

≤ 2(C1hq+1+γxn + C2h

p+γfn )2h2

nt2n

H2n

+2C3C

2f,0

Hn(c0x − κhq+γxn )−1

(1 +

K

Hn

). (7.10)

Then the third assertion of the theorem follows from the definition of the function Vn, (7.9), (7.10) andthe inequality

E(f∗n − f)2 ≤ 2[E(f∗n − f∗n)2 + E(f

∗n − f)2] ≤ Vn.

Thus Theorem 3.1 is proved.

7.2. Proof of Theorem 4.1

Consider the deviation of the estimator (4.1):

f∗N (h,H)− f =[fN (h,H)− f

]· χ

(N∑i=1

K

(x0 − xih

)≥ H

)

+

[Cf,0

2− f

]· χ

(N∑i=1

K

(x0 − xih

)< H

). (7.11)

Then

E(f∗N (h,H)− f)2 = E[fN (h,H)− f

]2· χ

(N∑i=1

K

(x0 − xih

)≥ H

)

+

[Cf,0

2− f

]2

· P

(N∑i=1

K

(x0 − xih

)< H

)≤ 1

H2E

τN (h,H)∑i=1

(YiK

(x0 − xihn

)− f)

2

+C2f,0

4· P

(N∑i=1

K

(x0 − xih

)< H

).

17

Similar to the proof of Theorem 3.1, using the inequalities EτN (h,H) ≤ N and Eτ2(N,H) ≤ N2 in(7.8), (7.9) and (7.10) instead of (3.2) and (3.3) for Eτn and Eτ2

n can be proved the inequality:

1

H2E

τN (h,H)∑i=1

(YiK

(x0 − xih

)− f)

2

≤ 2

{(2C3C

2f,0 + C4(1 + γx,δ))

hN

H2+ (1 + γx,δ)

[ϕ(0)K

2γδl(H)

H2(1 + 3l(H))

+C3ϕ(0)hN

H2+ 2γδϕ(0)K0

l(H)hN

H2+ 2γδK0Kϕ(l(H))

hN2

H2

]+

2(C1hq+1+γx + C2h

p+γf )2h2N2

H2

}.

Now, using the Chebyshev inequality, for Nh(c0x − κhq+γx)−H > 0 we estimate the probability

P

(N∑i=1

K

(x0 − xih

)< H

)

= P

(N∑i=1

(EK[

(x0 − xih

)|Fi−1]−K

(x0 − xih

))>

N∑i=1

EK[

(x0 − xih

)|Fi−1]−H

)

≤ P

(N∑i=1

(EK[

(x0 − xih

)|Fi−1]−K

(x0 − xih

))> Nh(c0

x − κhq+γx)−H

)

≤ 1

(Nh(c0x − κhq+γx)−H)2

E

(N∑i=1

(EK[

(x0 − xih

)|Fi−1]−K

(x0 − xih

)))2

≤ C3Nh

(Nh(c0x − κhq+γx)−H)2

and two first assertions of Theorem 4.1 are proved.For the proof of the third assertion it is enough to prove the following limit relation:

limN→∞

1

HN

N∑i=1

K

(x0 − xihN

)> 1 a.s. (7.12)

To this end we note that, similarly to the beginning of the proof of Theorem 3.1 we have the inequalities

1

NhN

N∑i=1

E[K

(x0 − xihN

)|Fi−1] ≥ c0

x − κhq+γxn0> 0, N ≥ n0. (7.13)

Then we prove the following relations

E

∣∣∣∣∣ 1

NhN

N∑i=1

Ki,N

∣∣∣∣∣r

≤ (8K2r)r/2

N r/2hr−1N

, N ≥ 1, (7.14)

Ki,N = K

(x0 − xihN

)−K[

(x0 − xihN

)|Fi−1],

18

using the covariance (Burkholder) inequality

E

∣∣∣∣∣N∑i=1

Xi

∣∣∣∣∣m

≤ (2m)m/2

E(|Xi|

N∑j=i

|E(Xj |Fi)|)m/22/m

m/2

,

which holds true, according to Dedecker and Doukhan (1990), form ≥ 2 and every sequence (Xi,Fi)1≤i≤N ,such that max

1≤i≤NE|Xi|m <∞.

Indeed, these inequalities are fulfilled for m = r and Xi = Ki,N in our case in view of the followingrelations:

E(Kj,N |Fi) = Kj,N · χ(j = i), E[|K(x0 − xihN

)|s|Fi−1] ≤ Ks

hN , s > 0,

Cr-inequality: |a+ b| ≤ Cr(|a|r + |b|r), Cr = 1 if 0 < r < 1; Cr = 2r−1 if r ≥ 1 and

E|Kj,N |r ≤ 2r−1E(|K(x0 − xihN

)|r + |K[

(x0 − xihN

)|Fi−1]|r)

≤ 2r−1E(E[|K(x0 − xihN

)|r|Fi−1] + |K[

(x0 − xihN

)|Fi−1]|r) ≤ (2K)rhN .

Thus, from the Borel-Cantelli lemma, condition from the third assertion of Theorem 4.1 and (7.14) wehave

limN→∞

1

NhN

N∑i=1

Ki,N = 0 a.s.

and, using (7.13), we obtain

limN→∞

1

HN

N∑i=1

K

(x0 − xihN

)= lim

N→∞

CαNhN

N∑i=1

E[K

(x0 − xihN

)|Fi−1]

≥ c0x − κh

q+γxn0

c0x − κh

q+γxn0 − α−1/2

> 1 a.s.

Theorem 4.1 is proved.

REFERENCES

Bai, Er-Wei, and Liu, Yun. (2007). Recursive Direct Weight Optimization in Nonlinear System Identifi-cation: A Minimal Probability Approach, IEEE Transactions on Automatic Control (July 2007) 52(7):1218–1231.

Borisov, V.Z., and Konev, V.V. (1977). On Sequential Estimation of Parameters in Discrete-time Processes,Automat. and Remote Control 10: 58–64 (in Russian).

Carroll, R.J. (1976). On Sequential Density Estimation, Zeitschrift fur Warscheinlichskeitstheorie und Ver-wandte Geiete 36: 58–64.

Davies, H.I., and Wegman, E.J. (1975). Sequential Nonparametric Density Estimation, IEEE Transactionsin Information Theory 21: 619–628.

Dedecker, J., and Doukhan, P. (1990). A New Covariance Inequality and Applications, Stochastic Proc.Appl. 10: 63–80.

19

Dharmasena, S., Zeephongsekul, P., and De Silva. (2008). Two Stage Sequential Procedure for Nonpara-metric Regression Estimation, Australian and New Zealand Industrial and Applied Mathematics Journal49: 699–716.

Dobrovidov, A.V., Koshkin, G.M., and Vasiliev, V.A. (2012). Non-parametric State Space Models, USA:Kendrick Press.(Russian original: Dobrovidov, A.V., Koshkin, G.M., and Vasiliev, V.A. (2004). Nonparametric Estima-tion of Functionals of Stationary Sequences Distributions, Moscow.: Nauka.)

Efroimovich, S.Yu. (1999). Nonparametric Curve Estimation. Methods, Theory and Applications, Berlin,N. Y.: Springer-Verlag.

Efroimovich, Sam. (2007). Sequential Design and Estimation in Heteroscedastic Nonparametric Regression,Sequential Analysis: Design Methods and Applications 26(1): 3–25. DOI: 10.1080/07474940601109670

Gyorfi, L., Kohler, M., Krzyzak, A., and Walk, H.. (2002). A Distribution-free Theory of NonparametricRegression, Berlin, N. Y.: Springer-Verlag.

Honda, T. (1998). Sequential Estimation of the Hgarginal Density Function for a Strongly Mix-ing Process, Sequential Analysis: Design Methods and Applications 17(3,4): 239–251. DOI:10.1080/07474949808836411

Konev, V.V. (1985). Sequential Parameter Estimation of Stochastic Dynamical Systems, Tomsk: TomskUniv. Press (in Russian).

Konev, V.V., and Pergamenshchikov, S.M. (1990). Truncated Sequential Estimation of the Parameters inRandom Regression, Sequential analysis 9(1): 19–41.

Konev, V.V., and Pergamenshchikov, S.M. (1990). On Truncated Sequential Estimation of the Drifting Pa-rameter Mean in the First Order Autoregressive Model, Sequential analysis 9(2): 193–216.

Konev, V.V., and Pergamenshchikov, S. M. (1992). On Truncated Sequential Estimation of the Parametersof Diffusion Processes, Methods of Economical Analysis, Ed.: Central Economical and MathematicalInstitute of Russian Academy of Science, Moscow 3–31.

Kuchler, U., and Vasiliev, V. (2010). On Guaranteed Parameter Estimation of a Multiparameter Linear Re-gression Process, Automatica, Journal of IFAC, Elsevier 46: 637–646.

Fourdrinier, D., Konev, V., and Pergamenshchikov, S. (2009). Truncated Sequential Estimation of the Param-eter of a First Order Autoregressive Process with Dependent Noises, Mathematical Methods of Statistics18(1): 43–58.

Leonov, S.L. (1999). Remarks on Extremal Problems in Nonparametric Curve Estimation, Statistics andProbability Letters 43(2): 169-178.

Liptser R.Sh., and Shiryaev A.N. (1978). Statistics of Random Processes. I: General Theory. N. Y.: Springer-Verlag, 1977. 2: Applications. N. Y.: Springer-Verlag, 1978.

Liptser, R., and Shiryaev, A. (1988). Theory of Martingales. Springer Verlag, Berlin, Heidelberg, New York.Liski, E.P., Mandal, N.K., Shah, K.R., and Sinha, B.K. (2002). Topics in Optimal Design, N. Y.: Springer-

Verlag, 2: Applications. N. Y.: Springer-Verlag.Loeve, M. (1963). Probability Theory. 3-d. ed., N. Y.: D.Van Nostrand.Martinsek, A.T. (1992). Using Stopping Rules to Bound the Mean Integrated Squared Error in Density

Estimation, Annals of Statistics 20: 797–806.McMurry, T., and Politis, D.N. (2004). Nonparametric Regression with Infinite Order Flat-top Kernels, J.

Nonparam. Statist. 16(3-4): 549–562.Mukhopadhyay, N., and Solanky, T.K.S. (1994). Multistage Selection and Ranking Procedures, New York:

Marcel Dekker.Nazin, A.V., and Ljung, L. (2002). Asymptotically Optimal SMoothing of Averaged LMS for Regression //

Proceedengs of the 15th IFAC World Congress, Barcelona.

20

Novikov, A.A. (1971). Sequential Estimation of the Parameters of the Diffusion Processes, Theory Probab.Appl. 16(2): 394–396 (in Russian).

Politis, D.N. (2001). On Nonparametric Function Estimation with Infinite-order Flat-top Kernels, Probabil-ity and Statistical Models with applications, Ch. Charalambides et al. (Eds.), Chapman and Hall/CRC,Boca Raton 469–483.

Politis, D.N., and Romano, J.P. (1993). On a Family of Smoothing Kernels of Infinite Order, Proceedings ofthe 25th Symposium on the Interface, San Diego, California, April 14-17, (M. Tarter and M. Lock, eds.),The Interface Foundation of North America 141–145.

Politis, D.N., and Romano, J.P. (1999). Multivariate Density Estimation with General Flat-top Kernels ofInfinite Order, Journal of Multivariate Analysis 68: 1–25.

Politis, D., Romano, J., and Wolf, M. (1999). Subsumpling, Springer Verlag, Berlin, Heidelberg, New York.Politis, D.N., and Vasiliev, V.A. (2012). Sequential kernel estimation of a multivariate regression func-

tion, Proceedings of the IX International Conference ’System Identification and Control Problems’,SICPRO’12, Moscow, 30 January – 2 February, 2012, V. A. Tapeznikov Institute of Control Sciences996–1009.

Prakasa Rao, B.L.S. (1983) Nonparametric Functional Estimation, New York: Academic Press.Roll, J., Nazin, A., and Ljung, L. (2002). A Non-Asymptotic Approach to Local Modelling. The 41st IEEE

CDC, Las Vegas, Nevada, USA, 10–13 Dec. (Regular paper.)http://www.control.isy.liu.se/research/reports/2003/2482.pdf

Roll, J., Nazin, A., and Ljung, L. (2005). Non-linear System Identification Via DirectWeight Optimization,Automatica, IFAC Journal, Special Issue on Data-Based Modelling and System Identification 41(3): 475-490.http://www.control.isy.liu.se/research/reports/2005/2696.pdf

Rosenblatt, M. (1991). Stochastic Curve Estimation, NSF-CBMS Regional Conference Series, 3, Institute ofMathematical Statistics, Hayward.

Scott, D.W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization, Wiley, New York.Stute, W. (1983). Sequential Fixed-Width Confidence Intervals for a Nonparametric Density Function.

Zeitschrift fur Warscheinlichskeitstheorie und Verwandte Geiete 62: 113–123.Vasiliev, V.A. (1997). On Identification of Dynamic Systems of Autoregressive Type, Automat. and Remote

Control 58: 12, 106-118.Vasiliev, V.A. (2011). On Adaptive Control Problems of Discrete-time Stochastic Systems, Proceedings of

the 18th World Congress The Int. Fed. Autom. Contr., Milan, Italy, August 27 - September 2, 6 pages (tobe published).

Vasiliev, V.A., and Konev, V.V. (1997). On Optimal Adaptive Control of a Linear Stochastic Process,Proc.15th IMACS World Congress on Scientific Computation, Modelling and Applied Mathematics, Au-gust 24–29, Berlin, Germany 5: 87–91.

Vasiliev, V.A., and Koshkin, G.M. (1998). Nonparametric Identification of Autoregression, Probability The-ory and Its Applications 43: 577–588.

Wald, A. (1947). Sequential Analysis, N. Y.: Wiley.Wald, A. (1950). Statistical Decision Functions, N. Y.: Wiley.Xu, Y., and Martinsek, A.T. (1995) Sequential Confidence Bands for Densities, Annals of Statistics 23:

2218–2240.

21

Documents

Non-parametric Sequential Estimation of a Regression ...politis/PAPER/PV2.pdf · Non-parametric Sequential Estimation of a Regression Function Based on Dependent Observations