Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Optimal Subsampling Approaches for LargeSample Linear Regression
Rong ZhuAcademy of Mathematics and Systems Science,Chinese Academy of Sciences, Beijing, China.
Ping MaDepartment of Statistics, University of Georgia, Athens, GA, USA.
Michael W. MahoneyInternational Computer Science Institute and Department of Statistics,
University of California at Berkeley, Berkeley, CA, USA.Bin Yu
Department of Statistics and EECS,University of California at Berkeley, Berkeley, CA, USA.
November 6, 2018
Abstract
A significant hurdle for analyzing large sample data is the lack of effective statisti-cal computing and inference methods. An emerging powerful approach for analyzinglarge sample data is subsampling, by which one takes a random subsample fromthe original full sample and uses it as a surrogate for subsequent computation andestimation. In this paper, we study subsampling methods under two scenarios: ap-proximating the full sample ordinary least-square (OLS) estimator and estimatingthe coefficients in linear regression. We present two algorithms, weighted estimationalgorithm and unweighted estimation algorithm, and analyze asymptotic behaviorsof their resulting subsample estimators under general conditions. For the weightedestimation algorithm, we propose a criterion for selecting the optimal sampling prob-ability by making use of the asymptotic results. On the basis of the criterion, weprovide two novel subsampling methods, the optimal subsampling and the predictor-length subsampling methods. The predictor-length subsampling method is based onthe L2 norm of predictors rather than leverage scores. Its computational cost is scal-able. For unweighted estimation algorithm, we show that its resulting subsampleestimator is not consistent to the full sample OLS estimator. However, it has betterperformance than the weighted estimation algorithm for estimating the coefficients.Simulation studies and a real data example are used to demonstrate the effectivenessof our proposed subsampling methods.
1
arX
iv:1
509.
0511
1v1
[st
at.M
E]
17
Sep
2015
Keywords: big data, ordinary least-square, subsampling algorithm, algorithmic leveraging,linear regression, OLS approximation.
2
1 Introduction
Regression analysis is probably the most popular statistical tool for modeling the relation-
ship between the response yi and predictors xi = (x1i, . . . , xpi)T , i = 1, . . . , n. Given a
set of n observations, in modern massive data sets, the number of predictors p and/or the
sample size n are large, in which case conventional methods suffer from high-dimension
(p is large) challenge, large sample size (n is large) challenge or both. When p is large,
researchers typically assume a sparsity principle, that is, the response only depends on a
sparse subset of the predictors. Under this assumption, using a subset of the predictors in
model fitting has been employed traditionally in overcoming the curse of dimensionality.
Various methods have been developed for selecting the subset of predictors in regression
analysis. See Tibshirani (1996), Meinshausen and Yu (2009), and Buhlmann and van de
Geer. S. (2011). When n is large, an accurate estimation is usually assumed for model-
fitting because a large sample size has historically been considered good. See Lehmann
and Casella (2003). However, the advance of computing technologies still lags far behind
the growth of data size. Thus, calculating ordinary least-square (OLS) estimate might be
computationally infeasible for modern large sample data.
Given fixed computing power, one popular method for handling large sample is sub-
sampling, that is, one uses a small proportion of the data as a surrogate of the full sample
for model fitting and statistical inference. Drineas et al. (2008) and Mahoney and Drineas
(2009) developed the effective subsampling method in matrix decomposition, which used
normalized statistical leverage scores of the data as the non-uniform sampling probabil-
ity. Drineas et al. (2006, 2011) applied the subsampling method to approximate the OLS
estimator in linear regression. Another approach is to make random projections for fast
algorithms to solve OLS, which was studied by Rokhlin and Tygert (2008), Avron et al.
(2010), Dhillon et al. (2013), Clarkson and Woodruff (2013) and McWilliams et al. (2014).
In this paper, we consider two kinds of algorithms by subsampling, i.e., weighted es-
timation algorithm and unweighted estimation algorithm, which are classified according
to weighted least-square and unweighted least-square on the subsample. We establish the
first asymptotic properties of their resulting weighted subsample estimator and unweighted
subsample estimator, respectively, and propose new subsampling methods from a statisti-
3
cal point of view. We do so in the context of the OLS approximation and the coefficients
estimation, respectively, in fitting linear regression models for large sample data, where by
“large sample”, we mean that the computational cost is too expensive to calculate the full
sample OLS estimator.
Our main theoretical contribution is to establish the asymptotic normality and consis-
tency of weighted/unweighted subsample estimators for approximating the full sample OLS
estimator. Unlike the worst-case analysis in Drineas et al. (2011), the asymptotic analysis
provides a statistical tool to describe the approximation error of weighted/unweighted sub-
sample estimators with respect to the full sample OLS estimator. Meanwhile, the asymp-
totic properties of weighted/unweighted subsample estimators for estimating the coefficients
are established here. These asymptotic results hold for various subsampling methods, as
long as our general conditions are satisfied. Recently Ma et al. (2014, 2015) derived the bias
and variance formulas for the weighted subsample estimator by Taylor series expansion.
However, in their work, Taylor expansion remainder is not precisely quantified. Unlike their
results, we give the asymptotic distribution of the approximation error, as well as show that
the variance is approximated by an explicit expression and the bias is negligible relative
to the variance. From these results, we provide a framework to develop novel subsampling
methods for the weighted subsample estimator. For the unweighted subsample estimator,
theoretical analysis reveals that it is NOT consistent to the full sample OLS estimator.
However it has better performance than the weighted subsample estimator for estimating
the coefficients.
Our main methodological contribution is to propose two optimal subsampling methods
for the weighted subsample estimator. On the basis of asymptotic results, we propose
an optimal criterion for the weighted subsample estimator to approximate the full sample
OLS estimator. This criterion provides a guide to construct optimal subsampling methods.
Following it, we propose an optimal subsampling method (denoted by OPT below). The
computational cost of OPT is at the same order of the available subsampling methods
based on leverage scores. More importantly, the sampling probability based on OPT can
be approximated by a normalized predictor-length, i.e., the L2 norm of predictors. Thus,
the predictor-length subsampling method (denoted by PL below) is approximately optimal
4
with respect to the criterion for approximating the full sample OLS estimator. Remarkably,
PL is also optimal with respect to another criterion for estimating the coefficients. In
particular, the computational cost of PL is scalable.
Our main empirical contribution is to provide a detailed evaluation of statistical prop-
erties for weighted/unweighted subsample estimators on both synthetic and real data sets.
These empirical results show that our proposed subsampling methods lead to improved per-
formance over existing subsampling methods. They also reveal the relationship between
good performance of nonuniform subsampling methods and the disperse degree of sample
data.
The remainder of this article is organized as follows. We briefly review the least-square
problem as well as existing subsampling methods in Section 2. In Section 3, we study
the asymptotic properties of the weighted subsample estimator and propose OPT and PL
subsampling methods. In Section 4, the unweighted subsample estimator is investigated.
In addition, the comparison between the weighted subsample estimator and unweighted
subsample estimator is also shown in Section 4. Simulation studies and a real data example
are presented in Sections 5 and 6, respectively. A few remarks in Section 7 conclude the
article. All technical proofs are relegated to the Appendix and some additional contents
are reported in Supplementary Material.
2 Least-square Estimate and Subsampling Methods
In this section, we provide an overview of subsampling methods for the large sample linear
regression problem.
2.1 Ordinary Least-square Estimate
In this paper, we consider a linear model
yi = xTi β + εi; i = 1, . . . , n, (1)
where yi is a response variable, xi is a p-dimensional design vector, independent of random
error εi, and random errors {εi}ni=1 are identically and independently distributed with
5
mean zero and variance σ2 such that E(|εi|2+δ) <∞ for some δ > 0. Note that we assume
E(|εi|2+δ) < ∞ for the convenience of theoretical study. Model (1) is expressible in a
matrix form
y = Xβ + ε, (2)
where y = (y1, . . . , yn)T ∈ Rn is a response vector, X = (x1, . . . ,xn)T is an n × p design
matrix, and ε = (ε1, . . . , εn)T ∈ Rn is an error vector. We assume that n is large and
p � n, as high-dimension problems are not investigated here. The ordinary least-square
(OLS) estimator βols of β is
βols = arg minβ‖y −Xβ‖2 = (XTX)−1XTy, (3)
where ‖ · ‖ represents the Euclidean norm, and the 2nd equality holds when X has full
rank. Note that we assume, without loss of generality, that X has full rank. The predicted
response is given by y = Hy, where the hat matrix H = X(XTX)−1XT . The ith diagonal
element of the hat matrix H, hii = xTi (XTX)−1xi, is called the leverage score of the ith
observation. It is easy to see that as hii approaches to 1, the predicted response of the
ith observation, yi, tends to yi. Thus, hii has been regarded as the importance influence
index indicating how influential the ith observation is to the full sample OLS estimator.
See Christensen (1996).
The full sample OLS estimator βols in (3) can be calculated using the singular value
decomposition (SVD) algorithm in Golub and Van Loan (1996). By SVD for X, H is
alternatively expressed as H = UUT , where U is an n×p orthogonal matrix whose columns
contain the left singular vectors of X. Then, the leverage score of the ith observation is
expressed as
hii = ‖ui‖2, (4)
where ui is the ith row of U. The exact computation of {hii}ni=1 using (4) requires O(np2)
time. See Golub and Van Loan (1996). Fast algorithms for approximating {hii}ni=1 were
proposed by Drineas et al. (2012), Clarkson and Woodruff (2013) and Cohen et al. (2014)
to further reduce the computational cost.
6
2.2 Subsampling Methods
When the sample size n is large, the computational cost of the full sample OLS estima-
tor becomes extremely high. An alternative strategy is Weighted Estimation Algorithm
(Algorithm 1) by subsampling methods presented below.
Algorithm 1 Weighted Estimation Algorithm
• Step 1. Subsample with replacement from the data. Construct sampling
probability {πi}ni=1 for all data points. Draw a random subsample of size r � n,
denoted as (X∗,y∗), i.e., draw r rows from the original data (X,y) according to
the probability {πi}ni=1. Construct the corresponding sampling probability matrix
Φ∗ = diag{π∗k}rk=1.
• Step 2. Calculate weighted least-square using the subsample. Solve weighted
least-square on the subsample to get the Weighted Subsample Estimator β, i.e.
β = arg minβ||Φ∗−1/2y∗ −Φ∗−1/2X∗β||2. (5)
In Algorithm 1, the key component is the sampling probability {πi}ni=1 in Step 1. Below
are several subsampling methods that have been considered in the literature.
• Uniform Subsampling (UNIF). Let πi = 1/n, i.e., draw the subsample uniformly
at random.
• Basic Leveraging (BLEV). Let πi = hii/n∑i=1
hii = hii/p, i.e., draw the subsample
according to a sampling distribution that is proportional to the leverage scores of the
data matrix X.
• Approximate Leveraging (ALEV). To reduce the computational cost for getting
hii, fast algorithms were proposed by Drineas et al. (2012), Clarkson and Woodruff
(2013) and Cohen et al. (2014). We call them as approximate leveraging.
• Shrinkage Leveraging (SLEV). Let πi = λhii/p + (1 − λ)/n, where the weight
λ is a constant between zero and one. It is a weighted average of uniform sampling
probability and normalized leverage score.
7
UNIF is very simple to implement but performs poorly in many cases. The first non-
uniform subsampling method is BLEV based on leverage scores which was developed by
Drineas et al. (2006), Drineas et al. (2008), Mahoney and Drineas (2009) and Drineas et al.
(2011). SLEV was introduced by Ma et al. (2014, 2015).
Another important feature of Algorithm 1 is that Step 2 uses the sampling probability
to calculate weighted least-square on the subsample. It is analogous to the Hansen-Hurwitz
estimate (Hansen and Hurwitz (1943)) in classic sampling techniques. Assume that a ran-
dom sample of size r, denoted by (y∗1, . . . , y∗r), is drawn from a given data (y1, . . . , yn) with
the sampling probability {πi}ni=1, then the Hansen-Hurwitz estimate ofn∑i=1
yi is r−1r∑i=1
y∗i /π∗i ,
where π∗i is the corresponding sampling probability of y∗i . It is well known that r−1r∑i=1
y∗i /π∗i
is an unbiased estimate ofn∑i=1
yi. For an overview see Sarndal et al. (2003).
Unlike weighted least-square on the subsample in Algorithm 1, we calculates ordinary
least-square on the subsample in Unweighted Estimation Algorithm (Algorithm 2) which is
presented below.
Algorithm 2 Unweighted Estimation Algorithm
• Step 1. Subsample with replacement from the data. This step is the same as
Step 1 in Algorithm 1.
• Step 2. Calculate ordinary least-square to the subsample. Solve an ordi-
nary least-square (instead of weighted least-square) on the subsample to get the
Unweighted Subsample Estimator βu, i.e.,
βu
= arg minβ||y∗ −X∗β||2. (6)
Algorithm 2 was introduced by Ma et al. (2014, 2015) for BLEV in order to estimate
the coefficients β and shown to have better empirical performance than Algorithm 1. We
investigate asymptotic properties of its resulting unweighted subsample estimator βu
and
make a theoretical comparison between Algorithms 1 and 2 in Section 5.
8
3 Weighted Subsample Estimator
In this section, we theoretically investigate the resulting weighted subsample estimator β
in Algorithm 1 under two scenarios: approximating the full sample OLS estimator βols and
estimating the coefficients β, and propose two subsampling methods.
3.1 Weighted Subsample Estimator for Approximating the Full
Sample OLS Estimator
Now we investigate the asymptotic properties of the weighted subsample estimator β to
approximate the full sample OLS estimator βols. Motivated by the theoretical results, we
propose two novel subsampling methods to get β more efficient.
3.1.1 Asymptotic Normality and Consistency
Given data Fn = {z1, · · · , zn}, where zi = (xTi , yi)T , i = 1, . . . , n, we establish the asymp-
totic properties under the following conditions.
Condition C1.
M1 , 1
n2
n∑
i=1
‖xi‖2xix
Ti
πi= Op(1), (7)
M2 , 1
n2
n∑
i=1
xixTi
πi= Op(1), (8)
M3 , 1
n2+δ
n∑
i=1
‖xi‖2+2δ
π1+δi
= Op(1) for some δ > 0, (9)
M4 , 1
n2+δ
n∑
i=1
‖xi‖2+δπ1+δi
= Op(1) for some δ > 0. (10)
Theorem 1. If Condition C1 holds, then given Fn, we have
V−1/2(β − βols)L−→ N(0, I) as r →∞, (11)
where the notationL−→ stands for the convergence in distribution, V = M−1
X VcM−1X , and
Vc = r−1n∑i=1
e2iπixix
Ti with ei = yi − xTi βols.
Moreover,
V = Op(r−1), (12)
9
and
E(β − βols|Fn) = Op(r−1). (13)
Theorem 1 states that β is consistent to βols as well as the difference between β and βols
converges to a normal distribution as r gets large. The theoretical results hold regardless
of subsampling method, as long as our conditions are satisfied. We make some discussion
about the conditions on some subsampling methods in Section 3.1.2.
Remark. There is no requirement on whether n is larger than r. The asymptotic results
still hold even when r/n→∞. However, our aim is to overcome computational bottleneck,
so n is much larger than r in our empirical studies.
Remark. One contribution of the asymptotic normality (11) is to construct the confidence
interval for β − βols. However, the large sample size n can cause the computation of V to
be infeasible. To deal with it, we use the plug-in estimator V based on the subsample to
estimate V, i.e.,
V = M−1X VcM
−1X , (14)
where MX = r−1X∗TΦ∗−1X∗ and Vc = r−2r∑i=1
e∗2iπ∗2ix∗ix
∗Ti with e∗i = y∗i − x∗Ti β.
Remark. From Theorem 1, we can get the asymptotic version of the relative-error ap-
proximation result (Theorem 1 of Drineas et al. (2011)). See details in Supplementary
Material.
Remark. Although we consider random design matrix X in Theorem 1, the asymptotic
normality and consistency still hold for fixed design matrix setting, since the proof does
not rely on the random property of design matrix X.
3.1.2 Condition C1
M1 and M2 in C1 are conditions imposed on the fourth and second moments, respectively,
of predictors with weighting by sampling probability. M3 and M4 in C1 are conditions on
higher-order moments for satisfying the Lindeberg-Feller condition to prove the asymptotic
normality (van der Vaart (1998), 2.27 at section 2.8, Page 20). Note that the moments in
C1 are sample moments imposed on full sample data rather than population moments on
population distributions.
Now we focus on C1 for three subsampling methods investigated in this paper.
10
For UNIF, i.e., πi = 1n, we have
M1 =1
n
n∑
i=1
‖xi‖2xixTi , M2 =1
n
n∑
i=1
xixTi , M3 =
1
n
n∑
i=1
‖xi‖2+2δ, M4 =1
n
n∑
i=1
‖xi‖2+δ.
Thus, C1 can be derived from the condition that {xi} are independent with bounded fourth
moments.
For BLEV, i.e., πi = hii/p, since ‖xi‖2hii≤ λm, where λm is the largest eigenvalue of
1nMX , we have
M1 ≤ pλmn
n∑
i=1
xixTi , M2 ≤ pλm
n
n∑
i=1
xixTi
‖xi‖2, M3 ≤ (pλm)1+δ, M4 ≤ (pλm)1+δ
n
n∑
i=1
‖xi‖−δ.
M1, M2 and M3 can be derived from the bounded second moment, i.e., 1n
n∑i=1
xixTi = Op(1),
but M4 = Op(1) if 1n
n∑i=1
1‖xi‖δ = Op(1). However, if we change leverage score from hii to
h1/(1+δ)ii , it is sufficient to get those equations in C1 under the condition that {xi} are
independent with bounded fourth moments (See details in Supplementary Material).
For PL subsampling method to be developed in the next section, i.e., πi = ‖xi‖n∑i=1‖xi‖
, we
have
M1 =1
n2
n∑
i=1
‖xi‖n∑
i=1
‖xi‖xixTi , M2 =1
n2
n∑
i=1
‖xi‖n∑
i=1
xixTi
‖xi‖,
M3 = M4 =1
n2+δ
n∑
i=1
‖xi‖1+δ(n∑
i=1
‖xi‖)1+δ.
We can verify that C1 can be derived from the condition that {xi} are independent with
bounded fourth moments.
Thus, C1 is not strong, as it is sufficient to assume that {xi} are independent with
bounded fourth moments. For example, Gaussian, log-normal, mixture Gaussian and trun-
cated t distributions all have bounded fourth moments. In Section 6.1, we shall show
the empirical comparison of sampling probabilities among those distributions where their
leverage scores have different degree of heterogeneousness. Of course, the bounded fourth
moments exclude some random distribution, such as t1 distribution. However, the empiri-
cal analysis in Section 6 shows that good performance of various subsampling methods is
also obtained for the dataset generated from t1 distribution.
11
4 Optimal Subsampling
By Theorem 1, given Fn, Var(β) can be approximated by V = M−1X VcM
−1X as r becomes
large. Meanwhile, following (12) and (13) in Theorem 1, V dominates squared bias, that
is, we can make mean squared-error (MSE) to attain its minimum approximately by mini-
mizing V. Thus, a direct statistical objective is to minimize V in some sense.
Since MX is independent with {πi}ni=1, by the property of nonnegative definite matrix,
Vc(π1) ≤ Vc(π2) is equivalent to V(π1) ≤ V(π2) for any two sampling probability sets
π1 = {π(1)i }ni=1 and π2 = {π(2)
i }ni=1. Here we use the notation A ≤ B if B−A is nonnegative
definite. From this view, we can minimize Vc instead of V in some sense.
Let the scalar version of Vc be
tr[Vc] = r−1n∑
i=1
e2iπixTi xi. (15)
Furthermore, we take expectation for tr[Vc] under the linear model (1) to remove the effect
of model errors. Thus, we set the expectation of tr[Vc] as our objective function, i.e.,
E[tr(Vc)] =1
r
n∑
i=1
E [e2i ]
πi‖xi‖2 =
σ2
r
n∑
i=1
[1− hii]πi
‖xi‖2. (16)
Optimal Criterion for Approximating βols. Our aim is to construct sampling proba-
bility {πi}ni=1 in Algorithm 1 that minimizes the objective function E[tr(Vc)] in (16).
Theorem 2. When
πi =
√(1− hii)‖xi‖
n∑j=1
√(1− hjj)‖xj‖
, (17)
E[tr(Vc)] attains its minimum.
We denote the subsampling method according to the sampling probability in (17) as
optimal subsampling (OPT).The computational cost of OPT is in the same order as
that of BLEV, i.e., O(np2).
Remark. When the design matrix X is orthogonal, i.e., XTX = I, hii = xTi (XTX)−1xi =
‖xi‖2, the sampling probability of OPT is proportional to√
(1− hii)hii. Figure 1 illustrates
various subsampling methods for orthogonal design matrix X. We observe that optimal
12
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
leverage score
optim
al s
core
(a) optimal score vs leverage score
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
leverage score
leng
th s
core
(b) length score vs leverage score
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
leverage score
shrin
kage
sco
re
(c) shrinkage score vs leverage
score
Figure 1: Comparison of leverage score hii, optimal score√
(1− hii)hii, predictor-length√
(1− hii)hii and shrinkage score 0.9hii + 0.1 pn
with pn
= 0.2 for orthogonal design matrix
X. Dotdash lines give the value of y-axis equals that of x-axis, dotted lines give the value
of their intersection.
score√
(1− hii)hii amplifies small hii but shrinks large hii to zero in Figure 1(a), whereas
the shrinkage score 0.9hii + 0.1p/n is linear of the leverage score hii as seen Figure 1(c).
Moreover,√
(1− hii)hii is nonlinear to hii.
Remark. The sampling probability of OPT is proportional to√
1− hii if the predictor-
length ‖xi‖ is fixed. It is a surprising observation as the sampling probability of BLEV is
proportional to hii.
4.1 Predictor-length Subsampling
Since the computational cost of the sampling probability of OPT is O(np2), the computa-
tion is not scalable to the dimension of predictors. Now we develop a subsampling method
that allows a significant reduction of computational cost. The key idea is that when lever-
age scores of all observations are small, one can simplify the expression of the sampling
probability of OPT in (17).
Condition C2. leverage scores satisfy hii = op(1) for every i = 1, · · · , n.
Remark. hii can approach to 1 for serious outliers, while hii = p/n when leverage scores are
identical asn∑i=1
hii = p. Thus, this condition means that hii’s are not highly heterogenous.
13
Corollary 1. Under Condition C2, the sampling probability of OPT in (17) can be ap-
proximated by
πi =‖xi‖n∑i=1
‖xi‖. (18)
We denote the subsampling method according to the sampling probability in (18) as
predictor-length subsampling (PL), as the sampling probability is proportional to the
L2 norm of predictors. The computational cost of PL is O(np).
Remark. When X is orthogonal, i.e., XTX = I, the predictor-length ‖xi‖ =√hii. In this
case, the predictor-length is nonlinear of the leverage score, as illustrated in Figure 1(b).
Thus, compared to BLEV, OPT amplifies for small probabilities but shrinks for large
probabilities.
4.2 Weighted Subsample Estimator for Estimating the Coeffi-
cients
Besides approximating βols, another aim is to estimate the coefficients β under the linear
model (1), so we are also interested in establishing the asymptotic normality and consistency
of the weighted subsample estimator β with respect to β.
Now we give the asymptotic properties of β with respect to β in the following theorem.
By the theorem, we provide an optimal subsampling method with respect to a statistical
criterion for estimating β.
Theorem 3. If Condition C1 holds, under the linear model (1) we have
V−1/20 (β − β)
L−→ N(0, I) as r →∞, (19)
where V0 = M−1X Vc0M
−1X , and Vc0 = r−1σ2
n∑i=1
1πixix
Ti + (1 − r−1)σ2
n∑i=1
xixTi . Moreover,
V0 = Op(r−1).
Theorem 3 states that β is consistent to β and β−β converges to a normal distribution
under the linear model (1).
Remark. Comparing V for approximating βols in Theorem 1 with V0 for estimating β in
Theorem 3, we observe that V0 has an extra term M−1X
[(1− r−1)σ2
n∑i=1
xixTi
]M−1
X . The
14
result can be seen from the following fact:
β − β = β − βols + βols − β, (20)
where β − βols leads to the main term of V0, M−1X
[r−1σ2
n∑i=1
1πixix
Ti
]M−1
X , and βols − βleads to the extra term of V0.
Remark. Analogous to the plug-in estimator V in (14), V0 can be estimated by the
plug-in estimator V0 based on the subsample, i.e., V0 = M−1X Vc0M
−1X , where Vc0 =
r−2σ2r∑i=1
1π∗2ix∗ix
∗Ti + (r−1 − r−2)σ2
r∑i=1
x∗ix∗Ti with σ2 = 1
r−pr∑i=1
(y∗i − x∗Ti β)2.
Since β is an unbiased estimator of β, we consider its covariance matrix V0 to construct
the optimal subsampling method for estimating β. Following the optimal criterion for
approximating βols, we propose a criterion to estimate β.
Optimal Criterion for Estimating β. Our aim is to find the sampling probability
{πi}ni=1 that minimizes tr(Vc0).
Remark. Unlike the optimal criterion (16) for approximating βols, we do not need to take
expectation for tr(Vc0) under the linear model (1).
Corollary 2. If PL subsampling method is chosen, i.e., πi = ‖xi‖n∑i=1‖xi‖
, then tr(Vc0) attains
the minimum.
Corollary 2 states that PL is optimal with respect to the criterion for estimating β.
This is very interesting since PL is an approximately optimal method for approximating
βols.
5 Unweighted Subsample Estimator
Unlike the weighted subsample estimator β in Algorithm 1, the unweighted subsample es-
timator βu
in Algorithm 2 does not solve the weighted least-square using the subample but
calculates the ordinary least-square directly. In this section, we investigate βu
under two
scenarios: approximating the full sample OLS estimator βols and estimating the coefficients
β.
15
5.1 Unweighted Subsample Estimator for Approximating the Full
Sample OLS Estimator
In the following theorem, we shall establish asymptotic properties of the unweighted sub-
sample estimator βu
for approximating βols. The properties show that, given the full
sample data Fn, βu
is NOT a consistent estimator of βols, but a consistent estimator of
the full sample weighted least-square (WLS) estimate
βwls = (XTΦX)−1XTΦy, (21)
where Φ = diag{πi : i = 1, · · · , n}.Condition C3.
n∑
i=1
πi‖xi‖2xixTi = Op(1),n∑
i=1
πixixTi = Op(1), (22)
n∑
i=1
π1+δi ‖xi‖2+2δ = Op(1),
n∑
i=1
π1+δi ‖xi‖2+δ = Op(1), for some δ > 0. (23)
Remark. Unlike C1 in which the sample moments are weighted by {1/πi}ni=1, the sample
moments in C3 are weighted by {πi}ni=1.
Theorem 4. If Condition C3 holds, then given Fn,
(Vu)−1/2(βu − βwls)
L−→ N(0, I) as r →∞ (24)
where Vu = (MuX)−1Vu
c (MuX)−1, Mu
X =n∑i=1
πixixTi , and Vu
c = r−1n∑i=1
πi(ewlsi )2xix
Ti with
ewlsi = yi − xTi βwls.In addition, we have
E(βu − βols|Fn) = βwls − βols +Op(r
−1). (25)
Theorem 4 states that βu
is not a good choice to approximate βols, since the main term
of its bias, βwls − βols, can not be controlled by increasing r.
Remark. βwls − βols = 0 for uniform subsampling method. For this case, the unweighted
subsample estimator βu
is identical to the weighted subsample estimator β.
Remark. Similar to the plug-in estimator V in (14), Vu can be estimated using the sub-
sample, i.e., Vu = (MuX)−1Vu
c (MuX)−1, where Mu
X = r−1X∗TX∗, and Vuc = r−2
r∑i=1
(e∗wlsi )2x∗ix∗Ti
with e∗wlsi = y∗i − x∗Ti βu.
16
5.2 Unweighted Subsample Estimator for Estimating the Coeffi-
cients
In Section 5.1, we have shown that the unweighted subsample estimator βu
converges to the
full sample WLS estimate βwls rather than the full sample OLS estimator βols. However, it
is easy to see that βu
is an unbiased estimator of β, i.e., E(βu) = β, where the expectation
is taken under the linear model (1). We establish asymptotic properties of βu
for estimating
β in the following theorem.
Theorem 5. If Condition C3 holds, then under the linear model (1) we have,
(Vu0)−1/2(β
u − β)L−→ N(0, I) as r →∞, (26)
where Vu0 = (Mu
X)−1(Vuc0)(M
uX)−1 with Vu
c0 = r−1σ2n∑i=1
πixixTi + (1− r−1)σ2
n∑i=1
π2ixix
Ti .
Moreover, Vu0 = Op(r
−1).
Theorem 5 states that βu
is consistent to β and βu−β converges to a normal distribution
under the linear model (1).
Remark. Similar to the plug-in estimator V in (14), Vu0 is also estimated based on
the subsample, i.e., Vu0 = (Mu
X)−1Vuc0(M
uX)−1, where Vu
c0 = r−2σ2u
r∑i=1
x∗ix∗Ti + (r−1 −
r−2)σ2u
r∑i=1
π∗ix∗ix∗Ti with σ2
u = 1r−p
r∑i=1
(y∗i − x∗Ti βu)2.
As we know, both the weighted subsample estimator β and the unweighted subsample
estimator βu
are unbiased estimators of β. Thus, we compare β and βu
in terms of
efficiency, i.e., their corresponding asymptotic covariance matrices V0 in (19) and Vu0 in
(26).
Corollary 3. We have that, as r = o(n),
Vu0 −V0 ≤ 0, (27)
where we use the notation Vu0 −V0 ≤ 0 if Vu
0 −V0 is nonpositive definite, and the equality
holds if and only if πi = 1/n for i = 1, · · · , n.
Corollary 3 states that βu
is more efficient than β, except for uniform subsampling, in
which two estimates are identical.
17
Remark. The condition that r = o(n) meets the computation bottleneck problem we
study in this paper.
From the theoretical analysis in this section, we have the following recommendation
about the unweighted subsample estimator βu. β
uis not ideal to approximate βols, since
it is not a consistent estimator of βols. However, βu
is a better choice to estimate β, since
it can get more efficient than β.
6 Empirical Evaluation on Synthetic Data
Extensive simulation studies are conducted to examine the empirical performance of the
weighted subsample estimator β and the unweighted subsample estimator βu
based on
various subsampling methods. In this section, we report several representative studies.
6.1 Synthetic Data
The n × p design matrix X matrix is generated row-by-row from one of five multivariate
distributions introduced below. (1) We generated X from Gaussian distribution N(0,Σ),
where the (i, j)th element of Σ, Σij = 2 × 0.8|i−j|. (Referred to as GA data.) (2)We
generated X from mixture Gaussian distribution with 12N(0,Σ) + 1
2N(0, 25Σ). (Referred
to as MG data.) (3)We generated X from log-normal distribution LN(0,Σ). (Referred
to as LN data.) (4)We generated X from t distribution with 1 degree of freedom and
covariance matrix Σ (t1 distribution). (Referred to as T1 data.) (5) We generated X from
truncated t1 distribution with element-wise truncation at [−p, p]. (Referred to as T1 data.)
We set n = 50, 000 and p = 50.
In Figure 2, we plot scatter diagrams for showing the comparison among sampling
probabilities of those data points based on BLEV, OPT and PL. For MG, LN, TT and T1,
we observe that (1) sampling probabilities of OPT and PL increase as sampling probabilities
of BLEV increase, (2) OPT and PL amplify small sampling probabilities but shrink large
sampling probabilities, compared to BLEV, and (3) sampling probabilities of OPT and
PL look very similar. Additionally, as GA has the most homogeneous probabilities, the
comparison with GA is unclear.
18
Figure 2: Various sampling probabilities of data points of BLEV, OPT and PL for different
data sets. Upper Panels: sampling probabilities between OPT vs BLEV. Lower Panels:
sampling probabilities between PL vs BLEV. From left to right: GA, MG, LN, TT, and
T1 data, respectively. Dashed lines give the value of y-axis equals that of x-axis.
●●●●●●●●
●
●●
●
●
●
●
●
●●●●●●●●●●●●●●
●
●●●●
●
●
●
●●●●●●●●●
●●
●●●
●●●
●●●●
●●
●
●
●●●●●●●
●
●●●
●
●●●●
●
●●●
●
●
●●
●●●●●●●●●●●●●●●●●
●
●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●
●●
●
●
●●●
●●
●●●●●●
●
●●●●
●
●●●●
●
●●●●●
●
●●●●●●●●●●
●
●●●●●●●●●●●
●
●
●●
●●
●
●●●●
●
●
●
●●●
●
●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●●●●●
●
●●●●●●●●
●
●●●●
●
●●●●
●
●
●
●
●
●●
●
●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●
●
●●●●●●●●●
●●
●
●
●●●●●●●●●
●
●
●
●●●●●●●●●●●
●
●●●●
●●
●●
●
●●●●●●●
●
●●●●●●●●●●●●●●●
●●
●●●
●
●●●
●
●●
●●
●
●
●●
●
●●●
●●●●●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●●●●●●●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●●
●●●●●●●●●●
●
●●●
●●
●
●
●
●●
●
●●●●●
●
●●●
●
●●
●●●●
●
●
●
●
●
●●
●●
●●●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●
●
●●●●
●
●●●●
●
●
●
●
●
●
●
●●●●
●●
●
●
●●
●
●
●●
●
●●
●●●
●●
●●●●
●
●
●
●●●
●
●●●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●●●
●
●●●
●
●●●●
●
●●●●●●●
●
●
●●●●●●●
●●●●●
●
●
●●●●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●●●●●●●●
●
●●●●●
●
●●●●●
●
●
●●●●
●
●●●●
●●●●
●
●●
●
●
●●
●●●
●●
●
●
●●
●
●●
●●
●●●●●●
●
●
●●
●●●●●●
●
●●●●●
●
●
●●●
●
●
●
●●●
●●
●
●●
●
●●●●●●●
●
●●
●
●●●●
●
●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●
●●●
●●●
●
●
●
●●●
●●●●
●●●●●●
●
●●●●●●●●●
●●
●
●●●
●●
●●
●
●
●
●●●
●
●
●
●
●●●●
●
●●●●
●
●●
●
●
●
●●●
●
●●●●
●
●●
●
●●
●●●
●
●
●●
●●●
●
●
●
●
●
●●●●●●
●●●●●●●
●
●
●●
●●●●●●●●●●●
●●●
●●●●●●●
●
●●
●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●
●
●●●●●●●●●●●●●
●●●●●●●
●
●●●●●
●●●
●●●●●●
●●●●●●●
●●●
●●●
●●●
●●
●
●●●●●●●●●●
●●
●●
●●●
●●●
●●●
●●
●●●●●●●●●●
●
●
●●●●
●
●●●●●
●
●●●●
●●●●●
●
●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●
●●●●●●●●●●
●●●●●●●●●●
●●
●●●●●
●●●●●
●●●●●
●●
●
●
●●●●●●●●●●
●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●
●●●●
●●●●●●●●●●●
●●●●●●●
●
●
●●●●
●●
●●●●●●●
●●●●●●●
●●●●●●●●●●●
●●●
●●●
●●●●●
●●●
●●●●●●
●●●●●●●●
●
●
●●
●●●
●●●
●●●●●●
●●●●●●
●●
●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●
●
●●●
●●●●●●●●●
●●●●●●●●●●●●
●
●●●
●
●●●
●●●
●
●●●●
●●●
●●●●●●●●
●
●●
●
●●
●●●●●●●●●●●●●●●●●●●
●
●●●●●●
●●●●
●
●●●●●
●
●●
●●●●
●●●
●●●
●●●
●●●●●●●●●●●
●●●●
●
●●●
●●●●●●
●●●●●●
●●●●●●●
●●●
●●●●
●●●●
●
●●●●●●●●●
●●
●
●●●●
●●
●
●
●●●●
●●●●●●●●●●
●●●
●
●
●
●
●
●●
●●●●●●●●●●●●●●●●●●●●
●
●●
●
●●
●●●●●●
●●●●●●
●
●●
●●●●●●●
●●●●
●●●●
●●●●●
●
●●●●●
●●●●
●●●●●●●●
●●●
●
●
●●●●
●●●●●●●●
●
●●●
●●●●●●●●
●
●●
●
●
●●
●
●
●●●●
●●●●●●●●●●
●
●●●●●●●●●
●●●●●●
●●●●●
●●
●●●●●●●
●●●●
●
●●●●●●●●●●●
●●●●●
●●●
●●●●●
●
●●●●●●●●
●●
●●●
●●●
●
●
●●●●●●●
●
●●●●●●●
●●●●●●●
●●●●
●
●●●●●●●●●●●●●●●
●
●●●●●●●●
●
●●●●●●
●●●●●
●
●
●●●●●●●●●●●
●●
●●
●●
●
●
●
●●
●●
●●●
●●●●●
●●●●
●●●●●●●●●●●●●●●
●
●●●●
●●●●●●●●
●●●●●●●●●
●
●●●●●●●●●●
●●●●
●
●●●●●●●●●●●●
●
●●●
●●●●●●●
●●●
●●●●●
●●●●●●●●●
●●●
●●●●●
●●●
●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●
●
●●●●●●●●●●●●
●●●●●●●
●●
●●●●
●
●●●●●●
●
●●●●
●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●
●
●●●●●●●●
●
●●●●
●
●●●
●
●●●
●
●●●
●●●●●●●●●●●●●
●●
●
●●●●●●
●
●●●●●●
●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●
●●
●●●●●●●●●●
●
●●
●
●●●●●
●●●●●●●●●
●
●
●●●●
●
●●●●●●●
●
●
●●●●●●●●●●●●●●●
●
●
●
●●●●●
●
●●●
●●
●●
●
●●●
●●●●
●
●●●●●●●
●●●
●
●
●●●●●●●
●●●●
●●●●●●●
●
●●●●●
●●●●●●●●●●●●●●
●●●●●
●●●●●
●●●●●●
●●
●●●●●●●
●
●●
●●●
●
●●●●●●●●●●●●
●●
●
●●
●●
●
●
●●●●●●
●●
●
●●
●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●
●
●
●
●●
●●●●●●●●
●
●●●
●
●●●
●
●●●●●●●●●●●●●●●●●●●●
●
●
●●●●
●
●●●●
●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●●●●●●●●
●●●●●●●●●●●●
●
●●
●●●●●
●●●●●●●●●●●●●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●●●●●●●
●
●●
●
●●
●●●
●●
●
●●
●
●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●●●
●●
●
●
●
●●
●
●●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●●●●●●
●●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●●●●●●●●
●
●
●
●
●
●
●●●●●●
●
●
●
●
●●●●●
●●●
●●●●●●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●
●
●
●●
●●●●●●●
●
●●●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●●●●●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●●
●
●
●
●●●●
●●
●●
●
●●
●
●
●
●●●
●
●●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●●●●
●
●●
●
●
●
●●●●●●●●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●●
●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●●
●●●
●
●●
●
●
●●●●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●●
●
●●
●●●
●●
●
●
●●●
●●●
●●●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●●●●●●●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●●●●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●
●
●●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●●●●●
●●
●●●●
●●●●
●●
●
●●
●●●
●
●●
●
●
●●
●
●
●●
●
●
●●●
●
●●●
●
●
●
●●●●●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●
●●
●●●
●
●●
●
●●
●
●●
●
●●●
●
●
●
●●●
●●●●
●●
●
●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●●●●
●
●
●
●
●
●
●●●●
●
●
●
●
●●●●●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●●
●
●
●
●
●
●●●●●●●
●
●
●
●●●
●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●●●●
●
●
●●
●●
●
●
●
●●●
●●●
●
●
●
●
●●●
●●●●
●
●
●●
●
●
●
●●
●
●●●
●●
●●●●
●●
●
●●
●●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●●
●
●●
●
●
●●●●
●
●●
●●
●●●●●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●●●●
●
●
●●●●●
●●
●
●
●
●
●●●●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●●
●●●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●●
●
●●●●●●
●
●
●
●
●●
●
●●
●
●
●●
●
●●●●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●●●
●
●
●●●
●
●
●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●●
●●●
●
●
●●
●●●●●●●
●
●
●
●
●
●
●●
●
●●
●●
●●●
●●
●
●
●●
●●
●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●●●●
●
●●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●●
●
●●●
●
●●
●
●●
●
●●●●●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●●
●●●●●
●
●
●●
●
●
●
●●●
●
●
●
●●●
●
●●
●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●●●●
●●
●
●●●
●
●
●●
●●
●
●
●
●●●●●●
●
●●
●
●
●●
●
●
●
●
●●
●
●●●●●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●●
●
●●
●
●
●●●
●
●
●
●●
●
●
●●●●
●
●●●
●
●
●
●
●●●
●●
●
●
●●●
●●
●
●
●●
●
●
●
●●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●●●●
●●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●
●●●
●●
●
●●●
●
●
●
●●●●
●●●
●●
●
●
●
●
●
●
●●●●
●●
●
●●●●
●●●
●
●●●
●
●
●●
●
●●
●●●●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●
●
●●●●●
●
●●●●
●
●●
●
●●
●●●●
●●
●●●●
●●●●
●
●
●
●
●
●
●
●●●
●
●
●●
●●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●●●●
●
●
●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●●
●
●
GA MG LN TT T1
−20
−15
−10
−5
log(
prob
abili
ty)
●
●
●
●
●●
●●●
●●●●
●
●
●
●●
●
●
●●●
●
●
●●
●●
●
●●●●●
●
●
●
●
●●●
●
●●●
●
●●
●
●
●●●●●●●
●
●
●●●●●●●
●
●
●
●●
●
●
●●●●●●●
●●
●●
●●●
●
●
●
●●
●
●
●●●●●
●●●●
●●
●●●●●●●
●●
●●
●●
●
●●
●
●●●●
●
●●●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●●●●
●●
●●
●●●●●
●
●
●●●
●
●●●
●
●●
●
●●
●●●●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●
●●●●
●
●●●●●●
●
●●●
●●
●
●
●●●
●
●
●●●
●●●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●●
●
●
●●
●
●
●●●●●●●
●
●●●
●●
●
●
●●
●●●
●
●
●
●
●
●●
●●●
●
●
●
●●●
●●●●●
●
●●
●
●
●
●
●●●●●
●●
●
●●●●●●●●●
●
●
●
●
●●●●●●
●
●
●
●●
●
●●●●●
●
●●●
●
●●●●●
●
●
●●
●
●
●
●
●●●●●
●
●●●●●
●
●●
●
●●●●●●
●
●●●
●
●
●
●●
●●
●●
●
●
●●●●●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●●●●
●
●
●
●
●●●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●●●
●
●●●●
●●
●●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●●●●
●
●●
●
●●●●
●
●
●
●
●●●
●●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●●●●●●●●
●
●
●●●●●
●●●●
●
●
●●●●
●
●
●●●●●●●
●
●
●
●
●
●●●●●●
●●
●
●●●
●●
●
●
●
●●●●●●
●
●
●
●
●
●
●
●●●
●
●●●●
●
●●
●
●●
●
●
●
●●
●
●●
●●●●●●●●●●●
●
●●
●
●●
●
●●●
●●
●●●●
●
●
●
●
●
●
●●●
●●●●●●●
●
●●●●●●●
●
●
●
●●●
●
●
●●
●
●
●●●●●
●
●●●●●
●
●●●
●●
●
●●●
●
●
●
●●
●
●●●●●
●
●
●
●
●
●●●●
●
●●●
●
●●
●
●
●●●●●●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●●●●
●
●●●●
●
●
●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●●●
●●●
●●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●●●●●
●
●
●●
●●●●●●●
●
●
●
●●
●
●●
●
●●●●●●
●
●
●
●●●●●
●●
●
●●●●●
●
●
●
●
●
●●●
●
●
●●●●●●
●
●●●●●
●
●
●●
●
●●●●●●●
●
●●
●●●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●●●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●●
●
●●
●●
●
●●●
●
●●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●●●●●●●
●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●
●
●
●●
●
●●●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●●
●
●
●
●●●
●●
●
●●
●●
●
●
●
●●●
●●
●
●●
●
●
●●●
●
●
●
●●●
●●●●
●
●
●
●
●●
●
●
●
●
●
●●●●
●●●
●●●●
●
●
●
●●
●
●
●
●
●●●
●●●
●
●
●●
●●●●●●
●
●●
●
●●●
●
●●●
●
●
●
●●
●●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●●●●●
●
●
●
●
●
●
●●
●●●●●●●●
●●
●●●●
●●●●
●●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●
●
●●●
●
●●
●●
●●●●
●
●●
●
●
●
●
●
●●
●
●
●●●●●●●
●
●
●
●
●
●●
●●
●
●●●
●
●●
●
●
●
●●●●
●
●●
●
●●
●●●●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●
●●●
●
●●
●
●
●
●
●
●
●●●
●
●
●●
●●●
●●●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●●
●●
●●
●●●
●
●
●
●●
●
●
●
●●
●●
●●
●●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●●●●
●
●
●●●
●●●
●
●
●
●
●●●
●●●●●●
●
●
●
●
●●
●
●●●
●●
●●●
●●●
●
●●
●●
●●
●●
●●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●●
●
●
●●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●●●●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●●
●
●
●●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●●●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●●●●●●●●
●●
●
●
●
●
●●
●
●●
●●●
●●●●●●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●●
●
●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●●
●
●
●
●●●●●
●
●
●●
●
●●
●
●
●●
●●
●●●●●
●
●
●
●●
●●
●
●●
●●
●●●●
●
●
●●●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●●●●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●●
●
●●●●●
●
●●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●●●
●
●●
●●
●
●●●
●●
●
●●
●
●●
●●
●●●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●●●
●●●●
●
●
●
●●●
●
●
●●
●●●
●
●
●
●●●●●●●
●●
●
●
●●
●
●
●
●
●
●●●●●●●●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●●●
●●
●
●
●●●●
●
●
●●
●
●
●
●●●
●
●
●
●●
●●●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●●●●
●●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●●●●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●●●●●
●
●●●●●
●
●●●●●●●
●
●
●
●
●
●
●
●
●●●
●●●
●●
●
●●●●
●
●●
●
●●
●
●●●●●
●
●
●
●●●
●●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●●●
●
●
●
●
●●●●
●
●●●●
●
●
●
●●●●●●●
●●●●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●●●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
GA MG LN TT T1
−14
−12
−10
−8
−6
log(
prob
abili
ty)
●
●
●
●
●●
●●●
●●●●
●
●
●
●●
●
●
●●●
●
●
●●
●●
●
●●●●●
●
●
●
●
●●●
●
●●●
●
●●
●
●
●●●●●●●
●
●
●●●●●●●
●
●
●
●●
●
●
●●●●●●●
●●
●●
●●●
●
●
●
●●
●
●
●●●●●
●●●●
●●
●●●●●●●
●●
●●
●●
●
●●
●
●●●●
●
●●●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●●●●
●●
●●
●●●●●
●
●
●●●
●
●●●
●
●●
●
●●
●●●●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●
●●●●
●
●●●●●●
●
●●●
●●
●
●
●●●
●
●
●●●
●●●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●●
●
●
●●
●
●●●●●●●●●
●●●
●●
●
●
●●
●●●
●
●
●
●
●
●●●●●
●
●
●
●●●●●●●●●
●●
●
●
●
●
●●●●●
●●
●●●●●●●●●●
●
●●
●
●●●●●●
●
●
●
●●
●
●●●●●
●
●●●●
●
●●●●●
●
●
●●
●
●
●
●
●●●●●
●
●●●●●
●
●●
●
●●●●●●
●
●●●
●
●
●
●●
●
●●
●
●●
●
●
●●●●●●
●
●
●
●
●●●●●●
●
●
●
●
●
●●●●●
●
●
●
●●●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●●
●●●●
●
●
●
●●●●●
●
●●●●
●●
●●
●
●
●
●●●●
●
●
●
●
●●●●●
●
●
●●●●
●
●●
●
●●●●
●
●
●●
●
●●●
●●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●
●●●●●●●●
●
●
●
●
●
●●●●●●
●●
●
●●●●●
●
●
●
●●●●●●
●
●
●
●
●
●
●
●●●
●
●●●●
●
●●
●
●●
●
●
●
●●●
●
●
●
●●●●●●●●●●●
●
●●
●
●●
●
●●●
●●
●●●●
●
●
●
●
●
●●●●
●●●●●●●●●●●●●●●
●
●
●
●●●
●
●
●●●
●●●●●●●
●
●●●●●
●
●●●●●
●
●●●
●
●
●
●●
●
●●●●●
●
●
●
●
●
●●●●
●
●●●
●
●●
●
●
●●●●●●●●
●●●●
●●●●
●
●
●
●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●●●●●
●
●●●●●
●●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●●
●
●●
●
●
●●
●
●
●●●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●●●●●●
●●●
●
●
●
●●●
●
●
●●●
●
●
●●
●●●●●●
●
●●●●●●●●●●
●
●
●●
●
●●
●
●●●●●●
●
●
●
●●●●●
●●
●
●●●●●
●
●
●
●
●
●●●
●
●
●●●●●●
●
●●●●●
●
●
●●
●
●●●●●●●
●
●●
●●●
●
●
●
●
●●●
●
●●●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●
●●●●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●●
●
●●
●●
●●●●
●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●
●●●●●●●●●
●
●
●
●
●●
●
●●
●
●●●●
●
●●
●
●●
●●
●
●●●
●●●
●
●
●
●
●
●●●
●
●
●
●●
●●
●●●
●
●
●●
●
●
●
●●
●
●
●
●●●
●●
●
●●●●
●
●
●●●●
●●
●
●●●
●
●●●
●
●
●●●●●●●●
●
●
●
●
●●
●
●
●
●
●●●●●
●●●
●●●●●
●
●
●●●
●
●●
●●●
●●●
●
●
●●
●●●●●●
●
●●
●
●●●
●
●●●
●
●
●
●●
●●●●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●●●
●
●
●●●
●
●●
●
●●●●●
●
●●
●
●
●
●●
●●●●●●●●
●●●●●●
●●●●●●
●
●●●
●●
●
●
●●●
●
●
●●
●
●●●
●
●●●●
●
●
●●●
●●●●●
●●●●
●
●●
●
●
●
●
●
●●
●
●
●●●●●●●●●
●
●
●
●●
●●
●
●●●
●
●●
●
●
●
●●●●
●●●●
●●
●●●●●
●●●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●●●
●
●●●
●●●
●
●
●
●
●
●
●●●
●
●
●●●●●●●●
●●
●●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●●●●●
●
●
●
●●
●
●●
●
●●
●●●●
●
●●
●
●
●
●●
●●
●●●●●●●
●
●
●
●●
●
●
●
●●
●●●●
●●●●
●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●●●●
●
●
●●●●●●
●
●●
●
●●●
●●●●●●●
●
●
●
●●
●
●●●
●●
●●●
●●●
●
●●●●
●●
●●●●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●●
●
●●●
●
●●●
●
●
●
●●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●
●●
●●●●
●
●
●
●●●●●
●●
●
●
●
●●●
●
●●
●●
●
●
●
●
●●●
●
●●
●
●●
●●●
●
●●
●
●
●●●●
●●
●
●
●
●
●●●●●
●
●
●
●●
●
●
●●
●
●●
●
●●●
●●
●
●
●●
●●●
●
●●●
●
●●
●
●
●●●
●
●
●
●●●
●
●
●
●
●●●●
●●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●
●
●●●●●●●●●●●
●
●
●
●●●
●●●●●●●●●●●
●
●
●
●●●
●●
●
●
●●
●●●●
●
●●
●
●
●●
●
●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●●●
●
●
●●
●
●
●
●●●●●
●
●
●●
●●●
●
●
●●
●●●●●●●
●
●
●
●●
●●
●
●●
●●
●●●●
●
●
●●●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●●●●
●
●●●●
●●
●●●
●
●●
●
●
●
●●
●
●●●●●
●
●●
●
●●
●
●●
●
●●●
●
●
●
●●●
●●
●
●
●●
●●●
●
●●
●●●
●
●●
●●
●
●●●
●●
●
●●
●
●●
●●
●●●●●
●
●
●●
●●●
●
●●
●
●
●
●●●●
●●●●●
●
●
●●●
●
●●●
●●●
●
●
●
●●●●●●●
●●
●
●
●●
●
●
●
●
●●●●●●●●●●●
●
●
●
●
●●
●
●
●●●
●●
●
●●●
●●
●●
●
●
●
●●●
●●●
●●
●
●
●●●●
●
●●●
●
●
●
●●●●
●●●●
●●●
●●
●
●
●
●
●
●●●
●
●
●●
●●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●●●
●●
●
●
●●
●
●●●●
●
●
●●
●
●
●●●●●●
●
●
●●
●●●●
●
●
●●
●
●
●
●
●
●●●
●●●●
●
●●●
●●●●●●
●
●●●●●
●
●●●●●●●
●●
●
●
●
●
●
●
●●●
●●●
●●●●●●●
●
●●
●
●●
●
●●●●●●
●
●
●●●
●●●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●●●
●●●●
●●●●
●
●●●●
●
●
●
●●●●●●●
●●●●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●●●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
●
●●
●
●
●
●●
●●●
●
●●●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●●●
GA MG LN TT T1
−14
−12
−10
−8
−6
−4
log(
prob
abili
ty)
Figure 3: Boxplots of the logarithm of various sampling probabilities of BLEV, OPT and
PL. Left subfigure: sampling probabilities of BLEV. Middle subfigure: sampling probabil-
ities of OPT. Right subfigure: sampling probabilities of PL. From left to right: GA, MG,
LN, TT, and T1 data, respectively.
19
To further compare those different sampling probabilities, Figure 3 gives the boxplots
of logarithm of sampling probabilities of BLEV, OPT, and PL. For all three subsampling
methods, we observe that GA tends to have the most homogeneous sampling probabilities
among those data sets, MG, LN and TT have less homogeneous sampling probabilities
compared to GA, and T1 tends to have the most heterogeneous sampling probabilities of
BLEV. Comparing these subfigures in Figure 3, we see that the probabilities of OPT and
PL are more concentrated than those of BLEV, and PL and OPT have similar performance.
6.2 Improvement from our Proposed Methods for the Weighted
Subsample Estimator
Given X matrices in Section 6.1, we generated y from the model y = Xβ + ε, where
β = (1T30, 0.1 × 1T20)T , ε ∼ N(0, σ2In) and σ = 10. Since five X matrices were generated,
we had five datasets.
We conduct empirical studies on OLS approximation. The empirical performance for
the coefficients estimation is not reported here but shown in Supplementary Material, since
it looks very similar to that for OLS approximation.
We calculate the full sample OLS estimator βols for each dataset. We then apply
five subsampling methods, including UNIF, BLEV, SLEV (with shrinkage parameter λ =
0.9), OPT and PL, with different subsample sizes r to the dataset. Specificly, we set
the subsample size r = 100, 200, 400, 800, 1600, 3200, 6400. For each subsample size r, we
repeatedly apply Algorithm 1 for B = 1000 times to get weighted subsample estimators
βb for b = 1, . . . , B. We calculate the empirical variance (V), squared bias (SB) and
mean-squared error (MSE) at each subsample size as follows:
V =1
B
B∑
b=1
∥∥∥X(βb − ¯β
)∥∥∥2
, SB =
∥∥∥∥∥1
B
B∑
b=1
X(βb − βols
)∥∥∥∥∥
2
,MSE =1
B
B∑
b=1
∥∥∥X(βb − βols)∥∥∥2
,
(28)
where ¯β = 1B
B∑b=1
βb.
We plot the logarithm of V, SB and MSE of Xβ for MG, LN and TT in Figure 4.
Since GA has such homogenous data points that various methods perform more or less
and T1 does not satisfy our conditions, we do not report the results of GA and T1 here.
20
0 1000 3000 5000
1012
1416
18
TT
subsample size
log(
varia
nce) ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
BLEVUNIFOPTPL
0 1000 3000 500010
1112
1314
1516
17
MG
subsample size
log(
varia
nce)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 1000 3000 5000
1011
1213
1415
1617
LN
subsample size
log(
varia
nce)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 1000 3000 5000
46
810
1214
subsample size
log(
squa
red
bias
)
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
0 1000 3000 5000
56
78
910
11
subsample size
log(
squa
red
bias
)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
0 1000 3000 50004
68
1012
subsample size
log(
squa
red
bias
) ●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
0 1000 3000 5000
1012
1416
18
subsample size
log(
mse
) ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 1000 3000 5000
1011
1213
1415
1617
subsample size
log(
mse
)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 1000 3000 5000
1011
1213
1415
1617
subsample size
log(
mse
)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Figure 4: Empirical variances, squared biases and mean-squared error of Xβ for predicting
Xβols. Upper panels are the logarithm of variances, middle panels the logarithm of squared
bias, and lower panels the logarithm of mean-squared error. From left to right: TT, MG
and LN data, respectively.
21
In addition, the performance of SLEV is not reported here, as it has similar performance
to BLEV. There are several things worth noting according to Figure 4. First, V, SB
and MSE values decrease as the subsample size r increases, which verifies Theorem 1.
Second, V obviously dominates SB, which is consistent with the fact that the squared bias
is proved to be negligible with respective to the variance. Third, OPT and PL methods
have better performance than other subsampling methods, while OPT and PL have similar
performance.
In addition, we report relative MSE values for various subsampling methods to UNIF
among all five datasets in Table 1. First, there is no obvious difference among those subsam-
pling methods for GA. It indicates that uniform subsampling has such nice performance
that it is not necessary to use non-uniform subsampling for GA. Second, comparing Ta-
ble 1 with Figure 3, as the data becomes more dispersed, i.e., more heterogeneous leverage
scores, all non-uniform subsampling methods get smaller MSE relative to UNIF. Specifi-
cally, non-uniform subsampling methods perform nearly the same as UNIF for GA, how-
ever, significantly outperform UNIF for MG, TT and LN, and get the best performance
compared to UNIF for T1 among five datasets.
6.3 Limitation of Unweighted Subsample Estimator
The squared bias of the the unweighted subsample estimator βu
is shown in Figure 5.
However, the empirical variance and mean-squared error of βu
are not reported here since
they look similar to the weighted subsample estimator β.
For LN, TT and MG, the squared bias does not decrease for various non-uniform
subsampling methods as the subsample size increases. Thus, we can not control the bias
of Xβu
relative to Xβols for non-uniform subsampling methods. For UNIF, βu
is identical
to β, so the squared bias decreases as the sample size increases. However, the bias of Xβu
relative to Xβ obviously decreases as r increases, which is consistent with the unbiasedness
of βu
to β.
Additionally, we conduct some numerical comparisons between Xβ and Xβu
in pre-
dicting Xβ. We report the results of BLEV and PL methods in Table 2 and ignore those
of other subsampling methods because of the similarity. The table suggests that (1) βu
22
Table 1: The relative MSE comparison of various methods to UNIF of Xβ for predicting
Xβols among GA, MG, LN, TT and T1 data.
r 100 200 400 800 1600 3200 6400
GA
BLEV 0.988 0.968 0.999 1.003 0.986 0.983 0.990
SLEV 0.981 0.971 0.998 0.971 0.992 0.969 0.992
OPT 1.001 1.017 1.010 0.996 1.016 1.021 0.994
PL 1.002 1.006 1.002 1.010 1.000 1.014 1.009
MG
BLEV 0.377 0.692 0.884 0.943 0.943 0.989 1.003
SLEV 0.336 0.567 0.691 0.725 0.722 0.747 0.774
OPT 0.346 0.557 0.642 0.659 0.668 0.704 0.708
PL 0.353 0.559 0.652 0.660 0.665 0.704 0.692
LN
BLEV 0.227 0.321 0.423 0.536 0.645 0.741 0.827
SLEV 0.281 0.263 0.314 0.396 0.471 0.520 0.582
OPT 0.374 0.358 0.336 0.340 0.363 0.360 0.384
PL 0.387 0.337 0.345 0.331 0.360 0.355 0.387
TT
BLEV 0.045 0.093 0.211 0.371 0.634 0.826 0.917
SLEV 0.040 0.050 0.098 0.169 0.273 0.341 0.390
OPT 0.081 0.062 0.080 0.116 0.172 0.201 0.223
PL 0.072 0.060 0.081 0.113 0.165 0.208 0.223
T1
BLEV 3.41e-4 2.05e-05 2.56e-05 6.96e-05 1.15e-04 1.84e-04 4.77e-04
SLEV 2.91e-4 3.90e-05 2.83e-06 5.85e-06 1.10e-05 1.82e-05 4.15e-05
OPT 5.25e-3 5.84e-04 1.98e-04 6.95e-05 2.93e-06 4.12e-06 8.67e-06
PL 1.03e-3 2.43e-05 5.31e-06 4.15e-06 5.01e-06 7.53e-06 1.64e-05
23
5 6 7 8
46
810
1214
TT
log(subsample size)
log(
squa
red
bias
)
● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●
● ● ● ● ●
BLEVUNIFOPTPL
5 6 7 8
46
810
12
MG
log(subsample size)
log(
squa
red
bias
)
●
●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●
●
5 6 7 8
46
810
12
LN
log(subsample size)
log(
squa
red
bias
)
●
●●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
5 6 7 8
46
810
1214
log(subsample size)
log(
squa
red
bias
)
●
●
● ●●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
5 6 7 8
46
810
12
log(subsample size)
log(
squa
red
bias
)
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5 6 7 8
46
810
12
log(subsample size)
log(
squa
red
bias
)
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
Figure 5: Empirical squared biases of Xβu
for predicting Xβols and Xβ, respectively. From
top to bottom: upper panels are results for predicting Xβols, and lower panels results for
predicting Xβ. From left to right: TT, MG and LN data, respectively.
24
Table 2: Ratios between MSEs of Xβu
and those of Xβ for predicting Xβ, among MG,
LN and TT, respectively.
r 100 200 400 800 1600 3200 6400
MG (all values less 1)
BLEV 0.538 0.524 0.538 0.532 0.522 0.542 0.586
PL 0.741 0.813 0.844 0.860 0.870 0.872 0.894
LN (all values less 1)
BLEV 0.222 0.133 0.092 0.104 0.141 0.229 0.406
PL 0.782 0.732 0.662 0.628 0.584 0.598 0.701
TT (all values less 1)
BLEV 0.091 0.071 0.065 0.078 0.097 0.148 0.221
PL 0.533 0.507 0.507 0.547 0.602 0.667 0.749
is more efficient than β for all cases, which is consistent with Corollary 3, and (2) the
advantage of βu
relative to β for BLEV is more obvious than that for PL.
Thus, we have the following empirical conclusion. Although βu
may not be a good
choice for approximating βols as the bias can not be controlled, it is better (but risky) to
choose βu
for estimating β, if one is sure that the dataset satisfies the linear model (1).
6.4 Computational Cost
We report the running time for the synthetic data TT in Table 3, where BLEV is based on
exact leverage scores by QR decomposition, ALEVCW and ALEVGP are based on the ap-
proximate leverage scores by CW projection in Clarkson and Woodruff (2013) and Gaussian
projection in Drineas et al. (2012), respectively, with r1 = 50 and r2 = 11 ≈ log(50, 000),
and T0 denotes the time cost to get sampling probabilities. All values in Table 3 were
computed by R software in PC with 3 GHz intel i7 processor, 8 GB memory and OS X
operation system.
From Table 3, firstly, since computing (approximate) leverage scores takes the most
25
time for BLEV, ALEVCW and ALEVGP , PL greatly reduces the running time, both sys-
tem time and user time. Secondly, although ALEVCW and ALEVGP greatly reduce the
computational cost compared to BLEV, PL has much less computational cost than them.
Thus, PL has notable computation advantage.
In addition, we also report the time cost for two larger design matrices X with 5M ×50
and 50K × 1, 000 respectively in Table 3. We see that, PL is more efficient in saving
computational cost over other methods when p is large.
7 Real Data Example
RNA-seq, an ultra-high throughput transcriptome sequencing technology, produces millions
of short reads. After mapping to the genome, it outputs read counts at each nucleotide of
every gene. To study the transcriptome of Drosophila melanogaster throughout develop-
ment, Graveley et al. (2011) conducted RNA-Seq experiments using RNA isolated from 30
whole-body samples representing distinct stages of development.
Through calculating the correlation of gene expression levels, the authors showed that
gene expressions at each developmental stage are highly correlated with those at its adjacent
stages. We are interested in investigating how much variation of RNA-seq read counts at
the 24th hour can be explained by those at early developmental stages, i.e., the 2nd hour,
the 4th hour, . . ., the 10th hour, by the linear regression. For this linear regression problem,
there are read counts on 4,227,667 nucleotides of 542 genes. Thus, the full sample with size
n = 4, 227, 667 is assumed to be from the following linear model:
yi = xTi β + εi, i = 1, · · · , n (29)
where xi = (xi,1, . . . , xi,5)T are the read counts for the i-th nucleotide at developmental
embryonic stages of the 2nd hour, the 4th hour, . . ., the 10th hour, respectively, and yi are
the read counts for the i-th nucleotide at the 24th hour.
We plot boxplots of logarithm of sampling probabilities of BLEV, OPT and PL in the
1st subfigure of Figure 6. The sampling probabilities of OPT and PL have larger means
and look more concentrated than those of BLEV, while the sampling probabilities of OPT
and PL are close to each other. The observation is similar to that in synthetic data.
26
Table 3: The running time, CPU seconds, for computing β with BLEV, ALEV and PL for
TT dataset, where ALEVGP and ALEVCW denote ALEV by Gaussian and CW projections
respectively, T0 the time of getting sampling probabilities, and “(Tols)” denotes the time of
getting the full sample OLS estimator.
50K × 50 design matrix X
System Time (Tols = 0.024) User Time (Tols = 0.311)
r T0 100 400 1600 6400 T0 100 400 1600 6400
BLEV 0.030 0.031 0.031 0.033 0.037 0.656 0.660 0.662 0.672 0.727
ALEVGP 0.020 0.021 0.021 0.027 0.061 0.690 0.694 0.696 0.706 0.755
ALEVCW 0.046 0.047 0.047 0.049 0.053 0.296 0.300 0.302 0.312 0.421
PL 0.013 0.014 0.015 0.015 0.020 0.238 0.242 0.245 0.252 0.299
5M × 50 design matrix X
System Time (Tols = 3.31) User Time (Tols = 22.97)
r T0 100 400 1600 6400 T0 100 400 1600 6400
BLEV 8.97 9.31 9.32 9.32 9.37 22.97 23.09 23.09 23.11 23.28
ALEVGP 6.53 6.87 6.88 6.88 6.93 69.28 69.40 69.40 69.42 69.47
ALEVCW 4.05 4.39 4.40 4.40 4.45 33.01 33.13 33.13 33.15 33.20
PL 2.30 2.64 2.65 2.65 2.70 19.12 19.24 19.24 19.26 19.31
50K × 1K design matrix X
System Time (Tols = 1.211) User Time (Tols = 86.08)
r T0 100 400 1600 6400 T0 100 400 1600 6400
BLEV 1.267 1.356 1.391 1.439 1.489 152.3 159.2 161.9 165.6 174.2
ALEVGP 0.665 0.754 0.789 0.837 0.887 89.25 96.15 98.85 102.6 111.2
ALEVCW 0.480 0.569 0.604 0.652 0.702 6.927 13.83 16.53 20.23 28.83
PL 0.213 0.363 0.451 0.387 0.452 1.353 8.145 10.86 14.45 23.22
27
●●
●
●●●●●
●
●●●●●
●
●●
●
●●●●●
●
●
●●●●●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●●●●●●●
●
●●●
●
●●●●●●
●
●●●●●●●●●●
●
●
●
●●●
●●
●
●
●
●●
●
●●
●
●●
●
●●●●●●●●
●
●●
●
●●●
●
●
●
●●●●
●
●●
●
●●●●
●
●
●
●●
●
●●●●●●●●
●
●
●
●
●●
●●●●
●
●●●●●●
●
●●●●●
●●
●
●
●
●
●
●
●●
●●●●
●●
●
●●●●
●
●
●●●●●●●●
●
●
●
●
●●●●
●
●●●
●
●
●
●●
●
●●
●
●
●
●●●●
●
●●
●●●
●
●●●●●●●●
●
●●●
●
●
●
●●●●●●
●●
●●
●●●●●●●●
●
●
●
●
●●
●
●●●●
●
●
●
●●●●
●
●
●●●
●
●●●
●●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●●●
●●●●●●●●●●
●
●
●●
●
●●●
●
●
●●●
●
●●●
●
●●●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●●●●
●●●●●●
●●
●●
●●
●●
●●●●●●●
●●
●
●●●●●●
●●●
●●
●●●
●●●●●●
●
●●
●●●●●●●●
●
●●●
●●
●
●●●
●●●●●●●●
●
●
●●●●●●●●●
●
●
●
●●●
●●●●●
●●
●
●●
●
●
●
●
●●●●●●●●
●
●
●●
●
●
●●●●
●
●●●
●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●●●
●●●●
●
●●
●
●
●●●
●
●●
●●
●●●●●
●●●●
●
●
●
●●●●●●
●
●●
●●●●●●●●●●●
●
●
●
●
●
●●●
●●●●
●●
●
●●●●●●
●
●
●
●●●●●●
●●●●●
●
●●
●●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●●
●
●
●●●
●
●
●
●
●●●
●●
●
●
●●
●
●
●●
●●
●
●●●●●
●
●●●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●●
●
●●●
●●●
●●
●●
●●
●●●
●
●●
●
●●●●●●●●●●●●●
●●●
●
●●
●●●●●●
●●●●●●
●
●
●
●
●●
●●●
●●●●●●●
●
●
●
●●●●
●●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●●
●
●●●
●●
●
●
●
●●●●
●
●●●●●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●●●
●
●●●●
●●
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●●
●●●●●
●
●
●
●
●
●●●●●●●●
●
●●
●
●
●
●
●●
●●●●
●
●●
●●●●
●
●●●
●
●●●●●●●●●●
●
●●
●●
●●●●
●●●●●●●●
●●
●
●
●
●●●●●●●
●
●
●●●●●
●●
●
●
●
●●
●
●●
●●●
●
●
●
●
●
●
●●●●●●●
●●●●●●
●
●●●●●●
●●●
●
●●●●
●
●
●
●●
●
●●●●●
●
●●
●
●●●●●●
●
●●●●●
●
●●●●●●
●●●
●
●●
●
●●●●●●●
●
●
●
●●
●
●●●
●
●●●●
●
●
●
●
●●●
●
●●●●●●●
●
●●●●
●
●
●●
●●●●●
●●●
●●●●●
●
●●
●
●
●●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●●●●●●●●●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●●●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●●●●●●●●●●●●
●
●
●●●
●
●●●●●
●
●
●●
●
●
●
●●●
●●●
●
●
●
●●●●
●●●●●
●
●●
●●●●●●●●
●
●
●
●
●●●●
●
●●●
●
●●●
●
●●●●
●●
●
●
●●
●
●●
●
●
●●●●●
●
●●
●
●●●●
●●
●
●●●●●
●
●●
●
●
●
●
●●
●
●●●●
●●
●
●
●●●●●●●●●●●●
●
●●
●
●
●
●●●
●
●
●
●
●●●●●●
●
●●
●●
●●●●●
●●
●
●●
●
●
●●●●●●●
●
●
●
●●●●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●●
●
●
●
●
●
●●●
●
●●
●
●●●●
●
●●●
●
●
●
●
●●●
●●●●
●●
●
●
●
●
●●●
●●
●
●
●
●●●●●
●
●●●●●●●
●
●●
●
●
●
●
●●●●●●●
●
●
●
●
●●
●●●●●●●●●●●
●
●●●●●●●
●
●●●●
●●
●●
●
●
●●●●
●●●●●●●●●●
●●
●
●
●●
●
●●●●
●●●●●
●
●●
●●●●●●
●
●
●●
●
●●●●●●
●
●
●
●
●●●●●●●●●●●
●●●
●●●●●●●
●●
●●
●●●
●●
●●●
●
●
●
●
●
●●●●
●
●●●
●
●●●
●●
●
●●●●
●
●●●
●
●
●●●●
●
●●●●
●
●
●
●●
●
●●●●●●●●●
●
●●
●●●
●
●
●●●
●●●
●
●●●
●
●●●●●
●
●●
●
●●
●
●
●
●●●●
●●●●●
●
●●●●
●
●●●●
●
●
●
●
●
●●
●●●
●
●
●●●●●●●
●●●●●●
●
●
●
●●●
●
●●
●
●●●●●
●●●
●●
●
●
●
●●●
●●●●
●
●
●
●●
●●●●●●●●
●
●●
●
●
●
●
●●●
●●●●●
●
●
●●
●
●●●
●●
●
●●
●
●●●●●●●
●●●●
●
●
●
●●●●●●●
●
●
●
●●
●
●●
●●●●●
●
●●●●●●●●●●●●
●●●
●
●●●●●●
●
●
●●●●●●●
●●●●●●●●●
●
●●
●
●●●●●●●●●●●●●
●
●●
●
●
●
●●●
●
●●●●●
●●
●●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●●
●
●
●
●●●●
●
●●
●
●●●●
●
●●
●
●
●●
●
●
●
●
●●●●●●●●●
●
●●
●
●●●●●●●●●●●●
●
●
●
●
●●●●●
●
●●
●●●●
●●●●●●●
●
●●●●
●●
●
●
●
●
●●
●
●
●●●●●●●
●
●●●●●●
●
●
●●●●
●●
●
●●●●●●●●
●●●●
●
●
●●
●
●●●●●
●●●●●●
●
●●
●●●●●●●●●●●●●●
●
●●●●
●●●●●
●
●
●
●
●●
●●
●●●●●
●
●●●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●●●
●●
●
●●●
●●
●
●
●●
●
●●●●
●●●●●
●
●
●●
●
●
●●
●
●
●
●●●●●
●
●●●
●●●●●●
●●
●●
●●●●●
●
●●
●●●
●
●
●
●
●●●●
●●
●
●●
●●●●●●●●●
●●
●●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●●●●●●
●●
●●●
●●●
●●●●●●●●
●
●●
●
●
●●●●●
●
●
●
●●●●
●
●●
●
●●●●
●●
●
●●●●●●
●
●
●
●●●
●
●●
●
●
●
●
●●●●●
●
●
●●●●●
●●●
●●●●
●●
●●●●
●
●
●●●●●
●
●
●
●●
●
●
●●
●
●
●●●●
●●●●●●●
●
●●●●
●
●●
●
●●●●●●
●
●●●●●●●●●●
●●●●●●
●
●
●●●
●●●●●
●
●
●
●
●●●
●
●
●●●
●●●●
●
●●●●
●
●●●●●●
●
●
●●
●●
●●●●●
●
●
●
●●●
●●
●●
●
●
●
●●
●
●●●●●●
●●
●●
●
●
●
●●●
●
●
●●
●●
●
●
●
●●●
●●●
●
●
●●●
●
●●●
●
●
●
●●
●●●●
●
●●
●
●●●
●●●●●●●●●●
●
●●●●
●
●
●●
●●
●
●●●●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●●●●●●●●●●●●●
●●
●
●●
●●
●
●●●●
●
●●
●
●●●●●●●
●●●
●●●●●●●●●●●
●
●●●
●
●●●●●●●●●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●●●●●
●
●●●●●●●●●●●
●
●●
●●●●●
●
●
●
●
●
●
●
●●
●●
●
●●●●●
●●●
●
●
●●
●
●
●●
●
●●●●
●
●
●
●●
●
●
●
●
●●●●
●
●●●●●
●
●
●
●
●●●●●●
●
●●
●
●●
●
●●●
●
●●
●●●
●
●
●
●●●●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●●
●
●●●
●
●●
●
●●
●
●●
●●
●●
●
●
●
●●●
●●●●
●
●●
●
●●
●●●●
●
●●●●
●
●
●
●●
●
●●●
●●
●●●
●
●
●
●
●●●
●
●●●●●●●●●
●●●●
●
●
●
●●●●●●●
●
●●●●
●●●●●●●●●●
●
●
●
●●
●●
●●
●
●●
●
●
●
●●●●●
●
●●●
●●
●●●●●●
●
●●
●●
●●●●●
●
●●●●●
●
●●●●
●
●●●●●●●
●●
●
●●
●
●
●●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●●●●●
●
●●●●●●
●
●●●●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●●●●
●
●
●
●●
●
●
●●
●●
●●●●●
●
●●
●●●●●
●
●●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●●●●●●●●●●
●
●●●●●●●●●
●●●●●●●●●
●●●
●●
●●●●
●
●
●
●●
●
●
●●
●
●
●
●
●●●●
●●●●●●●
●
●●
●
●●●●
●●●●●
●
●●●●●●●●
●
●●●
●●●
●
●●
●
●●●●●●●●●
●●
●
●●●
●
●
●●
●●●
●●
●●●●●●●
●●●
●
●●
●
●●●●
●
●●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
●●●
●
●
●●
●●●
●●●
●●●●●●
●●●
●
●
●●●●●●
●
●
●●
●●
●
●●
●
●
●●
●
●●●●●●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●●●●
●
●
●●●
●
●●●●
●●
●●
●●
●
●
●
●
●●●●●●●
●●
●
●
●
●
●●●
●●
●●
●
●●
●
●
●●●●
●●
●
●
●●●
●
●●●
●●
●●●
●●
●●
●●
●
●
●
●●●●●
●
●
●
●
●●●●●
●
●●
●
●●●●●●●●●
●
●●●
●
●
●●●
●●●●●●●
●
●
●
●●●●●●●
●
●
●●
●
●
●●●●
●●
●
●●
●
●●●
●
●
●●●●●●●
●●●●
●
●
●●●●
●●●●
●
●●●
●●
●
●
●●
●
●●
●
●
●●
●
●●●●
●
●●●●●●●●●
●●●●●●
●●●●●●
●
●
●
●
●
●●●
●
●●
●●●●
●
●
●
●●
●
●●●●
●
●
●
●
●●●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●●●●●●
●●●●
●
●●●●●●●
●●●●●
●
●●●●●●●
●
●●
●●
●
●●
●●●●●
●
●●
●
●●
●
●●●●●
●
●●●●●●●●●
●
●
●
●●●●●
●●●
●●●
●
●
●
●●
●
●●●●●●●
●
●●
●
●●
●
●●●●●●●●●●●●
●
●●●●●
●
●
●
●●●●●
●
●
●
●●●●●●●●●
●
●
●●●
●
●
●
●●●●●●
●
●●
●●
●
●●●●
●●
●
●
●●
●
●
●
●●
●
●
●●●
●●●●●●
●●●●
●
●
●
●
●●●●●●
●
●
●
●●●●
●
●●●●
●
●●●●
●●●●●●●●●
●
●●●
●
●●
●
●●
●●●●
●
●
●●●
●●●
●
●●
●●
●●●●
●
●●
●●
●●●●●
●
●●
●●
●
●
●
●●●
●
●●●●●●●
●
●●●
●
●
●
●●●●●●●●●
●
●
●
●●●●●
●
●●
●●
●●
●
●
●
●
●
●●●●●●●
●
●●●●●●●●
●●
●
●●●●●●●●
●●
●●●●
●●●●
●●
●●
●●●●●●
●●
●
●
●
●●
●●
●●
●
●
●●
●
●●
●
●●●●●
●
●
●
●●
●
●
●●
●●●●●●●
●
●●●●
●
●●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●●●
●
●
●●
●●●
●
●●●●●
●
●●●
●
●●
●
●●
●●●●●
●
●●●●●
●
●
●
●
●●●●●●●●
●
●
●●●
●
●●●
●
●●
●
●
●●●●●●
●
●
●●●
●
●●
●●●●●●
●
●
●●
●
●●●
●
●
●●●●●●●●●●
●
●●●●●●
●
●●
●
●
●
●●
●●●●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●●
●
●
●●●●●
●●
●
●●
●
●●
●●●
●●
●
●●●●●●●●●●●●●●●
●
●●●
●●
●
●●●●
●
●●●
●
●●●●●
●●●●●●
●
●
●●●●●●●
●
●
●●
●
●
●
●
●
●
●
●●●
●●●●●
●
●●●
●
●●
●●●●●●
●
●
●
●●●●
●
●●●●
●
●
●
●
●●●●●●
●
●●●●●
●●
●●
●●●●
●●●●●●●●●●
●
●●
●
●●●●●
●
●●●
●●●●●
●●
●●
●
●
●
●●●●●●
●
●●●●
●●●●●●●●
●
●
●
●
●
●●●●
●●
●●
●
●●
●
●●●
●
●●●
●●●
●
●●
●●
●●●●●
●●
●
●●
●
●
●
●●●●●●
●●●
●●
●
●●
●
●
●●●
●
●
●
●●●●●●●●●
●●
●
●
●●●
●
●
●
●
●●●●
●●
●●●
●
●●●●●●
●●
●
●●●
●
●●●
●
●
●
●
●
●
●●●
●●●●
●
●
●
●●●
●
●
●●●●
●
●●●●●●
●●●●●●
●
●
●●
●
●●●●
●●●●●●●●●
●●
●
●●●●
●
●●
●
●
●
●
●●●●●●
●
●●●
●
●
●
●●●●
●
●
●●●●
●
●●●
●●
●
●
●●●
●
●●●
●
●●●
●
●●
●
●
●●●●
●
●●●●
●
●●
●
●
●
●
●●●●●●
●
●●
●
●
●●
●
●
●
●●
●
●●●
●
●●●●●●●●●●
●
●●●●●●●●●
●●
●
●●●●●●●
●
●
●
●
●
●●●●●●●
●
●
●●●
●
●●●●●●●
●
●
●●●●●●●
●
●
●
●
●
●●●
●
●
●●
●●●●●●●●●●
●
●
●●
●●●
●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●●
●
●●●●●●●●●
●
●●
●
●
●
●●●●●
●
●
●●
●●●●●●
●
●●
●●●
●
●●●●●●
●
●
●●
●
●●●●●●
●
●
●
●●●
●
●●●●
●
●
●
●●●
●
●
●●●
●
●●
●
●
●
●●
●
●
●●
●●●●●●
●
●●
●●
●
●●
●●●●●●●●
●●
●
●
●
●
●
●●
●●●●●●●●●●●●
●
●
●●
●●
●●●●●●
●
●●
●●●●●●
●●●●
●●
●●
●●
●
●
●●
●
●
●●
●
●●●●●●●●
●
●●●
●●●
●
●
●
●
●●
●
●
●●●●●●●●
●
●●●
●
●●
●
●●●
●●●●●●●●
●
●
●●●
●
●
●●
●●●●
●●
●
●
●
●
●●●●●
●
●●●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●●●●●
●●
●●
●
●
●
●●●
●
●●●
●
●●
●
●●●●●
●
●
●
●●●●●
●
●
●●●●
●
●
●●
●
●
●●●●
●●
●●●●●
●
●●●●●●
●
●
●
●
●
●
●●●●
●●
●●
●
●
●●●●●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●●●
●
●●
●
●●
●●●
●●●●●●●●
●
●●
●●
●
●
●●●
●●
●
●
●
●●●
●
●
●
●●●●
●
●
●●●●●●●●●●●●●
●
●●
●
●●●●●
●
●●●
●
●●
●●
●
●●
●●●
●
●
●
●
●●●●●●●●●●●
●●
●
●
●
●●●●●●●
●
●●
●
●●
●●●●●●
●●
●●●●
●
●●●●●
●
●●
●
●
●●●
●●●
●
●●
●●●
●
●●●●
●
●●
●●●●●
●
●●
●●
●●
●
●●●●
●●
●
●
●●
●
●
●
●
●●●●●●●
●
●●
●●●
●
●●●●●●●●
●
●●●●●●
●
●●●
●●●●●●
●
●●●●●●●●●●●●●●
●
●
●●●●●
●
●●
●
●
●●●●●●
●
●●
●●●●
●
●
●●●●●●●●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●●
●
●
●●●
●
●●●●●●
●●
●●
●
●
●
●
●
●
●●●●
●
●
●●●●
●●●●●●●
●
●
●●
●
●
●
●●
●
●●
●
●●●
●●
●●
●
●●●●●●●●●●●
●
●
●
●●●
●●●
●●●●●●●●●
●
●
●
●
●●●
●
●
●●
●●
●●●●
●●●●
●
●
●
●
●●●●
●
●
●●●●●
●
●●●●●●●●●●●●●●●●●
●●●
●
●
●
●●●
●
●●●
●
●
●
●
●●●
●
●●●
●
●●
●
●●●●
●●
●
●
●●●
●
●●●●●●
●
●●
●
●
●
●●
●●●●
●
●●●
●●
●
●●●
●
●●●●●●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●●●●
●●
●
●
●
●
●●●●●
●●●●●●●●●●●
●
●
●
●●●●●●●
●
●
●
●
●●
●●
●
●●●●
●
●●
●
●
●●
●
●●●
●
●●●●
●●●
●
●●
●
●●●●●●
●
●●●
●
●
●
●●●●●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●●●●●●●●
●●●●
●●●●●●●●
●●●●●●●
●
●
●
●
●●●
●
●●
●
●
●
●●●
●●
●●●●●●
●●
●●
●
●●●●●
●
●●●
●
●
●
●●
●●●
●
●
●
●●
●●
●
●●●●●●●
●●
●●
●●
●●●
●
●
●●●●●
●
●
●●
●
●●●
●●●
●●●
●
●●
●
●
●●
●●
●●
●●
●●
●
●
●●
●
●
●
●
●
●●●●●
●
●●
●
●●●●●●●●●●●●●●●
●
●
●
●
●●
●
●
●
●
●●●●●●
●
●
●●
●
●
●●●
●
●
●
●
●●●●●
●●●●●●●
●
●
●
●●●●●●●●●
●
●
●●●●●
●●●
●
●
●●●●
●●
●●●
●●●●●
●
●
●
●●
●
●
●
●●●●
●●●●
●
●
●●●●●●●●●●
●
●
●●●●●●●●
●
●●●●
●●●●●●
●
●●●
●
●
●●●●
●
●
●
●
●●●●
●
●
●●●●●
●
●
●
●
●●●●
●
●●
●
●●●●●
●●●●●●
●●
●
●●●●●●●●
●●●
●●●●●
●●●●●●
●
●●●
●
●●●●●●●
●●
●
●●●●●●●●●●●
●●●●●●●
●●
●●●●
●
●
●●●●
●●●
●
●●
●●●●●●
●
●
●●
●
●
●
●
●●●
●
●
●●●●●●●●
●●
●●
●
●●●
●
●●
●
●
●●●●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●●●●
●
●●
●
●
●
●●●
●
●●
●
●
●●
●●
●
●●●●●
●
●●●●●
●
●●
●
●●●●●
●
●
●●●●●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●●●●●●●
●
●●●
●
●●●●●●
●
●●●●●●●●●●
●
●
●
●●●
●●
●
●
●
●●
●
●●
●
●●
●
●●●●●●●●
●
●●
●
●●●
●
●
●
●●●●
●
●●
●
●●●●
●
●
●
●●
●
●●●●●●●●
●
●
●
●
●●
●●●●
●
●●●●●●
●
●●●●●
●●
●
●
●
●
●
●
●●
●●●●
●●
●
●●●●
●
●
●●●●●●●●
●
●
●
●
●●●●
●
●●●
●
●
●
●●
●
●●
●
●
●
●●●●
●
●●
●●●
●
●●●●●●●●
●
●●●
●
●
●
●●●●●●
●●
●●
●●●●●●●●
●
●
●
●
●●
●
●●●●
●
●
●
●●●●
●
●
●●●
●
●●●
●●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●●●
●●●●●●●●●●
●
●
●●
●
●●●
●
●
●●●
●
●●●
●
●●●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●●●●
●●●●●●
●●
●●
●●
●●
●●●●●●●
●●
●
●●●●●●
●●●
●●
●●●
●●●●●●
●
●●
●●●●●●●●
●
●●●
●●
●
●●●
●●●●●●●●
●
●
●●●●●●●●●
●
●
●
●●●
●●●●●
●●
●
●●
●
●
●
●
●●●●●●●●
●
●
●●
●
●
●●●●
●
●●●
●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●●●
●●●●
●
●●
●
●
●●●
●
●●
●●
●●●●●
●●●●
●
●
●
●●●●●●
●
●●
●●●●●●●●●●●
●
●
●
●
●
●●●
●●●●
●●
●
●●●●●●
●
●
●
●●●●●●
●●●●●
●
●●
●●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●●
●
●
●●●
●
●
●
●
●●●
●●
●
●
●●
●
●
●●
●●
●
●●●●●
●
●●●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●●
●
●●●
●●●
●●
●●
●●
●●●
●
●●
●
●●●●●●●●●●●●●
●●●
●
●●
●●●●●●
●●●●●●
●
●
●
●
●●
●●●
●●●●●●●
●
●
●
●●●●
●●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●●
●
●●●
●●
●
●
●
●●●●
●
●●●●●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●●●
●
●●●●
●●
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●●●
●●●●●
●
●
●
●
●
●●●●●●●●
●
●●
●
●
●
●
●●
●●●●
●
●●
●●●●
●
●●●
●
●●●●●●●●●●
●
●●
●●
●●●●
●●●●●●●●
●●
●
●
●
●●●●●●●
●
●
●●●●●
●●
●
●
●
●●
●
●●
●●●
●
●
●
●
●
●
●●●●●●●
●●●●●●
●
●●●●●●
●●●
●
●●●●
●
●
●
●●
●
●●●●●
●
●●
●
●●●●●●
●
●●●●●
●
●●●●●●
●●●
●
●●
●
●●●●●●●
●
●
●
●●
●
●●●
●
●●●●
●
●
●
●
●●●
●
●●●●●●●
●
●●●●
●
●
●●
●●●●●
●●●
●●●●●
●
●●
●
●
●●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●●●●●●●●●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●●●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●●●●●●●●●●●●
●
●
●●●
●
●●●●●
●
●
●●
●
●
●
●●●
●●●
●
●
●
●●●●
●●●●●
●
●●
●●●●●●●●
●
●
●
●
●●●●
●
●●●
●
●●●
●
●●●●
●●
●
●
●●
●
●●
●
●
●●●●●
●
●●
●
●●●●
●●
●
●●●●●
●
●●
●
●
●
●
●●
●
●●●●
●●
●
●
●●●●●●●●●●●●
●
●●
●
●
●
●●●
●
●
●
●
●●●●●●
●
●●
●●
●●●●●
●●
●
●●
●
●
●●●●●●●
●
●
●
●●●●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●●
●
●
●
●
●
●●●
●
●●
●
●●●●
●
●●●
●
●
●
●
●●●
●●●●
●●
●
●
●
●
●●●
●●
●
●
●
●●●●●
●
●●●●●●●
●
●●
●
●
●
●
●●●●●●●
●
●
●
●
●●
●●●●●●●●●●●
●
●●●●●●●
●
●●●●
●●
●●
●
●
●●●●
●●●●●●●●●●
●●
●
●
●●
●
●●●●
●●●●●
●
●●
●●●●●●
●
●
●●
●
●●●●●●
●
●
●
●
●●●●●●●●●●●
●●●
●●●●●●●
●●
●●
●●●
●●
●●●
●
●
●
●
●
●●●●
●
●●●
●
●●●
●●
●
●●●●
●
●●●
●
●
●●●●
●
●●●●
●
●
●
●●
●
●●●●●●●●●
●
●●
●●●
●
●
●●●
●●●
●
●●●
●
●●●●●
●
●●
●
●●
●
●
●
●●●●
●●●●●
●
●●●●
●
●●●●
●
●
●
●
●
●●
●●●
●
●
●●●●●●●
●●●●●●
●
●
●
●●●
●
●●
●
●●●●●
●●●
●●
●
●
●
●●●
●●●●
●
●
●
●●
●●●●●●●●
●
●●
●
●
●
●
●●●
●●●●●
●
●
●●
●
●●●
●●
●
●●
●
●●●●●●●
●●●●
●
●
●
●●●●●●●
●
●
●
●●
●
●●
●●●●●
●
●●●●●●●●●●●●
●●●
●
●●●●●●
●
●
●●●●●●●
●●●●●●●●●
●
●●
●
●●●●●●●●●●●●●
●
●●
●
●
●
●●●
●
●●●●●
●●
●●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●●
●
●
●
●●●●
●
●●
●
●●●●
●
●●
●
●
●●
●
●
●
●
●●●●●●●●●
●
●●
●
●●●●●●●●●●●●
●
●
●
●
●●●●●
●
●●
●●●●
●●●●●●●
●
●●●●
●●
●
●
●
●
●●
●
●
●●●●●●●
●
●●●●●●
●
●
●●●●
●●
●
●●●●●●●●
●●●●
●
●
●●
●
●●●●●
●●●●●●
●
●●
●●●●●●●●●●●●●●
●
●●●●
●●●●●
●
●
●
●
●●
●●
●●●●●
●
●●●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●●●
●●
●
●●●
●●
●
●
●●
●
●●●●
●●●●●
●
●
●●
●
●
●●
●
●
●
●●●●●
●
●●●
●●●●●●
●●
●●
●●●●●
●
●●
●●●
●
●
●
●
●●●●
●●
●
●●
●●●●●●●●●
●●
●●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●●●●●●
●●
●●●
●●●
●●●●●●●●
●
●●
●
●
●●●●●
●
●
●
●●●●
●
●●
●
●●●●
●●
●
●●●●●●
●
●
●
●●●
●
●●
●
●
●
●
●●●●●
●
●
●●●●●
●●●
●●●●
●●
●●●●
●
●
●●●●●
●
●
●
●●
●
●
●●
●
●
●●●●
●●●●●●●
●
●●●●
●
●●
●
●●●●●●
●
●●●●●●●●●●
●●●●●●
●
●
●●●
●●●●●
●
●
●
●
●●●
●
●
●●●
●●●●
●
●●●●
●
●●●●●●
●
●
●●
●●
●●●●●
●
●
●
●●●
●●
●●
●
●
●
●●
●
●●●●●●
●●
●●
●
●
●
●●●
●
●
●●
●●
●
●
●
●●●
●●●
●
●
●●●
●
●●●
●
●
●
●●
●●●●
●
●●
●
●●●
●●●●●●●●●●
●
●●●●
●
●
●●
●●
●
●●●●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●●●●●●●●●●●●●
●●
●
●●
●●
●
●●●●
●
●●
●
●●●●●●●
●●●
●●●●●●●●●●●
●
●●●
●
●●●●●●●●●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●●●●●
●
●●●●●●●●●●●
●
●●
●●●●●
●
●
●
●
●
●
●
●●
●●
●
●●●●●
●●●
●
●
●●
●
●
●●
●
●●●●
●
●
●
●●
●
●
●
●
●●●●
●
●●●●●
●
●
●
●
●●●●●●
●
●●
●
●●
●
●●●
●
●●
●●●
●
●
●
●●●●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●●
●
●●●
●
●●
●
●●
●
●●
●●
●●
●
●
●
●●●
●●●●
●
●●
●
●●
●●●●
●
●●●●
●
●
●
●●
●
●●●
●●
●●●
●
●
●
●
●●●
●
●●●●●●●●●
●●●●
●
●
●
●●●●●●●
●
●●●●
●●●●●●●●●●
●
●
●
●●
●●
●●
●
●●
●
●
●
●●●●●
●
●●●
●●
●●●●●●
●
●●
●●
●●●●●
●
●●●●●
●
●●●●
●
●●●●●●●
●●
●
●●
●
●
●●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●●●●●
●
●●●●●●
●
●●●●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●●●●
●
●
●
●●
●
●
●●
●●
●●●●●
●
●●
●●●●●
●
●●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●●●●●●●●●●
●
●●●●●●●●●
●●●●●●●●●
●●●
●●
●●●●
●
●
●
●●
●
●
●●
●
●
●
●
●●●●
●●●●●●●
●
●●
●
●●●●
●●●●●
●
●●●●●●●●
●
●●●
●●●
●
●●
●
●●●●●●●●●
●●
●
●●●
●
●
●●
●●●
●●
●●●●●●●
●●●
●
●●
●
●●●●
●
●●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
●●●
●
●
●●
●●●
●●●
●●●●●●
●●●
●
●
●●●●●●
●
●
●●
●●
●
●●
●
●
●●
●
●●●●●●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●●●●
●
●
●●●
●
●●●●
●●
●●
●●
●
●
●
●
●●●●●●●
●●
●
●
●
●
●●●
●●
●●
●
●●
●
●
●●●●
●●
●
●
●●●
●
●●●
●●
●●●
●●
●●
●●
●
●
●
●●●●●
●
●
●
●
●●●●●
●
●●
●
●●●●●●●●●
●
●●●
●
●
●●●
●●●●●●●
●
●
●
●●●●●●●
●
●
●●
●
●
●●●●
●●
●
●●
●
●●●
●
●
●●●●●●●
●●●●
●
●
●●●●
●●●●
●
●●●
●●
●
●
●●
●
●●
●
●
●●
●
●●●●
●
●●●●●●●●●
●●●●●●
●●●●●●
●
●
●
●
●
●●●
●
●●
●●●●
●
●
●
●●
●
●●●●
●
●
●
●
●●●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●●●●●●
●●●●
●
●●●●●●●
●●●●●
●
●●●●●●●
●
●●
●●
●
●●
●●●●●
●
●●
●
●●
●
●●●●●
●
●●●●●●●●●
●
●
●
●●●●●
●●●
●●●
●
●
●
●●
●
●●●●●●●
●
●●
●
●●
●
●●●●●●●●●●●●
●
●●●●●
●
●
●
●●●●●
●
●
●
●●●●●●●●●
●
●
●●●
●
●
●
●●●●●●
●
●●
●●
●
●●●●
●●
●
●
●●
●
●
●
●●
●
●
●●●
●●●●●●
●●●●
●
●
●
●
●●●●●●
●
●
●
●●●●
●
●●●●
●
●●●●
●●●●●●●●●
●
●●●
●
●●
●
●●
●●●●
●
●
●●●
●●●
●
●●
●●
●●●●
●
●●
●●
●●●●●
●
●●
●●
●
●
●
●●●
●
●●●●●●●
●
●●●
●
●
●
●●●●●●●●●
●
●
●
●●●●●
●
●●
●●
●●
●
●
●
●
●
●●●●●●●
●
●●●●●●●●
●●
●
●●●●●●●●
●●
●●●●
●●●●
●●
●●
●●●●●●
●●
●
●
●
●●
●●
●●
●
●
●●
●
●●
●
●●●●●
●
●
●
●●
●
●
●●
●●●●●●●
●
●●●●
●
●●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●●●
●
●
●●
●●●
●
●●●●●
●
●●●
●
●●
●
●●
●●●●●
●
●●●●●
●
●
●
●
●●●●●●●●
●
●
●●●
●
●●●
●
●●
●
●
●●●●●●
●
●
●●●
●
●●
●●●●●●
●
●
●●
●
●●●
●
●
●●●●●●●●●●
●
●●●●●●
●
●●
●
●
●
●●
●●●●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●●
●
●
●●●●●
●●
●
●●
●
●●
●●●
●●
●
●●●●●●●●●●●●●●●
●
●●●
●●
●
●●●●
●
●●●
●
●●●●●
●●●●●●
●
●
●●●●●●●
●
●
●●
●
●
●
●
●
●
●
●●●
●●●●●
●
●●●
●
●●
●●●●●●
●
●
●
●●●●
●
●●●●
●
●
●
●
●●●●●●
●
●●●●●
●●
●●
●●●●
●●●●●●●●●●
●
●●
●
●●●●●
●
●●●
●●●●●
●●
●●
●
●
●
●●●●●●
●
●●●●
●●●●●●●●
●
●
●
●
●
●●●●
●●
●●
●
●●
●
●●●
●
●●●
●●●
●
●●
●●
●●●●●
●●
●
●●
●
●
●
●●●●●●
●●●
●●
●
●●
●
●
●●●
●
●
●
●●●●●●●●●
●●
●
●
●●●
●
●
●
●
●●●●
●●
●●●
●
●●●●●●
●●
●
●●●
●
●●●
●
●
●
●
●
●
●●●
●●●●
●
●
●
●●●
●
●
●●●●
●
●●●●●●
●●●●●●
●
●
●●
●
●●●●
●●●●●●●●●
●●
●
●●●●
●
●●
●
●
●
●
●●●●●●
●
●●●
●
●
●
●●●●
●
●
●●●●
●
●●●
●●
●
●
●●●
●
●●●
●
●●●
●
●●
●
●
●●●●
●
●●●●
●
●●
●
●
●
●
●●●●●●
●
●●
●
●
●●
●
●
●
●●
●
●●●
●
●●●●●●●●●●
●
●●●●●●●●●
●●
●
●●●●●●●
●
●
●
●
●
●●●●●●●
●
●
●●●
●
●●●●●●●
●
●
●●●●●●●
●
●
●
●
●
●●●
●
●
●●
●●●●●●●●●●
●
●
●●
●●●
●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●●
●
●●●●●●●●●
●
●●
●
●
●
●●●●●
●
●
●●
●●●●●●
●
●●
●●●
●
●●●●●●
●
●
●●
●
●●●●●●
●
●
●
●●●
●
●●●●
●
●
●
●●●
●
●
●●●
●
●●
●
●
●
●●
●
●
●●
●●●●●●
●
●●
●●
●
●●
●●●●●●●●
●●
●
●
●
●
●
●●
●●●●●●●●●●●●
●
●
●●
●●
●●●●●●
●
●●
●●●●●●
●●●●
●●
●●
●●
●
●
●●
●
●
●●
●
●●●●●●●●
●
●●●
●●●
●
●
●
●
●●
●
●
●●●●●●●●
●
●●●
●
●●
●
●●●
●●●●●●●●
●
●
●●●
●
●
●●
●●●●
●●
●
●
●
●
●●●●●
●
●●●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●●●●●
●●
●●
●
●
●
●●●
●
●●●
●
●●
●
●●●●●
●
●
●
●●●●●
●
●
●●●●
●
●
●●
●
●
●●●●
●●
●●●●●
●
●●●●●●
●
●
●
●
●
●
●●●●
●●
●●
●
●
●●●●●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●●●
●
●●
●
●●
●●●
●●●●●●●●
●
●●
●●
●
●
●●●
●●
●
●
●
●●●
●
●
●
●●●●
●
●
●●●●●●●●●●●●●
●
●●
●
●●●●●
●
●●●
●
●●
●●
●
●●
●●●
●
●
●
●
●●●●●●●●●●●
●●
●
●
●
●●●●●●●
●
●●
●
●●
●●●●●●
●●
●●●●
●
●●●●●
●
●●
●
●
●●●
●●●
●
●●
●●●
●
●●●●
●
●●
●●●●●
●
●●
●●
●●
●
●●●●
●●
●
●
●●
●
●
●
●
●●●●●●●
●
●●
●●●
●
●●●●●●●●
●
●●●●●●
●
●●●
●●●●●●
●
●●●●●●●●●●●●●●
●
●
●●●●●
●
●●
●
●
●●●●●●
●
●●
●●●●
●
●
●●●●●●●●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●●
●
●
●●●
●
●●●●●●
●●
●●
●
●
●
●
●
●
●●●●
●
●
●●●●
●●●●●●●
●
●
●●
●
●
●
●●
●
●●
●
●●●
●●
●●
●
●●●●●●●●●●●
●
●
●
●●●
●●●
●●●●●●●●●
●
●
●
●
●●●
●
●
●●
●●
●●●●
●●●●
●
●
●
●
●●●●
●
●
●●●●●
●
●●●●●●●●●●●●●●●●●
●●●
●
●
●
●●●
●
●●●
●
●
●
●
●●●
●
●●●
●
●●
●
●●●●
●●
●
●
●●●
●
●●●●●●
●
●●
●
●
●
●●
●●●●
●
●●●
●●
●
●●●
●
●●●●●●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●●●●
●●
●
●
●
●
●●●●●
●●●●●●●●●●●
●
●
●
●●●●●●●
●
●
●
●
●●
●●
●
●●●●
●
●●
●
●
●●
●
●●●
●
●●●●
●●●
●
●●
●
●●●●●●
●
●●●
●
●
●
●●●●●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●●●●●●●●
●●●●
●●●●●●●●
●●●●●●●
●
●
●
●
●●●
●
●●
●
●
●
●●●
●●
●●●●●●
●●
●●
●
●●●●●
●
●●●
●
●
●
●●
●●●
●
●
●
●●
●●
●
●●●●●●●
●●
●●
●●
●●●
●
●
●●●●●
●
●
●●
●
●●●
●●●
●●●
●
●●
●
●
●●
●●
●●
●●
●●
●
●
●●
●
●
●
●
●
●●●●●
●
●●
●
●●●●●●●●●●●●●●●
●
●
●
●
●●
●
●
●
●
●●●●●●
●
●
●●
●
●
●●●
●
●
●
●
●●●●●
●●●●●●●
●
●
●
●●●●●●●●●
●
●
●●●●●
●●●
●
●
●●●●
●●
●●●
●●●●●
●
●
●
●●
●
●
●
●●●●
●●●●
●
●
●●●●●●●●●●
●
●
●●●●●●●●
●
●●●●
●●●●●●
●
●●●
●
●
●●●●
●
●
●
●
●●●●
●
●
●●●●●
●
●
●
●
●●●●
●
●●
●
●●●●●
●●●●●●
●●
●
●●●●●●●●
●●●
●●●●●
●●●●●●
●
●●●
●
●●●●●●●
●●
●
●●●●●●●●●●●
●●●●●●●
●●
●●●●
●
●
●●●●
●●●
●
●●
●●●●●●
●
●
●●
●
●
●
●
●●●
●
●
●●●●●●●●
●●
●●
●
●●●
●
●●
●
●
●●●●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●●●●
●
●●
●
●
●
●●●
●
●●
●
●
●●
BLEV OPT PL
−19
−18
−17
−16
−15
−14
−13
−12
4 5 6 7 8
1618
2022
24
log(subsample size)
log(
varia
nce)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
BLEVUNIFOPTPLUW
4 5 6 7 8
1012
1416
1820
22
log(subsample size)
log(
squa
red
bias
)
●
●●
●●
● ●
●
●
●
●
● ●
●●
●
●
●●
●●
●
● ●
●
●
●
● ●
●
●
●
●●
● ● ● ● ● ●
4 5 6 7 8
1618
2022
24
log(subsample size)
log(
mea
n−sq
uare
d er
ror)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
Figure 6: Performance comparison for the real data example among Xβ by various sub-
sampling methods, UNIF, BLEV, OPT and PL, and Xβu
by BLEV denoted as UW. From
left to right: Boxplots of the logarithm of sampling probabilities among BLEV, OPT and
PL in the 1st subfigure, the logarithm of variance in the 2nd subfigure, the logarithm of
squared bias in the 3rd subfigure, and the logarithm of MSE in the 4th subfigure.
We study the performance of the weighted subsample estimator β by applying var-
ious subsampling methods to the full sample for B = 1000 times. Although the un-
weighted subsample estimator βu
is not recommended in practice as the bias for OLS
approximation can not be controlled by increasing the subsample size, we show the re-
sults of βu
based on BLEV for this real data. Different subsample sizes are chosen:
r = 25, 50, 100, 250, 500, 1000, 2500, 5000.
The resulting variance, squared bias and MSE are plotted in Figure 6. Inspecting in
Figure 6, we get similar observations to simulation studies. Firstly, the bias effect of β is
small enough to be ignored compared to the variance. Secondly, the MSEs of β for OPT
and PL are about 30% smaller than that for BLEV as the subsample size grows, while
the MSEs of β for OPT and PL are almost identical for this real data. Thirdly, for βu,
there are obviously smaller variance values compared with β, but its MSEs quickly become
flat as r increases because of its uncontrollable bias. The observation is consistent to our
theoretical and empirical results on βu.
We report the running time for computing β for BLEV and PL in Table 4. Those values
were computed by R software in PC with 3 GHz intel i7 processor, 8 GB memory and OS
X operation system. The results show that PL has obvious advantage over BLEV in terms
of system time and has slightly less computational cost than BLEV in terms of user time.
28
Table 4: The running time, CPU seconds, for computing β with BLEV and PL. T0 is the
running time for computing sampling probabilities.
r T0 25 50 100 250 500 1000 2500 5000
System Time
BLEV 0278 0.295 0.293 0.326 0.288 0.318 0.291 0.278 0.414
PL 0.235 0.251 0.252 0.245 0.245 0.245 0.253 0.265 0.344
User Time
BLEV 10.85 10.85 10.85 10.86 10.85 10.88 10.87 11.01 11.18
PL 9.83 9.83 9.83 9.83 9.83 9.84 9.86 10.00 10.06
On the other hand, we investigate the relative-error approximation for this real data.
We calculate the residual sum of squares ‖y −Xβols‖2 for full sample OLS estimator, and
the empirical value of expected residual sum of squares E‖y−Xβ‖2, i.e., 1B
B∑b=1
‖y−Xβb‖2,
where βb is the weighted subsample estimator on the bth subsample. We use the following
quantity Re to measure the relative-error approximation:
Re =
1B
B∑b=1
‖y −Xβb‖2
‖y −Xβols‖2− 1.
The Re values of the unweighted subsample estimator βu
is also calculated by the same
way. The results are reported in Table 5. We observe that (1) Re values of βu
do not
decrease as the subsample size increases, (2) OPT and PL have the best performance in
terms of the relative-error, and (3) even UNIF method can get the very good relative-error
approximation when the subsample size is sufficiently large. These observations empirically
show that the performance of relative-error approximation agrees the asymptotic results in
Theorem 1.
29
Table 5: The relative-error approximation comparison among β by various subsampling
methods, UNIF, BLEV, OPT and PL, and βu
by BLEV denoted as UW.
r 25 50 100 250 500 1000 2500 5000
BLEV 0.210 0.105 0.0584 0.0189 0.00943 0.00444 0.00201 0.00102
UW 0.147 0.0730 0.0397 0.0270 0.0212 0.0195 0.0176 0.0170
UNIF 0.403 0.163 0.0668 0.0249 0.0133 0.00646 0.00260 0.00124
OPT 0.224 0.106 0.0487 0.0183 0.00956 0.00431 0.00175 0.000888
PL 0.277 0.113 0.0462 0.0191 0.00773 0.00402 0.00182 0.000862
8 Discussion
In this paper, we have studied two classes of estimation algorithms based on subsample, i.e,
the weighted estimation algorithm and the unweighted estimation algorithm, for fitting lin-
ear models to large sample data by subsampling methods. We have established asymptotic
consistency and normality properties of their resulting subsample estimators, the weighted
subsample estimator β and the unweighted subsample estimator βu, respectively. Based
on asymptotical results of β, we have proposed two optimal criteria of the weighted es-
timation algorithm for approximating the full sample OLS estimator and estimating the
coefficients, respectively. Furthermore, two optimal subsampling methods are constructed.
Especially, PL subsampling is constructed based on the L2 norm of predictors rather than
their leverage scores. Compared with BLEV and OPT, PL is scalable to both sample size
n and predictor dimension p.
In addition, we have argued that the unweighted subsample estimator βu
is not ideal for
approximating the full sample OLS estimator, as its bias can not be controlled. However,
it is more efficient than the weighted subsample estimator β for estimating the coefficients.
Synthetic data and a real data example are used to demonstrate the performance of our
proposed methods.
30
Appendix: Proofs
A Two Lemmas
First, we provide the asymptotic results of the weighted subsample estimator β and the
unweighted subsample estimator βu
without the assumed model (1) in Lemma 1 and 2,
respectively. Their proofs are shown in Supplementary Material. Following Lemma 1 and
2, we prove Theorem 1 and Theorem 4 respectively.
Condition C1∗:1
n2
n∑
i=1
πi
(ziz
Ti
πi−MZ
)2
= Op(1), (A.1)
and there exists some δ > 0 such that
1
n2+δ
n∑
i=1
‖eixi‖2+δπ1+δi
= Op(1), (A.2)
where MZ =n∑i=1
zizTi and ei = yi − xTi βols.
Condition C3∗:n∑
i=1
πi(ziz
Ti −Mu
Z
)2= Op(1), (A.3)
n∑
i=1
π1+δi ‖eui xui ‖2+δ = Op(1), (A.4)
where MuZ =
n∑i=1
πizizTi , and eui = yi − xTi βwls with βwls = (XTΦX)−1XTΦy.
Note: (A.1) and (A.3) are imposed on the predictors and response variables, as well as
(A.2) and (A.4) are on the predictors and residuals of the full sample OLS, while C1 and
C3 are just imposed on the predictors.
Lemma 1. If Condition C1∗ holds, then given Fn, we have
V−1/2(β − βols)L−→ N(0, I) as r →∞ (A.5)
where V = M−1X VcM
−1X , and Vc = r−1
n∑i=1
e2iπixix
Ti .
Moreover,
V = O(r−1), (A.6)
31
and
E(β − βols|Fn) = O(r−1). (A.7)
Lemma 2. If Condition C3∗ holds, then given Fn,
(Vu)−1/2(βu − βwls)
L−→ N(0, I), as r →∞, (A.8)
where Vu = (MuX)−1Vu
c (MuX)−1, Mu
X =n∑i=1
πixixTi , and Vu
c = r−1n∑i=1
πi(eui )
2xixTi .
In addition, we have
E(βu − βols|Fn) = βwls − βols +Op(r
−1). (A.9)
B Proof of Theorem 1
If we show that C1 implies C1∗ under the linear model (1), Theorem 1 is proved by Lemma
1.
First, we will verify (A.1) in C1∗.
It is easy to have that
1
n2
n∑
i=1
πi
(ziz
Ti
πi−MZ
)2
=1
n2
n∑
i=1
(zizTi )2
πi− 1
n2M2
Z =
A11 A12
AT12 A22
, (A.10)
where
A11 =1
n2
n∑
i=1
1
πi
[(xix
Ti )2 + xix
Ti y
2i
]− 1
n2
[(n∑
i=1
xixTi )2 + (
n∑
i=1
xiyi)(n∑
i=1
xTi yi)
],
A12 =1
n2
n∑
i=1
1
πi(xix
Ti xiyi + xiy
3i )−
1
n2
[(n∑
i=1
xixTi )(
n∑
i=1
xiyi) + (n∑
i=1
xiyi)(n∑
i=1
y2i )
],
and A22 =1
n2
n∑
i=1
1
πi(xix
Ti y
2i + y4i )−
1
n2
[(n∑
i=1
xiyi)(n∑
i=1
xTi yi) + (n∑
i=1
y2i )2
].
Now we investigate the order of A11. Under the linear model (1), we have
1
n2
n∑
i=1
xixTi y
2i
πi− 1
n2
n∑
i=1
xixTi (xTi β)2 + σ2xix
Ti
πi→ 0 in probability. (A.11)
By Holder’s inequality, we get that
1
n2
n∑
i=1
xixTi (xTi β)2
πi≤ 1
n2
n∑
i=1
‖β‖2‖xi‖2xixTiπi
= Op(1). (A.12)
32
So combing (A.11), (A.12) and C1, i.e., 1n2
n∑i=1
xixTi
πi= Op(1), we have
1
n2
n∑
i=1
xixTi y
2i
πi= Op(1). (A.13)
Meanwhile, 1n
n∑i=1
xiyi− 1n
n∑i=1
xixTi β → 0 in probability under the linear model (1), so from
C2 we have that1
n2(n∑
i=1
xiyi)(n∑
i=1
xTi yi) = Op(1). (A.14)
Combining (A.13) and (A.14), we have A11 = Op(1).
Analogously, we can get that both of A12 and A22 in (A.10) are Op(1). Thus, (A.1) in
C1∗ is verified.
Second, we will verify (A.2) in C1∗.
1
n2+δ
n∑
i=1
‖eixi‖2+δπ1+δi
=1
n2+δ
n∑
i=1
‖[εi + xTi (βols − β)]xi‖2+δπ1+δi
≤ 1
n2+δ
n∑
i=1
‖εixi‖2+δπ1+δi
+‖βols − β‖2+δ
n2+δ
n∑
i=1
‖xi‖4+2δ
π1+δi
≤ 1
n2+δ
n∑
i=1
‖εixi‖2+δπ1+δi
+ ‖βols − β‖2+δ(
n∑
i=1
‖xi‖2)
1
n2+δ
n∑
i=1
‖xi‖2+2δ
π1+δi
=1
n2+δ
n∑
i=1
|εi|2+δ‖xi‖2+δπ1+δi
+ op(1) = Op(1),
where the 1st inequality is based on the triangle inequality and Holder’s inequality, the 2nd
inequality is from that ‖xi‖2 ≤n∑i=1
‖xi‖2, the penultimate equality holds because of (9) in
C1 and the following fact that βols − β = Op(n−1/2) under the assumed linear model, and
the last equality holds since 1n2+δ
n∑i=1
|εi|2+δ‖xi‖2+δπ1+δi
− 1n2+δ
n∑i=1
‖xi‖2+δπ1+δi
goes to 0 in probability
under the linear model assumption.
Thus, theorem 1 is proved.
C Proof of Theorem 2
By Holder’s inequality,
E[tr(Vc)] = r−1n∑
i=1
[1− hii]πi
‖xi‖2 = r−1n∑
i=1
[1− hii]πi
‖xi‖2n∑
i=1
πi ≥ (n∑
i=1
√[1− hii] ‖xi‖2)2,
33
where the equality holds if and only if√
[1−hii]πi‖xi‖2 ∝
√πi. Thus, the proof is completed.
D Proof of Theorem 4
The proof of Theorem 4 proceeds in the same fashion as that of Theorem 1 by showing
that condition C3 implies C3∗ under the linear model framework.
E Proof of Theorem 3
The difference between β and β is expressed as follows:
β − β = M−1X (r−1X∗TΦ∗−1ε∗) + (M−1
X −M−1X )(r−1X∗TΦ∗−1ε∗), (A.15)
where ε∗ = y∗ −X∗β.
We follow the steps of proving Lemma 1 to show that (M−1X −M−1
X )(r−1X∗TΦ∗−1ε∗) =
Op(r−1) and M−1
X (r−1X∗TΦ∗−1ε∗) converges to a normal distribution. However, the details
are omitted here because of the similarity.
Unlike Var(r−1n−1X∗TΦ∗−1e∗|Fn) in proving Lemma 1, there is a point worthy noting
that,
Var(r−1n−1X∗TΦ∗−1ε∗|X)
= E[Var(r−1n−1X∗TΦ∗−1ε∗|Fn)|X
]+ Var
[E(r−1n−1X∗TΦ∗−1ε∗|Fn)|X
]
= E
[r−1
1
n2
(n∑
i=1
ε2ixixTi
πi− (
n∑
i=1
ε2ixi)(n∑
i=1
ε2ixi)
)|X]
+ Var
[1
n
n∑
i=1
εixi|X]
= r−11
n2
n∑
i=1
σ2xixTi
πi− r−1 1
n2σ2
n∑
i=1
xixTi +
1
n2σ2
n∑
i=1
xixTi ,
where the second equality is from the properties of Hansen-Hurwitz estimates and the third
by taking expectation and variance under the linear model assumption.
34
F Proof of Corollary 2
By Holder’s inequality,
tr(Vc0) = r−1σ2
n∑
i=1
1
πi‖xi‖2
n∑
i=1
πi + (1− r−1)σ2
n∑
i=1
‖xi‖2
≥(
n∑
i=1
√‖xi‖2
)2
+ (1− r−1)σ2
n∑
i=1
‖xi‖2,
where the equality holds if and only if√
1πi‖xi‖2 ∝
√πi. Thus, the proof is completed.
G Proof of Theorem 5
Analogous to the proof of Theorem 3, Theorem 5 follows immediately from proving Lemma
2 by replacing βwls with β. However, during this proving we need to note that
Var(r−1X∗Tε∗|X) = E[Var(r−1X∗Tε∗|Fn)|X
]+ Var
[E(r−1X∗Tε∗|Fn)|X
]
=E
[r−1
n∑
i=1
πiε2ixix
Ti − r−1(
n∑
i=1
πiεixi)(n∑
i=1
πiεixi)T |X
]+ Var
[n∑
i=1
πiεixi|X]
=r−1σ2
n∑
i=1
πixixTi + r−1σ2
n∑
i=1
π2ixix
Ti + σ2
n∑
i=1
π2ixix
Ti .
H Proof of Corollary 3
Comparing V0 with Vu0 , we have
V0 −Vu0 =r−1σ2(XTX)−1XTΦ−1X(XTX)−1 + (1− r−1)σ2(XTX)−1
− r−1σ2(XTΦX)−1 − (1− r−1)σ2(XTΦX)−1XTΦ2X(XTΦX)−1
=r−1σ2[(XTX)−1XTΦ−1X(XTX)−1 − (XTΦX)−1
]
+ (1− r−1)σ2[(XTX)−1 − (XTΦX)−1XTΦ2X(XTΦX)−1
]
=r−1σ2[(XTX)−1XTΦ−1X(XTX)−1 − (XTΦX)−1
]+Op(n
−1),
where the last equality is from the facts that (XTX)−1XTΦ−1X(XTX)−1 − (XTΦX)−1 =
Op(1) and (XTX)−1 − (XTΦX)−1XTΦ2X(XTΦX)−1 = Op(n−1). Both facts are followed
by the conditions that n−2XTΦ−1X = Op(1) in C1 and XTΦX = Op(1) in C3, respectively.
35
On the other hand, it is obvious that (XTΦX)−1 ≤ (XTX)−1XTΦ−1X(XTX)−1 by
matrix operations, where the equality holds if and only if πi = 1n
for i = 1, · · · , n. Thus,
V0 ≥ Vu0 as r = o(n). Thus, the proof is completed.
References
Avron, H., P. Maymounkov, and S. Toledo (2010). Blendenpik: Supercharging LAPACK’s
least-squares solver. SIAM Journal on Scientific Computing 32, 1217–1236.
Buhlmann, P. and van de Geer. S. (2011). Statistics for High-Dimensional Data. Springer,
New York.
Christensen, R. (1996). Plane Answers to Complex Questions: The Theory of Linear
Models. Springer, New York.
Clarkson, K. and D. Woodruff (2013). Low rank approximation and regression in input
sparsity time. In Proc. of the 45th STOC, pp. 81–90.
Cohen, M., Y. Lee, C. Musco, C. Musco, R. Peng, and A. Sidford (2014). Uniform sampling
for matrix approximation. arXiv:1408.5099.
Dhillon, P., Y. Lu, D. Foster, and L. Ungar (2013). New subsampling algorithms for
fast least squares regression. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani,
and K. Weinberger (Eds.), Advances in Neural Information Processing Systems 26, pp.
360–368. Curran Associates, Inc.
Drineas, P., M. Magdon-Ismail, M. Mahoney, and D. Woodruff (2012). Fast approximation
of matrix coherence and statistical leverage. Journal of Machine Learning Research 13,
3475–3506.
Drineas, P., M. Mahoney, and S. Muthukrishnan (2006). Sampling algorithms for l2 regres-
sion and applications. In Proc. of the 17th Annual ACM-SIAM Symposium on Discrete
Algorithms, pp. 1127–1136.
Drineas, P., M. Mahoney, and S. Muthukrishnan (2008). Relative-error CUR matrix de-
composition. SIAM Journal on Matrix Analysis and Applications 30, 844–881.
Drineas, P., M. Mahoney, S. Muthukrishnan, and T. Sarlos (2011). Faster least squares
approximation. Numerische Mathematik 117, 219–249.
36
Golub, G. and C. Van Loan (1996). Matrix Computations. Baltimore: Johns Hopkins
University Press, Baltimore.
Graveley, B., A. Brooks, J. Carlson, M. Duff, and et al. (2011). The developmental tran-
scriptome of Drosophila melanogaster. Nature 471, 473–479.
Hansen, M. and W. Hurwitz (1943). On the theory of sampling from a finite population.
Annals of Mathematical Statistics 14, 333–362.
Lehmann, E. and G. Casella (2003). Theeory of Point Estimation, 4th ed. Springer, New
York.
Ma, P., M. Mahoney, and B. Yu (2014). A statistical perspective on algorithmic leveraging.
In Proc. of the 31th ICML Conference, pp. 91–99.
Ma, P., M. Mahoney, and B. Yu (2015). A statistical perspective on algorithmic leveraging.
Journal of Machine Learning Research 16, 861–911.
Mahoney, M. and P. Drineas (2009). CUR matrix decompositions for improved data anal-
ysis. Proceedings of the National Academy of Sciences 106, 697–702.
McWilliams, B., G. Krummenacher, M. Lucic, and J. M. Buhmann (2014, June). Fast and
Robust Least Squares Estimation in Corrupted Linear Models. ArXiv e-prints .
Meinshausen, N. and B. Yu (2009). Lasso-type recovery of sparse representations for high-
dimensional data. Annals of Statistics 37, 246–270.
Rokhlin, V. and M. Tygert (2008). A fast randomized algorithm for overdetermined linear
least-squares regression. Proceedings of the National Academy of Sciences of the United
States of America 105 (36), 13212–13217.
Sarndal, C., B. Swensson, and J. Wretman (2003). Model Assisted Survey Sampling.
Springer, New York.
Tibshirani, R. (1996). Regression shrikage and selection with the lasso. Journal of the
Royal Statistical Society. Series B 58, 267–288.
van der Vaart, A. (1998). Asymptotic Statistics. Cambridge University Press, London.
37
Supplementary Material of “Optimal Subsampling
Strategies for Large Sample Linear Regression”
1 The Relationship between Asymptotic Properties and Relative-
error Approximations
Drineas et al. (2011) gave an ε-dependent approximation error for β based on BLEV. Unlike their
results, we show asymptotic consistency and normality and consistency in Theorem 1. Here we
investigate the relationship between them in the following corollary.
Corollary 1. Under Condition C1, then there exists an ε ∈ (0, 1),
P(‖β − βols‖ ≤
√ε‖βols‖
)→ 0, as r →∞. (1)
Moreover,
P(‖y −Xβ‖ ≤ (1 + ε)‖y −Xβols‖
)→ 0, as r →∞. (2)
Proof. From our Theorem 1, ‖β − βols‖ = Op(r−1/2), and βols = Op(n
−1/2), so there exits an
ε ∈ (0, 1), such that the inequality (1) hold.
On the other hand, it is easy to see that
‖y −Xβ‖2 ≤ ‖y −Xβols‖2 + ‖X(β − βols)‖2.
Following our Theorem 1, we know β − βols = Op(r−1/2), which follows that ‖X(β − βols)‖2 =
Op(r−1n). Since in the assumed linear model, we have 1
n‖y −Xβols‖ ≥ c almost surely for some
constant c > 0, thus there exits a ε ∈ (0, 1), such that the inequality (2) is satisfied.
Corollary 1 is the asymptotic version of the relative-error approximation result (Theorem 1 of
Drineas et al. (2011)). Similar to their result, ε, which denotes the relative error in Corollary 1,
can be arbitrarily small as long as the r goes enough large.
1
arX
iv:1
509.
0511
1v1
[st
at.M
E]
17
Sep
2015
2 Condition C1 for Revised Leverage Score
If we enlarge leverage score from hii to h1/(1+δ)ii , πi = h
1/(1+δ)ii /
n∑i=1
h1/(1+δ)ii . Thus, we have
M1 =1
n2
n∑
i=1
h1/(1+δ)ii
n∑
i=1
‖xi‖2xixTi(xTi M−1
X xi)1/(1+δ)
≤ λmn
n∑
i=1
h1/(1+δ)ii
n∑
i=1
‖xi‖2δ/(1+δ)xixTi ,
M2 =1
n2
n∑
i=1
h1/(1+δ)ii
n∑
i=1
xixTi
(xTi M−1X xi)
1/(1+δ)≤ λm
n
n∑
i=1
h1/(1+δ)ii
n∑
i=1
‖xi‖−2/(1+δ)xixTi , ,
M3 =1
n2+δ
(n∑
i=1
h1/(1+δ)ii
)1+δ n∑
i=1
‖xi‖2+2δ
xTi M−1X xi
≤ λm(
n∑
i=1
h1/(1+δ)ii
)1+δ n∑
i=1
‖xi‖2δ,
M4 =1
n2+δ
(n∑
i=1
h1/(1+δ)ii
)1+δ n∑
i=1
‖xi‖2+δxTi M−1
X xi≤ λm
(n∑
i=1
h1/(1+δ)ii
)1+δ1
n
n∑
i=1
‖xi‖δ,
where λm is the largest eigenvalue of 1nMX . Thus, it is sufficient to get the those equations in C1
under the condition that {xi} are independent with bounded fourth moments.
3 Empirical Studies on Estimation of the True coefficient
Besides empirical studies on βols approximation, we apply five subsampling methods with different
subsample sizes to each dataset. For each of five X matrices with sample size n = 50, 000 and
predictor dimension p = 1000, one thousand data sets (X,yb), where b = 1, . . . , B with B = 1000
were generated from the model yb = Xβ + εb, where εb ∼ N(0, σ2In) with σ = 10. We get βb
based on bth dataset.
We calculate the empirical variance (V u), squared bias (SBu) and mean squared error (MSEu)
of Xβ for estimating Xβ as follows:
V u =1
B
B∑
b=1
∥∥∥X(βb − ¯
β)∥∥∥
2, (3)
SBu =
∥∥∥∥∥1
B
B∑
b=1
X(βb − β
)∥∥∥∥∥
2
, (4)
MSEu =1
B
B∑
b=1
∥∥∥X(βb − β)∥∥∥2, (5)
where¯β = 1
B
B∑b=1
βb.
2
0 1000 3000 5000
1012
1416
18TT
subsample size
log(
varia
nce) ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
BLEVUNIFOPTPL
0 1000 3000 5000
1011
1213
1415
1617
MG
subsample sizelo
g(va
rianc
e)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 1000 3000 5000
1011
1213
1415
1617
LN
subsample size
log(
varia
nce)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 1000 3000 5000
46
810
1214
subsample size
log(
squa
red
bias
)
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
0 1000 3000 5000
56
78
910
1112
subsample size
log(
squa
red
bias
) ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
0 1000 3000 5000
46
810
12
subsample sizelo
g(sq
uare
d bi
as)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
0 1000 3000 5000
1012
1416
18
subsample size
log(
mse
) ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 1000 3000 5000
1011
1213
1415
1617
subsample size
log(
mse
)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 1000 3000 5000
1011
1213
1415
1617
subsample size
log(
mse
)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Figure 1: Empirical variances, squared biases and mean-squared errors of Xβ for estimating Xβ
for three datasets. From top to bottom: upper panels are the logarithm of variances, middle panels
the logarithm of squared bias, and lower panels the logarithm of mean-squared error. From left to
right: TT, MG and LN data, respectively.
We plot the results in Figure 1. From Figure 1, we observe that the biases are negligible and
variances dominate the biases. Second, PL has the least MSEs, which is consistent to our results
of β in estimating β, interestingly, the performance of OPT is very close that of PL.
3
4 Proofs of Lemma 1 & 2
4.1 Proof of Lemma 1
Condition C1∗:1
n2
n∑
i=1
πi
(ziz
Ti
πi−MZ
)2
= Op(1), (A.1)
1
n2+δ
n∑
i=1
‖eixi‖2+δπ1+δi
= Op(1) for δ > 0, (A.2)
where MZ =n∑i=1ziz
Ti and ei = yi − xTi βols.
Lemma 1. If Condition C1∗ holds, then given Fn, we have
V−1/2(β − βols)L−→ N(0, I) as r →∞ (A.3)
where V = M−1X VcM
−1X , and Vc = r−1
n∑i=1
e2iπixix
Ti .
Moreover,
V = O(r−1). (A.4)
In addition, we have
E(β − βols|Fn) = O(r−1) (A.5)
Proof. The difference between β and βols is
β − βols = (X∗TΦ∗−1X∗)−1X∗TΦ∗−1y∗ − βols= (X∗TΦ∗−1X∗)−1X∗TΦ∗−1(y∗ −X∗βols)
= M−1X (r−1X∗TΦ∗−1e∗)
= M−1X (r−1X∗TΦ∗−1e∗) + (M−1
X −M−1X )(r−1X∗TΦ∗−1e∗), (A.6)
where MX = XTX, MX = r−1X∗TΦ∗−1X∗, and e∗ = y∗ −X∗βols.
Step 1. If we show
n−1(MX −MX) = Op(r−1/2), n−1r−1X∗TΦ∗−1e∗ = Op(r
−1/2), (A.7)
then it follows that
(M−1X −M−1
X )(r−1X∗TΦ∗−1e∗) = −M−1X (MX −MX)M−1
X (r−1X∗TΦ∗−1e∗)
= −(n−1MX)−1n−1(MX −MX)(n−1MX)−1(n−1r−1X∗TΦ∗−1e∗)
= Op(r−1). (A.8)
4
Combining equations (A.8) and (A.6), we get
β − βols = M−1X (r−1X∗TΦ∗−1e∗) +Op(r
−1) (A.9)
So we shall prove (A.7). Note that, given Fn, MX and X∗TΦ∗−1e∗ are Hansen-Hurwitz esti-
mates shown in Hansen and Hurwitz (1943), we thus have,
E(MX) = MX , E(X∗TΦ∗−1e∗) = XT (y −Xβols) = 0.
By C1∗, we have the following equations,
1
n2
n∑
i=1
πi
(xix
Ti
πi−MX
)(xix
Ti
πi−MX
)= Op(1),
1
n2
n∑
i=1
πi
(xix
Ti
πi−MX
)(xiyiπi−XTy
)= Op(1),
1
n2
n∑
i=1
πi
(yix
Ti
πi− yTX
)(xiyiπi−XTy
)= Op(1),
which imply, for any p-dimensional vector l with finite elements,
1
n2
n∑
i=1
πi
(xix
Ti
πi−MX
)llT(xix
Ti
πi−MX
)= Op(1),
1
n2
n∑
i=1
πi
(xix
Ti
πi−MX
)llT(xiyiπi−XTy
)= Op(1),
1
n2
n∑
i=1
πi
(yix
Ti
πi− yTX
)llT(xiyiπi−XTy
)= Op(1).
Hence,
1
n2
n∑
i=1
xixTi ll
TxixTi
πi=
1
n2(XTX)llT (XTX) +Op(1), (A.10)
1
n2
n∑
i=1
xixTi ll
Txiyiπi
=1
n2(XTX)llT (XTy) +Op(1), (A.11)
1
n2
n∑
i=1
yixTi ll
Txiyiπi
=1
n2(yTX)llT (XTy) +Op(1). (A.12)
Therefore, given Fn, we have,
V ar(n−1MXl) = r−1 1
n2
n∑
i=1
πi
(xix
Ti
πi−MX
)llT(xix
Ti
πi−MX
)= Op(r
−1), (A.13)
5
V ar(r−1n−1lTX∗TΦ∗−1e∗) = r−1 1
n2
n∑
i=1
πi
(eix
Ti l
πi
)2
= r−1 1
n2
n∑
i=1
1
πi
[y2i x
Ti ll
Txi + βT
olsxixTi ll
TxixTi βols − 2yix
Ti ll
TxixTi βols
]
= r−1 1
n2
[n∑
i=1
1
πiy2i x
Ti ll
Txi + βT
ols
(n∑
i=1
1
πixix
Ti ll
TxixTi
)βols − 2
(n∑
i=1
1
πiyix
Ti ll
TxixTi
)βols
]
= r−1
[1
n2yTXllTXTy +
1
n2βT
ols(XTX)llT (XTX)βols −
2
n2yTXllT (XTX)βols +Op(1)
]
= r−1
[1
n2
(lT [XT (y −Xβols)]
)2+Op(1)
]
= Op(r−1), (A.14)
where ei = yi−xTi βols, the fourth equality is based on (A.10), (A.11) and (A.12), the last equality
is based on the fact that XT (y −Xβols) = 0, and others are direct results. Thus we have (A.7).
Step 2. Next we will show, given Fn, [V ar(r−1n−1X∗TΦ∗−1e∗)]−1/2(r−1n−1X∗TΦ∗−1e∗) con-
verges to normal distribution.
We can construct the random vector ηi such that ηi take vector among { e1x1nπ1
, · · · , enxnnπn} and
Pr
[ηi =
ejxjnπj
]= πj , for j = 1, . . . , n.
Because of the independence and repeatability of each subsampling during subsampling with re-
placement, given Fn, the sequence {η1, · · · ,ηr} is independent identically distributed.
Then it is easy to see that
n−1X∗TΦ∗−1e∗ =
r∑
i=1
ηi,
and
E(ηi|Fn) = 0, V ar (ηi|Fn) =1
n2
n∑
i=1
e2ixixTi
πi= Op(1). (A.15)
In fact, {η1, · · · ,ηr} is the double array with distribution dependent on n, in order to prove the
asymptotic normality, thus we need to verify the Lindeberg-Feller condition in the following, i.e.,
for every ε > 0,
r∑
i=1
E[‖r−1/2ηi‖21{‖r−1/2ηi‖ > ε}
]=
n∑
i=1
1
πi‖eixin‖21{‖eixi
n‖ > r1/2πiε} (A.16)
≤ (εr1/2)−δn∑
i=1
1
πi‖eixin‖2‖eixi
nπi‖δ1{‖eixi
n> r1/2πiε}
≤ (εr1/2)−δn∑
i=1
1
π1+δi
‖eixin‖2+δ = op(1)
6
where δ > 0, the first inequality is gotten by making use of the constrain that 1{‖ eixin ‖ > r1/2πiε},
the second inequality from the fact that 1{‖ eixin ‖ > r1/2πiε} ≤ 1, the last equality is based on (A.2)
in Condition C1∗.
Combining (A.15) and (A.16), we have, given Fn, by Lindeberg-Feller central limit theorem
(CLT) (van der Vaart (1998), 2.27 at section 2.8, Page 20),
[1
n2
n∑
i=1
e2ixixTi
πi
]−1/2(r−1/2
r∑
i=1
ηi
)L−→ N(0, I). (A.17)
Moreover,
V ar(r−1n−1X∗TΦ∗−1e∗|Fn) = r−1 1
n2
n∑
i=1
e2ixixTi
πi. (A.18)
So following (A.17) and (A.18), given Fn, [V ar(r−1n−1X∗TΦ∗−1e∗)]−1/2(r−1n−1X∗TΦ∗−1e∗) con-
verges to normal distribution.
Thus given Fn, we get
V ar[M−1
X (r−1X∗TΦ∗−1e∗)]−1/2
M−1X (r−1X∗TΦ∗−1e∗) L−→ N(0, I). (A.19)
Combining (A.9),(A.18) and (A.19), we get (A.3) of Lemma 1 proved.
Since
V = M−1X VcM
−1X =
(n∑
i=1
xixTi
)−1(r−1
n∑
i=1
e2iπixix
Ti
)(n∑
i=1
xixTi
)−1
= O(1/r), (A.20)
We have (A.4) of Lemma 1 proved.
Next we shall prove (A.5) of Lemma 1. From (A.6) and (A.17), we have
E(β − βols|Fn) = E[(M−1
X −M−1X )(r−1X∗TΦ∗−1e∗)|Fn
]=: B. (A.21)
Then we have, for some constant c, if r is big enough,
‖B‖2 ≤ E[‖(M−1
X −M−1X )(r−1X∗TΦ∗−1e∗)‖2|Fn
](A.22)
≤ E[‖n(M−1
X −M−1X )‖F ‖n−1(r−1X∗TΦ∗−1e∗)‖2|Fn
]
= E[‖(n−1MX)−1n−1(MX −MX)(n−1MX)−1‖F ‖n−1(r−1X∗TΦ∗−1e∗)‖2|Fn
]
≤ E[‖(n−1MX)−1‖F ‖n−1(MX − MX)‖F ‖(n−1MX)−1‖F ‖n−1(r−1X∗TΦ∗−1e∗)‖2|Fn
]
≤{E[‖(n−1MX)−1‖2F |Fn
]E[n−2‖MX − MX‖2F |Fn
]E[‖(n−1MX)−1‖2F |Fn
]}1/2·
{E[‖n−1(r−1X∗TΦ∗−1e∗)‖22|Fn
]}1/2
≤ c{E[n−2‖MX − MX‖2F |Fn
]}1/2 {E[‖n−1(r−1X∗TΦ∗−1e∗)‖22|Fn
]}1/2
7
where the first inequality is followed by Jensen inequality, the second inequality by the inequal-
ity of Forbenius matrix norm and vector norm, the equality by the fact that M−1X −M−1
X =
−M−1X (MX −MX)M−1
X , the third inequality by matrix norm inequality, the forth inequality by
Holder’s inequality, and the last inequality is by the first result in (A.7) and the fact that the
Forbenius norm of the inverse of a positive definite matrix can be bounded by some constant.
Following (A.13) and (A.14), we have,
E[n−2‖MX − MX‖2F |Fn
]= Op(1/r),
and
E[‖n−1(r−1X∗TΦ∗−1e∗)‖22|Fn
]= Op(1/r).
By combining two formulas above and (A.22), we get
‖B‖2 = Op(r−1). (A.23)
We thus proved that
E(β − βols|Fn) = Op(r−1).
Condition C3∗:n∑
i=1
πi(ziz
Ti −Mu
Z
)2= Op(1), (A.24)
n∑
i=1
π1+δi ‖eui xui ‖2+δ = Op(1), (A.25)
where MuZ =
n∑i=1
πizizTi , and eui = yi − xTi βwls with βwls = (XTΦX)−1XTΦy.
4.2 Proof of Lemma 2
Lemma 2. If Condition C3∗ holds, then given Fn,
V−1/2u (β
u − βwls)L−→ N(0, I), as r →∞, (A.26)
where Vu = (MuX)−1Vu
c (MuX)−1, Mu
X =n∑i=1
πixixTi , and Vu
c = r−1n∑i=1
πi(eui )2xix
Ti .
In addition, we have
E(βu − βols|Fn) = βwls − βols +Op(r
−1) (A.27)
8
Proof. As βu
= (X∗TX∗)−1X∗T y∗, the difference between βu
and βwls is
βu − βwls = (X∗TX∗)−1X∗T (y∗ −X∗βwls) = (Mu
X)−1(r−1X∗Te∗u), (A.28)
where MuX = r−1X∗TX∗, and e∗u = y∗ −X∗βwls.
Step 1. Analogous to the proof of Lemma 1, if we show
MuX −Mu
X = Op(r−1/2), r−1X∗Te∗u = Op(r
−1/2). (A.29)
Then it follows that
βu − βwls = (Mu
X)−1(r−1X∗Te∗u) +Op(r−1). (A.30)
By the definition of βwls, XTΦy− (XTΦX)βwls = 0, one can easily verify the following equalities,
E(MuX |Fn) = Mu
X , E(r−1X∗Te∗u|Fn) = 0.
Analogous to the analysis of (A.13) and (A.14) in the proof of Lemma 1, from C4∗, we also have,
given Fn,
V ar(MuX) = r−1
n∑
i=1
πi(xix
Ti −Mu
X
)2= Op(r
−1),
V ar(r−1XTe∗u) = r−1n∑
i=1
πi(eui )2xix
Ti = Op(r
−1). (A.31)
Thus equation (A.29) is shown.
Step 2. Analogous to the proof of Lemma 1, from C3∗ and CLT, given Fn, we have,
V ar[(Mu
X)−1(r−1X∗Te∗u)]−1/2
(MuX)−1(r−1X∗T e∗u)
L−→ N(0, I). (A.32)
Thus combining (A.30), (A.31), and (A.32), the first result (A.26) in Lemma 2 is proved. We
will prove (A.27) in Lemma 2 in the following. As βu − βols = β
u − βwls + βwls − βols, we have
E(βu − βols|Fn) = E(β
u − βwls|Fn) + βwls − βols. (A.33)
Similarly as (A.22), we have
‖E(βu − βwls|Fn)‖2 = Op(r
−1).
Thus, (A.27) in Lemma 2 is proved.
References
Drineas, P., M. Mahoney, S. Muthukrishnan, and T. Sarlos (2011). Faster least squares approxi-
mation. Numerische Mathematik 117, 219–249.
Hansen, M. and W. Hurwitz (1943). On the theory of sampling from a finite population. Annals
of Mathematical Statistics 14, 333–362.
van der Vaart, A. (1998). Asymptotic Statistics. Cambridge University Press, London.
9