1Massive MIMO as a Big Data System: RandomMatrix Models and Testbed
Changchun Zhang, Student Member, IEEE, Robert C. Qiu, Fellow, IEEE
AbstractThe paper has two parts. The first one deals withhow to use large random matrices as building blocks to modelthe massive data arising from the massive (or large-scale) MIMOsystem. As a result, we apply this model for distributed spectrumsensing and network monitoring. The part boils down to thestreaming, distributed massive data, for which a new algorithmis obtained and its performance is derived using the central limittheorem that is recently obtained in the literature. The secondpart deals with the large-scale testbed using software-definedradios (particularly USRP) that takes us more than four years todevelop this 70-node network testbed. To demonstrate the powerof the software defined radio, we reconfigure our testbed quicklyinto a testbed for massive MIMO. The massive data of this testbedis of central interest in this paper. It is for the first time for usto model the experimental data arising from this testbed. To ourbest knowledge, we are not aware of other similar work.
Index TermsMassive MIMO, 5G Network, Random Matrix,Testbed, Big Data.
Massive or large-scale multiple-input, multiple output(MIMO), one of the disruptive technologies of the next gener-ation (5G) communications system, promises significant gainsin wireless data rates and link reliability  . In this paper,we deal with the massive data aspects of the massive MIMOsystem. In this paper, we use two terms (massive data and bigdata) interchangeably, following the practice from NationalResearch Council .
The benefits from massive MIMO are not only limited tothe higher data rates. Massive MIMO techniques makes greencommunications possible. By using large numbers of antennasat the base station, massive MIMO helps to focus the radi-ated energy toward the intended direction while minimizingthe intra and intercell interference. The energy efficiency isincreased dramatically as the energy can be focused withextreme sharpness into small regions in space . It is shownin  that, when the number of base station (BS) antennas Mgrows without bound, we can reduce the transmitted power ofeach user proportionally to 1/M if the BS has perfect channelstate information (CSI), and proportionally to 1
Mif CSI is
estimated from uplink pilots. Reducing the transmit power ofthe mobile users can drain the batteries slower. Reducing theRF power of downlink can cut the electricity consumption ofthe base station.
Massive MIMO also brings benefits including inexpensivelow-power components, reduced latency, simplification of
Robert C. Qiu and Changchun Zhang are with the Department ofElectrical and Computer Engineering, Center for Manufacturing Re-search, Tennessee Technological University, Cookeville, TN, 38505, e-mail:firstname.lastname@example.org; email@example.com.
MAC layer, etc . Simpler network design could bring lowercomplexity computing which save more energy of the networkto make the communications green.
Currently, most of the research of massive MIMO is focusedon the communications capabilities. In this paper, we promotean insight that, very naturally, the massive MIMO system canbe regarded as a big data system. Massive waveform datacoming in a streaming mannercan be stored and processedat the base station with a large number of antennas, whilenot impacting the communication capability. Especially, therandom matrix theory can be well mapped to the architectureof large array of antennas. The random matrix theory datamodel has ever been validated by  in a context of distributedsensing. In this paper, we extend this work to the massiveMIMO testbed. In particular, we studied the function of mul-tiple non-Hermitian random matrices and applied the variantsto the experimental data collected on the massive MIMOtestbed. The product of non-Hermitian random matrices showsencouraging potential in signal detection, that is motivated forspectrum sensing and network monitoring. We also presenttwo concrete applications that are demonstrated on our testbedusing the massive MIMO system as big data system. From thetwo applications, we foresee that, besides signal detection, therandom-matrix based big data analysis will drive more mobileapplications in the next generation wireless network.
II. MODELING FOR MASSIVE DATA
Large random matrices are used models for the massive dataarising from the monitoring of the massive MIMO system. Wegive some tutorial remarks, to facilitate the understanding ofthe experimental results.
A. Data Modeling with Large Random Matrices
Naturally, we assume n observations of p-dimensional ran-dom vectors x1, ...,xn Cp1. We form the data matrixX = (x1, ...,xn) Cpn, which naturally, is a random matrixdue to the presence of ubiquitous noise. In our context, we areinterested in the practical regime p = 100 1, 000, while nis assumed to be arbitrary. The possibility of arbitrary samplesize n makes the classical statistical tools infeasible. We areasked to consider the asymptotic regime 
p, n, p/n c (0,) , (1)while the classical regime  considers
p fixed, n, p/n 0. (2)
2Our goal is to reduce massive data to a few statistical pa-rameters. The first step often involves the covariance matrixestimation using the sample covariance estimator
xixHi Cpp, (3)
that is a sum of rank-one random matrices . The samplecovariance matrix estimator is the maximum likelihood estima-tor (so it is optimal) for the classical regime (2). However, forthe asymptotic regime (1), this estimator is far from optimal.We still use this estimator due to its special structure. See  for modern alternatives to this fundamental algorithm. Forbrevity, we use the sample covariance estimator throughoutthis paper.
B. Non-Hermitian Free Probability Theory
Once data are modeled as large random matrices, it isnatural for us to introduce the non-Hermitian random matrixtheory into our problem at hand. Qius book  gives anexhaustive account of this subject in an independent chapter,from a mathematical view. This paper is complementary toour book  in that we bridge the gap between theoryand experiments. We want to understand how accurate thistheoretical model becomes for the real-life data. See Section Vfor details.
Roughly speaking, large random matrices can be treated asfree matrix-valued random variables. Free random variablescan be understood as independent random variables. Thematrix size must be so large that the asymptotic theoreticalresults are valid. It is of central interest to understand thisfinite-size scaling in this paper.
III. DISTRIBUTED SPECTRUM SENSING
Now we are convinced that large random matrices are validfor experimental data modeling. The next natural questionis to test whether the signal or the noise is present in thedata. Both networking monitoring and spectrum sensing can beformulated as a matrix hypothesis testing problem for anomalydetection.
A. Related Work
Specifically, consider the n samples y1, ...,yn, drawn froma p-dimensional complex Gaussian distribution with covari-ance matrix . We aim to test the hypothesis:
H0 : = Ip.This test has been studied extensively in classical settings (i.e.,p fixed, n ), first in detail in . Denoting the samplecovariance by Sn = 1n
yiyHi , the LRT is based on the
linear statistic (see Anderson (2003) [11, Chapter 10])
L = Tr (Sn) ln (det Sn) p. (4)Under H0, with p fixed, as n , nL is well known
to follow a 2 distribution. However, with high-dimensionaldata for which the dimension p is large and comparable to
the sample size n, the 2 approximation is no longer valid.A correction to the LRT is done in Bai, Jiang, Yao andZheng (2009)  on large-dimensional covariance matrix byrandom matrix theory. In this case, a better approach is to useresults based on the double-asymptotic given by Assumption1. Such a study has been done first under H0 and later underthe spike alternative H1. More specifically, under H0, thiswas presented in  using a CLT framework establishedin Bai and Silverstein (2004) . Under H1 : has aspiked covariance structure as in Model A, this problem wasaddressed only very recently in the independent works, and . We point out that  (see also ) considereda generalized problem which allowed for multiple spikedeigenvalues. The result in  was again based on the CLTframework of Bai and Silverstein (2004) , with theirderivation requiring the calculation of contour integrals. Thesame result was presented in , in this case making use ofsophisticated tools of contiguity and Le Cams first and thirdlemmas .
B. Spiked Central Wishart Matrix
Our problem is formulated as
H0 : = IpH1 : Model A: Spiked central Wishart.
Model A: Spiked central Wishart: Matrices with distributionCWp (n,,0pp) (n > p) , where has multiple distinctspike eigenvalues 1 + 1 > > 1 + r, with r > 0for all 1 k r, and all other eigenvalues equal to 1.
Assumption 1. n, p such that n/p c > 1.Theorem III.1 (Passemier, McKay and Chen (2014) ).Consider Model A and define
a =(1c)2, b = (1 +c)2. (6)
Under Assumption 1, for an analytic function f : U Cwhere U is an open subset of the complex plane which contains[a, b], we have
) p L N
(b x) (x a)
f (x)(b x) (x a)