Investment in “Real Time” and “High Definition”: A Big

Investment in “Real Time” and“High Definition”: A Big Data Approach

November 2020

2020 Conference on Nontraditional Data & Statistical LearningBanca d’ Italia and Federal Reserve Board

Ali.B. Barlas, Seda Güler, Alvaro Ortiz & Tomasa Rodrigo

2Investment in ”Real Time” & ”High Definition”: A Big Data approach

The Covid-19 crisis has reinforced the potential of Big Data tools forEconomic Analysis and Policymaking

The high uncertainty triggered by the Covid-19 crisis has stressed the need to monitor the evolution ofthe economy in “real time”. These efforts have been materialized in several ways:

- Focusing on timely, alternative indicators: soft data surveys (particularly the PurchasingManager Indexes, PMIs) and other high frequency indicators like electricity production or chain

store sales released on daily or weekly basis.

- Developing higher frequency models: Some CBs have relied on this High Frequency indicatorsto develop weekly activity tracker models such as the FED´WEI (Lewis, 2020) and the

Bundesbank WAI (Eraslan, S. and T. Götz, 2020).

- Developing New Big Data Indicators*: Focusing on daily aggregate information of bankingtransactions to track consumption, employment , turnover, mobility (link to our BigData Project).

* Some of the Recent literature on Big Data analysis Andersen, Hansen, Johannesen, & Sheridan (2020a), Andersen, Hansen, Johannesen, & Sheridan (2020b), Alexander & Karger (2020), Baker, Farrokhnia, Meyer, Pagel, & Yannelis (2020a), Baker, Farrokhnia, Meyer, Pagel, & Yannelis (2020b), Bounie, Camara, & Galbraith (2020), Chetty, Friedman, Hendren, & Stepner (2020), Chronopoulos, Lukas, & Wilson (2020), Cox, Ganong, Noel, Vavra, Wong, Farrell, & Greig (2020), Surico, Kanzig, & Hacioglu (2020).

https://www.bbvaresearch.com/en/special-section/charts/


Through the analysis of the firm-to-firm transactions we extend ourproject of national accounts in real time & high definition to Investment

The investment spending is done mostly by companies and, to a lesser extent, by individualsWe track investment payments through

Firms are classified by their NACE codes to identify their business activity (in line with the European statistical classification of sectors)

firm to firm transactionsindividual to firm transactions

Construction Investment

We approximate investment demand in one type of asset taking into account the aggregate flows or transactions done from any firm or individual to the sector which produce the fixed assets

Machinery Investment*Total Investment*Machinery & Equipment, Media & ICT, Agriculture & Animals, Forestry, Durable Goods, Retail Trade, Textile & Clothing, Transportation and Shipping.


The case of Turkey: Data and Representativeness


Validation I: The Big Data investment index shows a high correlationand co-movement with the oficial data TURKEY: GBBBVA BIG DATA INVESTMENT INDICES (28-day cum. YoY real )

Correlation coefficient: 0.88

Total Investment

Correlation coefficient: 0.84 Correlation coefficient: 0.77

Maquinery & Equipment Construction

Source: Own Calculations


Validation II: The sincrony of BigData Investment with the Investment

Cycle is validated by the high correlation coefficients with HF proxies

-40%

-30%

-20%

-10%

0%

10%

20%

30%

40%

-50%

-40%

-30%

-20%

-10%

0%

10%

20%

30%

Sep

-17

No

v-1

7Ja

n-1

8M

ar-

18M

ay-

18

Jul-1

8S

ep-1

8N

ov-

18

Jan-

19

Ma

r-19

Ma

y-1

9Ju

l-19

Sep

-19

No

v-1

9Ja

n-2

0M

ar-

20M

ay-

20

Jul-2

0S

ep-2

0N

ov-

20

Garanti BBVA Big Data Construction Index 3mEmployment in ConstructionNon-Metalic Mineral (Cement)Turnover Building - rhs

BBVA BIG DATA INVESTMENT & HIGH FREQUENCY PROXIES (28-day cum. YoY real )

Maquinery & Equipment ConstructionCorrelation coefficient: 0.75





Source: Own Calculation


Investment Big Data in an Nowcasting Model (DFM): The framework

9ECB

Working Paper Series No 1189May 2010

the forecast accuracy of these specifications with that of univariate benchmarks as well as of the model ofBanbura and Runstler (2010), who adopt the methodology of Giannone, Reichlin, and Small (2008) to thecase of euro area.

Giannone, Reichlin, and Small (2008) have proposed a factor model framework, which allows to deal with“ragged edge” and exploit information from large data sets in a timely manner. They have applied it tonowcasting of US GDP from a large number of monthly indicators. While Giannone, Reichlin, and Small(2008) can handle the “ragged edge” problem, it is not straightforward to apply their methodology tomixed frequency panels with series of different lengths or, in general, to any pattern of missing data.8 Inaddition, as the estimation is based on principal components, it could be inefficient for small samples.

Other papers related to ours include Camacho and Perez-Quiros (2008) who obtain real-time estimates ofthe euro area GDP from monthly indicators from a small scale model applying the mixed frequency factormodel approach of Mariano and Murasawa (2003). Schumacher and Breitung (2008) forecast German GDPfrom large number of monthly indicators using the EM approach proposed by Stock and Watson (2002b).

and shows how to incorporate relevant accounting and temporary constraints. Angelini, Henry, andMarcellino (2006) propose methodology for backdating and interpolation based on large cross-sections. Incontrast to theirs, our method exploits the dynamics of the data and is based on maximum likelihoodwhich allows for imposing restrictions and is more efficient for smaller cross-sections.

The paper is organized as follows. Section 2 presents the model, discusses the estimation and explains howthe news content can be extracted. Section 3 provides the results of the Monte Carlo experiment. Section4 describes the empirical application. Section 5 concludes. The technical details and data description areprovided in the Appendix.

2 Econometric framework

Let yt = [y1,t, y2,t, . . . , yn,t]′ , t = 1, . . . , T denote a stationary n-dimensional vector process standardised

to mean 0 and unit variance. We assume that yt admits the following factor model representation:

yt = Λft + εt , (1)

where ft is a r× 19 vector of (unobserved) common factors and εt = [ε1,t, ε2,t, . . . , εn,t]′ is the idiosyncratic

component, uncorrelated with ft at all leads and lags. The n×r matrix Λ contains factor loadings. χt = Λft

is referred to as the common component. It is assumed that εt is normally distributed and cross-sectionallyuncorrelated, i.e. yt follows an exact factor model. We also shortly discuss validity of the approach inthe case of an approximate factor model, see below. What concerns the dynamics of the idiosyncraticcomponent we consider two cases: εt is serially uncorrelated or it follows an AR(1) process.

Further, it is assumed that the common factors ft follow a stationary VAR process of order p:

ft = A1ft−1 + A2ft−2 + · · · + Apft−p + ut , ut ∼ i.i.d. N (0, Q) , (2)

where A1, . . . , Ap are r×r matrices of autoregressive coefficients. We collect the latter into A = [A1, . . . , Ap].8Their estimation approach consists of two steps. First, the parameters of the state space representation of the factor

model are obtained using a principal components based procedure applied to a truncated data set (without missing data).Second, Kalman filter is applied on the full data set in order to obtain factor estimates and forecasts using all availableinformation.

9For identification it is required that 2r + 1 ≤ n, see e.g. Geweke and Singleton (1980).

Proietti (2008) estimates a factor model for interpolation of GDP and itsmain components

9

ECBWorking Paper Series No 1189

May 2010









yt = Λft + εt , (1)










10

ECBWorking Paper Series No 1189May 2010

2.1 Estimation

As ft are unobserved, the maximum likelihood estimators of the parameters of model (1)-(2), which wecollect in θ, are in general not available in closed form. On the other hand, a direct numerical maximisationof the likelihood is computationally demanding, in particular for large n due to the large number ofparameters.10

In this paper we adopt an approach based on the Expectation-Maximisation (EM) algorithm, which wasproposed by Dempster, Laird, and Rubin (1977) as a general solution to problems for which incomplete orlatent data yield the likelihood intractable or difficult to deal with. The essential idea of the algorithm isto write the likelihood as if the data were complete and to iterate between two steps: in the Expectationstep we “fill in” the missing data in the likelihood, while in the Maximisation step we re-optimise thisexpectation. Under some regularity conditions, the EM algorithm converges towards a local maximum ofthe likelihood (or a point in its ridge, see also below).

To derive the EM steps for the model described above, let us denote the joint log-likelihood of yt andft, t = 1, . . . , T by l(Y, F ; θ), where Y = [y1, . . . , yT ] and F = [f1, . . . , fT ]. Given the available dataΩT ⊆ Y ,11 EM algorithm proceeds in a sequence of two alternating steps:

1. E-step - the expectation of the log-likelihood conditional on the data is calculated using the estimatesfrom the previous iteration, θ(j):

L(θ, θ(j)) = Eθ(j)

[l(Y, F ; θ)|ΩT

];

2. M-step - the parameters are re-estimated through the maximisation of the expected log-likelihoodwith respect to θ:

θ(j + 1) = arg maxθ

L(θ, θ(j)) . (3)

Watson and Engle (1983) and Shumway and Stoffer (1982) show how to derive the maximisation step (3)for models similar to the one given by (1)-(2). As a result the estimation problem is reduced to a sequenceof simple steps, each of which essentially involves a pass of the Kalman smoother and two multivariateregressions. Doz, Giannone, and Reichlin (2006) show that the EM algorithm is a valid approach for themaximum likelihood estimation of factor models for large cross-sections as it is robust, easy to implementand computationally inexpensive. Watson and Engle (1983) assume that all the observations in yt areavailable (ΩT = Y ). Shumway and Stoffer (1982) derive the modifications for the missing data case butonly with known Λ. We provide the EM steps for the general case with missing data.

In the main text, we set for simplicity p = 1 (A = A1), the case of p > 1 is discussed in the Appendix. Wefirst consider the case of serially uncorrelated εt:

εt ∼ i.i.d. N (0, R) , (4)

where R is a diagonal matrix. In that case θ = Λ, A,R,Q and the maximisation of (3) results in the10Recently, Jungbacker and Koopman (2008) have shown how to reduce the computational complexity related to estimation

and smoothing if the number of observables is much larger than the number of factors.11ΩT ⊆ Y because some observations in yt can be missing.

10ECBWorking Paper Series No 1189May 2010

2.1 Estimation

As ft are unobserved, the maximum likelihood estimators of the parameters of model (1)-(2), which wecollect in θ, are in general not available in closed form. On the other hand, a direct numerical maximisationof the likelihood is computationally demanding, in particular for large n due to the large number ofparameters.10

In this paper we adopt an approach based on the Expectation-Maximisation (EM) algorithm, which wasproposed by Dempster, Laird, and Rubin (1977) as a general solution to problems for which incomplete orlatent data yield the likelihood intractable or difficult to deal with. The essential idea of the algorithm isto write the likelihood as if the data were complete and to iterate between two steps: in the Expectationstep we “fill in” the missing data in the likelihood, while in the Maximisation step we re-optimise thisexpectation. Under some regularity conditions, the EM algorithm converges towards a local maximum ofthe likelihood (or a point in its ridge, see also below).

To derive the EM steps for the model described above, let us denote the joint log-likelihood of yt andft, t = 1, . . . , T by l(Y, F ; θ), where Y = [y1, . . . , yT ] and F = [f1, . . . , fT ]. Given the available dataΩT ⊆ Y ,11 EM algorithm proceeds in a sequence of two alternating steps:

1. E-step - the expectation of the log-likelihood conditional on the data is calculated using the estimatesfrom the previous iteration, θ(j):

L(θ, θ(j)) = Eθ(j)

[l(Y, F ; θ)|ΩT

];

2. M-step - the parameters are re-estimated through the maximisation of the expected log-likelihoodwith respect to θ:

θ(j + 1) = arg maxθ

L(θ, θ(j)) . (3)

Watson and Engle (1983) and Shumway and Stoffer (1982) show how to derive the maximisation step (3)for models similar to the one given by (1)-(2). As a result the estimation problem is reduced to a sequenceof simple steps, each of which essentially involves a pass of the Kalman smoother and two multivariateregressions. Doz, Giannone, and Reichlin (2006) show that the EM algorithm is a valid approach for themaximum likelihood estimation of factor models for large cross-sections as it is robust, easy to implementand computationally inexpensive. Watson and Engle (1983) assume that all the observations in yt areavailable (ΩT = Y ). Shumway and Stoffer (1982) derive the modifications for the missing data case butonly with known Λ. We provide the EM steps for the general case with missing data.

In the main text, we set for simplicity p = 1 (A = A1), the case of p > 1 is discussed in the Appendix. Wefirst consider the case of serially uncorrelated εt:

εt ∼ i.i.d. N (0, R) , (4)

where R is a diagonal matrix. In that case θ = Λ, A, R, Q and the maximisation of (3) results in the10Recently, Jungbacker and Koopman (2008) have shown how to reduce the computational complexity related to estimation

and smoothing if the number of observables is much larger than the number of factors.11ΩT ⊆ Y because some observations in yt can be missing.

ExpectationMaximization (EM)

Algorithm

9ECB










yt = Λft + εt , (1)










NowcastingAccuracy

NowcastingAnticipation

11ECB


following expressions for Λ(j + 1) and A(j + 1):12

Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)

Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.

The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13

R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)

When yt does not contain missing data, we have that

Eθ(j) [yty′t|ΩT ] = yty

′t and Eθ(j) [ytf

′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)

Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)

[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT

], can be obtained through the Kalman smoother for the state space representation:

yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,

ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)

see Watson and Engle (1983).

However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as

vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)

Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes

R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)

12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).

13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.

A Dynamic factor Model(DFM)

11ECB



Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)



R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)




′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)


[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT






vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)


R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)



11ECB



Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)



R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)




′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)


[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT






vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)


R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)



11ECB



Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)



R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)




′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)


[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT






vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)


R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)



11ECB



Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)



R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)




′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)


[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT






vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)


R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)



11ECB



Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)



R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)




′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)


[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT






vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)


R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)



Source: Banbura M., and Modugno, M (2010)

Outcomes

News Contribuion


TURKEY: VARIABLES IN MONTHLY GDP DFM

Har

dD

ata

(M &

D)

Soft

(M)

Fin

(W)

Big

Dat

a (D

)Source: Own elaboration

Investment Big Data in a Nowcasting Model (DFM): Variables & Releases

Jan Feb Mar Apr May Jun July Aug SepGFCF

Capital Goods Production

Non-metal M. Production

Capital Goods Imports

Commercial Vehicle Sales 1 2Real Sector Confidence

Corporate Lonas

Big Data Investment

2020

TURKEY: VARIABLES IN MONTHLY INVESTMENT DFM

Jan Feb Mar Apr May Jun July Aug SeptGDP

Industrial Production

Non-metal Mineral Production

Auto Sales

Number of Employed

Number of Unemployed

Auto Imports

Auto Exports

Electricity Production

Manufacturing PMI

Real Sector Confidence

Total Loans growth 13-week

Bog Data Total Consumption Indicator

Bog Data Total Investment Indicator

2020

Big Data Investment

Big Data Consumption


TURKEY: OUT-OF-SAMPLE ERROR GDP MODEL

Source: Own Elaboration

-4%

-2%

0%

2%

4%

6%

8%

10%

12%

14%

Mar

-16

Jun-

16

Sep

-16

Dec

-16

Mar

-17

Jun-

17

Sep

-17

Dec

-17

Mar

-18

Jun-

18

Sep

-18

Dec

-18

Mar

-19

Jun-

19

Sep

-19

Dec

-19

Mar

-20

TURKSTAT GDP

Standard Dynamic Factor Model (DFM)

DFM + Big Data Investment

-25%

-20%

-15%

-10%

-5%

0%

5%

10%

15%

20%

25%

Mar

-16

Jun-

16

Sep

-16

Dec

-16

Mar

-17

Jun-

17

Sep

-17

Dec

-17

Mar

-18

Jun-

18

Sep

-18

Dec

-18

Mar

-19

Jun-

19

Sep

-19

Dec

-19

Mar

-20

TURKSTAT Investment Expenditures

Standard Dynamic Factor Model (DFM)

DFM + BBVA Investment Index

TURKEY: OUT-OF-SAMPLE ERROR INVESTMENT MODEL

Investment Big Data in a Nowcasting Model (DFM): Forecasting Accuracy


TURKEY: NOWCASTING FINANCIAL CRISIS (SEPT 2018) & COVID CRISIS (MAR 2020)(quasi real time nowcasting with and without Big Data Indexes vs Benchmark)**

Financial Crisis (September 2018) COVID-19 (March 2020)

The model with Big Data investmentStart to signal the deceeleration moretan a motnh before

The model with Big Data Consumption Reactedinmediately signalling theDeep and the Sharp recovery

Investment Big Data in a Nowcasting Model (DFM): Anticipation


0.0

1.0

2.0

3.0

4.0

5.0

Indu

stria

l Pro

duct

ion

(M)

Big

Data

Inve

stnm

ent (

D)

Une

mpl

oym

ent /

d)

Car

Impo

rts (M

)

Car

Sal

es (M

)

PMI M

anuf

ac (M

)

Eect

ricity

Pro

duct

ion

(D)

Cem

ent P

rod

(M)

Cre

dit G

rowt

h (W

)

Empl

oym

ent (

M)

Big

Data

Con

sum

ptio

n (D

)

Busi

ness

Con

fiden

ce (M

)

Car

Exp

orts

(M)

M1/2 M2/3 M3/4 M4/5 Average

DFM News Contributions(Unbalanced Sample)

Source: Own Estimations Source: Own Elaboration

DFM News Contributions correcting Prevalece Bias(Maintaining individually all the variables for the sample of Big data information)

Investment Big Data in a Nowcasting Model (DFM): News & Prevalence Bias


-50%

-40%

-30%

-20%

-10%

0%

10%

20%

30%

40%

50%

2-M

ar12

-Mar

22-M

ar1-

Apr

11-A

pr21

-Apr

1-M

ay11

-May

21-M

ay31

-May

10-J

un20

-Jun

30-J

un10

-Jul

20-J

ul30

-Jul

9-A

ug19

-Aug

29-A

ug8-

Sep

18-S

ep28

-Sep

8-O

ct18

-Oct

28-O

ct7-

Nov

Machinery Construction Total Investment

-60%

-40%

-20%

0%

20%

40%

60%

80%

2-M

ar12

-Mar

22-M

ar1-

Apr

11-A

pr21

-Apr

1-M

ay11

-May

21-M

ay31

-May

10-J

un20

-Jun

30-J

un10

-Jul

20-J

ul30

-Jul

9-A

ug19

-Aug

29-A

ug8-

Sep

18-S

ep28

-Sep

8-O

ct18

-Oct

28-O

ct7-

Nov

Goods Services Total Consumption

Results: Investment in Real Time will help policy makers to react faster


TURKEY: BIG DATA CONSUMPTION & INVESTMENT(7-day cum. YoY nominal in Cons., 28-day cum. YoY nominal in Invest.)

BIG DATA CONSUMPTION BIG DATA INVESTMENT

2Q-20 3Q-20 4Q-20 2Q-20 3Q-20 4Q-20


TURKEY:GB-BBVA BIG DATA INVESTMENT HEAT MAP(3mm avg YoY nominal)


Results: The "High Definition" dimension will allow policymakers todifferentiate shocks and design more targeted policies

TURKEY:GB-BBVA BiG DATA INVESTMENTS GEO-MAPS RECOVERY AFTER THE COVID(Change in YoY investment before, during and after the lockdowns by Covid)

20Activities & 81Province in Real Time & High Definition


Conclusions & further Research

- We present a novel approach to estimate Investment in “Real Time & HighDefinition” from the analysis of a Bank´s Big Data (BBVA)

- The Investment index improves the properties of a Standard Nowcasting Modelin terms of forecasting accuracy, anticipation and news.

- The “High Definition” dimension can help to design targeted policies

- We cross validate the results through the high correlations with national accountsand high frequency proxies for Turkey and other countries.

- The characteristics of Big data Information (detailed but short history) advocates forthe use of non linear and/or regularization techniques (Further Research)


References

• Chetty, R., Friedman, J. N., Hendren, N., & Stepner, M. (2020). How Did COVID-19 and Stabilization Policies Affect Spending and Employment? A New Real-Time Economic Tracker Based on Private Sector Data. Working Paper.

• Chronopoulos, D. K., Lukas, M., & Wilson, J. O. S. (2020). Consumer SpendingResponses to the COVID-19 Pandemic: An Assessment of Great Britain. WorkingPaper 20-012.

• Carvalho, V,Hansen,S,Ortiz,A, Garcia,JR, Rodrigo,T, Rodriguez Mora,S P,Ruiz(2020). Tracking the Covid-19 crisis with high resolution transaction data”. CEPR DPNo 14642

• Cox, N., Ganong, P., Noel, P., Vavra, J., Wong, A., Farrell, D., & Greig, F. (2020).Initial Impacts of the Pandemic on Consumer Behavior: Evidence from Linked Income,Spending, and Saving Data.

• Eurostat (2013) Handbook on quarterly national accounts

• Gezici F & Burçin Yazgı Walsh & Sinem Metin Kacar, 2017. "Regional and structuralanalysis of the manufacturing industry in Turkey," The Annals of Regional Science,Springer;Western Regional Science Association, vol. 59(1), pages 209-230, July.

• Modugno, M., B. Soybilgen, and E. Yazgan (2016). Nowcasting Turkish GDP andNews Decomposition. International Journal of Forecasting 32 (4), 1369–1384.

• Surico, P., Kanzig, D., & Hacioglu, S. (2020). Consumption in the Time of COVID-19:Evidence from UK Transaction Data. CEPR Discussion Paper 14733.

• Watanabe, T., Omori, Y., et al. (2020). Online consumption during the Covid-19 crisis:Evidence from japan. Tech. rep., University of Tokyo, Graduate School of Economics.

• Akcigit,U & Yusuf Emre Akgunduz & Seyit Mumin Cilasun & Elif Ozcan Tok & Fatih Yilmaz, 2019. "Facts on Business Dynamism in Turkey," Working Papers 1930, Research and Monetary Policy Department, Central Bank of the Republic of Turkey.

• Andersen, A., Hansen, E. T., Johannesen, N., & Sheridan, A. (2020a). Consumer Responses tothe COVID-19 Crisis: Evidence from Bank Account Transaction Data. CEPR Discussion Paper14809.

• Andersen, A. L., Hansen, E. T., Johannesen, N., & Sheridan, A. (2020b). Pandemic, Shutdownand Consumer Spending: Lessons from Scandinavian Policy Responses to COVID-19.arXiv:2005.04630 [econ, q-fin].

• Baker, S. R., Farrokhnia, R., Meyer, S., Pagel, M., & Yannelis, C. (2020a). How DoesHousehold Spending Respond to an Epidemic? Consumption During the 2020 COVID-19Pandemic. Working Paper 26949, National Bureau of Economic Research.

• Baker, S. R., Farrokhnia, R. A., Meyer, S., Pagel, M., & Yannelis, C. (2020b). Income, Liquidity,and the Consumption Response to the 2020 Economic Stimulus Payments. Working Paper27097, National Bureau of Economic Research.

• Banbura,M and Modugno (2014), maximum likelihood estimation of factor models on datasetswith arbitrary pattern of missing data, Journal of Applied Econometrics, 29, (1), 133-160

• Bodas, D., López, J. R. G., López, T. R., de Aguirre, P. R., Ulloa, C. A., Arias, J. M., de DiosRomero Palop, J., Lapaz, H. V., & Pacce, M. J. (2019). Measuring retail trade using cardtransactional data. Working Papers 1921, Banco de España;Working Papers Homepage.

• Bounie, D., Camara, Y., & Galbraith, J. W. (2020). Consumers’ Mobility, Expenditure andOnline-Offline Substitution Response to COVID-19: Evidence from French Transaction Data.SSRN Scholarly Paper ID 3588373, Social Science Research Network, Rochester, NY.

https://ideas.repec.org/a/spr/anresc/v59y2017i1d10.1007_s00168-017-0827-4.html

https://ideas.repec.org/s/spr/anresc.html

https://ideas.repec.org/p/tcb/wpaper/1930.html

https://ideas.repec.org/s/tcb/wpaper.html

https://econpapers.repec.org/article/wlyjapmet/v_3a29_3ay_3a2014_3ai_3a1_3ap_3a133-160.htm

Investment in “Real Time” and“High Definition”: A Big Data Approach

November 2020

2020 Conference on Nontraditional Data & Statistical LearningBanca d’ Italia and Federal Reserve Board

Ali.B. Barlas, Seda Güler, Alvaro Ortiz & Tomasa Rodrigo

Documents

Investment in “Real Time” and “High Definition”: A Big