27
Investment in “Real Time” and “High Definition”: A Big Data Approach November 2020 Modelling with Big Data & Machine Learning: Measuring Economic Instability The Bank of England, The Federal Reserve Board and King's College London Ali.B. Barlas, Seda Güler, Alvaro Ortiz & Tomasa Rodrigo

Investment in real time and high definition: A big data

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Investment in real time and high definition: A big data

Investment in “Real Time” and“High Definition”: A Big Data Approach

November 2020

Modelling with Big Data & Machine Learning: Measuring Economic InstabilityThe Bank of England, The Federal Reserve Board and King's College London

Ali.B. Barlas, Seda Güler, Alvaro Ortiz & Tomasa Rodrigo

Page 2: Investment in real time and high definition: A big data

2Investment in ”Real Time” & ”High Definition”: A Big Data approach

Why this paper is relevant

- The first “Real Time” Investment indicator from a Bank’s Big Data

- Improves the properties of a Standard Nowcasting Model

- It’s not only “Real Time” but also “High Definition” (Sector & Geography)

- Validation with national accounts and proxies is successful

- Validation results are promising for other Countries

Page 3: Investment in real time and high definition: A big data

3Investment in ”Real Time” & ”High Definition”: A Big Data approach

The Covid-19 crisis has reinforced the role of Big Data tools forEconomic Analysis

The high uncertainty triggered by the Covid-19 crisis has stressed the need to monitor the evolution ofthe economy in “real time”. These efforts have been materialized in several ways:

- Focusing on timely, alternative indicators: soft data surveys (particularly the PurchasingManager Indexes, PMIs) and other high frequency indicators like electricity production or chain

store sales released on daily or weekly basis.

- Developing higher frequency models: Some CBs have relied on this High Frequency indicatorsto develop weekly activity tracker models such as the FED´WEI (Lewis, 2020) and the

Bundesbank WAI (Eraslan, S. and T. Götz, 2020).

- Developing New Big Data Indicators*: Focusing on daily aggregate information of bankingtransactions to track consumption, employment , turnover, mobility… … link.

* Some of the Recent literature on Big Data analysis Andersen, Hansen, Johannesen, & Sheridan (2020a), Andersen, Hansen, Johannesen, & Sheridan (2020b), Alexander & Karger (2020), Baker, Farrokhnia, Meyer, Pagel, & Yannelis (2020a), Baker, Farrokhnia, Meyer, Pagel, & Yannelis (2020b), Bounie, Camara, & Galbraith (2020), Chetty, Friedman, Hendren, & Stepner (2020), Chronopoulos, Lukas, & Wilson (2020), Cox, Ganong, Noel, Vavra, Wong, Farrell, & Greig (2020), Surico, Kanzig, & Hacioglu (2020).

Page 4: Investment in real time and high definition: A big data

4Investment in ”Real Time” & ”High Definition”: A Big Data approach

Efforts on Big Data Information have been accelerated due to theimportance to know an updated state of the economy…

Page 5: Investment in real time and high definition: A big data

5Investment in ”Real Time” & ”High Definition”: A Big Data approach

We continue to enhance our “Real Time & High Definition” extending our Big Data indexes to Investment (GFCF)

The investment spending is done mostly by companies and, to a lesser extent, by individuals

We track investment payments through

Firms are classified by their NACE codes to identify their business activity (in line with the European statistical classification of sectors)

firm to firm transactionsindividual to firm transactions

Construction Investment

We approximate investment demand in one type of asset taking into account the aggregate flows or transactions done from any firm or individual to the sector which produce the fixed assets

Machinery Investment*Total Investment*Machinery & Equipment, Media & ICT, Agriculture & Animals, Forestry, Durable Goods, Retail Trade, Textile & Clothing, Transportation and Shipping.

+

Page 6: Investment in real time and high definition: A big data

6Investment in ”Real Time” & ”High Definition”: A Big Data approach

The data: The Big Data Investment Data Representativeness

BBVA Big Data

Transactions(000s)

Firms(000s)

INVESTMEMT DATA 2019: BBVA vs COMPANY ACCOUNTS (CENTRAL BANK & TURKSTAT)

Source: cbrt & turkstat and own calculations

Turkey CBRT

179.7

Total ConstructionMachinery

23.2 156.5

24.6 2.3 22.3

308Amount(US$ bn)

Total ConstructionMachinery

440 257 183

730.2 614.4

280 28

115.8

Firms(% CBRT) 24.6% 25.5% 19.8%

Page 7: Investment in real time and high definition: A big data

7Investment in ”Real Time” & ”High Definition”: A Big Data approach

A caveat: we develop Real Time & High Definition indicators … wedo not fully replicate the national accounts

Gross Fixed Capital Investments

(ESA, Yearly)

Gross Fixed Capital Investments(QNA, Quarterly)

InvestmentProxies

(Proxies, Monthly)

Big Data Index

(Proxies, Daily)

Balanced(P=E=I, One GDP figureReal Vs N)

Net Disposal ofFixed assets

SampleRepresentativity

Estimations Supply, Demand, Suveys…

Seasonality

Frequent Revisions(short samples)

Balanced(P=E=I, One GDP figureReal Vs N, SA and NSA)

National Accounts Indicators

Stability(short samples)

InvestmentProxies

(Proxies, Monthly)

Gross FixedCapital

Investments(QNA, Quarterly)

CrossValidation

Page 8: Investment in real time and high definition: A big data

Big Data Investment in Real time and High Definition: Thecase of Turkey

Page 9: Investment in real time and high definition: A big data

9Investment in ”Real Time” & ”High Definition”: A Big Data approach

Big Data Investment in detail: The case of Turkey

- The volatility of investment in Turkey and the delay in the publication of official data

(standard in EMs) make it an ideal case for analysis

- We compute investment indicators in real time (daily) for 22 sectors and 81 provinces

- Validation results are positive in terms of the main Aggregates (Machinery & Transport

and Construction) as well as the aggregate investment

- A deeper analysis show how the Big Data investment can improve the properties of an

standard Nowcasting Model in terms of Accuracy, Anticipation and News contribution

- The high definition analysis show the characteristics of the latest investment shocks and

the Geography response of Investment to the Covid-19 crisis

Page 10: Investment in real time and high definition: A big data

10Investment in ”Real Time” & ”High Definition”: A Big Data approach

Validation I: The Big Data investment index shows a high correlationan co-movement with the oficial data TURKEY: GBBBVA BIG DATA INVESTMENT INDICES (28-day cum. YoY real )

Correlation coefficient: 0.88

Total Investment

Correlation coefficient: 0.84 Correlation coefficient: 0.77

Machinery & Equipment Construction

Source: Own elaboration

-25%

-15%

-5%

5%

15%

25%

35%

Jun-

15Se

p-15

Dec

-15

Mar

-16

Jun-

16Se

p-16

Dec

-16

Mar

-17

Jun-

17Se

p-17

Dec

-17

Mar

-18

Jun-

18Se

p-18

Dec

-18

Mar

-19

Jun-

19Se

p-19

Dec

-19

Mar

-20

Jun-

20Se

p-20

Official GCFC Machinery InvestmentBig Data Machinery Investment

-45%

-35%

-25%

-15%

-5%

5%

15%

25%

Jun-

15Se

p-15

Dec

-15

Mar

-16

Jun-

16Se

p-16

Dec

-16

Mar

-17

Jun-

17Se

p-17

Dec

-17

Mar

-18

Jun-

18Se

p-18

Dec

-18

Mar

-19

Jun-

19Se

p-19

Dec

-19

Mar

-20

Jun-

20Se

p-20

Official GCFC Construction InvestmentBig Data Construction Index

-30%-25%-20%-15%-10%

-5%0%5%

10%15%20%

Jun-

15Se

p-15

Dec

-15

Mar

-16

Jun-

16Se

p-16

Dec

-16

Mar

-17

Jun-

17Se

p-17

Dec

-17

Mar

-18

Jun-

18Se

p-18

Dec

-18

Mar

-19

Jun-

19Se

p-19

Dec

-19

Mar

-20

Jun-

20Se

p-20

Official GCFC InvestmentBig Data Total Investment

Page 11: Investment in real time and high definition: A big data

11Investment in ”Real Time” & ”High Definition”: A Big Data approach

Validation II: The sincrony of Big Data Investment with the Investment

Cycle is validated by high correlation coefficients with HF proxies

BBVA BIG DATA INVESTMENT & HIGH FREQUENCY PROXIES (28-day cum. YoY real )

Maquinery & Equipment ConstructionCorrelation coefficient: 0.75

Correlation coefficient: 0.79

Correlation coefficient: 0.85

Correlation coefficient: 0.72

Correlation coefficient: 0.90

Source: Own elaboration and Turkstat

-40%

-30%

-20%

-10%

0%

10%

20%

30%

40%

50%

-50%

-40%

-30%

-20%

-10%

0%

10%

20%

30%

Mar

-17

Jun-

17

Sep-

17

Dec

-17

Mar

-18

Jun-

18

Sep-

18

Dec

-18

Mar

-19

Jun-

19

Sep-

19

Dec

-19

Mar

-20

Jun-

20

Sep-

20

Dec

-20

Garanti BBVA Big Data Construction Index 3mEmployment in ConstructionNon-Metalic Mineral (Cement)Turnover Building - rhs

-15-13-11-9-7-5-3-1135

-25%-20%-15%-10%

-5%0%5%

10%15%20%25%

Mar

-17

Jun-

17

Sep-

17

Dec

-17

Mar

-18

Jun-

18

Sep-

18

Dec

-18

Mar

-19

Jun-

19

Sep-

19

Dec

-19

Mar

-20

Jun-

20

Sep-

20

Dec

-20

Garanti BBVA Big Data Total M&T InvestmentIndustrial Production MachineryCapacity Utilization Rate Investment Goods - yoy, rhs

Page 12: Investment in real time and high definition: A big data

12Investment in ”Real Time” & ”High Definition”: A Big Data approach

Big Data & Nowcasting Model (DFM): The framework

9ECB

Working Paper Series No 1189May 2010

the forecast accuracy of these specifications with that of univariate benchmarks as well as of the model ofBanbura and Runstler (2010), who adopt the methodology of Giannone, Reichlin, and Small (2008) to thecase of euro area.

Giannone, Reichlin, and Small (2008) have proposed a factor model framework, which allows to deal with“ragged edge” and exploit information from large data sets in a timely manner. They have applied it tonowcasting of US GDP from a large number of monthly indicators. While Giannone, Reichlin, and Small(2008) can handle the “ragged edge” problem, it is not straightforward to apply their methodology tomixed frequency panels with series of different lengths or, in general, to any pattern of missing data.8 Inaddition, as the estimation is based on principal components, it could be inefficient for small samples.

Other papers related to ours include Camacho and Perez-Quiros (2008) who obtain real-time estimates ofthe euro area GDP from monthly indicators from a small scale model applying the mixed frequency factormodel approach of Mariano and Murasawa (2003). Schumacher and Breitung (2008) forecast German GDPfrom large number of monthly indicators using the EM approach proposed by Stock and Watson (2002b).

and shows how to incorporate relevant accounting and temporary constraints. Angelini, Henry, andMarcellino (2006) propose methodology for backdating and interpolation based on large cross-sections. Incontrast to theirs, our method exploits the dynamics of the data and is based on maximum likelihoodwhich allows for imposing restrictions and is more efficient for smaller cross-sections.

The paper is organized as follows. Section 2 presents the model, discusses the estimation and explains howthe news content can be extracted. Section 3 provides the results of the Monte Carlo experiment. Section4 describes the empirical application. Section 5 concludes. The technical details and data description areprovided in the Appendix.

2 Econometric framework

Let yt = [y1,t, y2,t, . . . , yn,t]′ , t = 1, . . . , T denote a stationary n-dimensional vector process standardised

to mean 0 and unit variance. We assume that yt admits the following factor model representation:

yt = Λft + εt , (1)

where ft is a r× 19 vector of (unobserved) common factors and εt = [ε1,t, ε2,t, . . . , εn,t]′ is the idiosyncratic

component, uncorrelated with ft at all leads and lags. The n×r matrix Λ contains factor loadings. χt = Λft

is referred to as the common component. It is assumed that εt is normally distributed and cross-sectionallyuncorrelated, i.e. yt follows an exact factor model. We also shortly discuss validity of the approach inthe case of an approximate factor model, see below. What concerns the dynamics of the idiosyncraticcomponent we consider two cases: εt is serially uncorrelated or it follows an AR(1) process.

Further, it is assumed that the common factors ft follow a stationary VAR process of order p:

ft = A1ft−1 + A2ft−2 + · · · + Apft−p + ut , ut ∼ i.i.d. N (0, Q) , (2)

where A1, . . . , Ap are r×r matrices of autoregressive coefficients. We collect the latter into A = [A1, . . . , Ap].8Their estimation approach consists of two steps. First, the parameters of the state space representation of the factor

model are obtained using a principal components based procedure applied to a truncated data set (without missing data).Second, Kalman filter is applied on the full data set in order to obtain factor estimates and forecasts using all availableinformation.

9For identification it is required that 2r + 1 ≤ n, see e.g. Geweke and Singleton (1980).

Proietti (2008) estimates a factor model for interpolation of GDP and itsmain components

9

ECBWorking Paper Series No 1189

May 2010

the forecast accuracy of these specifications with that of univariate benchmarks as well as of the model ofBanbura and Runstler (2010), who adopt the methodology of Giannone, Reichlin, and Small (2008) to thecase of euro area.

Giannone, Reichlin, and Small (2008) have proposed a factor model framework, which allows to deal with“ragged edge” and exploit information from large data sets in a timely manner. They have applied it tonowcasting of US GDP from a large number of monthly indicators. While Giannone, Reichlin, and Small(2008) can handle the “ragged edge” problem, it is not straightforward to apply their methodology tomixed frequency panels with series of different lengths or, in general, to any pattern of missing data.8 Inaddition, as the estimation is based on principal components, it could be inefficient for small samples.

Other papers related to ours include Camacho and Perez-Quiros (2008) who obtain real-time estimates ofthe euro area GDP from monthly indicators from a small scale model applying the mixed frequency factormodel approach of Mariano and Murasawa (2003). Schumacher and Breitung (2008) forecast German GDPfrom large number of monthly indicators using the EM approach proposed by Stock and Watson (2002b).

and shows how to incorporate relevant accounting and temporary constraints. Angelini, Henry, andMarcellino (2006) propose methodology for backdating and interpolation based on large cross-sections. Incontrast to theirs, our method exploits the dynamics of the data and is based on maximum likelihoodwhich allows for imposing restrictions and is more efficient for smaller cross-sections.

The paper is organized as follows. Section 2 presents the model, discusses the estimation and explains howthe news content can be extracted. Section 3 provides the results of the Monte Carlo experiment. Section4 describes the empirical application. Section 5 concludes. The technical details and data description areprovided in the Appendix.

2 Econometric framework

Let yt = [y1,t, y2,t, . . . , yn,t]′ , t = 1, . . . , T denote a stationary n-dimensional vector process standardised

to mean 0 and unit variance. We assume that yt admits the following factor model representation:

yt = Λft + εt , (1)

where ft is a r× 19 vector of (unobserved) common factors and εt = [ε1,t, ε2,t, . . . , εn,t]′ is the idiosyncratic

component, uncorrelated with ft at all leads and lags. The n×r matrix Λ contains factor loadings. χt = Λft

is referred to as the common component. It is assumed that εt is normally distributed and cross-sectionallyuncorrelated, i.e. yt follows an exact factor model. We also shortly discuss validity of the approach inthe case of an approximate factor model, see below. What concerns the dynamics of the idiosyncraticcomponent we consider two cases: εt is serially uncorrelated or it follows an AR(1) process.

Further, it is assumed that the common factors ft follow a stationary VAR process of order p:

ft = A1ft−1 + A2ft−2 + · · · + Apft−p + ut , ut ∼ i.i.d. N (0, Q) , (2)

where A1, . . . , Ap are r×r matrices of autoregressive coefficients. We collect the latter into A = [A1, . . . , Ap].8Their estimation approach consists of two steps. First, the parameters of the state space representation of the factor

model are obtained using a principal components based procedure applied to a truncated data set (without missing data).Second, Kalman filter is applied on the full data set in order to obtain factor estimates and forecasts using all availableinformation.

9For identification it is required that 2r + 1 ≤ n, see e.g. Geweke and Singleton (1980).

Proietti (2008) estimates a factor model for interpolation of GDP and itsmain components

10

ECBWorking Paper Series No 1189May 2010

2.1 Estimation

As ft are unobserved, the maximum likelihood estimators of the parameters of model (1)-(2), which wecollect in θ, are in general not available in closed form. On the other hand, a direct numerical maximisationof the likelihood is computationally demanding, in particular for large n due to the large number ofparameters.10

In this paper we adopt an approach based on the Expectation-Maximisation (EM) algorithm, which wasproposed by Dempster, Laird, and Rubin (1977) as a general solution to problems for which incomplete orlatent data yield the likelihood intractable or difficult to deal with. The essential idea of the algorithm isto write the likelihood as if the data were complete and to iterate between two steps: in the Expectationstep we “fill in” the missing data in the likelihood, while in the Maximisation step we re-optimise thisexpectation. Under some regularity conditions, the EM algorithm converges towards a local maximum ofthe likelihood (or a point in its ridge, see also below).

To derive the EM steps for the model described above, let us denote the joint log-likelihood of yt andft, t = 1, . . . , T by l(Y, F ; θ), where Y = [y1, . . . , yT ] and F = [f1, . . . , fT ]. Given the available dataΩT ⊆ Y ,11 EM algorithm proceeds in a sequence of two alternating steps:

1. E-step - the expectation of the log-likelihood conditional on the data is calculated using the estimatesfrom the previous iteration, θ(j):

L(θ, θ(j)) = Eθ(j)

[l(Y, F ; θ)|ΩT

];

2. M-step - the parameters are re-estimated through the maximisation of the expected log-likelihoodwith respect to θ:

θ(j + 1) = arg maxθ

L(θ, θ(j)) . (3)

Watson and Engle (1983) and Shumway and Stoffer (1982) show how to derive the maximisation step (3)for models similar to the one given by (1)-(2). As a result the estimation problem is reduced to a sequenceof simple steps, each of which essentially involves a pass of the Kalman smoother and two multivariateregressions. Doz, Giannone, and Reichlin (2006) show that the EM algorithm is a valid approach for themaximum likelihood estimation of factor models for large cross-sections as it is robust, easy to implementand computationally inexpensive. Watson and Engle (1983) assume that all the observations in yt areavailable (ΩT = Y ). Shumway and Stoffer (1982) derive the modifications for the missing data case butonly with known Λ. We provide the EM steps for the general case with missing data.

In the main text, we set for simplicity p = 1 (A = A1), the case of p > 1 is discussed in the Appendix. Wefirst consider the case of serially uncorrelated εt:

εt ∼ i.i.d. N (0, R) , (4)

where R is a diagonal matrix. In that case θ = Λ, A,R,Q and the maximisation of (3) results in the10Recently, Jungbacker and Koopman (2008) have shown how to reduce the computational complexity related to estimation

and smoothing if the number of observables is much larger than the number of factors.11ΩT ⊆ Y because some observations in yt can be missing.

10ECBWorking Paper Series No 1189May 2010

2.1 Estimation

As ft are unobserved, the maximum likelihood estimators of the parameters of model (1)-(2), which wecollect in θ, are in general not available in closed form. On the other hand, a direct numerical maximisationof the likelihood is computationally demanding, in particular for large n due to the large number ofparameters.10

In this paper we adopt an approach based on the Expectation-Maximisation (EM) algorithm, which wasproposed by Dempster, Laird, and Rubin (1977) as a general solution to problems for which incomplete orlatent data yield the likelihood intractable or difficult to deal with. The essential idea of the algorithm isto write the likelihood as if the data were complete and to iterate between two steps: in the Expectationstep we “fill in” the missing data in the likelihood, while in the Maximisation step we re-optimise thisexpectation. Under some regularity conditions, the EM algorithm converges towards a local maximum ofthe likelihood (or a point in its ridge, see also below).

To derive the EM steps for the model described above, let us denote the joint log-likelihood of yt andft, t = 1, . . . , T by l(Y, F ; θ), where Y = [y1, . . . , yT ] and F = [f1, . . . , fT ]. Given the available dataΩT ⊆ Y ,11 EM algorithm proceeds in a sequence of two alternating steps:

1. E-step - the expectation of the log-likelihood conditional on the data is calculated using the estimatesfrom the previous iteration, θ(j):

L(θ, θ(j)) = Eθ(j)

[l(Y, F ; θ)|ΩT

];

2. M-step - the parameters are re-estimated through the maximisation of the expected log-likelihoodwith respect to θ:

θ(j + 1) = arg maxθ

L(θ, θ(j)) . (3)

Watson and Engle (1983) and Shumway and Stoffer (1982) show how to derive the maximisation step (3)for models similar to the one given by (1)-(2). As a result the estimation problem is reduced to a sequenceof simple steps, each of which essentially involves a pass of the Kalman smoother and two multivariateregressions. Doz, Giannone, and Reichlin (2006) show that the EM algorithm is a valid approach for themaximum likelihood estimation of factor models for large cross-sections as it is robust, easy to implementand computationally inexpensive. Watson and Engle (1983) assume that all the observations in yt areavailable (ΩT = Y ). Shumway and Stoffer (1982) derive the modifications for the missing data case butonly with known Λ. We provide the EM steps for the general case with missing data.

In the main text, we set for simplicity p = 1 (A = A1), the case of p > 1 is discussed in the Appendix. Wefirst consider the case of serially uncorrelated εt:

εt ∼ i.i.d. N (0, R) , (4)

where R is a diagonal matrix. In that case θ = Λ, A, R, Q and the maximisation of (3) results in the10Recently, Jungbacker and Koopman (2008) have shown how to reduce the computational complexity related to estimation

and smoothing if the number of observables is much larger than the number of factors.11ΩT ⊆ Y because some observations in yt can be missing.

ExpectationMaximization (EM)

Algorithm

9ECB

Working Paper Series No 1189May 2010

the forecast accuracy of these specifications with that of univariate benchmarks as well as of the model ofBanbura and Runstler (2010), who adopt the methodology of Giannone, Reichlin, and Small (2008) to thecase of euro area.

Giannone, Reichlin, and Small (2008) have proposed a factor model framework, which allows to deal with“ragged edge” and exploit information from large data sets in a timely manner. They have applied it tonowcasting of US GDP from a large number of monthly indicators. While Giannone, Reichlin, and Small(2008) can handle the “ragged edge” problem, it is not straightforward to apply their methodology tomixed frequency panels with series of different lengths or, in general, to any pattern of missing data.8 Inaddition, as the estimation is based on principal components, it could be inefficient for small samples.

Other papers related to ours include Camacho and Perez-Quiros (2008) who obtain real-time estimates ofthe euro area GDP from monthly indicators from a small scale model applying the mixed frequency factormodel approach of Mariano and Murasawa (2003). Schumacher and Breitung (2008) forecast German GDPfrom large number of monthly indicators using the EM approach proposed by Stock and Watson (2002b).

and shows how to incorporate relevant accounting and temporary constraints. Angelini, Henry, andMarcellino (2006) propose methodology for backdating and interpolation based on large cross-sections. Incontrast to theirs, our method exploits the dynamics of the data and is based on maximum likelihoodwhich allows for imposing restrictions and is more efficient for smaller cross-sections.

The paper is organized as follows. Section 2 presents the model, discusses the estimation and explains howthe news content can be extracted. Section 3 provides the results of the Monte Carlo experiment. Section4 describes the empirical application. Section 5 concludes. The technical details and data description areprovided in the Appendix.

2 Econometric framework

Let yt = [y1,t, y2,t, . . . , yn,t]′ , t = 1, . . . , T denote a stationary n-dimensional vector process standardised

to mean 0 and unit variance. We assume that yt admits the following factor model representation:

yt = Λft + εt , (1)

where ft is a r× 19 vector of (unobserved) common factors and εt = [ε1,t, ε2,t, . . . , εn,t]′ is the idiosyncratic

component, uncorrelated with ft at all leads and lags. The n×r matrix Λ contains factor loadings. χt = Λft

is referred to as the common component. It is assumed that εt is normally distributed and cross-sectionallyuncorrelated, i.e. yt follows an exact factor model. We also shortly discuss validity of the approach inthe case of an approximate factor model, see below. What concerns the dynamics of the idiosyncraticcomponent we consider two cases: εt is serially uncorrelated or it follows an AR(1) process.

Further, it is assumed that the common factors ft follow a stationary VAR process of order p:

ft = A1ft−1 + A2ft−2 + · · · + Apft−p + ut , ut ∼ i.i.d. N (0, Q) , (2)

where A1, . . . , Ap are r×r matrices of autoregressive coefficients. We collect the latter into A = [A1, . . . , Ap].8Their estimation approach consists of two steps. First, the parameters of the state space representation of the factor

model are obtained using a principal components based procedure applied to a truncated data set (without missing data).Second, Kalman filter is applied on the full data set in order to obtain factor estimates and forecasts using all availableinformation.

9For identification it is required that 2r + 1 ≤ n, see e.g. Geweke and Singleton (1980).

Proietti (2008) estimates a factor model for interpolation of GDP and itsmain components

NowcastingAccuracy

NowcastingAnticipation

11ECB

Working Paper Series No 1189May 2010

following expressions for Λ(j + 1) and A(j + 1):12

Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)

Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.

The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13

R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)

When yt does not contain missing data, we have that

Eθ(j) [yty′t|ΩT ] = yty

′t and Eθ(j) [ytf

′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)

Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)

[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT

], can be obtained through the Kalman smoother for the state space representation:

yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,

ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)

see Watson and Engle (1983).

However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as

vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)

Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes

R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)

12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).

13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.

A Dynamic factor Model(DFM)

11ECB

Working Paper Series No 1189May 2010

following expressions for Λ(j + 1) and A(j + 1):12

Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)

Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.

The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13

R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)

When yt does not contain missing data, we have that

Eθ(j) [yty′t|ΩT ] = yty

′t and Eθ(j) [ytf

′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)

Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)

[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT

], can be obtained through the Kalman smoother for the state space representation:

yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,

ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)

see Watson and Engle (1983).

However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as

vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)

Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes

R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)

12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).

13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.

11ECB

Working Paper Series No 1189May 2010

following expressions for Λ(j + 1) and A(j + 1):12

Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)

Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.

The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13

R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)

When yt does not contain missing data, we have that

Eθ(j) [yty′t|ΩT ] = yty

′t and Eθ(j) [ytf

′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)

Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)

[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT

], can be obtained through the Kalman smoother for the state space representation:

yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,

ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)

see Watson and Engle (1983).

However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as

vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)

Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes

R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)

12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).

13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.

11ECB

Working Paper Series No 1189May 2010

following expressions for Λ(j + 1) and A(j + 1):12

Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)

Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.

The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13

R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)

When yt does not contain missing data, we have that

Eθ(j) [yty′t|ΩT ] = yty

′t and Eθ(j) [ytf

′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)

Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)

[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT

], can be obtained through the Kalman smoother for the state space representation:

yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,

ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)

see Watson and Engle (1983).

However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as

vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)

Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes

R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)

12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).

13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.

11ECB

Working Paper Series No 1189May 2010

following expressions for Λ(j + 1) and A(j + 1):12

Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)

Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.

The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13

R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)

When yt does not contain missing data, we have that

Eθ(j) [yty′t|ΩT ] = yty

′t and Eθ(j) [ytf

′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)

Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)

[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT

], can be obtained through the Kalman smoother for the state space representation:

yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,

ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)

see Watson and Engle (1983).

However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as

vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)

Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes

R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)

12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).

13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.

11ECB

Working Paper Series No 1189May 2010

following expressions for Λ(j + 1) and A(j + 1):12

Λ(j + 1) =

(T∑

t=1

Eθ(j)

[ytf

′t |ΩT

])(

T∑

t=1

Eθ(j)

[ftf

′t |ΩT

])−1

, (5)

A(j + 1) =

(T∑

t=1

Eθ(j)

[ftf

′t−1|ΩT

])(

T∑

t=1

Eθ(j)

[ft−1f

′t−1|ΩT

])−1

. (6)

Note that these expressions resemble the ordinary least squares solution to the maximum likelihood estima-tion for (auto-) regressions with complete data with the difference that the sufficient statistics are replacedby their expectations.

The (j + 1)-iteration covariance matrices are computed as the expectations of sums of squared residualsconditional on the updated estimates of Λ and A:13

R(j + 1) = diag

(1T

T∑

t=1

Eθ(j)

[(yt − Λ(j + 1)ft

)(yt − Λ(j + 1)ft

)′|ΩT

])(7)

= diag

(1T

(T∑

t=1

Eθ(j)

[yty

′t|ΩT

]− Λ(j + 1)

T∑

t=1

Eθ(j)

[fty

′t|ΩT

]))

and

Q(j + 1) =1T

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]− A(j + 1)

T∑

t=1

Eθ(j)

[ft−1f

′t |ΩT

])

. (8)

When yt does not contain missing data, we have that

Eθ(j) [yty′t|ΩT ] = yty

′t and Eθ(j) [ytf

′t |ΩT ] = ytEθ(j) [f ′

t |ΩT ] . (9)

Finally, the conditional moments of the latent factors, Eθ(j) [ft|ΩT ], Eθ(j) [ftf ′t |ΩT ], Eθ(j)

[ft−1f ′

t−1|ΩT

]

and Eθ(j)

[ftf ′

t−1|ΩT

], can be obtained through the Kalman smoother for the state space representation:

yt = Λ(j)ft + εt , εt ∼ i.i.d. N (0, R(j)) ,

ft = A(j)ft−1 + ut , ut ∼ i.i.d. N (0, Q(j)) , (10)

see Watson and Engle (1983).

However, when yt contains missing values we can no longer use (9) when developing the expressions (5)and (7). Let Wt be a diagonal matrix of size n with ith diagonal element equal to 0 if yi,t is missing andequal to 1 otherwise. As shown in the Appendix, Λ(j + 1) can be obtained as

vec(Λ(j + 1)

)=

(T∑

t=1

Eθ(j)

[ftf

′t |ΩT

]⊗ Wt

)−1

vec

(T∑

t=1

WtytEθ(j)

[f ′

t |ΩT

])

. (11)

Intuitively, Wt works as a selection matrix, so that only the available data are used in the calculations.Analogously, the expression (7) becomes

R(j + 1) = diag

(1T

T∑

t=1

(Wtyty

′tW

′t − WtytEθ(j)

[f ′

t |ΩT

]Λ(j + 1)′Wt − WtΛ(j + 1)Eθ(j)

[ft|ΩT

]y′

tWt

+ WtΛ(j + 1)Eθ(j)

[ftf

′t |ΩT

]Λ(j + 1)′Wt + (I − Wt)R(j)(I − Wt)

)). (12)

12A sketch of how these are derived is provided in the Appendix, see also e.g. Watson and Engle (1983) and Shumway andStoffer (1982).

13Note that L(θ, θ(j)) does not have to be maximised simultaneously with respect to all the parameters. The procedureremains valid if M-step is performed sequentially, i.e. L(θ, θ(j)) is maximised over a subvector of θ with other parametersheld fixed at their current values, see e.g. McLachlan and Krishnan (1996), Ch. 5.

Source: Banbura M., and Modugno, M (2010)

Outcomes

News Contribuion

Page 13: Investment in real time and high definition: A big data

13Investment in ”Real Time” & ”High Definition”: A Big Data approach

Jan Feb Mar Apr May Jun July Aug SeptGDP

Industrial Production

Non-metal Mineral Production

Auto Sales

Number of Employed

Number of Unemployed

Auto Imports

Auto Exports

Electricity Production

Manufacturing PMI

Real Sector Confidence

Total Loans growth 13-week

Bog Data Total Consumption Indicator

Bog Data Total Investment Indicator

2020

TURKEY: VARIABLES IN MONTHLY GDP DFM

Har

dD

ata

(M &

D)

Soft

(M)

Fin

(W)

Big

Dat

a (D

)Source: Own elaboration

Big Data & Nowcasting Model (DFM): Variables & Releases

TURKEY: VARIABLES IN MONTHLY INVESTMENT DFM

Page 14: Investment in real time and high definition: A big data

14Investment in ”Real Time” & ”High Definition”: A Big Data approach

TURKEY: OUT-OF-SAMPLE ERROR GDP MODEL

Source: Own Elaboration

-4%

-2%

0%

2%

4%

6%

8%

10%

12%

14%

Mar

-16

Jun-

16

Sep

-16

Dec

-16

Mar

-17

Jun-

17

Sep

-17

Dec

-17

Mar

-18

Jun-

18

Sep

-18

Dec

-18

Mar

-19

Jun-

19

Sep

-19

Dec

-19

Mar

-20

TURKSTAT GDP

Standard Dynamic Factor Model (DFM)

DFM + Big Data Investment

-25%

-20%

-15%

-10%

-5%

0%

5%

10%

15%

20%

25%

Mar

-16

Jun-

16

Sep

-16

Dec

-16

Mar

-17

Jun-

17

Sep

-17

Dec

-17

Mar

-18

Jun-

18

Sep

-18

Dec

-18

Mar

-19

Jun-

19

Sep

-19

Dec

-19

Mar

-20

TURKSTAT Investment Expenditures

Standard Dynamic Factor Model (DFM)

DFM + BBVA Investment Index

TURKEY: OUT-OF-SAMPLE ERROR INVESTMENT MODEL

Big Data & Nowcasting Model (DFM): Out-of-Sample errors

Page 15: Investment in real time and high definition: A big data

15Investment in ”Real Time” & ”High Definition”: A Big Data approach

TURKEY: NOWCASTING FINANCIAL CRISIS (SEPT 2018) & COVID CRISIS (MAR 2020)(quasi real time nowcasting with and without Big Data Indexes vs Benchmark)**

Financial Crisis (September 2018) COVID-19 (March 2020)

The model with Big Data investmentStart to signal the deceeleration morethan a moth before

The model with Big Data Consumption Reactedinmediately signalling theDeep and the Sharp recovery

Big Data & Nowcasting Model (DFM): Anticipation

Source: Own Elaboration

Page 16: Investment in real time and high definition: A big data

16Investment in ”Real Time” & ”High Definition”: A Big Data approach

0.0

1.0

2.0

3.0

4.0

5.0

Indu

stria

l Pro

duct

ion

(M)

Big

Data

Inve

stnm

ent (

D)

Une

mpl

oym

ent /

d)

Car

Impo

rts (M

)

Car

Sal

es (M

)

PMI M

anuf

ac (M

)

Eect

ricity

Pro

duct

ion

(D)

Cem

ent P

rod

(M)

Cre

dit G

rowt

h (W

)

Empl

oym

ent (

M)

Big

Data

Con

sum

ptio

n (D

)

Busi

ness

Con

fiden

ce (M

)

Car

Exp

orts

(M)

M1/2 M2/3 M3/4 M4/5 Average

DFM News Contributions(Unbalanced Sample)

DFM News Contributions correcting Bias(Maintaining individually all the variables for the sample of Big data information)

Big Data & Nowcasting Model (DFM): News & Prevalence Bias

Source: Own Elaboration Source: Own Elaboration

Page 17: Investment in real time and high definition: A big data

17Investment in ”Real Time” & ”High Definition”: A Big Data approach

-60%

-40%

-20%

0%

20%

40%

60%

80%

8-M

ar

22-M

ar

5-Ap

r

19-A

pr

3-M

ay

17-M

ay

31-M

ay

14-J

un

28-J

un

12-J

ul

26-J

ul

9-Au

g

23-A

ug

6-Se

p

20-S

ep

4-O

ct

18-O

ct

Goods Services Total Consumption

One of the key results is that we can examine Investment in Real Time

TURKEY: BBVA BIG DATA CONSUMPTION & INVESTMENT(7-day cum. YoY nominal in Cons., 28-day cum. YoY nominal in Invest.)

BIG DATA CONSUMPTION BIG DATA INVESTMENT

-50%

-40%

-30%

-20%

-10%

0%

10%

20%

30%

40%

50%

22-F

eb3-

Mar

13-M

ar23

-Mar

2-Ap

r12

-Apr

22-A

pr2-

May

12-M

ay22

-May

1-Ju

n11

-Jun

21-J

un1-

Jul

11-J

ul21

-Jul

31-J

ul10

-Aug

20-A

ug30

-Aug

9-Se

p19

-Sep

29-S

ep9-

Oct

19-O

ct

Total Investment Machinery ConstructionSource: Own Elaboration

Page 18: Investment in real time and high definition: A big data

18Investment in ”Real Time” & ”High Definition”: A Big Data approach

TURKEY:GB-BBVA BIG DATA INVESTMENT HEAT MAP(3mm avg YoY nominal)

The "High Definition" sectorial information can help us to track the evolution of sectors in detail and differentiate different shocks…

Source: Own Elaboration

Page 19: Investment in real time and high definition: A big data

19Investment in ”Real Time” & ”High Definition”: A Big Data approach

TURKEY:GB-BBVA BiG DATA INVESTMENTS GEO-MAPS RECOVERY AFTER THE COVID-19(Change in YoY investment before, during and after the lockdowns by Covid)

… while geo localization of the big data investment will help us to analyze shocks at regiona levels to design targeted policies

Source: Own Elaboration

Page 20: Investment in real time and high definition: A big data

Investment in the rest of the countries: Validation

Page 21: Investment in real time and high definition: A big data

21Investment in ”Real Time” & ”High Definition”: A Big Data approach

Validation for Spain: Still working on raising Transaction Data

SPAIN: BBVA BIG DATA INVESTMENT INDEX AND GROSS FIXED CAPITAL INVESTMENT COMPONENTS(YoY Real deflacted by Producer Prices, 3 months moving average)

Correlation coefficient: 0.78 Correlation coefficient: 0.58 Correlation coefficient: 0.73

Source: Own Elaboration and INE

Page 22: Investment in real time and high definition: A big data

22Investment in ”Real Time” & ”High Definition”: A Big Data approach

MEXICO: BBVA BIG DATA INVESTMENT INDEX AND GROSS FIXED CAPITAL INVESTMENT COMPONENTS(YoY Real deflacted by Producer Prices, 3 months moving average)

Validation for Mexico: Positive results in Correlation & Comovement

Correlation coefficient: 0.86 Correlation coefficient: 0.78 Correlation coefficient: 0.88

Source: Own Elaboration and INEGI

Page 23: Investment in real time and high definition: A big data

23Investment in ”Real Time” & ”High Definition”: A Big Data approach

MEXICO: BBVA BIG DATA MACHINERY/CONSTRUCTION INVESTMENT INDEX AND PROXIES(YoY Real deflacted by Producer Prices, 3 months moving average)

Correlation coefficient: 0.78

Correlation coefficient: 0.62

Correlation coefficient: 0.88

Correlation coefficient: 0.78

Validation for Mexico: Positive results with HF proxies

Source: Own Elaboration and INEGI

Page 24: Investment in real time and high definition: A big data

24Investment in ”Real Time” & ”High Definition”: A Big Data approach

COLOMBIA:BBVA BIG DATA INVESTMENT INDEX AND GROSS FIXED CAPITAL INVESTMENT COMPONENTS(YoY Real deflacted by Producer Prices, 3 months moving average)

Correlation coefficient: 0.86 Correlation coefficient: 0.76 Correlation coefficient: 0.64

Validation for Colombia: Positive results in Correlation & Comovement

Source: Own Elaboration & DANE

Page 25: Investment in real time and high definition: A big data

ConclusionsFurther ResearchReferences

Page 26: Investment in real time and high definition: A big data

26Investment in ”Real Time” & ”High Definition”: A Big Data approach

Conclusions and Further Research

- Big Data will become an increasing tool for economic analysis as they provide”real time”and “high definition” advantages

- We provide a novel and validated “real time” and “high definition” assessment of theInvestment cycle for Turkey. Preliminary validation results show that this analysis can beextended to other countries

- Given the characteristic of Big Data information (HF, Granular but Short history) we need toexplore the integration of the Big Data information in new models. Particularly RegularizationTechniques to exploit high and rich dimension of the information (Large P) but in shortersamples (T) could enhance further the potential of Big Data for economic analysis

- The Big Data information can enhance the standard Nowcasting Models providing analystand policymaker with an effective tool to react faster to shocks and design targeted policies

Page 27: Investment in real time and high definition: A big data

27Investment in ”Real Time” & ”High Definition”: A Big Data approach

References

• Chetty, R., Friedman, J. N., Hendren, N., & Stepner, M. (2020). How Did COVID-19 and Stabilization Policies Affect Spending and Employment? A New Real-Time

Economic Tracker Based on Private Sector Data. Working Paper.

• Chronopoulos, D. K., Lukas, M., & Wilson, J. O. S. (2020). Consumer Spending

Responses to the COVID-19 Pandemic: An Assessment of Great Britain. WorkingPaper 20-012.

• Carvalho, V,Hansen,S,Ortiz,A, Garcia,JR, Rodrigo,T, Rodriguez Mora,S P,Ruiz

(2020). Tracking the Covid-19 crisis with high resolution transaction data”. CEPR DP

No 14642

• Cox, N., Ganong, P., Noel, P., Vavra, J., Wong, A., Farrell, D., & Greig, F. (2020).Initial Impacts of the Pandemic on Consumer Behavior: Evidence from Linked Income,

Spending, and Saving Data.

• Eurostat (2013) Handbook on quarterly national accounts

• Gezici F & Burçin Yazgı Walsh & Sinem Metin Kacar, 2017. "Regional and structural

analysis of the manufacturing industry in Turkey," The Annals of Regional Science,Springer;Western Regional Science Association, vol. 59(1), pages 209-230, July.

• Modugno, M., B. Soybilgen, and E. Yazgan (2016). Nowcasting Turkish GDP and

News Decomposition. International Journal of Forecasting 32 (4), 1369–1384.

• Surico, P., Kanzig, D., & Hacioglu, S. (2020). Consumption in the Time of COVID-19:Evidence from UK Transaction Data. CEPR Discussion Paper 14733.

• Watanabe, T., Omori, Y., et al. (2020). Online consumption during the Covid-19 crisis:Evidence from japan. Tech. rep., University of Tokyo, Graduate School of Economics.

• Akcigit,U & Yusuf Emre Akgunduz & Seyit Mumin Cilasun & Elif Ozcan Tok & Fatih Yilmaz, 2019. "Facts on Business Dynamism in Turkey," Working Papers 1930, Research and

Monetary Policy Department, Central Bank of the Republic of Turkey.

• Andersen, A., Hansen, E. T., Johannesen, N., & Sheridan, A. (2020a). Consumer Responses to

the COVID-19 Crisis: Evidence from Bank Account Transaction Data. CEPR Discussion Paper14809.

• Andersen, A. L., Hansen, E. T., Johannesen, N., & Sheridan, A. (2020b). Pandemic, Shutdown

and Consumer Spending: Lessons from Scandinavian Policy Responses to COVID-19.

arXiv:2005.04630 [econ, q-fin].

• Baker, S. R., Farrokhnia, R., Meyer, S., Pagel, M., & Yannelis, C. (2020a). How DoesHousehold Spending Respond to an Epidemic? Consumption During the 2020 COVID-19

Pandemic. Working Paper 26949, National Bureau of Economic Research.

• Baker, S. R., Farrokhnia, R. A., Meyer, S., Pagel, M., & Yannelis, C. (2020b). Income, Liquidity,and the Consumption Response to the 2020 Economic Stimulus Payments. Working Paper

27097, National Bureau of Economic Research.

• Banbura,M and Modugno (2014), maximum likelihood estimation of factor models on datasets

with arbitrary pattern of missing data, Journal of Applied Econometrics, 29, (1), 133-160

• Bodas, D., López, J. R. G., López, T. R., de Aguirre, P. R., Ulloa, C. A., Arias, J. M., de DiosRomero Palop, J., Lapaz, H. V., & Pacce, M. J. (2019). Measuring retail trade using card

transactional data. Working Papers 1921, Banco de España;Working Papers Homepage.

• Bounie, D., Camara, Y., & Galbraith, J. W. (2020). Consumers’ Mobility, Expenditure and

Online-Offline Substitution Response to COVID-19: Evidence from French Transaction Data.SSRN Scholarly Paper ID 3588373, Social Science Research Network, Rochester, NY.