24
Searching for an Edge: Utilizing Google Trend Data to Forecast Changes in the U.S. Housing Market Bryan Lanich 8-3-2015

Bryan Lanich - Masters Thesis

Embed Size (px)

Citation preview

Page 1: Bryan Lanich - Masters Thesis

Searching for an Edge:

Utilizing Google Trend Data to Forecast Changes in the U.S. Housing Market

Bryan Lanich8-3-2015

Page 2: Bryan Lanich - Masters Thesis

I. Introduction

The outcome resulting from a decision is only as good as the quantity and quality of the

information available to the decision maker. Whether you are a consumer, a business owner, a

policy maker, an investor, or an economist, macroeconomic data are the foundation upon which

most decisions are made. Unfortunately, the release of such data is typically delayed up to a

month resulting in lost time for the decision maker. Google Trends is a tool that compiles all

Google search queries and categorizes those queries according to their related industries. The

volume of these queries are reported on a weekly basis which could be useful to the decision

maker if it were found to give some insight into what the macroeconomic data would look like

when released weeks later. (Choi, H. and H. Varian, 2009a)

This analysis will explore the use of Google Trend data in economic time series

modeling. The focus will be on the health of the U.S. housing market by looking at the number

of seriously delinquent mortgage loans (90+ days) as the dependent variable. It is expected that

both the in-sample and the out-of-sample predictive accuracy of the model using Google Trend

data will be equal to or better than the accuracy of the model consisting of traditional

macroeconomic indicators. Additionally, it is expected that the predictive accuracy of the model

consisting of the macroeconomic indicators will improve with the addition of Google Trend data

as explanatory variables.

Before the discussion of the methods and the results of this analysis can begin it will be

necessary to look at the recent history of the U.S. housing market as a means of outlining the

advantages of utilizing Google Trend data. Because the advantages discussed here have to do

with timing, much attention will be given to when the explanatory variables used in this analysis

have peaked and bottomed out in comparison to the historical trends of the observed numbers of

delinquent mortgage loans in the U.S. This established link will offer insight as to why these

variables have been selected as the discussion moves to the analysis portion.

Page 1 of 19

Page 3: Bryan Lanich - Masters Thesis

II. Literature Review

The HUD report given to Congress in January of 2010 outlined the causes of the

foreclosure crisis the U.S. had experienced beginning in 2007. The report indicated that prior to

2007 a rise in foreclosures was typically triggered by an economic downturn where holders of

unaffordable mortgage experienced an income shock such as that resulting from unemployment.

During these economic downturns the number of foreclosures, while increasing, did not balloon

like they had done following 2007 because the ever increasing housing prices allowed some

holders of unaffordable mortgages to sell or refinance to get out from under their mortgage debt.

The point the HUD report makes is that, among other factors, the rapid rise in the U.S. housing

prices precipitated an increase in the number of unaffordable mortgages which in turn led to

further increases in housing prices. Therefore, when the first wave of foreclosures had occurred

and the housing prices dropped it triggered a larger wave of foreclosures leading to an even

larger decrease in housing prices. The damage resulted in a slow-down in the overall U.S.

economy with levels of unemployment not seen in generations. (Herbert, Christopher E., 2010)

With this information an economic decision maker would be prompted to look at

variables such as unemployment, housing prices, and home sales when building a model to

produce a reasonably accurate forecast of the number residential homes entering foreclosure.

However, the reporting of these economic variables like most other monthly indicators occurs

the following month resulting in time lost. This is time that could be spent devising strategies to

mitigate the effects of future changes like the ones experienced following the downturn of 2007.

Google Trend data have some distinct advantages over these economic indicators as was

alluded to previously. They are available in real time, they are compiled and reported on a

weekly basis, and are not subject to later revisions. (Chen, Toro, 2015) Evidence suggests

Google is the most widely used search engine in the U.S., therefore the volume of Google search

terms is representative of the U.S. internet population. A population the American Community

Survey reported in 2013 as being 74.4% of the U.S. households. While real time search data,

like the kind offered by Google Trends can be an advantage when building a model for the

purpose of forecasting, previous studies suggest that including leading macroeconomic indicators

enhance the model’s ability to accurately predict turning points in the data being forecasted.

(Chen, Toro, 2015)

Page 2 of 19

Page 4: Bryan Lanich - Masters Thesis

III. Data

Historical Google Trend data are only available going back to January of 2004, therefore

to have a sufficient number of observations for this analysis it was necessary to use only

variables whose data are reported monthly. The observations of all variables span January 2004

to April 2015, giving a total number of 136 observations.

Monthly U.S. foreclosure starts was intended to be the response variable for this analysis,

but since this data were only available quarterly the monthly percentage of seriously delinquent

Fannie Mae mortgage loans (90+ days) was used instead. Data for this variable are published

within FNMA’s monthly reports and are given as a percentage of their total loan portfolio.

Typically a foreclosure start is not reported until after the mortgage loan has been reported as

seriously delinquent, so using this variable as a proxy for foreclosure starts was not ideal with

regard to the timing of the actual U.S. foreclosure peak. It could also be argued that the FNMA

data are not representative of the mortgage loan delinquency numbers reported by the entire U.S.

housing market. However, this analysis takes the position that FNMA’s mortgage portfolio is

sufficiently large enough to refute this claim.

Macroeconomic indicators that appeared to lead the delinquent mortgage trend were

chosen and their subsequent plots have also been included in Figure 1 to demonstrate visually

how they have behaved in comparison to the response variable. First, the seasonally adjusted

monthly civilian unemployment numbers as reported by the “Labor Force Statistics” from the

“Current Population Survey.” It is evident from the plot this variable shows a steady increase

beginning in June 2007 with the peak coming in October of 2009. As alluded to in the literature

review section, housing prices were named as a contributor to the recent housing crisis so the

monthly S&P/Case-Shiller U.S. National Home Price Index© published by the Federal Reserve

were included. The data plot for this variable in Figure 1 shows a trend peaking in April 2006,

and again in March 2007 before dropping off dramatically. Following the price index it was

necessary to also include sales so monthly data for total U.S. sales of single family residential

homes reported by the Federal Reserve were used. As demonstrated by the plot, homes sales in

this category peaked in July 2005 with a steady decline thereafter. After reaching a bottom in

March of 2011 sales have begun a slow, but steady increase. Lastly, the HUD report had

mentioned an increase in mortgage-backed security holdings as a precipitating factor to the sharp

Page 3 of 19

Page 5: Bryan Lanich - Masters Thesis

rise in U.S. foreclosures, so monthly totals belonging to large domestic commercial banks were

included. This data is published by the Federal Reserve and the plot displays a large peak in July

2008, then a small decrease, before peaking again in November 2009. Following the second peak

the numbers have exhibited a steady decline before leveling off in December of 2012.

Just as with the macroeconomic indicators described above, the plots of the Google

search term volume have also been displayed alongside the response variable to compare their

behaviors. Before taking a look at these plots in Figure 2 it is necessary to first address the

method by which Google reports their trend data. The monthly query volume for each search

term has been normalized. This has been done by taking the total query volume of the search

term and dividing it by the total number of queries entered into Google that month. The time

period (month for this analysis) with the highest result is assigned 100 and the values for the

remaining months are assigned according to the percentage of deviations from the time period

that has been assigned 100.

To arrive at the search terms selected for this analysis the search volume data for

“mortgage help” was obtained first and the remaining search terms were located in a section

entitled “Related searches.” The plot for “mortgage help” in Figure 2 displays a steady increase

beginning August 2007 with a peak in March 2009. It should be noted the search volume for this term

remains higher post-recession than the pre-recession levels indicating it most likely takes on more of a

mortgage shopping purpose as the housing market begins to rebound. Next, the search volume for

“loan modification” was reported as negligible up until June 2007 with a steep increase beginning in

September 2008, and a peak occurring in March 2009. The plot for the “foreclosure help” search volume

also had a period of negligible search volume, giving way to a steep increase beginning in March 2008

with a peak in February 2009. Lastly, “loss mitigation,” like the previous terms, also reported a period of

negligible search volume. The steady increase for this variable began in February 2007 with a plateau-

styled peak occurring January 2009 through April 2009. Notice the peaks in normalized volume for each

of these search terms occurred prior to the peak of FNMA delinquent mortgages. Visually this would

seem to suggest Google Trend data could increase the predictive accuracy of a model with delinquent

mortgages as a response variable. Before exploring this further it is necessary to first talk about the

methods used to conduct the empirical analysis designed to confirm what the plots have revealed.

Page 4 of 19

Page 6: Bryan Lanich - Masters Thesis

Jan-04

Aug-04

Mar-05Oct-

05

May-06

Dec-06Jul-0

7

Feb-08

Sep-08Apr-0

9

Nov-09Jun-10

Jan-11

Aug-11

Mar-12Oct-

12

May-13

Dec-13Jul-1

4

Feb-150

5000000100000001500000020000000

Civilian Unemployment

Jan-04

Aug-04

Mar-05Oct-

05

May-06

Dec-06Jul-0

7

Feb-08

Sep-08Apr-0

9

Nov-09Jun-10

Jan-11

Aug-11

Mar-12Oct-

12

May-13

Dec-13Jul-1

4

Feb-15150.0170.0190.0210.0230.0250.0

Housing Price Index

Page 5 of 19

Jan-04

Aug-04

Mar-05Oct-

05

May-06

Dec-06Jul-0

7

Feb-08

Sep-08Apr-0

9

Nov-09Jun-10

Jan-11

Aug-11

Mar-12Oct-

12

May-13

Dec-13Jul-1

4

Feb-150.001.002.003.004.005.006.00

FNMA Delinquent Mortgages (%)

Figure 1:

Delinquent mortgage data graphed with chosen macroeconomic indicators

Page 7: Bryan Lanich - Masters Thesis

Jan-04

Aug-04

Mar-05Oct-

05

May-06

Dec-06Jul-0

7

Feb-08

Sep-08Apr-0

9

Nov-09Jun-10

Jan-11

Aug-11

Mar-12Oct-

12

May-13

Dec-13Jul-1

4

Feb-150

500000

1000000

1500000

New Home Sales

Jan-04

Aug-04

Mar-05Oct-

05

May-06

Dec-06Jul-0

7

Feb-08

Sep-08Apr-0

9

Nov-09Jun-10

Jan-11

Aug-11

Mar-12Oct-

12

May-13

Dec-13Jul-1

4

Feb-1550.070.090.0

110.0130.0150.0170.0190.0210.0

Mortgage-Backed Security Holdings

Page 6 of 19

Page 8: Bryan Lanich - Masters Thesis

Jan-04

Aug-04

Mar-05Oct-

05

May-06

Dec-06Jul-0

7

Feb-08

Sep-08Apr-0

9

Nov-09Jun-10

Jan-11

Aug-11

Mar-12Oct-

12

May-13

Dec-13Jul-1

4

Feb-150.001.002.003.004.005.006.00

FNMA Delinquent Mortgages (%)

Jan-04

Aug-04

Mar-05Oct-

05

May-06

Dec-06Jul-0

7

Feb-08

Sep-08Apr-0

9

Nov-09Jun-10

Jan-11

Aug-11

Mar-12Oct-

12

May-13

Dec-13Jul-1

4

Feb-150

20406080

100

Search: Loan Modification

Jan-04

Aug-04

Mar-05Oct-

05

May-06

Dec-06Jul-0

7

Feb-08

Sep-08Apr-0

9

Nov-09Jun-10

Jan-11

Aug-11

Mar-12Oct-

12

May-13

Dec-13Jul-1

4

Feb-150

20406080

100

Search: Foreclosure Help

Jan-04

Aug-04

Mar-05Oct-

05

May-06

Dec-06Jul-0

7

Feb-08

Sep-08Apr-0

9

Nov-09Jun-10

Jan-11

Aug-11

Mar-12Oct-

12

May-13

Dec-13Jul-1

4

Feb-150

50

100

Search: Loss Mitigation

Page 7 of 19

0

50

100Search: Mortgage Help

Figure 2:

Delinquent mortgage data graphed with the chosen Google search terms

Page 9: Bryan Lanich - Masters Thesis

IV. Methodology

The statistical analysis using the monthly data described above was completed using the

open-source R software package. By default the Google Trend data is reported weekly so

extracting the monthly data required additional work utilizing the embedded java script. The

specific details outlining this procedure will not be covered here, but the following site does offer

an accurate step by step approach; http://www.ommax-marketing.com/blog/google-trends-how-

to-extract-monthly-data/. The in-sample observations (January 2004 to February 2015) totaled

134, leaving two observations for the out-of-sample forecast.

Following the log transformation of each of the macroeconomic indicator variables the

analysis began with determining the order of integration. This was accomplished using

comparisons of the autocorrelation functions and the augmented Dickey-Fuller tests. The results

of which are displayed in this table. Given

this analysis was multivariate, selecting a

vector autoregressive (VAR) model was a

logical choice to determine the dynamic

relationships between the variables. To

accurately analyze these relationships it was

necessary to begin with stationary variables to

address any problems of spurious regression.

Therefore, each variable was differenced

according to their order of integration to

arrive at stationarity and one observation from

each of the I (1) variables was dropped to

ensure all variables are at level, and their

respective data frames are the same.

Model 0

Model 0 consisted of the Google search term variables (g1, g2, g3, and g4) and the

FNMA delinquent mortgages (y) as the response variable. The Johansen approach was used to

test for variable co-integration which revealed the null hypothesis of no co-integration was rejected at

all reasonable levels of significance where r = 0. These results suggested an error correction term would

Page 8 of 19

Variable Order of

Integration

(y) FNMA Delinquent

Mortgages

I (2)

(m1) Civilian Unemployment I (1)

(m2) Housing Price Index I (2)

(m3) New Home Sales I (1)

(m4) Mortgage-Backed Security

Holdings

I (1)

(g1) Search: Mortgage Help I (1)

(g2) Search: Loan Modification I (1)

(g3) Search: Foreclosure Help I (1)

(g4) Search: Loss Mitigation I (1)

Page 10: Bryan Lanich - Masters Thesis

need to be included in our model to add robustness to the results and accuracy to the forecast. Next, the

optimal number of lags to include in the VECM model (VAR plus error correction term) had to be

determined by comparing the Akaike Information Critera of each lag variation of model 0. The results of

this test determined that 10 lags of each variable would be optimal and the resulting model is represented

mathematically here:

Model 0 :∆ log ( y t)=α+β1 ∆ log ( y t −1)+… β10 ∆ log ( y t−10)+β11∆ g1 t+β12 ∆ g1t−1+… β22 ∆ g1 t−10+β23∆ g2 t+ β24 ∆ g2 t−1+…β34 ∆ g2 t−10+β35 ∆ g3t +β36∆ g3 t −1+…β 46∆ g3 t−10+β47 ∆ g4 t+β48 ∆ g4 t−1+… β58 ∆ g4 t−10+β59 et−1+u t

In the course of using this model to determine the predicted and forecasted results to be

reported in the next section it was necessary to view the plots of the model’s residuals to

determine the presence of any serial correlation. As the plots below would indicate there was no

evidence of serial correlation that could have caused skewed results with regard to the model’s

standard errors and t-tests.

Model 1

Page 9 of 19

Page 11: Bryan Lanich - Masters Thesis

Model 1 consisted of the macroeconomic indicator variables (m1, m2, m3, and

m4) and the FNMA delinquent mortgages (y) as the response variable. The Johansen test

revealed the null hypothesis of no co-integration was rejected at all reasonable levels of significance so

the error correction term was again added. Next, to keep the model findings consistent with regard to the

number of lags, 10 was chosen for model 1 just as it was for model 0. The mathematical representation

for model 1 can be seen here:

Model 1: ∆ log ( y t)=α+ β1 ∆ log ( y t −1)+… β10∆ log ( y t−10)+ β11 ∆ log (m1 t)+β12∆ log (m1 t−1)+…β22 ∆ log (m1 t −10)+β23∆ log (m2 t)+ β24 ∆ log (m2 t−1)+…β34 ∆ log (m2 t−10)+β35 ∆ log (m3 t)+β36 ∆ log ¿

The plots of the residuals for model 1 were analyzed to determine the presence of serial

correlation just as they had been done for model 0. As the plots below would indicate there was

no evidence of serial correlation.

Model 2

Model 2 included all google search variables from model 0 and the macroeconomic

indicator variables from model 1, with the FNMA delinquent mortgages (y) as the response

variable. The Johansen test again revealed the null hypothesis of no co-integration was rejected at all

Page 10 of 19

Page 12: Bryan Lanich - Masters Thesis

reasonable levels of significance so the error correction term was added. The number of lags was again

chosen to be 10 with the mathematical representation for model 2 seen here:

Model 2: ∆ log ( y t)=α +β1∆ log ( y t−1)+… β10∆ log ( y t−10)+β11∆ g1 t+β12∆ g1 t−1+… β22 ∆ g1t−10+ β23 ∆ g2 t+β24 ∆ g2 t−1+… β34 ∆ g2 t−10+ β35 ∆ g3 t+β36 ∆ g3 t−1+… β46 ∆ g3 t−10+β47 ∆ g4 t+ β48∆ g4 t−1+… β58∆ g4 t−10+β59∆ log (m1 t)+ β60 ∆ log (m1 t−1)+…β70 ∆ log (m1 t−10)+β71∆ log (m2 t)+β72 ∆ log (m2 t−1)+… β82∆ log (m2 t−10)+β83 ∆ log (m3t)+β84 ∆ log (m3 t−1)+…β94 ∆ log (m3 t−10)+β95 ∆ log (m4 t)+β96 ∆ log (m4 t−1)+…β106 ∆ log (m4 t−10)+β107 et −1+u t

The plots of the residuals for model 2 were analyzed to determine the presence of serial

correlation just as they had been done for previous two models. As the plots below would

indicate there was no evidence of serial correlation.

Lastly, the root mean squared forecast error (RMSFE) is being used as the measure of

predictive accuracy when analyzing the predicted in-sample values for FNMA delinquent

mortgages as well as the two period ahead out-of-sample forecasted values. This forecast error is

the difference between the actual value and the predicted or forecasted value for the

corresponding period and is the result of the following function:

RMSFE=√ 1n∑i=1

n

¿¿

Page 11 of 19

Page 13: Bryan Lanich - Masters Thesis

Results

It was expected that the predictive accuracy of both the in-sample predicted values and

the 2 period ahead out-of-sample values of model 0 would be equal to or better than the

predictive accuracy of the model 1. Additionally, it was expected that the predictive accuracy of

model 1 would improve with the addition of Google search variables from model 0.

Visually the plots of the in-sample predicted values from each model compared with the

actual values of the FNMA delinquent mortgages as seen in Figure 3 (page 13) demonstrate no

noticeable differences in model predictive accuracy. The results found from the root mean

squared forecast error

calculations listed in the table

here give a better picture of the

predictive accuracy

comparisons. As the table

would indicate the predictive

accuracy of the in-sample predicted values of model 1 is 1.9% more accurate than that of model

0. The in-sample predicted values of model 2 is 11.4% more accurate than model 1 and 13.0%

more accurate than model 0. This suggests that the predictive accuracy of the model consisting

of just the Google search term variables was slightly less accurate than the macroeconomic

indicator variables. However, the accuracy of the macroeconomic indicator model was greatly

enhanced by the addition of the Google search term variables (model 2).

Visually the plots of the 2 period ahead, out-of-sample forecasted values show greater

disparity between the models as would be expected from a small number of predicted values.

From the plots located on Figure 4 (page 14) it is evident the forecasted values from model 0

demonstrate the greatest predictive accuracy, followed by model 1 and then model 2. These

assumptions are validated with the RMSFE results as reported in the table on the previous page.

The underperformance of model 2 with regard to predictive accuracy suggests this model could

suffer from over-fitting which would be more likely with a lower sample size.

Page 12 of 19

model 0 0.03897model 1 0.03823model 2 0.03388

model 0 0.02342model 1 0.04664model 2 0.0588

RMSFE: In-Sample Predicted vs. Actual

RMSFE: 2 Periods Ahead Out-of-Sample vs. Actual

Page 14: Bryan Lanich - Masters Thesis

Taking the forecast analysis a step further we look at the variance decomposition results

from a forecast of 12 periods ahead to gain a better understanding of the dynamic relationship

among the variables in both the short run and the long run. This is displayed in Table 1 on page

16, where the results for each independent variable are listed relative to the delinquent mortgage

response variable. These forecast error variance decomposition results are measuring how much

of the shock applied to each of the variables (1 standard deviation) is explained by the overall

shock to the model. The lower the value would indicate the more exogenous the relationship.

Looking at models 0 and 1 first, we see out of the Google search terms “foreclosure help”

the most exogenous in the short run (less than or equal to 6 periods ahead) with “mortgage help”

and “loan modification” the most exogenous in the long run (greater than 6 periods ahead). The

values of the macro indicators suggest new home sales and mortgage-backed securities are the

most exogenous variables in the short and long runs with unemployment and housing price being

the most endogenous.

When these Google variables are combined with the macro indicators in model 2 we see

changes to these variance decomposition results when compared with the values from models 0

and 1. With the exception of the ‘loss mitigation” search variable, the values of all variables

have increased in both the short and long runs. This demonstrates a greater endogenous

relationship relative to the delinquent mortgage response variable with “loan modification”, new

home sales, and mortgage-backed securities demonstrating the largest increases. This would

suggest the granger causality is increased between these variables and delinquent mortgages

when they are included in the same model.

Conclusion

This analysis has provided initial evidence of the usefulness of Google Trend data when

producing a forecast of delinquent mortgages to determine the overall health of the U.S. housing

market. The results from the VECM models suggests the predictive accuracy of a model

consisting of only Google search variables are close in accuracy to those consisting of

macroeconomic indicator data. Additionally, there is evidence to support the increased

predictive accuracy with the addition of Google trend variables to a model consisting of

Page 13 of 19

Page 15: Bryan Lanich - Masters Thesis

macroeconomic indicators. Even though the variance decomposition results indicate low levels

of Granger causality for the Google search terms relative to delinquent mortgages this analysis

does suggest the causality increases when combined with the macro indicators.

These initial results offer reasons to explore the use of Google Trend data even further,

especially when its greater reporting frequency is taken into account. This trend data may only

reach back to 2004 but the immense volume of search related categories and terms offers

opportunities for researchers to devise creative new modelling approaches to enhance those time

series techniques already being utilized.

Page 14 of 19

Page 16: Bryan Lanich - Masters Thesis

Figure 3: Comparison of each Model’s Predicted Plot

0

1

2

3

4

5

6 Model 1 : Predicted Macro Indicators vs. Actual

FittedActual

FNM

A De

linqu

ent M

ortg

ages

(%)

0

1

2

3

4

5

6Model 2 : Predicted Combination vs. Actual

FittedActual

FNM

A De

linqu

ent M

ortg

ages

(%)

Page 15 of 19

0

1

2

3

4

5

6 Model 0 : Google Predicted vs. Actual

FittedActual

FNM

A De

linqu

ent M

ortg

ages

(%)

Page 17: Bryan Lanich - Masters Thesis

Figure 4: 2 period ahead forecasts for each model plotted with a 95% confidence interval

Page 16 of 19

Page 18: Bryan Lanich - Masters Thesis

Table 1: Variance Decomposition results from a 12 period ahead forecast. Results from each independent variable are relative to the delinquent mortgage response variable.

Periods Ahead model 0 model 2 model 0 model 2 model 0 model 2 model 0 model 2

1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

2 0.009 0.011 0.010 0.030 0.001 0.043 0.024 0.023

3 0.012 0.027 0.010 0.057 0.004 0.057 0.050 0.032

4 0.012 0.025 0.015 0.050 0.012 0.060 0.051 0.036

5 0.028 0.028 0.016 0.058 0.025 0.050 0.059 0.026

6 0.031 0.038 0.020 0.051 0.024 0.045 0.062 0.030

7 0.031 0.044 0.020 0.061 0.042 0.043 0.060 0.031

8 0.030 0.042 0.028 0.065 0.042 0.047 0.068 0.035

9 0.032 0.053 0.030 0.074 0.043 0.045 0.072 0.059

10 0.032 0.046 0.030 0.073 0.046 0.043 0.074 0.056

11 0.032 0.044 0.032 0.090 0.045 0.034 0.075 0.061

12 0.039 0.040 0.033 0.118 0.046 0.026 0.074 0.061

Periods Ahead model 1 model 2 model 1 model 2 model 1 model 2 model 1 model 2

1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

2 0.014 0.000 0.001 0.023 0.000 0.008 0.004 0.000

3 0.014 0.040 0.002 0.085 0.006 0.014 0.024 0.000

4 0.069 0.137 0.005 0.078 0.006 0.019 0.025 0.003

5 0.092 0.133 0.012 0.163 0.022 0.019 0.024 0.061

6 0.089 0.116 0.058 0.167 0.021 0.055 0.023 0.074

7 0.090 0.106 0.060 0.155 0.021 0.073 0.026 0.108

8 0.094 0.112 0.060 0.148 0.022 0.085 0.026 0.115

9 0.092 0.095 0.060 0.115 0.027 0.124 0.025 0.163

10 0.090 0.087 0.063 0.093 0.036 0.105 0.029 0.277

11 0.089 0.083 0.066 0.080 0.035 0.110 0.031 0.322

12 0.088 0.073 0.065 0.062 0.034 0.115 0.031 0.370

Unemployment Housing Price New Home Sales Mortgage-Backed Securities

Relative to Delinquent Mortgages (y)Search: Mortgage Help Search: Loan Modification Search: Foreclosure Help Search: Loss Mitigation

Page 17 of 19

Page 19: Bryan Lanich - Masters Thesis

References

Carrière-Swallow, Y. and Labbé, F., Nowcasting with Google Trends in an Emerging Market. J. Forecast., 32: 289–298. 2013.

Chen, Toro and So, Erin Pik Ki and Wu, Liang and Yan, Isabel Kitming, “The 2007–2008 U.S. Recession: What Did the Real‐Time Google Trends Data Tell the United States?” Contemporary Economic Policy, Vol. 33, Issue 2, pp. 395-403, 2015.

Choi, H. and H. Varian, 2009a, “Predicting the Present with Google Trends”, Google Technical Report.

Choi, H. and H. Varian, 2009b, “Predicting Initial Claims for Unemployment Benefits”, Google Technical Report.

Herbert, Christopher E. and Apgar, William C., “Report to Congress on the Root Causes of the Foreclosure Crisis,” (January 28, 2010).

Vosen, S. and Schmidt, T. (2011), “Forecasting Private Consumption: survey-based indicators vs. Google trends. J. Forecast.,” 30: 565–578. doi: 10.1002/for.1213.

Page 18 of 19