9

Click here to load reader

Housing Paper

Embed Size (px)

Citation preview

Page 1: Housing Paper

Do Out-of-State Residents Pay More ForHouses? If Yes, What If They Did Not Do So?

Manish GupteMechanism Analytics

Ralph SiebertPurdue University and CESifo

June 27, 2012

Abstract

This paper analyzes individual sales of houses. A structural modelwhich is based on discrete choice approach and accounts for choiceset heterogeneity is developed here. The choice set allows modelinglocational effects better. Based on the phone number of the buyer wefind if the person is from the area (Lafayette, IN). We find that buy-ers living in Lafayette area pays 3.04% less. Also, for every 30,000$a person pays 3.00% more. The structural model enables a counterfactual which simulates what would happen if there was no discrimi-nation against non-locals. A decrease in the average price of 0.41% ispredicted and due to higher concentration of non-locals in the WestLafayette area, where Purdue University is located, the decrease inprices would be 1.41%.

1 Introduction

Many of us have either bought a house or seen someone buy. Understandingthis process better can provide value and enable better policy or marketinterventions to make it more efficient. A basic average of prices between 2004and 2007 in the West Lafayette, IN area shows the average price of housesbought by non-locals is $165,367 and those bought by locals is $126,812. Now,this can be because of various reasons. First, the non-locals may actually bepaying a higher price. Second, the houses bought by non-locals could bedifferent from the houses bought by locals. Using MLS and county data we

1

Page 2: Housing Paper

estimate this effect. To do so, we develop a variant of the discrete choicemodel. Two key features are, the choice set changes for every purchase andevery transaction is unique. Some of the house’s characteristics are observedand the rest is modeled using an unobservable variable. We do find non-localspay 3.04% more. Also those with 30,000$ more income pay 3.00% more.

Such willingness (or unwillingness) to pay higher price could be due tovarious reasons like incomplete information, less time to buy (which could bedue to no fallback option) and possibly just a matter of preference. Wedescribe the datasets, model and estimation algorithm in the next section.

Now, that we know something like this happening we simulate wouldhappen if non-locals paid the same as locals. The structural model allows issimulating it. A decrease in the average price of 0.41% is predicted and due tohigher concentration of non-locals in the West Lafayette area, where PurdueUniversity is located, the decrease in prices would be 1.41%. We describe themodel, along with the assumptions that go into it and estimation algorithmin the subsequent section.

2 Related Literature

Rosen (1974, JPE) and Epple (1987, JPE) have considered the hedonic ap-proach where price is modeled using various explanatory factors. Cutler,Glaeser and Vigdor (1999, JPE) and Cutler and Glaeser (1999, QJE) havelooked at role of neighborhood and school quality/segregation/residentialsorting on house prices. We account for school quality in our analysis. Ba-jari and Kahn (2004, JBES) use random coefficients and allow heterogeneouspreferences. Their approach is along the lines of hedonic models.Our analy-sis also allows for heterogenous preferences, some observed and some unob-served. The unobserved are simulated. Bayer, McMillan and Reuben(2004,NBER) and Bayer, Ferreira and McMillan (2007, JPE) are the papers closestto our work in terms of the model. Discrete choice models’ are employed.Our approach incorporates variation in the choice set for every purchase andevery transaction is unique.

2

Page 3: Housing Paper

3 Finding the facts

3.1 Datasets

Data was collected from the MLS website using a python script. This hashousing characteristics, address and prices from 2004 to 2007. Residentialentries and single family houses were considered. We dropped houses witharea in square foot less than 460, number of bedrooms less than 1 and morethan 9, sale price less than 10,000$, roomsize less 8 sqft and garage sizegreater than 4. Prices were deflated using annual median house price fromHUD. This was done to filter out the variation in housing market due tomacro-economic factors. Data from Tippecanoe county provided address ofthe buyers and their phone number. GIS software was used to match the twodatasets and match the phone number with the MLS transaction. Censusdata is also used to obtain neighborhood characteristics (household income,family size, education, ethnicity etc) for 39 tracts in the Lafayette area.

3.2 Descriptive Statistics

The characteristics of the houses in the Lafayette area are:

Variable Mean Stdev Min MaxHouse CharacteristicsOrigprice 136,080.46 83,098.56 23,041.48 724,299.06Listprice 131,973.72 79,375.74 23,041.48 700,934.58Soldprice 126,812.48 76,061.98 14,634.14 671,028.04DOM 78.45 79.91 0 593Sqft 1,739.69 686.91 748 4,917Rooms 7.44 1.85 4 14Bedrooms 3.37 0.58 2 6Baths 1.77 0.67 1 5Kitchensize 160.78 60.66 0 840Age 40.95 26.42 0 146Lotsize 9,743.52 5,922.76 0 48,860Newconstr 0.08 0.27 0 1Garage 1.47 0.77 0 3Basement 0.43 0.49 0 1Pool 0.03 0.17 0 1Local 1 0 1 1Neighborhood CharacteristicsIncome 44,137.04 9,798.66 13,965 62,003Masters 0.06 0.05 0.01 0.25HS WL 0.11 0.31 0 1HS Jeff 0.59 0.49 0 1HS Har 0.29 0.46 0 1Observations 2,445

Prices in current US-dollars

3.3 Hedonic regression

A simple OLS regression was first done. The results are as follows.

3

Page 4: Housing Paper

VariableEndogenous: Soldprice OLSLocal -6.422e+03

(1.008e+03)∗∗∗

Sqft 3.425e+01(8.379e-01)∗∗∗

Baths 1.025e+04(7.969e+02)∗∗∗

Age -2.038e+02(1.392e+01)∗∗∗

Newconstr -2.385e+02(1.311e+03)

Garage -6.209e+03(4.812e+02)∗∗∗

Income 1.256e-01(3.397e-02)∗∗∗

Masters -8.180e+03(8.322e+03)

HS WL 7.881e+03(1.262e+03)∗∗∗

HS Har 4.847e+03(1.059e+03)∗∗∗

Observations 2,839R-squared adjusted 0.9385

As this regression shows non-locals pay 5,612.67$ more. This is about4.4%. The structural model we propose next allows for heterogeneity ofvarious kinds and enables a counter-factual.

3.4 Structural model

3.4.1 Considering local/non-local heterogeneity

We develop a discrete choice model and extend Bayer, Ferreira and McMil-lan’s (2007, JPE) work. The extentions are every house is unique and sono aggregation over housing types is required, the choice set changes and ishouses on the market that day and contraction mapping algorithm used toestimate is modified. The utility from a purchase of house during a transac-tion h by customer i is Uih = δh+λih+εi. δh is mean utility from a house andequivalent to the fixed effect as in BLP (1995), λih is heterogeneity of pref-erences and εi is extremum value distributed error (logit). The mean utilityis δh = βXh− αph + ζh, where β is the marginal utility from a characteristicX, α is the mean willingness to pay,ph is the price of the house and ζh isthe unobserved housing characteristic. λih = αI ∗ Ii ∗ ph is the correctionto willingness pay when a person is a local. We also consider differences inconsumers’ incomes. This is described in the following section.

The probability of a consumer buying a house can then be modeled usinga multinomial logit. So, the likelihood of purchasing house h by consumer iPrh = exp(λih+δh)∑

kexp(λik+δk)

. The denominator is the set of all houses on the market

that day. This is found using the ”days on the market” variable and closedate. All houses in Lafayette area are on the market that day. This opens

4

Page 5: Housing Paper

the possibility of irrelevant alternatives. Heterogeneity corrects for it. In thismodel only the local/non-local one is considered and the model in followingsection income heterogeneity is also considered. Market share of every houseis 1 over market size and a consumer picks house which generates highestutility. As the house is actually sold the parameters should be such that theprobability of buying the bought house is maximized. As all houses are soldthe joint probability is maximized.

A two stage procedure is developed for this. The heterogeneity parameteris estimated first which leaves out the mean utility. The mean utility is thenexpressed in terms of the housing characteristics and price. In the first stage,the heterogeneity parameter is searched. Matlab’s fmin was used. This isa varient of the gradient search algorithm. For each choice of heterogeinityparameter the unobservable is estimated using a contraction mapping. Thecontraction mapping is similar in spirit to that in BLP(AER 1995) but re-quires a modification as every house is unique. As the house is actually soldthe predicted probability should be 1. So, δt+1

h = δth − log(Prh). Other thanintution we do not have a formal proof on why this should work. The resultsobtained this way were tested and are robust.

The mean utility δh obtained this way is then explained using housingcharacteristics and price. Price is higher for houses of higher quality. Notall variables which measure a houses quality are in the data. So, the endo-geneity has to be corrected. We use seemingly unrelated regression for thispurpose. The intruments used are follows. Average neighborhood house sizeand average age of houses in the neigborhood are considered instrumentshere. The model in the next section uses another instrument. A time in-dictor on whether it is winter month or not (Nov, Dec, Jan). This is aninstrument because, the house stays the same but the buyer’s constraints interms traveling in the snow (Midwest!) change. As this is work in progressa few things like this have to fixed for consistency.

The results from the first stage estimate of αI is −4.91E − 6 and thestandard error is (1.12E − 6)∗∗∗. The standard error should actually bebootstrapped as done for the next model. Here we use asymptotic standarderrors. To do that, the concentrated likelihood is used and the values of theunobservable are as estimated in the last iteration. So, locals pay less. Thisresult is robust to various specifications.

The results from the second stage are as follows.

5

Page 6: Housing Paper

2nd Stage, SURVariables Delta Equation Price EquationIntercept -2.99801e+04

(4.73687e+03)∗∗∗

Soldprice -2.06675e-04(3.22501e-05)∗∗∗

DOM -5.10679e+00(4.57409e+00)

Sqft 9.26933e-03 3.40125e+01(1.82646e-03)∗∗∗ (8.38939e-01)∗∗∗

Baths 1.42978e+01 1.12612e+04(1.41187e+00)∗∗∗ (8.32477e+02)

Age 4.54273e-01 -2.11796e+02(2.46411e-02)∗∗∗ (2.11504e+01)∗∗∗

Newconstr 1.36015e+01 -1.36650e+02(2.26764e+00)∗∗∗ (1.30578e+03)

Garage 4.70400e+00 -5.57629e+03(8.56420e-01)∗∗∗ (4.90073e+02)∗∗∗

Local -5.08918e+03(1.04655e+03)∗∗∗

Income 2.14964e-03 3.75893e-01(5.51048e-05)∗∗∗ (5.72506e-02)∗∗∗

Masters 4.21076e+02 4.95787e+04(1.37397e+01)∗∗∗ (1.33553e+03)∗∗∗

HS WL 2.15782e+01 8.80663e+03(2.19591e+00)∗∗∗ (1.43545e+03)∗∗∗

HS Har 9.36538e+00 5.99882e+03(1.83618e+00)∗∗∗ (1.33553e+03)∗∗∗

Age in neighborhood 1.25292e+02(3.08704e+01)∗∗∗

Sqft in neighborhood 3.84025e+00(2.18943e+00).

Correlation in errors 0.119862Observations 2,839R-squared adjusted -0.526856 0.77316

Based on the mean willingness to pay and adjustment to it based onlocal / non-local heterogeneity one finds that locals pays 2.45% less. Hedonicapproach had suggested 4.4% less. This method accounts concentration ofnon-locals in an area so spatial effects are better captured. The preferenceof buyers is estimated here and the prices depend on how the buyers andsellers interact. As the conterfactual simulation on what would happen ifthis difference in willingness to pay did not exist shows, the price differencein West Lafayette should be higher.

3.4.2 Considering income heterogeneity along with local/non-localheterogeneity

Differences in income should play an important role in consumers’ willingnessto pay. We do not have data on the buyers’ incomes. So, we simulatethat using the distribution of incomes in the 39 tracts in Lafayette area.The differences in distributions enable estimating the income effect. Thelikelihood is in the spirit of Nevo (2001). The expected probability of

a sale of a house now is, Prh = E[ exp(λ1ih+λ2ih+δh)∑kexp(λ1ik+λ2ik+δk)

] where λ1ih is λ1 ∗Salepriceh ∗ Locali + λ2 ∗ Salepriceh ∗ Incomeih + areah ∗ Incomeih, andthe expectation is computed by simulating income. Every house has an

6

Page 7: Housing Paper

expected probability and the joint probability is product of such individualprobabilities.

This likelihood is again maximized like in the earlier section. The resultsare as follows. The standard errors on the first stage were obtained by ran-domly picking 70% of the houses (2,000 out of 2,839) and running 49 suchestimations.

Parameter EstimateLocal*Soldprice -4.97*10E-6

(1.60*10E-6)∗∗∗

Income*Soldprice 1.64*10E-10(2.54*10E-11)∗∗∗

Income*Sqft 9.99E-10(2.97E − 10)∗∗∗

Estimates obtained from all data and used for further analysis

Local*Soldprice -4.86*10E-6Income*Soldprice 1.59*10E-10Income*Sqft 1.9E-9

The results from the second stage are as follows.2nd Stage, SURVariables Delta Equation Price EquationIntercept -2.966e+04

(4.692e+03)∗∗∗

Soldprice -1.592e-04(2.610e-05)∗∗∗

DOM -4.481e+00(4.596e+00)

Sqft 7.035e-03 3.397e+01(1.473e-03)∗∗∗ (8.342e-01)∗∗∗

Baths 1.149e+01 1.135e+04(1.136e+00)∗∗∗ (8.299e+02)∗∗∗

Age 3.662e-01 -2.032e+02(1.985e-02)∗∗∗ (2.080e+01)∗∗∗

Newconstr 1.052e+01 -4.491e+02(1.824e+00)∗∗∗ (1.303e+03)

Garage 3.929e+00 -5.747e+03(6.909e-01)∗∗∗ (4.888e+02)∗∗∗

Local -4.707e+03(1.042e+03)∗∗∗

Income 1.735e-03 3.589e-01(4.449e-05)∗∗∗ (5.766e-02)∗∗∗

Masters 3.363e+02 5.094e+04(1.106e+01)∗∗∗ (1.267e+04)∗∗∗

HS WL 1.723e+01 8.724e+03(1.767e+00)∗∗∗ (1.441e+03).

HS Har 7.163e+00 6.283e+03(1.488e+00)∗∗∗ (1.354e+03)∗∗∗

Age in neighborhood 1.166e+02(3.032e+01)∗∗∗

Sqft in neighborhood 4.144e+00(2.182e+00).

Winter Months -3.515e+02(9.880e+01)∗∗∗

Correlation in errors 0.129Number of observations 2824R-squared adjusted 0.969 (no constant) 0.776

To find the difference is willingness to pay we realize (PNon−local−PLocal)/PLocal =λLocal∗Soldprice/α. This means locals pay 3.04% percent less than non-locals.Also a person with 30,000$ more income would be willing to pay more by(PIncome+30,000−PIncome)/PIncome = λIncome∗Soldprice∗30, 000/α which is about3.00%. Higher income also means higher preference for area.

7

Page 8: Housing Paper

4 What happens to prices if non-locals be-

haved like locals

Now that we know that such differences in preferences exist we attempt tofind out what would happen if non-locals behaved like locals. Our goal is topredict the housing prices in the market if non-locals had the same willingnessto pay as locals.Unlike considering heterogeneity term alone this shows theeffect on the price of houses on the market as a whole.The approach hereis assuming new prices are going to be such that the original probability ofbuying the house is same as the probability after Non-locals become Locals.That is, the consumers in the market make the same choices.

For this simulation income heterogeneity was not considered. We usethe δhs and willingness to pay obtained through the estimates of α and αIfor the counterfactual experiment. To do so, we find the λih by setting thedummy variable on locals, Ii, to 1 for all i. That is all non-locals becomelocals. We then plug in these λih and write the probability of a sale as: Prh =exp(λih+δh+α(ph−p∗h))∑kexp(λik+δk+α(pk−p∗k))

where p∗h are the new predicted prices, which maximize

the probability of sale. As we assume houses would be sold in exactly thesame manner after Non-locals behaved like Locals. Thus, p∗h are those priceswhich maximize the probability of observed sale. One way to obtain the newprices is to search for the best prices, but that is computationally demanding.So, we plug in p∗k = β ∗Xk, where X are explanatary variables and β are theparameters to be estimated. The results of this analysis are as follows.

Variable (1) (2)

Old Price 0.9725 0.9727(0.0305)∗∗∗ (0.0322)∗∗∗

Sqft -0.0113 -0.0083(0.0841) (0.0837)

Lotsize -0.0007(0.0058)

Age -0.0008(0.0029)

Income 0.0886 0.086(0.0856) (0.091)

Masters 1.1424 0.806(1.7982) (1.95)

West Lafayette high sch -0.4607 -0.5008(0.2091)∗∗∗ (0.2291)∗∗∗

Harrison high sch -0.0837(0.1921)

If Locals are treated like Non-locals. Prices decrease on average by 0.41%.Prices in West Lafayette decrease by an additional 1, 386 dollars which is atotal of about 1.41%.

An analysis using micro data could improve estimates. Clearly, somegovernment or market intervention is required to remedy the situation.

8

Page 9: Housing Paper

5 References

1. D. Cutler, Glaeser, E Are ghettos good or bad? Quarterly Journal ofEconomics 112, 827872, 1997.

2. D. Cutler, Glaeser E, Vigdor J, The rise and decline of the Americanghetto. Journal of Political Economy 107, 455506, 1999.

3. Bayer, P., F. Ferreira, and R. McMillan. 2007. A Unified Frameworkfor Measuring Preferences for Schools and Neighborhoods, Journal ofPolitical Economy, 115, 588638

4. P. Bayer, McMillan R., Rueben K., An Equilibrium Model of Sortingin an Urban Housing Market, NBER Working Paper No. 10865, 2004

5. P. Bajari and M. Kahn, Estimating Housing Demand with an Appli-cation For Explaining Racial Segregation in Cities, Journal of Businessand Economic Statistics, 2004

6. S. Berry, J. Levinsohn, and A. Pakes, Automobile Prices in MarketEquilibrium, Econometrica, 63(4), 841-90, 1995

7. D. Epple, Hedonic prices and implicit markets: Estimating demandand supply functions of differentiated products, vol. 95, no. 1, 1987

8. A. Nevo, Measuring Market Power in the Ready-to-Eat Cereal Industry,Econometrica , 69(2), 307-342, 2001

9. S. Rosen, Hedonic prices and implicit markets, Journal of PoliticalEconomy 82, 34- 55,1974.

9