14
Review of Economics & Finance Submitted on 24/Jan./2011 Article ID: 1923-7529-2011-02-69-14 Beatriz Larraz ~ 69 ~ An Expert System for Online Residential Properties Valuation 8 Beatriz Larraz Faculty of Law and Social Sciences, University of Castilla-La Mancha Cobertizo San Pedro Martir, s/n, 45071 Toledo, SPAIN Tel: +34-636-277-824 E-mail: [email protected] Abstract: Recent legislation in Spain states the revision of the mortgaged residential property value. This fact has become important since 2008 due to the fall in the average housing prices in Spain. Traditionally, residential properties valuation procedure is based on the comparison to the nearest property prices being necessary to consult human experts. This article presents an expert system, which allows not only punctual but also big portfolios residential properties valuations in the financial market framework using spatial statistical methods (kriging). The final report provides online an immediate updated market price estimation of each residential property sited on Spanish territory. JEL Classifications: G21, R31, R32, C01, C61, C13 Keywords: Expert system, House prices, Hedonic method, Kriging, Banking, Portfolios valuation 1. Introduction In today‟s volatile real estate market, a precise valuation of housing is of great importance for owners, investors, assessors, financial institutions, banks, credit agencies and insurance companies. In the financial market, it is essential for lenders to have a valuation of the real property in order to determine whether to provide and how to price a mortgage (Ming-Shann et al. 2009). In addition, in the real estate market, valuations are necessary to perform any market analysis and pricing. Banco de España, the national central bank and supervisor of the Spanish banking system, recently amended its legislation on credit institutions (BOE, 2008), in compliance with European Union (EU) directives (EC, 2006a; EC, 2006b). The aim of these directives are “the introduction of rules concerning the taking up and pursuit of the business of credit institutions, and their prudential supervision” and “the establishment of the capital adequacy requirements applying to investment firms and credit institutions, the rules for their calculation and the rules for their prudential supervision”, respectively. This new approach supports legislation that guarantees the entities‟ solvency and stability, and attempts to make the legal requirements more sensitive to actual risks. In real estate valuation, the Official State Gazette (BOE, 2008, p. 26518) states that the initial real property valuation must be conducted by an approved valuation company (BOE, 2003). Then, the mortgaged property value has to be revised each year if it is a commercial property or every three years in the case of residential properties. This revision can be carried out through statistical methods, except in the singular properties case. In addition, when the market conditions indicate that general market prices are changing, the credit risk reduction is affected, and additional valuations of the properties in the affected loans should be conducted. Cacdac and Warnock (2008) state that there should be mechanisms for determining the market value of a property; however, the valuation of residential properties has traditionally been based only on a comparison with houses recently sold or listed for sale and on knowledge of neighborhood trends. 8 The author wishes to thank Oscar D. and Susana C. for being interested in real estate research. This research was partially supported by the MICINN project CSO2009-11246.

An Expert System for Online Residential Properties Valuation Expert System for Online Residential... · An Expert System for Online Residential Properties Valuation 8 ... hedonic

  • Upload
    dodan

  • View
    220

  • Download
    3

Embed Size (px)

Citation preview

Review of Economics & Finance

Submitted on 24/Jan./2011

Article ID: 1923-7529-2011-02-69-14 Beatriz Larraz

~ 69 ~

An Expert System for Online Residential Properties Valuation8

Beatriz Larraz

Faculty of Law and Social Sciences, University of Castilla-La Mancha

Cobertizo San Pedro Martir, s/n, 45071 Toledo, SPAIN

Tel: +34-636-277-824 E-mail: [email protected]

Abstract: Recent legislation in Spain states the revision of the mortgaged residential property value.

This fact has become important since 2008 due to the fall in the average housing prices in Spain.

Traditionally, residential properties valuation procedure is based on the comparison to the nearest

property prices being necessary to consult human experts. This article presents an expert system,

which allows not only punctual but also big portfolios residential properties valuations in the financial

market framework using spatial statistical methods (kriging). The final report provides online an

immediate updated market price estimation of each residential property sited on Spanish territory.

JEL Classifications: G21, R31, R32, C01, C61, C13

Keywords: Expert system, House prices, Hedonic method, Kriging, Banking, Portfolios valuation

1. Introduction

In today‟s volatile real estate market, a precise valuation of housing is of great importance for

owners, investors, assessors, financial institutions, banks, credit agencies and insurance companies. In

the financial market, it is essential for lenders to have a valuation of the real property in order to

determine whether to provide and how to price a mortgage (Ming-Shann et al. 2009). In addition, in

the real estate market, valuations are necessary to perform any market analysis and pricing.

Banco de España, the national central bank and supervisor of the Spanish banking system,

recently amended its legislation on credit institutions (BOE, 2008), in compliance with European

Union (EU) directives (EC, 2006a; EC, 2006b). The aim of these directives are “the introduction of

rules concerning the taking up and pursuit of the business of credit institutions, and their prudential

supervision” and “the establishment of the capital adequacy requirements applying to investment firms

and credit institutions, the rules for their calculation and the rules for their prudential supervision”,

respectively. This new approach supports legislation that guarantees the entities‟ solvency and

stability, and attempts to make the legal requirements more sensitive to actual risks. In real estate

valuation, the Official State Gazette (BOE, 2008, p. 26518) states that the initial real property

valuation must be conducted by an approved valuation company (BOE, 2003). Then, the mortgaged

property value has to be revised each year if it is a commercial property or every three years in the

case of residential properties. This revision can be carried out through statistical methods, except in

the singular properties case. In addition, when the market conditions indicate that general market

prices are changing, the credit risk reduction is affected, and additional valuations of the properties in

the affected loans should be conducted.

Cacdac and Warnock (2008) state that there should be mechanisms for determining the market

value of a property; however, the valuation of residential properties has traditionally been based only

on a comparison with houses recently sold or listed for sale and on knowledge of neighborhood trends. 8 The author wishes to thank Oscar D. and Susana C. for being interested in real estate research. This

research was partially supported by the MICINN project CSO2009-11246.

ISSNs: 1923-7529; 1923-8401 © 2011 Academic Research Centre of Canada

~ 70 ~

Though Belsky et al. (1998) stated more than ten years ago that “spatial research is critical in building

next-generation mortgage finance business applications”, the reality is that real estate and financial

markets have not incorporated it on their research. In Spain, following national guidelines for

producing estimates of the current market value of a property (BOE, 2003), a property assessor needs

to physically visit the property; however, in the last few years, some online valuation systems have

been developed. The traditional comparison method is not statistically advanced and may be biased,

because of the comparison procedure with its own valuations.

Nevertheless, in the last two decades, several studies in the statistical and real estate literature

have recommended improvements to the real estate valuation procedures. Each study has improved

upon the estimation capacity of earlier ones, either increasing the number of housing characteristics

considered or developing new valuation methods. In this sense, most of the articles were based upon

hedonic models, which began with Rosen (1974). Malpezzi (2002) made a selective revision of the

hedonic models applied to real estate valuation, and Goodman and Thibodeau (2003) developed an

interesting application in Dallas County (USA). Similarly, Stevenson (2004) applied hedonic pricing

models in Boston (USA) and Núñez et al. (2007) in Cordoba (Spain). Ellen and Voicu (2006) and

Ellen et al. (2007) use hedonic regression models in New York (USA) that explain the sale price of a

property.

Approximately twenty years ago, artificial intelligence was designed to replicate the human

brain‟s learning process. Neural networks have been applied to real estate valuation processes. Notable

studies include Worzala et al. (1995) in Colorado (USA), Limsombunchai et al. (2004) in New

Zealand, García Rubio (2004) in Albacete (Spain), Caridad et al. (2008) in Córdoba (Spain) and

Peterson and Flanagan (2009) in Wake County, North Carolina (USA). Additionally, spatial

econometric approaches have been used to estimate housing prices; e.g., Brasington and Hite (2005)

developed spatial hedonic regressions in six North American cities and Anselin and Lozano-Gracia

(2008) applies similar methods to Southern California (USA). The analytic network process has also

been applied, though infrequently, to property valuation, and this approach combines quantitative and

qualitative attributes (Aznar et al., 2009). Brint (2009) predicted a house‟s selling price through

inflating its previous selling prices using the information provided by repeat sales. Finally, in a

geostatistics framework, kriging methods have been applied to punctual real estate price estimation, as

first used by Chica-Olmo (1995, 2007) in Granada (Spain) and also used more recently, e.g., Montero

and Larraz (2006, 2009) in Toledo (Spain).

An automated system that allows real-time (for not only punctual, but also large portfolios)

valuations in the financial market framework needs to be developed, and the statistical techniques also

need to be updated. There has been decrease in the average housing prices, as published quarterly in

Spain (an annual decrease of 7.8% last year as reported by the Housing Ministry in the third quarter of

2009— Housing Ministry, 2009), and also a worrying decrease of confidence in the housing market.

Therefore, a new, statistically sound model for estimating housing prices that more completely

presents with absolute transparency would help fill an important information gap and improve the

efficiency of the real estate market.

The aim of this paper is to provide software that will solve this valuation problem in an automated

manner, where previously one or more human experts would need to be consulted. This expert system,

the „Residential Properties Valuation Report‟ (RPVR) provides (i) an immediate, online and complete

description of the residential property that is being evaluated in Spain (including the information

provided by the land registry about the charges); (ii) real-time and updated market price estimation

calculated using the most advanced statistical methods currently available; (iii) a statistical estimation

procedure (the most accurate one given the available data, usually obtained using kriging); and (iv) the

change in the neighbourhood average quarterly prices in the last years as well as the environmental

conditions.

In particular, data describing the real estate properties have been obtained from the Virtual Office

Review of Economics & Finance

~ 71 ~

of Cadastre, which provides a description of the property (surfaces, location, use, shape, boundaries,

cartographic representation, type and quality of constructions), legal information (identification of

holders or owners) and economic information (cadastral values of buildings, valuation criteria). This

system also considers other useful information (e.g., property condition or quality) that improve the

property valuation.

RPVR provides information on the neighbourhood in terms of communications, noise

contamination, green zones, crime, street cleanliness, and local resources (hospitals, schools,

commercial properties, filling stations). These variables were obtained from the last census conducted

by the National Statistics Institute (INE, 2009).

Lastly, but most importantly, RPVR provides a market price valuation using spatial estimation

methods (kriging when possible) to analyze data from home sales advertisements. Data will be

obtained quarterly. To provide punctual estimation, this study used the most convenient database and

the best estimation method (Section 2). Section 3 describes the estimation procedure. Section 4

presents the estimation results obtained in this project. Finally, Section 5 presents the conclusions and

areas for future research.

2. Selecting the sample data set and estimation method

This section analyses the options available to construct a sample data set that could be used in

RPVR in Spain, where data on transaction prices are not available. The variables examined include

housing prices, geographical location and characteristics of residential properties that were sold or

listed for sale in the previous three months. It also describes the advantages and disadvantages of the

estimation methods that could be used to valuate residential properties.

Spanish General Notaries Council and Land Registry (transaction data): Land registry and

regional notaries councils in Spain have prices and deeds (describing all the physical characteristics)

of the properties that have been sold. However, these databases are not suitable because estimations

may be biased downwards, where portions of the transactions may have been undeclared in order to

avoid paying taxes. Moreover, these are private data and are not available.

Real estate agencies (transaction data): These agencies have the real transaction prices of all the

properties that have been sold by an agency. However, this database includes neither geographical

locations nor data from properties that have been sold „person to person‟. Moreover, due to the large

number of real estate agencies in Spain (more than 180,000 in 2006, see ABC, 2007) it is impossible

to obtain this information for the whole country in an automated way.

Real estate advertisements (intended sales prices): Housing prices provided by these sources are

not the actual selling prices, but they best capture the „market price‟ as defined by official authorities

in Spain (see Housing Ministry, 2009). Real estate websites have become open references on housing

price evolution in Spain as an alternative to the statistics generated by valuation companies (including

banks and financial entities), consultants, or even Spanish and regional governments. In fact, the

Housing Ministry did not detect the decrease in housing prices until January 2009 (data from Q4 2008

corresponded to a -3.2% annual variation rate, Housing Ministry, 2009) when the first decrease in the

last fifteen years was official. As stated in the introduction, this study chose this data source due to the

great difficulty of obtaining real information from the market and because it is possible to

automatically obtain a large and complete quarterly data set. Consequently, final estimations will

reflect the market prices.

Nevertheless, it is important to stress the great variability among housing prices even when

considering neighbouring properties with similar characteristics, which underscores the complexity of

this project and the non-existence of a single housing price.

ISSNs: 1923-7529; 1923-8401 © 2011 Academic Research Centre of Canada

~ 72 ~

According to Montero and Larraz (2010), housing prices not only depend on specific housing

characteristics but also on the spatial location of a house in relation to its environment. From this point

of view, in order to provide statistically sound, punctual estimations, it is necessary to use statistical

tools that take into account the importance of space. It is essential to consider the geographical

coordinates of sample locations and the location of the focal property.

Nevertheless, the real estate literature still describes methods that do not consider space, e.g., the

most commonly used complete hedonic models or artificial neural networks. In the first case, even

suggesting a hedonic model that is as complete as possible in considering the relevant characteristics,

including local neighbourhood ones, we get a housing price explained by the independent variables

(Goodman, 1978). However, this method only considers the general averaged characteristics of the

census track of the property and does not take into account geographical location or distance to sample

locations. Consequently, the price would be the same for every similar residential property in the same

census track (see Caridad et al. 2008) and does not allow for accurate punctual valuations. The second

method, the neural network, is a non-linear, statistical data-modelling tool, and it is an abstract

simulation of the biological neural networks formed by interconnected nodes. This replicates the

human learning process, where the structure is based on external or internal information flowing

through the network during the learning phase. The basic neural network model consists of an input

layer (housing characteristics), one or more hidden layers, and an output layer, which corresponds to

the housing price estimation. After the training phase, the neural network „identifies‟ certain

characteristics with a certain price. Though this method can include the geographical locations as

another property feature (García Rubio, 2004), in general, most of the real estate applications do not

take space into account (Limsombunchai et al., 2004). Finally, a comparative model based on the

analytic network process (Aznar et al. 2009) is suitable where scant information is available or when

qualitative variables are involved in the model. However, it is not suitable for the valuation of large

portfolios.

As an alternative, spatial statistics and spatial econometrics consider the data‟s geographical

references as the most important feature of the property due to the great amount of valid information it

contains. As Tobler (1979) states, “Everything is related to everything else, but closer things more so”.

Spatial econometrics differs from traditional econometrics in two respects: (i) it considers the spatial

dependence among sample variables; and (ii) it attends to the spatial heterogeneity in the model

parameters, which change across space. Geographical locations are added to the model through a

weighting matrix (see Anselin, 1998). Spatial statistics or geostatistics incorporate the structure of

spatial dependence among variables into classic statistics. Tobler‟s statement forced traditional

statisticians to consider the spatial dependence of georeferenced variables. In earth sciences, variable

values tend to be more similar as the distance between locations decreases; this is also true for

economic variables, especially real estate prices, unemployment rate, Gross Domestic Product per

capita, and educational level.

In geostatistics, kriging methods represent best use of the structure of the spatial dependence that

arises in residential property prices in order to predict the value at a non-observed location. This is

kriging's main advantage compared to other interpolation methods (e.g., inverse distance weighting in

Johnston et al., 2001 or Li and Revesz, 2002; splines in Goodman and O‟Rourke, 1997; polynomial

regression in Buchanan, 1995, Zienkiewikz and Taylor, 2000 or Li and Revesz, 2004; n-D Delaunay

tessellation in Watson, 1981, among others). Kriging is a minimum mean squared error statistical

procedure for spatial prediction that assigns a differential weight to observations that are spatially

closer to the dependent variable's location, and kriging provides the best unbiased lineal punctual

estimate. With ordinary kriging, the weights sum to one and are derived from the estimated variogram.

(See Wackernagel, 2003, chapter 11, for statistical details on kriging.) Kriging estimates are more

accurate than those obtained from any other lineal estimator and the most accurate in the Gaussian

case.

Review of Economics & Finance

~ 73 ~

3. Estimation procedure

The estimation of market prices in RPVR consists of five phases: (i) collecting sample data from

real estate sales; (ii) standardizing and mapping the postal addresses from these data; (iii) converting

the sale prices data set into prices of properties with the same characteristics; (iv) using interpolation

methods that depend on the sample size of each municipality; and finally, (v) using cross-validation to

evaluate the forecasting accuracy of the models (see Figure 1). The expert system was programmed

using the R language (R Development Core Team, 2008), while data were managed using an SQL

Server (see Figure 2). All phases were implemented using batch processing.

Figure 1. Valuation procedure

RPVR maps the postal addresses of the database using an expert system previously implemented.

Once every residential property possesses the geographical coordinates (latitude and longitude) of the

number and street where it is located, then the coordinates are transformed into Universal Transverse

Mercator coordinates (UTM). Then, we calculated the distance between each pair of sample locations

as well as the distance between sample locations and the property we were valuating. We fixed the

UTM Zone when the boundaries of the municipality were in multiple zones (i.e., 29N or 30N) in order

to calculate the distances.

ISSNs: 1923-7529; 1923-8401 © 2011 Academic Research Centre of Canada

~ 74 ~

Figure 2. Residential Properties Valuation Report procedure.

Table 1. Characteristics of the residential properties data set

Variable Description Data Source

Price Total price of the announced property (€) Websites

Floor surface Constructed surface without common zones (m

2)

Virtual Office of Cadastre

Price per square meter Price/Floor surface (€/m2)

Geographical coordinates

Latitude-longitude and UTM coordinates Own elaboration(*)

Age of the structure Property age (intervals) Virtual Office of Cadastre

Floor (**)

Floor on which the residential property is located

Virtual Office of Cadastre

Type Big apartment, penthouse, apartment, loft, duplex apartment, house.

Websites

Bedrooms Number of rooms excluding kitchens, bathrooms and living rooms

Websites

Bathrooms Number of bathrooms and toilettes included in the property

Websites

Condition A property can be in a good condition or needs renovation

Websites

Quality We considered three categories: normal, good, and luxury

Websites

Heating Yes/No Websites Basement Yes/No Websites Swimming pool Yes/No Websites Garden or common areas

Yes/No Websites

Garage Yes/No Websites Elevator

(**) Yes/No Websites

Notes: (*) Own elaboration from street and number; (**) Not considered for single-family houses.

Review of Economics & Finance

~ 75 ~

The mapped data set consists of 434,618 residential properties (Table 1) from June 2008. Of these

properties, 80.18% were apartments and 19.82% were single-family houses. The estimation procedure

grouped properties according to habitability and architectural differences, which implies a range of

prices even in neighboring locations.

In order to debug the data, we identified inaccuracies and outliers in the surface and price

variables. Inaccuracies were attributed to errors in the adv.. We excluded apartments that were smaller

than 20 m2 or bigger than 850 m

2 and houses that were smaller than 20 m

2 or bigger than 5,000 m

2.

We considered properties that were reported to be less than 100 €/m2 or more than 15,000 €/m

2 to be

inaccuracies. When detecting outliers, we replaced 2.5% of the sample at the high and low ends with

the most extreme remaining values. After debugging and geo-referencing, the final data set consisted

of 165,872 advertisements.

Table 2. Distribution of housing characteristics

Variables Ranges and categories

Price/m2 100-500 500-1000 1000-2000

2000- 3000

3000- 4000

4000- 5000

5000- 7000

7000- 15000

0.78% 5.66% 43.05% 34.31% 12.21% 2.94% 0.90% 0.45%

Age of the structure

0-5 5-10 10-20 20-30 30-50 50-70 > 70 NA

18.74% 10.33% 8.13% 9.87% 8.05% 0.41% 0.24% 44.23%

Floor (*) 1 2 3 4 5 6 7 NA

20.52% 17.56% 13.17% 8.99% 4.27% 2.25% 1.34% 19.24%

Type Big apart.

Penthouse Apart. Loft Duplex House

57.12% 5.40% 3.78% 0.20% 6.12% 27.38%

Condition Good Reform. need

34.58% 65.42%

Quality Normal Good Luxury

99.88% 0.07% 0.05%

No Yes N.A.

Heating 39.84% 60.16%

Basement 86.96% 13.04%

Swim-pool 92.75% 7.25%

Garden (a) 89.37% 10.63%

Garage 80.79% 19.21%

Elevator(*) 12.37% 38.19% 49.45%

Note: Own elaboration. (*) Not available for houses. (a) Garden or common areas

Table 2 reports the percentages by groupings for property characteristics computed with the initial

data set. In particular, 44.2% of data on age of structure were missing, which led us to search the

official Land Registry for these data, as we did in the floor surface case. Most of the data comprise big

apartments (the most common type of dwelling in Spain) and are not very high (usually no more than

seven floors up). Of these properties, 65.4% need some improvements according to the second market

framework.

Then, in order to isolate the spatial component, we transformed the original sample prices into

homogeneous prices, which are the prices of properties with the same characteristics. A homogeneous

apartment was considered to be 80 m2, less than five years old, situated on the first floor, with two

ISSNs: 1923-7529; 1923-8401 © 2011 Academic Research Centre of Canada

~ 76 ~

bedrooms (in addition to the kitchen, bathroom and living room), one bathroom, in need of renovation,

normal quality, without heating, a basement, a swimming pool, a garden, a garage or elevator. A

homogeneous house was 100 m2, less than five years old, with three bedrooms (in addition to the

kitchen, bathroom and living room), two bathrooms, in need of renovation, normal quality, without

heating, a basement, or a swimming pool.

Hedonic price theory assumes that a residential property can be decomposed into separate

components that determine the price. The number or presence of attributes associated with the

commodities defines a set of implicit or “hedonic” prices (Rosen, 1974). These hedonic prices are the

contribution to the total value of the specific property features. They are estimated by employing

multiple regression techniques. In particular, following Shonkwiler et al (1986) and Limsonbunchai et

al (2004), among others, this study employed the semi-log model, which assumes a constant

percentage of partial effects because price is a very sensitive and volatile component. We created one

hedonic model for each municipality where more than 400 advertisements were available. In the

remaining municipalities, we considered the neighbouring properties for sale through a cluster

procedure joining homogeneous municipalities. To estimate the contribution of each feature, 434,618

data were considered, even those that were not geo-referenced.

Once the data set contains homogeneous prices, we estimated the prices of residential properties

that were located in Spanish territory for 8,108 municipalities. The estimation procedures used for

each municipality depended on the number of properties available: where the number of mapped

properties exceeded 100 (including the largest cities, Madrid, Barcelona, Valencia or Seville), kriging

estimations were computed (following Montero and Larraz, 2008); where only 11 to 99 mapped

properties were available, interpolations were conducted with the inverse distance weighting method

(IDW, see Li and Revesz, 2002), which is a weighting mechanism that assigns more influence to the

data points near the location where the value is being estimated. In rural areas where fewer than ten

mapped properties were for sale, the averages for values from non-mapped properties were included.

In municipalities with less than 10 data (mapped or non-mapped), we aggregated neighbouring

properties. Table 3 reports the characteristics of the estimation procedure. Finally, the homogenisation

procedure was undone to obtain final estimations using the inverse coefficients resulting from the

hedonic model.

Table 3. Criteria and characteristics of the estimation procedure.

Municipalities Sample size Percentage

Apt. Houses Apt. Houses Apt. Houses

Kriging >100 mapped data 115 7 88,872 1,541 66.80% 4.69%

IDW 11 to 99 mapped data 656 511 37,710 22,291 28.34% 67.89%

Municipality average

>10 data (mapped and non-mapped)

710 1,032 4,909 6,135 3.69% 18.68%

Zone average <10 data (mapped and non-mapped)

6,627 6,558 1,550 2,864 1.16% 8.72%

Total 8,108 8,108 133,041 32,831 100% 100%

Note: Own elaboration. Apt.: Apartments

4. Results This section reports the main results of the project phases. In order to illustrate the results of the

homogenisation procedure, statistically significant coefficients from the hedonic model for the city of

Madrid are depicted in Table 4. To avoid multi-collinearity, each toilet was considered a bathroom.

Weighted least squares was utilised. The signs of the coefficients of the model are in the directions

hypothesised (i.e., less than or greater than one after exponential). Variables were removed when the

Review of Economics & Finance

~ 77 ~

signs of their coefficients were not in the direction predicted. In particular, the age of the building had

a coefficient greater than one, which does not fit well with general opinion (the age made the building

undesirable); older and more expensive buildings are, in general, located in the city centre while new

and cheaper buildings are in outlying areas. The coefficients from the model for properties in Madrid

indicate that the higher the property, the cheaper the price per square meter. Penthouses and small

apartments were priced higher than big apartments, lofts and duplexes. People preferred the fifth or

sixth floor and, of course, luxury apartments were more costly than normal apartments, and having an

elevator also increased the price.

Table 4. Parameter estimates from the hedonic model for Madrid.

Variable Estimate t-value P(>׀t׀) Explanation

Intercept 3,932.38 1,924.4 <2e-16

NumM2H 0.9990 -14.6 <2e-16 Decrease in 0.1% for each m

2 above 80 m

2

CodType P1 1.0533 10.0 <2e-16 Increase in 5.3% if penthouse

CodType P2 1.0550 8.8 <2e-16 Increase in 5.5% if small apartment

CodType P3 0.8450 -11.6 <2e-16 Decrease in 15.5% if loft

CodType P4 0.9374 -7.7 <2e-16 Decrease in 6.2% if duplex

NumRooms 01PP

1.1457 33.3 <2e-16 Increase in 14.5% if it has one room

NumRooms 03PP

0.8976 -34.2 <2e-16 Decrease in 10.2% if it has three rooms

NumBathrooms 02PP

1.1131 29.6 <2e-16 Increase in 11.3% if it has two bathrooms

CodFloor 2 1.0171 3.3 0.0008 Increase in 1.7% if it is in the second floor

CodFloor 5 1.0469 6.7 1.68e-11 Increase in 4.5% if it is in the fifth floor

CodFloor 6 1.0735 8.7 <2e-16 Increase in 7.3% if it is in the sixth floor

CodFloor 12 0.9093 -7.3 2.69e-13 Decrease in 9.0% if it is in the twelfth floor

CodQuality 2 1.0500 2.5 0.0097 Increase in 5.0% if it is good quality

CodQuality 3 1.0944 3.8 0.0001 Increase in 9.4% if it is luxury

Elevator 1 0.8883 -27.8 <2e-16 Decrease in 11.1% if it does not have an elevator

Elevator 2 1.0817 15.7 <2e-16 Increase in 8.1% if it has an elevator

R2

0.2

F-statistic 368.9 on 29

<2.2e-16

Note: Own elaboration. Being left out category for surface (80 m2), age (<5 years old), floor (first),

bedrooms (2), bathrooms (1), condition (renovations needed), quality (normal), heating (no), basement (no), swimming-pool (no), garden (no), garage (no), elevator (no information).

In the municipalities where kriging was used, the variogram was first computed to estimate the

structure of spatial dependence. The developed software chooses the best theoretical variogram model

by minimizing the sum of squares error. It determines the most suitable stationary variogram model

and the most suitable neighbourhood used to calculate it, as well. Figure 3 shows the experimental and

fitted variograms for the cities of Madrid and Barcelona. The high variability of housing prices is

apparent at small distances. The clear structure of the spatial dependence is easily modelled in both

cases (where it follows an exponential model in Madrid and a spherical model in Barcelona).

ISSNs: 1923-7529; 1923-8401 © 2011 Academic Research Centre of Canada

~ 78 ~

Figure 3. Variogram fitting: cases of Madrid and Barcelona

Goodness-of-fit statistics were calculated using cross-validation, based on a leave-one-out

criterion (e.g., Sinclair and Blackwell 2002, p.221). Specifically, each observation was removed in

turn and the variograms (Figure 3) were used for model estimation. The resulting model was then used

to predict the value of the property at the location of the point that was removed, which was then

compared to the actual observed value, and the robustness of was assessed. Predicted values are robust

when their standardised values are in the interval [-2.5, 2.5].

After the three estimation procedures were conducted (kriging, inverse distance weighting or

average), the distributions of different evaluation measures were compared to analyse the accuracy of

the techniques. Table 5 reports three forecasting accuracy statistics. The root mean square error

(RMSE) measures the square root of the mean of the square differences between final estimations and

real prices, 21 ˆ

i in ip p ; absolute mean error (AME) measures the mean of the differences in

absolute values between final estimations and real prices, 1 ˆi in i

p p ; and relative mean error (RME)

measures the mean of the relative differences in absolute values between final estimations and real

prices in percentages, 1 ˆ 100i i in ip p p . The results obtained for the ten cities with the most

advertisements are reported.

Following Table 5, predictions were very good on average, with a 13% relative mean error for the

whole country. Madrid, which had the greatest number of advertisements, had an even lower RME

(12%). In the tourist city of Murcia, the results were not as accurate on average (RME of 19%) due to

the great variability in the database, which often occurs in coastal municipalities due to real estate

investment and speculation. Furthermore, the absolute mean error indicates that valuations differ from

their real values by almost 42,000 €/house in Madrid (about 33,000 €/house on average in Spain),

which is a small difference relative to the high prices of residential properties in the capital.

As Table 5 reported, the RME indicates that better results were obtained when kriging was used,

followed by IDW and the simple average. However, RMSE and AME show higher differences in

kriging than in IDW and the simple average; this is because kriging was applied in the larger cities

where prices per square meter were also higher.

Madrid Barcelona

Review of Economics & Finance

~ 79 ~

Table 5. Forecast evaluation results for the cities with the most home sales:

average accuracy statistics, depending on the estimation method used

Municipality RMSE (a)

AME (a)

RME Number of

advertisements

Madrid 58,193.54 40,895.12 12.11% 19,247

Barcelona 69,435.41 49,076.91 13.10% 4,819

Valencia 65,508.89 46,171.84 16.25% 2,934

Sevilla 70,439.90 48,365.82 15.88% 2,068

Málaga 47,406.33 33,139.20 13.59% 2,052

Alicante 55,241.88 37,207.11 16.68% 2,028

Zaragoza 56,027.41 39,276.86 13.85% 1,938

Palma de Mallorca

56,561.09 37,441.14 13.94% 1,306

Terrasa 41,488.51 31,539.80 11.51% 1,235

Murcia 65,850.96 44,982.73 19.03% 981

Total Spain 48,497.26 33,058.65 13.16% 165,872

Kriging 43,131.13 33,245.12 12.74% 90,413

IDW 34,861.97 27,409.88 13.78% 60,001

Average

31,591. 21 24,159.87 14.20% 15,458

Note: Own elaboration. RMSE: root mean square error; AME: absolute mean error; RME: relative mean error. (a) Measured in €/house, on average.

To assess the estimation power of the expert system, the distribution of the relative mean error

was examined. Table 6 reports the percentage and number of advertisements whose relative mean

error lies in the cited intervals (in percentages) as well as the mean error and the number of

advertisements used in the procedure. The results obtained for the four cities with the most data are

reported in Table 6.

Table 6. Distribution of the relative mean errors for the cities with the most advertisements

Madrid Barcelona Valencia Sevilla

% Nº % Nº % Nº % Nº

0-5 25.17 4,844 23.55 1,135 17.92 526 19.20 397

0-10 48.09 9,256 44.74 2,156 35.84 1,052 38.20 790

0-20 80.69 15,530 76.76 3,699 67.22 1,973 68.52 1,417

>20 19.31 3,717 23.24 1,120 32.74 961 31.48 651

No. Advert.

19,247 4,819 2,934 2,068

Mean error 12.11 13.10 16.25 15.88

Note: Own elaboration. RME: relative mean error measured as the percentage of the total real price

Table 6 shows that around 20% of the valuated residential properties have less than 5% error

valuations, data that increases to 25% in Madrid and to 24% in Barcelona. These percentages increase

to almost 50% and 45% of the valuation where the error lies between 0 to 10% in these two cities.

Finally, eight of ten residential properties sited in Madrid are evaluated by the expert system with less

than a 20% of relative mean error, data that changes to 75% in Barcelona and almost 60% in Valencia

and Seville.

ISSNs: 1923-7529; 1923-8401 © 2011 Academic Research Centre of Canada

~ 80 ~

5. Summary

The valuation of real estate properties is important in the real estate and financial markets. This

project developed an expert system to valuate residential properties without consultation from human

experts. In a financial framework, an expert system to estimate housing prices is essential for

appraising collateral properties, securing mortgage loans and valuing real estate portfolios. This

research was motivated by the 2008 Spanish central bank directive, which required financial entities to

evaluate their real estate portfolios in order to guarantee their solvency and stability, in an environment

of decreasing housing prices. This study was also motivated by advances in statistical estimation

techniques that could be applied to this valuation problem.

Linking these ideas, this study presents an expert system called the „Residential Properties

Valuation Report‟. The RPVR can be used to evaluate each of the residential properties in Spain, with

a complete knowledge of the property characteristics and environment, the neighbourhood property

values and distances to the focal property. The RPVR produces maps of housing locations including

attributes on the official quarterly housing prices according to the respective municipality. To valuate

these properties, this study applied the most suitable techniques from geostatistics, kriging methods,

which takes into account the spatial dependence among housing prices. In the program, several steps

are completed to automate the fitting of the variogram. Estimation has been a difficult task due to the

great variability of housing prices (which is captured by variogram models) as well as the great

amount of information that is required. Nevertheless, the results demonstrate the accuracy of the

estimated valuations.

This project fills an important information gap in the real estate and financial markets with

respect to the automatic, immediate and online valuation of the residential properties in a country.

Future research should valuate commercial property prices. Moreover, this project has exclusively

considered information in a punctual, temporal manner (although the information is updated each

quarter every year). Temporal patterns in housing prices have been ignored, even though temporal

factors could affect the final estimates (the same property could have a different price in different

years, even when holding age constant; this time effect will be considered when sufficient information

is available.

References

[1] ABC (2007). Barcelona casi empata con Madrid en número de agencias inmobiliarias. ABC

periodico electronico S.L.U., 9 April 2007. [Online] Available: http://www.abc.es/hemeroteca/

historico-09-04-2007/abc/Catalunya/barcelona-casi-empata-con-madrid-en-numero-de-agencias-

inmobiliarias_1632427900792.html (November 18, 2009).

[2] Anselin, L. and Lozano-Gracia, N. (2008). Errors in variables and spatial effects in hedonic house

price models of ambient air quality. Empirical Economics. 34: 5-34.

[3] Anselin, L. (1998) Spatial econometrics: methods and models.. Dordrecht: Kluwer.

[4] Aznar, J., Ferrís-Oñate, J. and Guijarro, F. (2009). An ANP framenwork for property pricing

combining quantitative and qualitative attributes. Journal of the Operational Research Society:

doi:10.1057/jors.2009.31.

[5] Belsky, E., Can, A. and I. Megbolugbe (1998). A primer on Geographic Information systems in

mortgage finance. Journal of Housing Research. 9,1: 5-31.

[6] BOE (2003). ORDEN ECO/805/2003, de 27 de marzo, sobre normas de valoración de bienes

inmuebles y de determinados derechos para ciertas finalidades financieras. Boletín Oficial del

Estado 85: 13678-13707.

[7] BOE (2008). CIRCULAR 3/2008, de 22 de mayo, del Banco de España, a entidades de crédito,

Review of Economics & Finance

~ 81 ~

sobre determinación y control de los recursos propios mínimos. Boletín Oficial del Estado 140:

26465-26647.

[8] Brasington, D.M. and Hite, D. (2005). Demand for environmental quality: a spatial hedonic

analysis. Regional Science and Urban Economics. 35: 57-82.

[9] Brint, A. (2009). Predicting a house‟s selling price through inflating its previous selling price.

Journal of Operational Research Society. 60: 339-347.

[10] Buchanan, G.R. (1995). Finite element analysis. New York: McGraw-Hill.

[11] Cacdac Warnock, V. and Warcnock, F.E.(2008). Markets and housing finance. Journal of

Housing Economics. 17: 239-251.

[12] Caridad, J.M., Núñez, J. and Ceular, N. (2008). Metodología de precios hedónicos vs. Redes

Neuronales Artificiales como alternativas a la valoración de inmuebles. Un caso real. Revista

CT/Catastro. 62: 27-42.

[13] Chica-Olmo, J. (1995). Spatial Estimation of Housing Pricesand Locational Rents. Urban Studies,

32: 1331-1344.

[14] Chica-Olmo, J. (2007). Prediction of Housing Location Price by a Multivariate Spatial Method:

Cokriging. Journal of Real Estate Research. 29: 91-114.

[15] EC (2006a). DIRECTIVE 2006/48/EC of the European Parliament and of the Council of 14 June

2006 relating to the taking up and pursuit of the business of credit institutions (recast). Official

Journal of the European Union (30.06.2006) L177, 1-200. (Strasbourg).

[16] EC (2006b). DIRECTIVE 2006/49/EC of the European Parliament and of the Council of 14 June

2006 on the capital adequacy of investment firms and credit institutions (recast). Official Journal

of the European Union (30.06.2006), L177, 201-255. (Strasbourg).

[17] Ellen, I.G. and Voicu, I. (2006). Nonprofit housing and neighborhood spillovers. Journal of

Policy Analysis and Management. 25, 1: 31-52.

[18] Ellen, I.G., Schwartz, A.E., Voicu, I. and Schill, M.H. (2007). Does federally subsidized rental

housing depress neighborhood property values? Journal of Policy Analysis and Management. 26,

2: 257-280.

[19] García Rubio, N. (2004). Desarrollo y aplicación de redes neuronales artificiales al mercado

inmobiliario: aplicación a la ciudad de Albacete. Tesis Doctoral. Albacete: Universidad de

Castilla-La Mancha.

[20] Goodman, A.C. (1978). Hedonic Prices, Price Indices and Housing Markets. Journal of Housing

Reseach. 3: 25-42.

[21] Goodman, A.C. and Thibodeau, G.T.(2003). Housing market segmentation and hedonic

prediction accuracy. Journal of Housing Economics. 12: 181-201.

[22] Goodman, J.E. and O‟Rourke, J. (1997). Handbook of discrete and computational geometry. New

York: CRC Press.

[23] Housing Ministry,(2009). Housing Prices Statistics. General Pricing Index. Available from

http://www.mviv.es/, accessed 30 October, 2009.

[24] INE (2009). 2001 Population and Housing Census. Available from http://www.ine.es, accessed 5

November, 2009.

[25] Johnston, K., Ver Hoef, J.M., Krivoruchko, K. and Lucas, N. (2001). Using ArcGIS geostatistical

analyst. Redlands: ESRI Press.

[26] Li, L. and Revesz, P. (2002). A Comparison of Two Spatio-Temporal Interpolation Methods.

Lectures Notes in Computational Science. 2478: 145-160.

[27] Li, L. and Revesz, P. (2004). Interpolation methods for spatio-temporal geographic data.

ISSNs: 1923-7529; 1923-8401 © 2011 Academic Research Centre of Canada

~ 82 ~

Computers, Environment and Urban Systems. 28, 3: 201-227.

[28] Limsonbunchai, V., Gan, C. and Lee, M. (2004). House price prediction: hedonic price model vs.

artificial neural network. American Journal of Applied Sciences. 1, 3:193-201.

[29] Malpezzi, S. (2002). Hedonic Pricing Models: A Selective and Applied Review. Housing

Economics: Essays in Honor of Duncan Maclennan. Edyted by Kenneth Gibb and Anthony

O‟Sullivan.

[30] Ming-Shann, T., Szu-Lang, L. and Shu-Ling, C.(2009). Analyzing yield, duration nad

convexityof motgage loans under prepayment and default risks. Journal of Housing Economics.

18: 92-103.

[31] Montero, J.M. and Larraz, B. (2006). Estimación espacial del precio de la vivienda mediante

métodos de krigeado. Revista Estadística Española, 48, 162: 62-108.

[32] Montero, J.M., and Larraz, B. (2008). Introducción a la Geoestadística Lineal. A Coruña:

Editorial Netbiblo.

[33] Montero, J.M., Larraz, B. and Paez, A. (2009). Estimating Commercial Property Prices: An

Application of Cokriging with Housing Prices as Anciliary Information. Journal of Geographical

Systems. 11: 407-425.

[34] Montero, J.M. and Larraz, B. (2010). Estimating housing prices: A proposal with spatially

correlated data. International Advances in Economic Research, 16: 39-51.

[35] Núñez, J., Ceular, N. and Millán, G.. (2007). Aproximación a la valoración inmobiliaria

mediante la metodología de precios hedónicos (MPH). Conocimiento, innovación y

emprendedores: camino al futuro, coord. por Juan Carlos Ayala Calvo.

[36] Peterson, S. and Flanagan, A.B. (2009). Neural network hedonic pricing models in mass real

estate appraisal. Journal of Real Estate Research. 31,2: 147-164.

[37] R Development Core Team. (2008). R: A language and environment for statistical computing. R

Foundation for Statistical Computing. Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-

project.org

[38] Rosen, S. (1974). Hedonic Prices and Implicit Markets. Product Differenciation in Pure

Competition. Journal of Political Economy. 82: 34-55.

[39] Sinclair, A.J. and Blackwell, G.H. (2002). Applied Mineral Inventory Estimation.. Cambridge:

Cambridge University Press.

[40] Shonkwiler, J.S. and Reynolds, J.E. (1986). A note on the use of hedonic price models in the

analysis of land prices at the urban fringe. Land Economics. Vol.62, no.1.

[41] Stevenson, S. (2004). New empirical evidence on heteroscedasticity in hedonic housing models.

Journal of Housing Economics. 13, 2: 136-153.

[42] Tobler, W. (1979). Cellular geography. In Gale S, Olsson G (Eds), Philosophy in Geography.

Reidel: Dordrecht. 379-386.

[43] Wackernagel, H. (2003). Multivariate Geostatistics: An introduction with applications. 3rd ed.

Berlin: Springer-Verlag.

[44] Watson, D.F. (1981). Computing the n-dimensional delaunay tesselation with applications to

Voronoi Polytopes. The Computer Journal. 24, 2: 167-172.

[45] Worzala, E., Lenk, M. and Silva, A. (1995). An exploration of neural Networks and its

application to real estate valuation. Journal of Real Estate Research. 10, 2: 185-202.

[46] Zienkiewicz, O.C. and Taylor, R.L. (2000). Finite element method. In: The basis 1. London:

Butterworth Heinemann.