22
Extracting Urban Land Use from Linked Open Geospatial Data Gloria Re Calegari, Emanuela Carlino, Irene Celino, Diego Peroni Paper available at: http://www.mdpi.com/2220-9964/4/4/2109 1 - International Journal of Geo-Information -

Extracting Urban Land Use from Linked Open Geospatial

Embed Size (px)

Citation preview

Page 1: Extracting Urban Land Use from Linked Open Geospatial

Extracting Urban Land Use from Linked Open Geospatial Data

Gloria Re Calegari, Emanuela Carlino, Irene Celino, Diego Peroni

Paper available at:http://www.mdpi.com/2220-9964/4/4/2109

1- International Journal of Geo-Information -

Page 2: Extracting Urban Land Use from Linked Open Geospatial

Urban planning and land use

Urban planning deals with the improvement of people’s and communities’ welfare by creating more convenient, sustainable and attractive places.

In terms of land use, urban planning regards the management and the modifications to the environment.

In a city, monitoring the changes of land use is of utmost importance to drive and support a sustainable urbanization process.

BUT: Detecting and reporting land use modifications is not a trivial task. It requires an expensive and partially manual process to collect, integrate and make sense of urban information.

2- International Journal of Geo-Information -

Page 3: Extracting Urban Land Use from Linked Open Geospatial

Proposed solution

An innovative solution for extracting urban land use and support smart cities’ planning activities.

Predict expensive land use geo-information using free linked open geospatial data related to urban environments.

Repeatability and Generality of the approach. Replication of the experiments in 4 different European cities (Milano, München, Barcelona and Brussels)

Classificationmethods

Point of Interest POI from OpenStreetMap

CORINE land use

3- International Journal of Geo-Information -

Page 4: Extracting Urban Land Use from Linked Open Geospatial

Datasets

The same spatial resolution for the 4 selected cities:

• area of 625 square km

• grid of 10,000 square cells of 250 meters.

The urban grids are used as uniform spatial characterization for the input and output data.

The cell is the atomic unit for our analysis.

4- International Journal of Geo-Information -

Page 5: Extracting Urban Land Use from Linked Open Geospatial

POI categories from LinkedGeoData

Select a set of 50 POI categories that can characterize the urban landscape.

5- International Journal of Geo-Information -

Page 6: Extracting Urban Land Use from Linked Open Geospatial

Distances from cell to POI category

• Describe each grid cell in terms of its surrounding environment .

• For each POI category, compute the distance fromthe cell center to the closest POI of the given category.

• Each cell is described by 50 distance values, one for each POI category.

10 30 20cell 1

lgdo:Pub

lgdo:Bank

lgdo:Hotel

6- International Journal of Geo-Information -

Page 7: Extracting Urban Land Use from Linked Open Geospatial

CORINE Land Use

Each cell is described by its composition (percentages) with respect to the CORINE taxonomy.The cell is labeled with its predominant land use (category that covers the largest share of the cell area)

30% residential

20% industrial50%

residential

• Existing land use classification provided by the CORINE programme

• Land use type defined in the CORINE taxonomy• 3-levels hierarchy, up to 40 types of land use defined

• CORINE taxonomy http://swa.cefriel.it/ontologies/corine.html#

Land use as Raster Information -> project it onto the grid cellresolution level.

RESIDENTIAL

7- International Journal of Geo-Information -

Page 8: Extracting Urban Land Use from Linked Open Geospatial

CORINE Land Use composition at cell level

Reduction of the 40+ CORINEcategories using clusteringmethods, obtaining 5 maincategories:

• Dense residential• Sparse residential• Industrial• Agricultural• Nature

Clear differences in terms of landuse distribution between cities (itreflects the intrinsic structure andnature of a city)

8- International Journal of Geo-Information -

Page 9: Extracting Urban Land Use from Linked Open Geospatial

Land Use Classification Experiments

Train SVM classification models to classify the urban environment according to its land use

Modelgeneralization

SVM modelX = Distances from POIs

Three different experiments (from specific to general):

1. City-Specific Model Selection

2. Cross-City Model Selection with Some Background Knowledge

3. Cross-City Model Selection without Any Background Knowledge

Dense Residential

Y =

Sparse Residential

Industrial

Agricultural

Nature

9- International Journal of Geo-Information -

Page 10: Extracting Urban Land Use from Linked Open Geospatial

1) City-Specific Model Selection

Classify the land use of each single city separately, training a model for each city and predicting unseen data of the same city.

Milano SVM model

MilanoCells

Brusssels SVM model

BrusselsCells

industrialindustrial

10- International Journal of Geo-Information -

Page 11: Extracting Urban Land Use from Linked Open Geospatial

Quantitative Errors Analysis

München Brussels

How do the prediction errors spread across all of the classes?

11- International Journal of Geo-Information -

Page 12: Extracting Urban Land Use from Linked Open Geospatial

Qualitative Errors Analysis

• All of the errors lie on the“boundaries” between theareas with homogeneousland use

• The mistaken class of a cellis the one of its adjacent cells-> cells made up of mixedland uses, while weconsidered the predominantland use only

12- International Journal of Geo-Information -

Page 13: Extracting Urban Land Use from Linked Open Geospatial

2) Cross-City Model Selection with Some Background KnowledgeCreate a single model suitable for predicting multiple cities and trained using some previous knowledge about

all of the cities involved.

Rationale: these classification models could be used to update land use maps (e.g. to identify specific areas in which the land use could have changed)

Model trained on a subset of

MIL.BRU.BAR.MUE train cells

MilanoCells

industrial

Two sampling strategies for the training set:• balanced on cities, balanced on classes (BCi.BCl): 200 cells for each class and for each city for the training set (4,000 cell)• stratified on cities, stratified on classes (SCi.SCl):a third of the original 40,000 cells as the training set, respecting the original proportion of five classes across cities

Brussels Cells

Muenchen Cells

Barcelona Cells

13- International Journal of Geo-Information -

Page 14: Extracting Urban Land Use from Linked Open Geospatial

Prediction accuracy and errorsMünchen

The higher the number of training cells, the more reliable the prediction -> be careful to overfitting!

Milano

14- International Journal of Geo-Information -

Page 15: Extracting Urban Land Use from Linked Open Geospatial

3) Cross-City Model Selection without Any Background Knowledge

Predict a city using the models trained on multiple different cities, i.e., without any previousknowledge about the city to be predicted. For example, predicting Milano land-use using a modelbuilt on Barcelona, Brussels, Muenchen data.

Model trainedon a subset of BRU.BAR.MUE

train cells

industrial

Three sampling strategies for the training set:•balanced on cities, balanced on classes (BCi.BCl): 200 cells for each class and for each city for the training set (3,000 cell)• balanced on cities and stratified on classes (BCi.SCl): one third of the original 30,000 observations in a stratified way according to classes and in balanced way with respect to cities• stratified on cities, stratified on classes (SCi.SCl): a third of the original 30,000 cells as the training set, respecting the original proportion of five classes across cities

MilanoCells

15- International Journal of Geo-Information -

Page 16: Extracting Urban Land Use from Linked Open Geospatial

Selection of the predictors

Three subset of predictors to avoid overfitting:

- All 50 predictors

- top 5 predictors according to information gain

- top 11 predictors according to information gain

Similar distribution of the the top 5 predictor in the 4 cities -> a cross-city model built on these five predictors could suitably describe the patterns of different cities

16- International Journal of Geo-Information -

Page 17: Extracting Urban Land Use from Linked Open Geospatial

Results: overall accuracy

• no values higher than 50%• the lower the number of predictors, the higher the overall accuracy

Overall accuracy of the predictions obtained with all the possible combinations in terms of sampling strategies and number of predictors.

17- International Journal of Geo-Information -

Page 18: Extracting Urban Land Use from Linked Open Geospatial

Quantitative errors analysisMünchen Barcelona

• High misclassification error between the “Dense res.” and the “Sparse res.”• Difficulties in predicting the “Industrial” class• Relevant misclassification errors between the “Agricultural” and “Nature” classes

• “Dense res.” and “Nature” are the best predicted classes• “Agricultural” , “Sparse res.” and “Industrial” are not correctly modeled

A single model could fail to be general enough to predict other unknown urban environments

18- International Journal of Geo-Information -

Page 19: Extracting Urban Land Use from Linked Open Geospatial

Two-proportion Z-test

Verify if the difference in sensitivity and specificity values between the cities is statistically significant.

• the difference between cities is statistically significant (white cells) • for the “Dense residential” class the difference is almost always not statistically significant -> experiments are focused on urban areas, and indeed, this land use type is more typical for cities.

The limitation for adopting a single model trained without any previous knowledge probably lies in the intrinsic peculiarities of each city

19- International Journal of Geo-Information -

Page 20: Extracting Urban Land Use from Linked Open Geospatial

Classification with two levels

As the two-proportion Z-test suggest, we try to build a single cross-city classification model only for residential land use typology.

Most of the errors (blue and green dots) lie again on the “boundaries”

München

20- International Journal of Geo-Information -

Page 21: Extracting Urban Land Use from Linked Open Geospatial

Final discussion

• Very good results (overall accuracy > 75%) using a classifier that takes intoaccount some background knowledge of the city during the training phase

• Repeatability and generality of the methodology-> comparable results on the fourcities

• Methodology useful for monitoring and detecting the land use changes

• Limits: predicting the urban land use without any background knowledge of thecity itself• Intrinsic peculiarities of each city

• The distances to the closest POI are extracted from OpenStreetMap which is a VGI (varyinglevels of data completeness and reliability from place to place)

21- International Journal of Geo-Information -

Page 22: Extracting Urban Land Use from Linked Open Geospatial

Conclusion and Future work

• Open geospatial data can be an additional and relevant input information in urban planning field

• Linked open geospatial data can be successfully used for producing or updating other expensive spatial data (classification of the city land use using the urban POIs)

In the near future..

• Make our methodology applicable to any urban environment

• Moving from a classification with a 5 levels to the full CORINE taxonomy (40+ classes)

• Deeper analysis of the “quality” of OpenStreetMap data

• Improving the POI selection to get the best possible coverage of all of the land use types

• Extending our experiments to make the resulting solution more effective and robust• Combine and complement the geo-information from OpenStreetMap with other heterogeneous

sources

22- International Journal of Geo-Information -