13
Knowledge Discovery And Data Mining Predicting The Availability of Parking Spaces in Ljubljana Luis Rei [email protected] http://luisrei.com Report, slides and code available online.

Predicting The Availability of Parking Spaces in Ljubljana

Embed Size (px)

DESCRIPTION

Presentation for my Josef Stefan International Postgraduate School data mining course assignment.Predicts the availability of parking spaces in Ljubljana car parks: 30min, 1h, 2h and 3h intervals.

Citation preview

Knowledge Discovery And Data Mining

Predicting The Availability of Parking Spaces in Ljubljana

Luis Rei [email protected]

http://luisrei.com

Report, slides and code available online.

Parking Spaces• City of Ljubljana (http://www.lpt.si)

• Available via http://opendata.si/

• 11 Car Parks!• Park Name (and id) • Number of Free Spaces!• Total Spaces Available • Price • Coordinates • Timestamp!• Updated Every 5min • From 2011-09-12 to 2013-11-18!

• Test starts: 2013-08-19

The ParksPark Total Spaces*PH Kozolec 248Tivoli I 360Mirje 110Trg MDB 40Gospodarsko raz.

550Bežigrad 62Trg preko. brigad

98Kranjčeva 118Žale II 80Petkovskovo II 85PH Kongresni trg

720

Buyer Beware: Cleanup• Missing data!

• Collection failed: entire months, weeks, days missing • All parks

• Sensor/communication failed: missing entries • Some parks

• Invalid data!• Negative free spaces • (A lot) more free spaces than the total • Null values

• Strategies!• Interpolating • Replacing with the mean (window variables) • Removing (target variable)

Time Series Resampling

2011-01-01 00:00:00 1

2011-01-01 00:45:00 2

2011-01-02 01:30:00 2

2011-01-02 02:15:00 4

2011-01-03 03:00:00 3

2011-01-03 06:00:00 12011-01-03 2.0

How: Mean

2011-01-01 1.5

Interval: Daily

2011-01-02 3.0

How: Min

2011-01-03 1

2011-01-01 1

2011-01-02 2

How: Last

2011-01-03 1

2011-01-01 2

2011-01-02 4

Question (Goal)!At the end of the next time period, how many free spaces will be available in this park?

How: Last

Intervals: {30, 60, 120, 180} min

Sliding Windowsw-2!

past statew-1!

past statew!

current state

Target!future state

Interval!t-2 170 180 190 200

Interval!t-1 180 190 200 210

Interval!t 190 200 210 220

window size = 4

Baselines & Models• Baselines

• Mean • Previous Value

• Models • Linear Regression • Regression Tree • Random Forest

• Bonus Models • Global Random Forest • Incremental Linear Regression

Results Average Root Mean Squared Error

Method 30Min 60Min 120Min 180Min

Mean 41,2 41,4 41,6 41,3

Previous Value 10,1 16,3 26,6 33,9

Linear Regression 3,5 4,2 4,8 4,7

Regression Tree 0,5 0,8 0,4 0,5

Random Forest 0,4 0,5 0,6 0,5

Results: RMSE for each park for 120 min intervals

!PH Kongressni trg,resampled 120 min intervals

One Week At the Car Park &

The trouble with missing values

The Effect of Missing Values The Sliding Window Revisited

w-2!past state

w-1!past state

w!current state

Target!future state

Interval!t-2 170 160 140 100

Interval!t-1 160 140 100

Missing!not

predicted

Interval!t 140 100

Missing!replaced

with mean ?? = 10 E.g Mean = 150 very different from the missing value (e.g. 60)

Missing Values

Percentage of test set 0.7%

Percentage of error (RMSE) 71%

Note For window_size = 1, RMSE = 21 - not represented for the sake of clarity

RF Average RMSE vs Window Size

Future Work• Better handling of missing values

• Time based interpolation of some of the missing data within a limited max time interval

• Use model to predict the missing data!

• Crawl more data

• Test with a full year

• Evaluate “classical” autoregressive models

• with smoothing

• Predict further into the future

• Additional data: weather, holidays, soccer, social…

• Get the average error down to zero, keep maximum error small