25
Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information Xianyuan Zhan * Satish V. Ukkusuri * * Civil Engineering, Purdue University 24/04/2014

Urban Link Travel Time Estimation Using Large-scale Taxi ...dimacs.rutgers.edu/Workshops/HumanEnvironments/Slides/zhan.pdfStudy Region Introduction Study region Base model Probabilistic

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Urban Link Travel Time Estimation Using Large-scale Taxi Data with

    Partial Information

    Xianyuan Zhan*

    Satish V. Ukkusuri*

    *Civil Engineering, Purdue University

    24/04/2014

  • โ€ข Introduction

    โ€ข Study Region

    โ€ข Link Travel Time Estimation Model

    โ€ข Base Model

    โ€ข Probabilistic Model

    โ€ข Numerical Results

    โ€ข Conclusion

    โ€ข Questions/Comments

    Outline

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    2

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

  • โ€ข New York City has the largest market for

    taxis in North America:

    โˆ’ 12,779 yellow medallion (2006)

    โˆ’ Industrial revenue $1.82 billion (2005)

    โˆ’ Serving 240 million passengers per year

    โˆ’ 71% of all Manhattan residentsโ€™ trips

    โ€ข GPS devices are installed in each taxicab

    โ€ข Taxi data recorded by New York Taxi and

    Limousine Commission (NYTLC)

    Introduction

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    3

    โ€ข Massive amount of data!

    โˆ’ 450,000 to 550,000 daily trip records

    โˆ’ More than 180 million taxi trips a year

    โˆ’ Providing a lot of opportunities!

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

  • Taxi trips in NYC

    Introduction

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    4

    Trip Origin Trip Destination

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

  • Estimating urban link travel times

    โ€ข Traditional approaches:

    โˆ’ Loop detector data

    โˆ’ Automatic Vehicle Identification tags

    โˆ’ Video camera data

    โˆ’ Remote microwave traffic sensors

    โ€ข Why taxicab data?

    โˆ’ Novel large-scale data sources

    โˆ’ Ideal probes monitoring traffic condition

    โˆ’ Large coverage

    โˆ’ Do not need fixed sensors

    โˆ’ Cheap!

    Introduction

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    5

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

  • The data

    โ€ข NYTLC records taxi GPS trajectory data, but not public

    โ€ข Only trip basis data available

    โˆ’ Contains only OD coordinate, trip travel time and distance, etc.

    โˆ’ Path information not available

    โˆ’ Large-scale data with partial information

    The problem

    โ€ข Given large-scale taxi OD trip data, estimate urban link travel times

    โ€ข Sub-problems to solve:

    โˆ’ Map data to the network

    โˆ’ Path inference

    โˆ’ Estimate link travel time based on OD data

    Introduction

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    6

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

  • Study Region

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    7

    โ€ข 1370ร—1600m rectangle area in Midtown Manhattan

    โ€ข Data records fall within the region are subtracted

    MPE 2013+MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

  • Study Region

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    8

    Test network

    โ€ข Network contains:

    โˆ’ 193 nodes

    โˆ’ 381 directed links

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

  • Study Region

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    9

    Number of observations in the study region

    โ€ข Day 1: Weekday (2010/03/15, Monday)

    โ€ข Day 2: Weekend (2010/03/20, Saturday)

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    0

    200

    400

    600

    800

    1000

    1200

    Fre

    qu

    en

    cy

    Histogram for day 1

    0

    100

    200

    300

    400

    500

    600

    Fre

    qu

    en

    cy

    Histogram for day 6

  • Base Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    10

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Base link travel time estimation model*

    โ€ข Hourly average link travel time estimations

    โ€ข Direct optimization approach

    โ€ข Overall framework: four phases

    * Zhan, X., Hasan, S., Ukkusuri, S. V., & Kamga, C. (2013). Urban link travel time estimation using large-scale taxi data with partial information.Transportation Research Part C: Emerging Technologies, 33, 37-49.

  • Base Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    11

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Data mapping

    โ€ข Mapping points to nearest links in the network

    โ€ข Mapped point (blue) are used

    โ€ข Identify intermediate origin/

    destination nodes

    โ€ข ๐›ผ1, ๐›ผ2 are defined as distance proportions from mapped

    points to the intermediate

    origin/destination node

  • Base Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    12

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Construct reasonable path sets

    โ€ข Number of possible paths could be huge!

    โ€ข Need to shrink the size of possible path set

    โ€ข Use trip distance to eliminate unreasonable paths

    โ€ข K-shortest path algorithm* (k=20) is used to generate initial path sets

    * Y. Yen, Finding the K shortest loopless paths in a network, Management Science 17:712โ€“716, 1971.

    โ€ข Filter out unreasonable paths (threshold:

    weekday 15%~25%, weekend 50%)

  • Base Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    13

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Route choice model

    โ€ข Assumption:

    โˆ’ Each driver wants to minimize both trip time and distance to make more trips

    thus make more revenue

    โ€ข A MNL model based on utility maximization scheme

    ๐‘ƒ๐‘š ๐‘ก, ๐‘‘, ๐œƒ =๐‘’โˆ’๐œƒ๐ถ๐‘š ๐‘ก,๐‘‘๐‘š

    ๐‘—โˆˆ๐‘…๐‘– ๐‘’โˆ’๐œƒ๐ถ๐‘— ๐‘ก,๐‘‘๐‘—

    โ€ข Path cost measured as a function of trip travel time and distance

    ๐ถ๐‘š ๐‘ก, ๐‘‘๐‘š = ๐›ฝ1 โˆ™ ๐‘”๐‘š ๐‘ก + ๐›ฝ2 โˆ™ ๐‘‘๐‘š

    ๐‘”๐‘š ๐‘ก = ๐›ผ1๐‘ก๐‘‚ + ๐›ผ2๐‘ก๐ท +

    ๐‘™โˆˆ๐ฟ

    ๐›ฟ๐‘š๐‘™๐‘ก๐‘™

  • Base Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    14

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Link travel time estimation

    โ€ข Minimizing the squared difference between expected (๐ธ ๐‘Œ๐‘–|๐‘…๐‘– ) and observed ๐‘Œ๐‘– path travel times

    ๐ธ ๐‘Œ๐‘–|๐‘…๐‘– =

    ๐‘šโˆˆ๐‘…๐‘–

    ๐‘”๐‘š( ๐‘ก)๐‘ƒ๐‘š ๐‘ก, ๐‘‘, ๐œƒ

    ๐‘ก = argmin ๐‘ก

    ๐‘–โˆˆ๐ท

    ๐‘ฆ๐‘– โˆ’ ๐ธ ๐‘Œ๐‘–|๐‘…๐‘–2

    โ€ข Solve using Levenberg-Marquardt (LM) method

    โ€ข Parallelized codes developed to estimate the model

    โ€ข Entire optimization solved within 10 minutes on an intel i7 laptop

    โ€ข Numerical results show in later section

  • Probabilistic Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    15

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Limitations of the base model

    โ€ข Point estimate of hourly average travel time

    โ€ข Not incorporating variability of link travel times

    โ€ข Not utilizing historical data

    โ€ข Problems of compensation effect

    โ€ข Less robust

    Solution: Adopt a probabilistic framework

    โ€ข Accounting for variability in link travel times

    โ€ข More robust

    โ€ข Historical information can be incorporated as priors

  • Probabilistic Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    16

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Assumptions:

    1. Link travel time: ๐‘ฅ๐‘™ โˆผ ๐’ฉ(๐œ‡๐‘™ , ๐œŽ๐‘™2)

    2. Path travel time is the summation of a set of link travel times

    ๐‘ƒ ๐‘ฆ๐‘–|๐‘˜, ๐’™ = ๐‘ƒ ๐‘ฆ๐‘–|๐‘˜, ๐, ๐œฎ = ๐‘ ๐›ผ1๐œ‡0 + ๐›ผ2๐œ‡๐ท +

    ๐‘™โˆˆ๐‘˜

    ๐œ‡๐‘™ , ๐›ผ1๐œŽ๐‘‚2 + ๐›ผ2๐œŽ๐ท

    2 +

    ๐‘™โˆˆ๐‘˜

    ๐œŽ๐‘™2

    3. Route choice based on the perceived mean link travel times and distance

    ๐œ‹๐‘˜๐‘– ๐,๐œท, ๐‘‘๐‘– =

    exp โˆ’๐ถ๐‘˜๐‘– ๐,๐œท, ๐‘‘๐‘–

    ๐‘ โˆˆ๐‘…๐‘–exp โˆ’๐ถ๐‘ 

    ๐‘– ๐,๐œท, ๐‘‘๐‘–

    โ€ข where ๐’™, ๐, ๐œฎ are the vector of link travel times, their mean and variance

  • Probabilistic Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    17

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Mixture model

    โ€ข A Mixture model is developed to model the posterior probability of the

    observed taxi trip travel times given link travel time parameters ๐, ๐œฎ

    ๐ป ๐’š|๐, ๐œฎ,๐‘ซ =

    ๐‘–=1

    ๐‘›

    ๐‘˜โˆˆ๐‘…๐‘–

    ๐œ‹๐‘˜๐‘– ๐,๐œท, ๐‘‘๐‘– ๐‘ƒ ๐‘ฆ๐‘–|๐‘˜, ๐, ๐œฎ

    โ€ข Introducing ๐‘ง๐‘˜๐‘– as the latent variable

    indicating if path ๐‘˜ is used by observation ๐‘–

    Plate notation

  • Probabilistic Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    18

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Bayesian Mixture model

    โ€ข Incorporating historical information:

    โˆ’ Prior on ๐:

    โˆ’ Priors on ๐ and variance ๐œฎ

    ๐ป ๐’š|๐, ๐œฎ,๐‘ซ =

    ๐‘–=1

    ๐‘›

    ๐‘˜โˆˆ๐‘…๐‘–

    ๐œ‹๐‘˜๐‘– ๐,๐œท, ๐‘‘๐‘– ๐‘ƒ ๐‘ฆ๐‘–|๐‘˜, ๐, ๐œฎ โˆ™

    ๐‘—โˆˆ๐ฟ

    ๐‘ ๐œ‡๐‘—

    ๐ป ๐’š|๐, ๐œฎ,๐‘ซ =

    ๐‘–=1

    ๐‘›

    ๐‘˜โˆˆ๐‘…๐‘–

    ๐œ‹๐‘˜๐‘– ๐,๐œท, ๐‘‘๐‘– ๐‘ƒ ๐‘ฆ๐‘–|๐‘˜, ๐, ๐œฎ โˆ™

    ๐‘—โˆˆ๐ฟ

    ๐‘ ๐œ‡๐‘— ๐‘ ๐œŽ๐‘—2

  • Probabilistic Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    19

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Solution approach

    โ€ข An EM algorithm is proposed for estimation

    โ€ข A iterative procedure of two steps:

    โˆ’ E-step:

    ๐”ผ ๐‘ง๐‘˜๐‘– = ๐‘ง๐‘˜๐‘– ๐‘ง๐‘˜๐‘– ๐œ‹๐‘˜๐‘– ๐,๐œท, ๐‘‘๐‘– ๐‘ƒ ๐‘ฆ๐‘–|๐‘˜, ๐, ๐œฎ

    ๐‘ง๐‘˜๐‘–

    ๐‘ง๐‘˜๐‘– ๐‘ โˆˆ๐‘…๐‘– ๐œ‹๐‘ 

    ๐‘– ๐,๐œท, ๐‘‘๐‘– ๐‘ƒ ๐‘ฆ๐‘–|๐‘ , ๐, ๐œฎ๐‘ง๐‘ ๐‘–= ๐›พ ๐‘ง๐‘˜

    ๐‘–

    โˆ’ M-step: Let ๐œ๐‘™ = ๐œŽ๐‘™2, ๐‰ = ๐œฎ,

    ๐‘„ ๐, ๐›• = ๐”ผ๐’› ln ๐‘ƒ ๐’š, ๐’›|๐, ๐›• =

    ๐‘–=1

    ๐‘›

    ๐‘˜โˆˆ๐‘…๐‘–

    ๐›พ ๐‘ง๐‘˜๐‘– ln ๐œ‹๐‘˜

    ๐‘– ๐,๐œท, ๐‘‘๐‘– + ln๐‘ƒ ๐‘ฆ๐‘–|๐‘˜, ๐, ๐›•

    ๐๐‘›๐‘’๐‘ค, ๐‰๐‘›๐‘’๐‘ค = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ๐,๐›•๐‘„ ๐, ๐›•

  • Probabilistic Model

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    20

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Solving for large-scale data and large networks

    โ€ข The M-step involves a large-scale optimization problem

    โ€ข Our goal:

    โˆ’ Solve for large-scale data input

    โˆ’ Solve for large network

    โˆ’ Short term link travel time estimation (say 15min)

    โ€ข Solution: parallelize the computation!

    โˆ’ Alternating Direction Method of Multiplier (ADMM) to decouple the problem

    into smaller sub-problems

    โˆ’ Solve decomposed sub-problems in parallel

    โˆ’ Deals with large size of network and data

    โˆ’ Faster model estimation

  • Numerical Results

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    21

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Model results for base model

    โ€ข Validation metrics

    โ€ข Root mean square error

    โ€ข Mean absolute percentage error

    RMSE =1

    ๐‘›

    ๐‘–=1

    ๐‘›

    ๐‘‡๐‘–๐‘ƒ๐‘Ÿ โˆ’ ๐‘‡๐‘–

    ๐‘‚๐‘ 2

    MAPE =1

    ๐‘›

    ๐‘–=1

    ๐‘›๐‘‡๐‘–๐‘ƒ๐‘Ÿ โˆ’ ๐‘‡๐‘–

    ๐‘‚๐‘

    ๐‘‡๐‘–๐‘‚๐‘ ร— 100%

  • Numerical Results

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    22

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

    Model results for base model

    โ€ข Test data: 3/15/2010 ~ 3/21/2010

    Day ErrorTime Period

    9:00-10:00 13:00-14:00 19:00-20:00 21:00-22:00

    MondayRMSE (min) 2.614 1.981 1.937 1.372

    MAPE 29.51% 24.22% 26.27% 21.87%

    TuesdayRMSE (min) 2.461 2.302 1.827 1.437

    MAPE 29.63% 25.59% 23.33% 22.20%

    WednesdayRMSE (min) 3.827* 3.216* 2.18 1.691

    MAPE 41.32%* 34.97%* 28.73% 24.40%

    ThursdayRMSE (min) 2.468 2.699 2.49 1.382

    MAPE 27.28% 27.92% 28.54% 21.05%

    FridayRMSE (min) 2.26 2.179 1.692 1.334

    MAPE 27.76% 27.04% 25.17% 22.26%

    SaturdayRMSE (min) 1.034 1.69 1.839 1.584

    MAPE 16.84% 24.58% 27.14% 21.61%

    SundayRMSE (min) 2.041 1.518 1.395 1.16

    MAPE 25.44% 23.70% 22.72% 19.87%

    * Traffic disturbance caused by Patrick's Day Parade.

  • Numerical Results

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    23

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

  • โ€ข Two new models are proposed to estimate urban link travel times

    โ€ข Utilizing data with only partial information

    โ€ข Efficiently estimation using base model with reasonable accuracy

    โ€ข Mixture models are proposed to get more robust and accurate estimations

    โ€ข Applicable to trajectory data, can provide more accurate estimations

    Future work

    โ€ข Test the mixture models for larger network

    โ€ข Efficient implementation using distributed computing technique

    โ€ข Result validation

    Conclusion

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    24

    MPE 2013+

    Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

    Xianyuan Zhan

  • Q&A

    Introduction Study region Base model Probabilistic model Numerical results Conclusion

    25

    Thank you!

    Questions / Comments ?