16
Perrine, Khani, and Ruiz-Juri 1 A MAP-MATCHING ALGORITHM FOR APPLICATIONS IN MULTIMODAL 1 TRANSPORTATION NETWORK MODELING 2 3 Kenneth Perrine 4 (CORRESPONDING AUTHOR) 5 Research Fellow 6 Network Modeling Center, Center for Transportation Research, Cockrell School of Engineering 7 University of Texas at Austin 8 1616 Guadalupe St. 9 Austin, TX 78701 10 E-Mail: [email protected] 11 Phone: 512-232-3123 12 Fax: 512-232-3070 13 14 Alireza Khani 15 Research Associate 16 Network Modeling Center, Center for Transportation Research, Cockrell School of Engineering 17 University of Texas at Austin 18 1616 Guadalupe St. 19 Austin, TX 78701 20 E-Mail: [email protected] 21 Phone: 512-232-3075 22 Fax: 512-232-3070 23 24 Natalia Ruiz-Juri 25 Research Associate 26 Network Modeling Center, Center for Transportation Research, Cockrell School of Engineering 27 University of Texas at Austin 28 1616 Guadalupe St. 29 Austin, TX 78701 30 E-Mail: [email protected] 31 Phone: 512-232-3099 32 Fax: 512-232-3070 33 34 Word count: 5593 words text + 4 tables/figures x 250 words (each) = 6593 words 35

A MAP-MATCHING ALGORITHM FOR APPLICATIONS …docs.trb.org/prp/15-5081.pdf · Perrine, Khani, and Ruiz-Juri 1 1 A MAP-MATCHING ALGORITHM FOR APPLICATIONS IN MULTIMODAL 2 TRANSPORTATION

  • Upload
    dangnhu

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Perrine, Khani, and Ruiz-Juri 1

A MAP-MATCHING ALGORITHM FOR APPLICATIONS IN MULTIMODAL 1

TRANSPORTATION NETWORK MODELING 2 3

Kenneth Perrine 4 (CORRESPONDING AUTHOR) 5

Research Fellow 6

Network Modeling Center, Center for Transportation Research, Cockrell School of Engineering 7

University of Texas at Austin 8

1616 Guadalupe St. 9

Austin, TX 78701 10

E-Mail: [email protected] 11

Phone: 512-232-3123 12

Fax: 512-232-3070 13

14

Alireza Khani 15 Research Associate 16

Network Modeling Center, Center for Transportation Research, Cockrell School of Engineering 17

University of Texas at Austin 18

1616 Guadalupe St. 19

Austin, TX 78701 20

E-Mail: [email protected] 21

Phone: 512-232-3075 22

Fax: 512-232-3070 23

24

Natalia Ruiz-Juri 25 Research Associate 26

Network Modeling Center, Center for Transportation Research, Cockrell School of Engineering 27

University of Texas at Austin 28

1616 Guadalupe St. 29

Austin, TX 78701 30

E-Mail: [email protected] 31

Phone: 512-232-3099 32

Fax: 512-232-3070 33

34

Word count: 5593 words text + 4 tables/figures x 250 words (each) = 6593 words 35

Perrine, Khani, and Ruiz-Juri 2

ABSTRACT 1 Generalized Transit Feed Specification (GTFS) files have gained wide acceptance among transit 2

agencies, which now provide them for most major metropolitan areas. Their public availability, 3

combined with the convenience of presenting a standard data representation, has promoted the 4

development of numerous applications for their use. While most of these tools are focused on the 5

analysis and utilization of public transportation systems, GTFS datasets are also extremely 6

relevant for the development of multimodal planning models. The use of GTFS data for 7

integrated modeling requires creating a graph of the public transportation network that’s 8

consistent with the roadway network. The former is not trivial, given limitations of networks 9

often used for regional planning models, and the complexity of the roadway system. This paper 10

proposes an open-source algorithm that matches GTFS geographic information to existing 11

planning networks, and is also relevant for realtime in-field applications. The methodology is 12

based on maintaining a set of candidate paths connecting successive geographic points. We 13

present examples of implementations using traditional planning networks and a network built 14

from crowd-sourced OpenStreetMap data. We also demonstrate the versatility of the 15

methodology by using it to matching GPS points from a navigation system. Experimental results 16

suggest that our approach is highly successful even when the underlying roadway network is not 17

complete. The proposed methodology is a promising step toward using novel and inexpensive 18

data sources to facilitate and eventually transform the way that transportation models are built 19

and validated. 20

21

22

Keywords: map matching, transit, graph, GTFS, GPS 23

24

Perrine, Khani, and Ruiz-Juri 3

INTRODUCTION 1 For decades, multimodal transportation network modeling has been an interesting and 2

challenging topic for transportation researchers. In the last few years, emerging advanced 3

transportation models such as Dynamic Traffic Assignment (DTA), Schedule-based Transit 4

Assignment (STA), and Traffic Microsimulation (TM) models have enabled more realistic 5

multimodal network modeling. However, building a high-resolution multimodal transportation 6

network based on various data sources is still a tedious obstacle. 7

Generalized Transit Feed Specification (GTFS) files, currently provided by most transit 8

authorities, allow for a comprehensive, geometrically accurate, and time-dependent transit 9

network representation, which is ideal for modeling and planning (e.g. (1,2)). However, in order 10

to utilize GTFS in multimodal applications, it is necessary to generate a graph representation of 11

the transit system compatible with the roadway network, a problem known as map matching (3). 12

The solution of this problem is not trivial, given the complexity of urban roadway systems. 13

Further, the limitations of the networks often used for regional planning purposes make the 14

problem more challenging: while high-resolution GIS networks are available for urban areas in 15

the US, many planning models are built using sketch networks, with the occasional additions of 16

minor streets in denser regions. Meanwhile, as some proprietary modeling tools generate transit 17

networks based on GTFS data, the methodologies used by such tools are typically not disclosed. 18

Given ambiguities often found when processing GTFS data, we believe that there is great value 19

in proposing open source tools that can be used by both practitioners and researchers. 20

This paper proposes an algorithm to automatically solve the problem of matching GTFS 21

geographic data to regional planning networks of various resolution levels. The matched data 22

includes both routes (defined as a string of georeference coordinate trackpoints) and bus stops. 23

Our methodology is based on creating and maintaining a set of candidate paths to connect GPS 24

points over the roadway network. This approach allows the algorithm to handle GPS data sets of 25

varying quality and resolution. The methodology is flexible and general by avoiding the use of 26

heuristics that may impose restrictive assumptions. We also address the bus stop matching 27

problem (3) by projecting GTFS bus stop locations on the roadway network. 28

Our numerical experiments successfully implement the proposed algorithm for GPS data 29

with point spacings ranging between 2 inches and more than 3000 feet. We also explore the 30

advantages of utilizing higher resolution roadway networks by testing the methodology on a 31

graph created based on OpenStreetMap (OSM) data. The ability to automatically combine 32

geographic data from different sources using a reliable approach holds the promise to streamline 33

the development of multimodal networks. This can enable researchers to focus on the 34

methodological challenges of multimodal models, and to test these on realistic data sets. It can 35

also facilitate the adoption of advanced models in practice, and the utilization of more detailed 36

data sources, such as GPS traces and other vehicle tracking techniques (4), for model validation 37

and calibration. 38

39

Related Work 40 A handful of researchers have explored the problem of incorporating GTFS data into multimodal 41

transportation models, focusing mostly on microsimulation applications with high resolution 42

roadway networks (5). For regional planning applications, Puchalsky et al. (6) present an 43

example of combining OSM data and GTFS data using proprietary software, and showcase the 44

power of utilizing web-based open data sources. However, the use of proprietary software does 45

Perrine, Khani, and Ruiz-Juri 4

not lend itself to exploring the methodological challenges and ambiguities often found when 1

fusing traffic and transit networks, which is the subject of our effort. 2

The more general subject of map-matching has been extensively investigated, and 3

research continues to emerge because of ongoing technical challenges and application-specific, 4

field-dependent requirements. Two major categories of map-matching algorithms have been 5

proposed in the literature (7): Global algorithms (e.g. (7)) require all of the incoming trackpoints 6

to be collected a priori. While this may yield improved accuracy, global algorithms are not 7

amenable to live, in-field applications. On the other hand, most works referenced in this paper 8

can be classified as local/incremental algorithms, where matched paths are built as trackpoints 9

arrive, possibly with a known latency. Incremental algorithms are more versatile for mobile 10

applications and can also work offline after all trackpoints have been collected. 11

Quddus et al. (8) distinguish between geometric, topological and probabilistic/advanced 12

map-matching algorithms. Geometric approaches match trackpoints to underlying spatial 13

features without considering their connectivity. These approaches have been shown to 14

successfully match high-density trackpoints to underlying nodes. (9), and they are flexible to 15

allow for off-road scenarios. Topological algorithms specifically consider how the features of the 16

underlying map are connected together, and are appropriate for addressing complex situations as 17

freeway interchanges or urban railway crossings. Li (3) leverages topology in using Dijkstra’s 18

algorithm (10) to find shortest paths between bus stops. Probabilistic algorithms make path 19

choices based on likelihood values calculated among several proposed hypotheses, each located 20

within an “error region” around incoming trackpoints. Finally, advanced algorithms employ a 21

variety of more refined techniques for decision-making, including Kalman filtering, the use of 22

Dempster-Shafer belief theory and fuzzy logic (see (8) for examples of each). 23

A distinguishing factor among algorithms is the expected trackpoint density, and their 24

ability to handle trackpoint degradation due to low sample densities, noise and bias. Many 25

algorithms require a high trackpoint density such as second-by-second GPS data. Ordonez and 26

Erath (11) geometrically find paths through Singapore with a trackpoint distance of ~65m, and 27

incorporate human interaction into their error-correcting workflow. Pyo et al. (9) pair GPS-based 28

trackpoints with dead reckoning in order to alleviate biasing. Meng (4) applies a series of 29

integrity checks to incoming trackpoints to discard outliers. Marchal et al. (12) can handle larger 30

GPS point spacing by analyzing the underlying topology, but cannot maintain a continuous path 31

if an entire link is skipped. Yuan et al. (7) on the other hand specifically design their global 32

algorithm to work with sparse trackpoints that had been collected once every two minutes. 33

Map-matching methodologies can also be affected by the quality of the underlying map. 34

While some algorithms work properly only with high resolution definitions (5), others may 35

accommodate missing links (e.g. (13)), incorrect geometry, and misrepresented features. Van 36

Velden (14) discusses the challenge of dealing with an underlying map which contains 37

“aggregate” bus stop locations that actually represent several physical bus stops. Velaga et al. 38

(15) benefit from the inclusion of extra permitted turning movement data at each intersection. 39

Several researchers have studied the algorithmic logic used for choosing “correct” paths 40

in map-matching algorithms, which is complicated by the presence of overpasses, sharp turns, 41

errors in measurements, and ambiguity (8). In (16) it is noted that the selection of an incorrect 42

link within a matched path may lead to a sequence of bad matches. In a probabilistic framework 43

that handles multiple hypotheses, Pyo et al. (9) compares movement heading with underlying 44

link heading, and uses heuristics to check for overpasses. Other heuristics include turn and curve 45

detection (4). Schuessler and Axhausen (5) additionally compares movement speed with link 46

Perrine, Khani, and Ruiz-Juri 5

free-flow speed. Often these are paired with weighting coefficients, and Velaga et al. (15) uses 1

ground-truth data to automatically select them. Each measurement and heuristic, however, 2

presents itself with challenges as surveyed by Quddus et al. (16). 3

The computational efficiency of map-matching algorithms—a key necessity for realtime 4

environments—has also been discussed in the literature. Efficiency techniques include the use of 5

quad trees in addressing spatially located data (17) and using the A* shortest path search 6

algorithm (11). Schuessler and Axhausen (5) compare the techniques of Pyo et al. and Marchal 7

et al. (9,17) for guarding against excessive iterations. 8

The methodology proposed in this paper involves the use of “error regions” around each 9

trackpoint, as seen in Velaga (15), for selecting underlying geometry. It follows Schuessler and 10

Axhausen (5) in using a shortest-path algorithm for dealing with potentially large gaps in 11

incoming trackpoints. Potentially noisy incoming geocoordinates are effectively “snapped” to (or 12

expressed purely in terms of) “known-good” underlying topology as seen with Blasquez and 13

Vonderohe (18). In our current approach we avoid using heuristics such as those implemented in 14

(9,4,5) to improve the accuracy of intermediate matches, given that these may behave 15

unexpectedly in unforeseen circumstances. We instead rely on being tolerant of geospatial error 16

and maintaining several candidate paths, using a simple scoring mechanism to gradually prune 17

away infeasible branches. Our work can be run incrementally, but slightly exhibits global 18

behavior: although we can build multiple paths through underlying topology as trackpoints 19

arrive, the scoring mechanism for choosing the “most correct” path may be ambiguous until a 20

sufficient number of additional trackpoints arrive, depending upon the underlying topology. 21

22

METHODOLOGY 23 This section presents our proposed map-matching algorithm, which incorporates GTFS transit 24

routes and stops into an existing roadway network graph consisting of nodes and links. The 25

process was originally designed to analyze the GTFS file that contains the definitions of the 26

shapes of all transit routes, but it can be extended to processing any ordered list of GPS points. 27

The algorithm works one route at a time, and GPS points are considered sequentially, as if they 28

are trackpoints from a live feed. The two main algorithmic steps are described in the following 29

sections. Each incoming trackpoint is first associated to one or more “reference points” in the 30

network (point matching). These points, which always lie within a network link, are considered 31

path-end candidates. The second step, pathfinding, involves extending all previous path-ends to 32

reach the new candidate reference points by running multiple shortest-path searches. Each path-33

end is assigned a single predecessor based on a score function, and “childless” ends from 34

previous steps are pruned before analyzing a new trackpoint. Once the algorithm reaches the last 35

trackpoint to be analyzed, the ordered set of links that defines the route can be easily retrieved. 36

Later, we describe the bus stop matching approach, which is a reimplementation of the 37

original algorithm using the transit route graph as the underlying network and bus stop locations 38

as GPS trackpoints. 39

40

Point-Matching 41 For each incoming trackpoint, the point-matching stage finds one or more candidate reference 42

points (“POINT_ON_LINK” points) on the underlying network graph. Given the possibility of 43

matching errors, which may be present in the GPS points or the underlying data, we define a 44

circular “error region” using a preconfigured search radius kp around the incoming trackpoint. 45

All POINT_ON_LINKs lie on links within the error region such that the line from the trackpoint to 46

Perrine, Khani, and Ruiz-Juri 6

the POINT_ON_LINK is perpendicular to the link, or on the endpoints of links (as in (5)); the 1

endpoint cases are assigned a penalty in a later stage. Reference points are ranked according to 2

dr, the distance from the sample point to the link. Only the closest qp points are kept for the next 3

stage. To illustrate, Figure 1 shows the resulting POINT_ON_LINK data structure and illustrates a 4

simple point-finding algorithm. We note that more efficient algorithms such as those that use k-d 5

trees (19) can perform the same point-matching task much quicker. Our algorithms have not 6

been optimized for performance, as we focused on the methodological challenges of the problem 7

under study. 8

9

10 11

FIGURE 1 This FINDPOINTSONLINKS algorithm locates points of potential interest within 12

an error region on the underlying map and returns the best qp of them. 13 14

As an option, an extended search area may be defined using a secondary radius ks around 15

previous candidate reference points. This allows paths to “stick” to their current endpoint when 16

the tracked vehicle enters an area that is not defined within the underlying map, such as a bus 17

entering a parking lot. In the proposed approach the path may be continued if a suitable point is 18

referenced at a later stage without causing a discontinuity 19

20

Pathfinding 21 The core of our methodology is the mechanism for creating a graph of best paths from previous 22

path-ends (referred in Figure 2 as “PATH_ENDs”) to current candidate PATH_ENDs. The graph is 23

such that each new PATH_END has a single parent, while previous PATH_ENDs may branch to 24

multiple children. 25

data structure POINT_ON_LINK:

L : reference map link, which can be resolved to a geographically-referenced line segment

d : distance from the start of the map link, which can be resolved to geographic coordinates

r : a Boolean indicator on whether this is referenced to an endpoint, rather than a perpendicular point

dr : reference distance, the distance that the trackpoint is away from the link line segment

global parameters:

qp : maximum number of candidate POINT_ON_LINK points

k : working radius for finding candidate POINT_ON_LINK points (ft)

kp : primary radius from current trackpoint (ft)

ks : secondary radius from the previous POINT_ON_LINKs (ft)

S : underlying geographic map

algorithm FINDPOINTSONLINKS:

inputs: t : trackpoint

Pp : previous POINT_ON_LINKs

output: P : the resulting list of POINT_ON_LINK points

P ← [ ], an empty list.

S ← all POINT_ON_LINK points in entire underlying map S within radius k of t

for each POINT_ON_LINK s in S:

if s is within radius kp of t:

append s to P

else if Pp is defined:

for each POINT_ON_LINK pp in Pp:

if s is within radius ks of pp:

append s to P

break

for p in P, keep in P the qp POINT_ON_LINK points with the least p.dr

sort P by ascending p.dr for p in P

Perrine, Khani, and Ruiz-Juri 7

1 2 FIGURE 2 The pathfinding portion of the algorithm, including the scoring all of the path 3

candidates. 4 5

data structure PATH_END: c : the POINT_ON_LINK to which this PATH_END is referenced s : the total score of the path represented p : the previous PATH_END step in this path ℓ : a list of map links that have been traversed on the shortest path r : a Boolean signifying a discontinuity

global parameters: qe : maximum number of simultaneous paths fd : distance factor fr : drift factor fp : non-perpendicular penalty factor

algorithm WALKTRACK: (A global-style entry to the algorithm that processes an entire track.)

input: T : list of trackpoints output: R : the resulting list of PATH_ENDs

for each trackpoint t in track T: E ← TRACKPOINTARRIVES (t, previous list of candidate PATH_ENDs Ep) Ep ← E e ← the PATH_END in E with the minimum e.s value for e in E while e is defined: append e to R e ← e.p reverse the order of R

algorithm TRACKPOINTARRIVES: (An incremental step in path-building.)

inputs: t : incoming trackpoint Ep : previous list of candidate PATH_ENDs output: E : the list of current new PATH_ENDs

C ← FINDPOINTSONLINKS (t, [ep.c for each ep in Ep]), all the candidate POINT_ON_LINKs around t for each POINT_ON_LINK c in C: append to E a new PATH_END e, with e.c ← c; e.s ← ∞ for each PATH_END ep in Ep (or for once if Ep not defined): for each PATH_END e in E: (distance d, list of links ℓ) = FINDSHORTESTPATH (ep, e) if ep is defined, else (0, [ ]) s ← SCOREFUNCTION (d, e), the score for the path from ep to e if ep.s + s < e.s: (Reference this low-scoring path ep to e as a “winning” path:) e.p ← ep e.s ← ep.s + s; e.ℓ ← ℓ for e in E, if all e.p are not defined: (Reference this as a discontinuity and find an approximate distance:) for each PATH_END e in E: e.r ← TRUE e.p ← the PATH_END e in Ep with the minimum e.s value d ← linear distance from e.p to e e.s ← e.p.s + SCOREFUNCTION (d, e) delete from E each PATH_END e whose e.p is not defined delete from Ep each PATH_END that isn’t referenced from any e.p for each e in E keep in E only the qe PATH_ENDs with the minimum e.s values for each e in E.

algorithm SCOREFUNCTION: inputs: d : distance e : current PATH_END output: s : the calcuated score

(Penalizing for the reference perpendicular distance:) s ← e.c.dr ∙ fr if e.c.r is TRUE: (Penalizing further if this is nonperpendicular (that is, in relation to an endpoint):) s ← s ∙ fp (Finally, penalizing for travel distance:) s ← s + d ∙ fd

Perrine, Khani, and Ruiz-Juri 8

A score is used to select the optimal predecessor for each candidate PATH_END. The score 1

of a predecessor is an aggregation of penalties that combines travel distance from the origin 2

through the predecessor, the reference distance dr in the corresponding POINT_ON_LINK, and 3

whether the POINT_ON_LINK is nonperpendicular. Each of these is weighted according to 4

empirically-devised constant factors. The intuition is that perpendicular matches, small dr values, 5

and shortest travel distances are the most likely to be correct and therefore should be favored. 6

Only the the highest qe scoring PATH_ENDs are kept, and childless PATH_ENDs from the 7

previous trackpoint are considered infeasible and are trimmed. Figure 3 exemplifies the 8

progression of the map-matching algorithm. It also demonstrates ambiguities that may arise 9

when selecting a single predecessor per endpoint. Improving accuracy may involve additional 10

decision logic, as seen in many examples in (8), or waiting until additional trackpoints arrive. 11

While this a current limitation of our proposed methodology, it is expected to have relatively 12

minor impact on performance, and the very few cases of ambiguity we have observed in 13

experiments have been limited to the last couple of matched points. 14

15

16 17

FIGURE 3 Map matched paths that are maintained as four trackpoints arrive. Candidate 18

POINT_ON_LINKs are labeled “linkname@d” (where d represents distance_along_link / 19

total_link_distance), and whether they represent perpendicularity. Judging from the 20

lengths of the dr reference distances and distances traversed, either the “bdgk” or the 21

“bdfp” path could be the winner. 22 23

Bus Stop Matching 24 GTFS data includes information to define sequences of bus stops and corresponding geographic 25

locations. Once the GTFS tracks are map-matched, a second pass of the algorithm discussed in 26

previous sections can be used to solve the bus stop matching problem. 27

For bus stop matching, each route is considered as a separate underlying map, defined as 28

a subset of the roadway network graph. The map includes the links identified during the route 29

map-matching step. The sequence of stops is treated as a series of trackpoints. In order to 30

calculate cumulative scores correctly, we add “dummy” trackpoints at the beginning and end of 31

the route. WALKTRACK (Figure 2) is used to retrieve a list of PATH_ENDs, each of which 32

(excluding the ends) refers to a POINT_ON_LINK that corresponds to a bus stop location. 33

34

APPLICATIONS AND NUMERICAL EXPERIMENTS 35 This section discusses two major multimodal modeling applications of the algorithm 36

described above, and presents numerical experiments. Our numerical experiments demonstrate 37

ℓ@0.2⊥

[email protected]⊥̷

[email protected]⊥̷

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]⊥̷

[email protected]

[email protected]

[email protected]

[email protected][email protected]

[email protected]

[email protected]

aq

n

g

p

① ② ③ ④FEASIBLE PATHS -and- INFEASIBLE BY CHILDLESSNESS:

INFEASIBLE BY

PARENT REPLACEMENT

⓪y

trackpoint

link name

node

POINT_ON_LINK✳

a

b c

q h

i

j

n

g

d

f

k

p

①②

✳✳

✳✳

via

Perrine, Khani, and Ruiz-Juri 9

the use of the proposed map-matching algorithm to enable the described applications by 1

matching GTFS bus route shapes, GTFS bus stops, and second-by-second GPS data to 2

underlying maps derived from a GIS database as well as OpenStreetMap (20). 3

4

Integrating DTA and Transit Assignment Models 5

Advanced transportation models aim at capturing the complexity of the interactions and decision 6

making processes that ultimately determine the performance of modern transportation systems. 7

While different models exist for demand estimation, roadway network performance evaluation 8

and transit system assessment, the potential of advanced modeling techniques is typically 9

achieved when various tools are appropriately integrated. An appropriate representation of the 10

transit system can enable the integration of dynamic traffic assignment (DTA) models and 11

schedule-based transit assignment models, promoting the implementation of a powerful 12

modeling framework. 13

Simulation-based dynamic traffic assignment (DTA) models (21) have gained wide 14

acceptance for the modeling of roadway networks in metropolitan areas. These models capture 15

traffic dynamics by explicitly simulating the progression of traffic through the roadway network, 16

accounting for traffic control and information provision and heterogeneous driving behaviors. 17

The operation of public transit vehicles may play a significant role in traffic dynamics (e.g. (12)). 18

In order to effectively account for such impact it is critical to map the transit routes into roadway 19

links, and to locate stops properly. Details such as the use of express bus lanes, the inappropriate 20

routing through minor residential streets due to network inaccuracy, and the placement of stops 21

on or off the arterials and at the intersections can significantly impact DTA results. The 22

methodology proposed in this paper is a step toward facilitating a more accurate modeling of 23

transit systems in DTA models. Additionally, it enables their integration with transit assignment 24

models that can provide dwell time (the time that a transit vehicle stops to board and alight 25

passengers). Dwell times affect traffic conditions when the transit vehicle blocks a lane on the 26

street, and are exogenous to the DTA model, but they can be adjusted by proper integration with 27

transit assignment models. 28

On the transit side, the main body of the assignment models in practice are frequency-29

based. In recent years, schedule-based models have been developed as better alternatives to the 30

existing models. A state-of-the-art model which is starting to be used in the US metropolitan 31

areas is FAST-TrIPs, which has multiple advantages over the models currently in practice (2). In 32

addition to its advantages in network representation and user behavior, a main feature in FAST-33

TrIPs is its use of transit vehicle trajectories simulated in DTA. In fact, transit assignment results 34

may vary drastically if more realistic transit travel times are used instead of the scheduled travel 35

times. This is mainly because transit users make their route choice decision based on their 36

experience, and factors such as deviation from the schedule or unreliability in making transfers 37

impacts their desired path. By integrating the FAST-TrIPs model with a DTA model, and 38

incorporating the impact of roadway traffic on bus travel times, we aim to improve the transit 39

route choice and ridership forecasting. Obviously, the requirement for such improvement is the 40

capability and quality of modeling and simulating transit vehicles in a DTA model. In other 41

words, we are facing a map-matching (and a stop-matching) problem in building an integrated 42

traffic and transit assignment model. The integrated traffic and transit assignment model can be 43

extended to include intermodal (drive-to-transit) travel modeling at park-and-ride locations 44

(22,1). 45

46

Perrine, Khani, and Ruiz-Juri 10

Experimental Design 1 We conducted three sets of experiments, the first of which matched GTFS route and stop data to 2

an underlying graph of the Austin, TX region, used for DTA applications. The map-matching 3

algorithm was run twice, with the first run used to identify errors and limitations in the data using 4

convenient reporting features. 5

The second experiment explored the use of the algorithm to match GPS from numerous 6

probe-vehicle runs to the aforementioned DTA network. The mapping of GPS trajectories is 7

extremely valuable for model validation purposes, as they can be used to generate reference 8

travel times along selected corridors which can later be compared to DTA results. 9

A third test was conducted to evaluate the impact of improved network representations on 10

the accuracy of the proposed methodology. A network graph of the Austin area was created 11

based on OpenStreetMap (OSM) data (20), an online mapping service that leverages the use of 12

volunteer (or crowd-sourced) efforts to create a thorough database representation of 13

transportation networks. For the purpose of our study, minimal data processing was required to 14

produce an adequate directed graph for modeling purposes. This was used as an underlying 15

network for matching GPS points and GTFS data. The experiment also tested the concept of 16

taking advantage of open-source data for building and maintaining planning models. The low 17

cost of such an approach may promote the adoption of advanced techniques among smaller 18

planning organizations with reduced budgets. However, further research and data analysis 19

techniques are needed to produce reliable, detailed forecasting networks that include all of the 20

information required for planning and operation purposes. 21

22

Data Description 23 The first and third numerical tests use GTFS data provided by the Austin, TX-area Capital Metro 24

Transportation Authority (CapMetro) (23) to an underlying DTA map maintained by the Center 25

for Transportation Research Network Modeling Center (NMC) (24) for the Capital Area 26

Municipal Planning Organization (CAMPO) (25). The CapMetro GTFS set is dated August 30, 27

2013, retrieved freely from the GTFS Data Exchange (26). It contains 84 routes over 170 shapes 28

covering 45,476 points, representing a total of 1383 miles (the data was cleaned to remove rail 29

routes and eliminate duplicate shapes). These points follow an extremely varied density dictated 30

by the node-link representation of the regional GIS database maintained by CapMetro. Point 31

spacings range (in nonzero cases) from 2 in to 3472 ft apart, with an average distance of 161 ft. 32

The DTA map used in two of these experiments is derived from the graph used by 33

CAMPO for its regional planning network, to which some local and minor streets have been 34

added. It consists of 11,393 nodes connected by 13,353 links. While only link point ends are 35

relevant from a modeling perspective, most links are described with series of GPS coordinates 36

that define curvature for matching closely with physical roadway geometry. Our initial 37

development efforts do not use curvature information, treating links as straight lines between the 38

respective node endpoints. Even though this represents an area for future work, this illustrates the 39

versatility of our algorithm in accommodating a lower-resolution map. 40

For the third experiment, the graph of the underlying network was created using OSM. 41

The minimal cleaning of the OSM data involved less than an hour of time in correcting a handful 42

of misclassified network links and did not involve the correction of any geometry. The resulting 43

network consists of 123,046 nodes and 300,199 links, and represents a far more complete map in 44

regards to the number of minor streets and passageways through parking lots. 45

Perrine, Khani, and Ruiz-Juri 11

Our experimental GPS tracks, also used in the third experiment, come from a subset of a 1

CAMPO congestion study dataset (27), and includes a second-by-second collection of 44,298 2

points, with 44 journeys over 22 routes during different times of the day, covering a total of 283 3

miles. The distance between trackpoint samples ranges from 0 to 114 ft, with an average of 34 ft. 4

When compared with the GTFS trackpoints, the overall density of the GPS tracks is far more 5

consistent, and is representative of the high sample rate input that is expected by several other 6

works found in literature. 7

8

Experiment 1: Matching GTFS Data to a DTA Model Network 9

The parameters listed below were chosen for the initial run of the Python implementation of our 10

algorithm (refer to Figures 1 and 2 for context). Because the GTFS trackpoints are often based 11

on physical road geometry, the radii choices are dominated by the need to compensate for the 12

algorithm’s disregard of link curvature. 13 14

qp = 12: maximum number of candidate POINT_ON_LINK points 15

k = 1000: working radius for finding candidate POINT_ON_LINK points (ft) 16

kp = 350: primary radius from current trackpoint (ft) 17

ks = 200: secondary radius from the previous POINT_ON_LINKs (ft) 18

qe = 8: maximum number of simultaneous paths 19

fd = 1.0: distance factor 20

fr = 1.5: drift factor 21

fd = 1.5: non-perpendicular penalty factor 22

23

After running the algorithm on the DTA network, we visually analyzed the results for 24

discontinuities and also matches that exceeded 300ft from their respective GTFS trackpoints—a 25

hint that an incorrect roadway may have been chosen. We discovered that our DTA map was 26

missing over 80 links that were critical for properly modeling bus movements. These links 27

pertained to transit centers, mall parking lots, college campuses, turnarounds, parks, residential 28

roads, and an airport loop. Even though our algorithm could sometimes route around missing 29

links, most of these cases were surrounded by single-direction freeway service roads, one-way 30

loop roads, and large subdivisions that made proposed detours infeasible. 31

A second model run, using the same parameters, was conducted using a repaired network, 32

yielding 2 discontinuities. Further analysis revealed that the links closest to these two GTFS 33

trackpoints had heavy curvature and were too far for POINT_ON_LINK candidates to be found. 34

(These links were curved freeway interchange flyovers). A third refinement run, which re-35

evaluates results within 3000ft of a discontinuity while keeping all else equal, was run using the 36

following parameters: 37 38

qp = 25; k = 1600; kp = 1600; qe = 25 39

40

After this refinement, all 170 GTFS shapes were matched with no discontinuities. We 41

then inspected 10 of these for correct routing. Of the 3068 GTFS trackpoints inspected covering 42

a distance of 93 miles, 32 were incorrectly matched and reflected slight detours. These were all 43

caused by missing topology in the underlying map, including a missing link in a transit center, a 44

missing path through a parking lot, and a missing residential road. There were also 283 45

misalignments caused by lack of link curvature; Figure 4 shows the worst cases of this. In all 46

cases, the correct path was regained relatively quickly. 47

Perrine, Khani, and Ruiz-Juri 12

1

(a) (b) 2

(c) (d) 3 4

FIGURE 4 The most extreme mismatches between GTFS trackpoints and map-matching 5

to the underlying DTA map. a) GTFS trackpoints follow the curvature of the roadways, 6

but b) matched points do not look at link curvature despite being connected correctly at 7

nodes. This poses no problem for this particular scenario. c) GTFS trackpoints follow the 8

freeway, but because of lack of matching to link curvature, d) the matched points slip off 9

for a moment to the adjacent service road. 10 11

The results from the refinement run were used to conduct the bus stop matching. For 10 12

manually inspected GTFS shapes stops were observed to be closely map-matched. However, 13

stop placement is only as accurate as the links of the underlying map, and several did not follow 14

link curvature. Also, because of insufficient GTFS shape point resolution at bus route endpoints, 15

5 of the stops were moved because the closest, correct links were not included in the network 16

graph subset. These movements were less than 100ft, often from one cross street of an 17

intersection to another. A remedy to this may be to include a strategic set of connected links into 18

the network graph subset. 19

20

21

Perrine, Khani, and Ruiz-Juri 13

Experiment 2: Matching GPS Points to a DTA Network 1 When we use our map-matcher to project these GPS tracks to our underlying DTA map, all GPS 2

tracks match continuously using the same parameters as that of the initial run in the first 3

experiment. Upon analysis, it is found that 291 matched points across continuous segments of 4

three of the routes are matched to underlying links that are more than 300 ft away from the GPS 5

trackpoints. Closer inspection reveals that the lack of link curvature again causes this inaccuracy, 6

but never reflects an incorrect traversal through the underlying DTA map topology. 7

8

Experiment 3: Matching GTFS Shapes to an OpenStreetMap Network 9 We applied the same map-matching experiment as described in the first experiment. After the 10

first run, we discovered 22 discontinuities and acknowledged the possibility of link curvature-11

related inaccuracies again, as well as errors within the map. However, without investigating 12

further, we reran the map-matcher in a refinement cycle with the relaxed set of parameters. 13

When this completed, there remained 3 discontinuities. 14

The investigation of the 3 discontinuities revealed 2 topology errors that presumably 15

came from the original OSM database. For the same set of 10 GTFS shapes used in the first 16

scenario, 6 out of 3068 trackpoints were misaligned to link curvature, and 20 were found to be 17

matched incorrectly. This performance greatly improves upon the success rate of the original 18

DTA model network, which encouragingly highlights advantages of the crowd-sourced network 19

over the more expensive “in-house” DTA network. While the service road routing problem 20

pictured in Figure 4 reoccurred, again because link curvature was not considered, the algorithm 21

was able to return to the correct alignment within a small number of mismatched trackpoints. 22

Additionally, the performance of the bus stop matching run was similar to that of the first 23

experiment. 24

25

Summary of Results 26 The results discussed in previous sections suggest that our map-matching algorithm can 27

successfully create graph representations of both GPS and GTFS data that match an underlying 28

network. Error in trackpoints and the underlying map affects the outcome of the map-matcher in 29

various ways. When the trackpoints or the underlying map coordinates are biased, the matches 30

are likely to be biased, possibly causing selection of adjacent roads in central districts, or service 31

roads instead of freeways. Similar errors may be caused by our current lack of attention to link 32

curvature in the underlying map. If these errors exceed the point search radius kp, then path 33

discontinuities may occur. We can often detect the presence of these inaccuracies when reference 34

distances dr exceed a threshold that is set to be slightly greater than maximum expected GPS 35

noise. While a remedy for these kinds of inaccuracies may require techniques found in other 36

literature (e.g. (9,16)), a remedy for discontinuities is to re-run map-matching operations around 37

the problematic trackpoints using relaxed parameters. This route refinement approach worked 38

effectively in the cases tested in this study. 39

Serious diversions or breaks in continuity may happen if the underlying map is missing 40

needed links. If alternate links are near a discontinuity, the matched path may use them, which 41

can be detected by examining dr. The remedy is to fix the underlying map and then rerun map-42

matching in the region around the discontinuity. 43

44

45

Perrine, Khani, and Ruiz-Juri 14

CONCLUSIONS 1 This paper presents a methodology to facilitate the use of GTFS data when building multimodal 2

graphs for transportation planning applications. The proposed map-matching algorithm creates a 3

graph representation of GTFS shape data compatible with an underlying roadway network graph. 4

It can also be extended to map GPS trajectories, which may be used for model validation 5

purposes by providing, among other data, a reference travel time along selected corridors. The 6

algorithm can assume that GPS points are fed one at the time, and maintains candidate paths 7

between suggested GPS points. This, in combination with the use of an “error region” around 8

matched points, enables it to successfully handle GPS and network data of varying quality. 9

Numerical examples were conducted to test the performance of the methodology when 10

matching GTFS and GPS data to a real planning network, and GTFS data to a network built 11

based on minimally-cleaned OpenStreetMap data (20). After two model runs, 170 GTFS shapes 12

were matched to the DTA network with no discontinuities and minor alignment errors in a subset 13

of manually analyzed routes. The performance was even more impressive for the GPS data set, 14

which exhibits much higher point density, and when utilizing a highly refined OSM network. For 15

the latter, only 6 out of 3068 trackpoints were found to be misaligned (i.e. less than 0.2% 16

inaccuracy), with the algorithm regaining the correct route a few trackpoints after the 17

problematic area. 18

There are a number of opportunities to extend this work in future efforts. The accuracy of 19

our map-matcher would greatly benefit from the ability to observe link curvature. Additionally, 20

some of the heuristic improvements identified in the literature based on movement classification, 21

comparison of headings, and observation of speeds can also assist in better estimating the 22

“correct” link in the face of ambiguity. This can assist in the realtime use of our algorithm, where 23

the latency for resolving ambiguity among multiple path candidates can be shortened. Third, we 24

desire to improve the procedure for repairing bad underlying topology. While we observe several 25

problems as being detectable, we may be able to leverage prior work to provide capabilities for 26

automatically suggesting or applying fixes to ensure later success. Finally, the computational 27

efficiency of this process has not yet been addressed, as the current performance is adequate for 28

the planning applications typically considered by the authors. However, identifying efficient 29

approaches to accomplish the path search can considerably increase the speed of this process, 30

making it more suitable for real-time applications and general use. 31

Overall, the map-matching approach presented in this effort provides a promising 32

approach to enhancing modelers’ abilities to take advantage of emerging data sources. The 33

processing of GTFS data in combination with OSM information can set the basis for a 34

transformative approach to building and validating multimodal transportation models. Given the 35

ambiguities often found when processing GTFS data, and the numerous ways that GPS data (and 36

advanced model results) may be aggregated (spatially and temporally) we believe that there is 37

great value in proposing open source tools that can be used by both practitioners and researchers. 38

A Python-based open source implementation of this work is available online at 39

http://ctr.utexas.edu/nmc/nmc-map-matcher. 40

41

ACKNOWLEDGEMENTS 42 This research was partially supported by the U.S. Department of Transportation through the 43

Data-Supported Transportation Operations and Planning (D-STOP) Tier 1 University 44

Transportation Center, as well as the Capital Area Metropolitan Planning Organization 45

(CAMPO) and the Texas Department of Transportation. 46

Perrine, Khani, and Ruiz-Juri 15

REFERENCES 1 1. Khani, A., B. Bustillos, H. Noh, Y. C. Chiu, and M. Hickman. Modeling Transit and

Intermodal Tours in a Dynamic Multimodal Network. Transportation Research Record:

Journal of the Transportation Research Board, 2014 (in press).

2. Khani, A., E. Sall, L. Zorn, and M. Hickman. Integration of the FAST-TrIPs Person-

Based Dynamic Transit Assignment Model, the SF-CHAMP Regional, Activity-Based

Travel Demand Model, and San Francisco’s Citywide Dynamic Traffic Assignment

Model. In Transportation Research Board 92nd Annual Meeting, No. 13-4601, 2013.

3. Li, J. Match Bus Stops To A Digital Road Network By The Shortest Path Model.

Transportation Research Part C: Emerging Technologies, Vol. 22, 2012, pp. 119-131.

4. Meng, Y. Improved Positioning of Land Vehicle in ITS Using Digital Map and Other

Accessory Information. Department of Land Surveying and Geoinformatics, Hong Kong

Polytechnic University, PhD Thesis, 2006.

5. Schuessler, N., and K. W. Axhausen. Map-Matching of GPS Traces on High-Resolution

Navigation Networks Using the Multiple Hypothesis Technique (MHT). In

Arbeitsberichte Verkehrs- und Raumplanung, Vol. 568, IVT, ETH Zurich, 2009.

6. Puchalsky, C. M., D. Joshi, and W. Scherr. Development of a regional forecasting model

based on Google transit feed. In Transportation Research Board 91st Annual Meeting.

No. 12-0779, 2012.

7. Yuan, J., Y. Zheng, C. Zhang, X. Xie, and G. Z. Sun. An Interactive-Voting Based Map

Matching Algorithm. In Proc. 2010 Eleventh International Conference on Mobile Data

Management, 2010.

8. Quddus, M. A., W. Y. Ochieng, and R. B. Noland. Current Map-Matching Algorithms

for Transport Applications: State-of-the Art and Future Research Directions.

Transportation Research Part C: Emerging Technologies, Vol. 15, No. 5, 2007, pp. 312-

328.

9. Pyo, J. S., D. H. Shin, and T. K. Sung. Development of a Map Matching Method Using

the Multiple Hypothesis Technique. In Intelligent Transportation Systems, 2001, pp. 23-

27.

10. Ahuja, R. K., T. L. Magnanti, and J. B. Orlin. Network flows: theory, algorithms, and

applications. 1993.

11. Ordonez, S., and A. Erath. Semi-Automatic Tool for Map-Matching Bus Routes on High-

Resolution Navigation Networks, 2011.

12. Melson, C. L., S. D. Boyles, and R. B. Machemehl. Modeling the Traffic Impacts of

Transit Facilities Using Dynamic Traffic Assignment. In Transportation Research Board

92nd Annual Meeting, No. 13-2267, 2013.

13. Weiss, A., M. S. Mahmoud, P. Kucireck, and K. N. Habib. Issues and strategies involved

in developing agent-based multimodal network simulation model for transportation

planning: Lessons from a case study on the Greater Toronto and Hamilton Area. In Proc.

2013 Transportation Association of Canada (TAC) Conference, 2013.

14. Van Velden, J. A Large-Scale Multi-Modal Implementation of MATSim for the Nelson

Mandela Bay Metropole, 2013.

15. Velaga, N., M. A. Quddus, and A. L. Bristow. Developing an Enhanced Weight-Based

Ttopological Map-Matching Algorithm for Intelligent Transport Systems. Transportation

Research Part C: Emerging Technologies, Vol. 17, No. 6, 2009, pp. 672-683.

16. Quddus, M. A., W. Y. Ochieng, L. Zhao, and R. B. Noland. A General Map Matching

Perrine, Khani, and Ruiz-Juri 16

Algorithm for Transport Telematics Applications. GPS Solutions, Vol. 7, No. 3, 2003, pp.

157-167.

17. Marchal, F., J. K. Hackney, and K. W. Axhausen. Efficient Map-Matching of Large GPS

Data Sets: Tests on a Speed Monitoring Experiment in Zurich. In Transportation

Research Record: Journal of the Transportation Research Board, No. 1935,

Transportation Research Board of the National Academies, Washington, D.C., 2005, pp.

93-100.

18. Blazquez, C. A., and A. P. Vonderohe. Simple Map-Matching Algorithm Applied to

Intelligent Winter Maintenance Vehicle Data. Transportation Research Record: Journal

of the Transportation Research Board, Vol. 1935, 2005, pp. 68-76.

19. Bentley, J. L. Multidimensional binary search trees used for associative searching.

Communications of the ACM, Vol. 18, No. 9, pp. 509-517, 1975.

20. OpenStreetMap. OpenStreetMap. http://www.openstreetmap.org/about. Accessed July

29, 2014.

21. Chiu, Y. C. et al. Dynamic Traffic Assignment: A Primer. In Transportation Research E-

Circular (E-C153), 2011.

22. Khani, A., S. Lee, M. Hickman, H. Noh, and N. Nassir. Intermodal Path Algorithm for

Time-Dependent Auto Network and Scheduled Transit Service. Transportation Research

Record: Journal of the Transportation Research Board, Vol. 2284, pp. 40-46, 2012.

23. Capital Metro. Capital Metro: Austin Public Transportation. https://www.capmetro.org/.

Accessed July 29, 2014.

24. Network Modeling Center. Network Modeling Center, Center for Transportation

Research. http://ctr.utexas.edu/nmc/. Accessed July 29, 2014.

25. CAMPO. Home: CAMPO: Capital Area Metropolitan Planning Organization.

http://www.campotexas.org/. Accessed July 29, 2014.

26. GTFS Data Exchange. GTFS Data Exchange. http://www.gtfs-data-exchange.com/.

Accessed July 30, 2014.

27. Jacobs Engineering Group Inc. Roadway Congestion Analysis: Performance Report and

Information System, Fall 2010. Project #WFXK3800. Capital Area Metropolitan

Planning Organization (CAMPO), 2010.

28. Greenfeld, J. S. Matching GPS Observations to Locations on a Digital Map. In

Transportation Research Board 81st Annual Meeting, 2002.

29. Khani, A., M. Hickman, and H. Noh. Trip-Based Path Algorithms Using the Transit

Network Hierarchy. Networks and Spatial Economics, 2014 (in press).

1