Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
A DATA-DRIVEN SUPPORT SYSTEM FOR AIRCRAFT TRAJECTORY PREDICTION IN THE NATIONAL AIRSPACE SYSTEM
By Travis John Gonzalez
B.S. in Aerospace Engineering, December 2008,
Embry-Riddle Aeronautical University
M.S. in Systems Engineering, May 2011, George Washington University
Dissertation submitted to
The Faculty of
The School of Engineering and Applied
Science of the George Washington University
In partial fulfillment of the
requirements for the degree of Doctor
of Philosophy
May 17, 2015
Dissertation directed by
Timothy Eveleigh
Professor of Engineering Management and Systems Engineering
ii
The School of Engineering and Applied Science of The George Washington University
certifies that Travis John Gonzalez has passed the Final Examination for the Degree of
Doctor of Philosophy as of November 12, 2014. This is the final and approved form of
the dissertation.
A DATA-DRIVEN SUPPORT SYSTEM
FOR AIRCRAFT TRAJECTORY PREDICTION IN THE
NATIONAL AIRSPACE SYSTEM
Travis John Gonzalez
Dissertation Research Committee:
Timothy Eveleigh, Professor of Engineering Management and Systems Engineering,
Dissertation Director
Thomas Holzer, Professor of Engineering Management and Systems Engineering,
Committee Member
Thomas Mazzuchi, Professor of Engineering Management and Systems Engineering and
Decision Sciences, Committee Member
Edward Murphree, Professor of Engineering Management and Systems Engineering,
Committee Member
Shahram Sarkani, Professor of Engineering Management and Systems Engineering,
Committee Member
iii
Dedication
To my mother, father, and grandparents
For without their enduring support and investment in me,
This accomplishment would not be possible
iv
Acknowledgements
I would first like to thank my dissertation advisors Timothy Eveleigh D.Sc.,
Thomas Holzer D.Sc., and Shahryar Sarkani Ph.D. Throughout the dissertation process,
they have given me the neccesary input and aided in properly managing research that
stretches across multiple technical domains. At times, when I was at a crossroads with
research direction, they worked with me and redirected me towards a positive trajectory
for success.
I would also like to thank the MITRE Corporation for providing the available data
and distributed computing platform for without these resources, this research would not
be possible. Thanks are also necessary for my colleagues, and friends in the aviation
industry (i.e. pilots, air traffic controllers, etc.) who provided valuable operational
expertise for model development. I am also grateful for the support and friendship of the
entire GWU Ph.D. cohort that worked alongside me the past few years.
Last but certaintly not least, I would like to thank my fiance, Melissa, for
supporting me throughout the entire process. Her help with structuring my time
management while I worked full-time and pursued this doctoral degree is one of the
reasons I have made it to the finish line!
v
Abstract of Dissertation
A DATA-DRIVEN SUPPORT SYSTEM FOR AIRCRAFT TRAJECTORY
PREDICTION IN THE NATIONAL AIRSPACE SYSTEM
Although a recent audit report from the U.S. Department of Transportation shows
declining flight delays over the last decade, scheduled U.S, passenger airlines still
accrued 92 million system delay minutes that were estimated to result in $7.2 billion in
direct aircraft operating costs in 2012. To address these flight delays, the Federal
Aviation Administration (FAA) is implementing the Next Generation Air Transportation
System (NextGen) which aims to transform air traffic operations to meet future growth.
A core component of NextGen is Trajectory Based Operations (TBO), with goals that
include improving throughput, flight efficiency, flight times, and schedule predictability
through better prediction and coordination of aircraft trajectories in the National Airspace
System (NAS). In this research, a novel approach is presented by constructing a Dynamic
Bayesian Network (DBN) to accurately quantify delay uncertainty for airport origin-
destination (OD) pairs. Since the size of the conditional probability tables (CPTs) grows
exponentially as the number of variables increase in the DBN, parameter learning was
developed within the Hadoop MapReduce distributed computing framework. Hadoop
aids in the mitigation of scaling concerns which significantly reduce the computational
time necessary for air traffic decision support. Experiments are performed using a fused
historical aircraft radar dataset that improves on current data limitations to dynamically
predict the probability of a delay and its causal factor(s) for the strategic prediction
horizon. The predictive performance of the model is evaluated by focusing on major OD
pairs in the NAS, and the results show flight delay time was predicted accurately
vi
approximately 92% of the time for the two hour prediction horizon. Furthermore, the
results from the delay model are integrated into a developed real-time trajectory predictor
that recommends which route an aircraft should fly given both historical and real-time
flight delay information combined with data related to the aircraft and the external
environment. This research is the first known attempt that combines elements of systems
engineering (SE), operations research (OR), and distributed computing concepts to derive
a data-driven decision support system for air traffic decision makers under operational
uncertainty.
vii
Table of Contents
Dedication .................................................................................................................. iii
Acknowledgements ................................................................................................... iv
Abstract of Dissertation ............................................................................................. v
Table of Contents ..................................................................................................... vii
List of Figures ............................................................................................................ x
List of Tables ............................................................................................................. xi
List of Acronyms...................................................................................................... 12
Terms and Definitions ............................................................................................. 13
Chapter 1: Introduction ............................................................................................ 1
1.1 Overview ......................................................................................................... 1
1.2 NextGen Explained ........................................................................................ 2
1.3 Trajectory Predictor Technology ................................................................. 4
1.4 Statement of the Problem .............................................................................. 8
1.5 Research Importance and Objectives .......................................................... 12
1.6 Research Scope ............................................................................................. 13
1.7 Dissertation Organization ........................................................................... 13
Chapter 2: Literature Review ................................................................................. 15
2.1 Overview ....................................................................................................... 15
2.2 Trajectory Based Operations Research ..................................................... 15
2.3 Flight Delay Research ................................................................................. 18
2.4 Literature Summary ................................................................................... 23
viii
Chapter 3: Development of Flight Delay Model ................................................... 25
3.1 Overview ....................................................................................................... 25
3.2 Background Knowledge .............................................................................. 25
3.2.1 Actor Interactions in the National Airspace System ................................ 26
3.2.2 Delay Prediction Horizons and Classification Thresholds ...................... 27
3.2.3 Dynamic Bayesian Networks ...................................................................... 28
3.3 Problem Domain ........................................................................................... 31
3.4 Data Processing ............................................................................................ 35
3.4.1 Data Segmentation ....................................................................................... 39
3.4.2 Segment Metadata ........................................................................................ 41
3.4.3 Data Fusion Process ..................................................................................... 41
3.4.4 Track Smoothing and Filtering ................................................................... 44
3.4.5 Flight Metadata ............................................................................................ 44
3.4.6 Data Quality .................................................................................................. 45
3.5 DBN Formalism Extensions ........................................................................ 46
3.6 DBN Structure Derivation .......................................................................... 48
3.7 DBN Parameter Learning ........................................................................... 50
3.7.1 MapReduce for Massive Scale Distributed Computations ...................... 51
3.7.2 ALEM on MapReduce (ALEMMR) .......................................................... 52
Chapter 4: Empirical Experiments ........................................................................ 55
4.1 Experimental Design ................................................................................... 55
4.2 Empirical Experiments Overview .............................................................. 56
4.2.1 Experiment 1: ALEMMR Flight Delay Application ................................ 57
ix
4.2.2 Experiment 2: Varying the Measurement Rate ........................................ 59
4.2.3 Experiment 3: Causal Delay Prediction Results ....................................... 61
4.2.3 Experiment 4: Trajectory Route Selection Decision Support System .... 62
4.2.4 Validation & Insight .................................................................................... 66
Chapter 5: Conclusions & Future Work ............................................................... 68
5.1 Conclusions .................................................................................................. 68
5.2 Recommendations for Future Work .......................................................... 68
5.2.1 Air Traffic DBN Application ...................................................................... 68
Appendix A: Data-Fused Algorithms .................................................................... 74
Appendix B: Data Schema ...................................................................................... 92
x
List of Figures
Figure 1-1: NextGen 2025 Flight Profile [1] ......................................................................3 Figure 1-2: Trajectory Predictor Technology- Process Flow [3] ........................................5 Figure 1-3: Trajectory Predictor Technology- Data Flow [3] .............................................8
Figure 1-4: Research Environment Complexity ...............................................................10 Figure 1- 5: Predictability challenges for airport and enroute delay factors.. ..................11 Figure 2-1: Benefits of DBN for Decision Support .........................................................23
Figure 3-1: Background and overview of steps for modeling delay with a DBN ............25 Figure 3-2: An abstract representation of decision making ..............................................27
Figure 3-3: Abstract example of DBN that represents the influences between NAS …..31 Figure 3-4: Problem domain system hierarchy .................................................................32 Figure 3-5: Threaded Track gate-to-gate flight data sources used for each phase ...........37
Figure 3-6: Data Processing Steps. ...................................................................................38 Figure 3-7: Data Fusion Workflow ...................................................................................45 Figure 3- 8: The five components of the DBN extended formalism ................................47
Figure 3-9: Extended formalism for a second-order DBN. ..............................................49 Figure 3- 10: ALEM application with populations of DBNs on MapReduce. .................54
Figure 4- 1: Track dispersion using a subset of threaded track historical radar data. ......56
Figure 4-2: Varying the size of training samples for learning the DBN. ..........................59 Figure 4-3: Data-Driven Decision Support Architecture ..................................................64
Figure 4-4: NAS route selection based on delay prediction time .....................................66
Figure A-1: Threaded Track Data Fused Algorithm Oozie Workflow. ...........................75
xi
List of Tables
Table 3- 1: DBN Fixed and Temporal Categories of Input Variables ..............................34
Table 4- 1: The Classification Threshold for Flight Delay Using a Confusion Matrix ....60 Table 4- 2: Dynamic Bayesian Network & Static Bayesian Network Results .................62
xii
List of Acronyms
ATC Air Traffic Control
ATDM Air Traffic Decision-Maker
BN Bayesian Network
DBN Dynamic Bayesian Network
FAA Federal Aviation Administration
FC Flight Crew
FP Flight Plan
HMM Hidden Markov Model
MC Markov Chain
MR MapReduce
NAS National Airspace System
NextGen Next Generation Air Transportation System
OR Operations Research
PBM Process-Based Management Charts
SE Systems Engineering
TBO Trajectory-Based Operations
TFM Traffic Flow Management/Manager
TFMI Traffic Flow Management Initiative
xiii
Terms and Definitions
The following, are key terms and definitions that are used throughout the study:
Component – Composed of multiple parts; a clearly identified part of the product
being designed or produced.
Element- An integrated set of components that comprise a defined part of a
subsystem.
Flight Plan – A subset of the flight object information used for flight planning
prior to departure that carries basic information about the flight and route to be
followed.
Part- The lowest levels of separately identifiable items within a system—are not
normally subject to disassembly without destruction or impairment of designed
use.
Program- projects of all sizes and complexity, ranging from a System to its
individual parts.
System- An integrated set of constituent parts that are combined in an operational
or support environment to accomplish a defined objective. These parts include
people, hardware, software, firmware, information, procedures, facilities,
services, and other support facets.
Subsystem- A system in and of itself (reference the system definition) contained
within a higher level system. The functionality of a subsystem contributes to the
overall functionality of the higher level system. The scope of a subsystem’s
xiv
functionality is less than the scope of functionality contained in the higher level
system.
Systems Engineering (SE) – a discipline that concentrates on the design and
application of the whole (system) as distinct from the parts. It involves looking at
a problem in its entirety, taking into account all the facets and all the variables and
relating the social to the technical aspects.
Traffic Flow Management Initiative (TFMI) – techniques used to manage
demand with capacity in the NAS.
Trajectory-Based Operations (TBO) – NextGen Portfolio of research that focus
on improving throughput, flight efficiency, flight times, and schedule
predictability through better prediction and coordination of aircraft 4-dimensional
trajectories (4DT) which consider lateral, longitudinal, time and space dimensions
1
Chapter 1: Introduction
1.1 Overview
Trajectory Based Operations (TBO) is the NextGen concept of improving
throughput, flight efficiency, flight times, and schedule predictability through better
prediction and coordination of aircraft 4-dimensional trajectories (4DT) which consider
lateral, longitudinal, time and space dimensions [1]. TBO uses the 4DT to both
strategically manage and tactically control surface and airborne operations. Implementing
TBO effectively requires understanding the interactions and trade-offs between proposed
TBO decisions, and sources of uncertainty. For TBO and the regional and local NAS air
traffic controllers it would serve, understanding system impacts and relationships have
proved difficult for analysts and decision-makers to visualize. The mathematics and
concepts of stochastic optimal control are suited to detailed analyses, but they are poorly
suited to providing accessible intuition and explanations to identify TBO characteristics
and trade-offs. Currently no analytical framework for an integrated understanding and
measurement of TBO uncertainty for either the strategic (2-15 hours) or tactical (less than
2 hours) prediction horizon exists- thus stems the importance and high level objective of
this study.
A strategic management decision in TBO is to predict the delay time of aircraft that
are flying from an origin to destination (city pair) airport under operational and
environmental uncertainty. This study achieves this task by developing a dynamic
Bayesian network (DBN) model that infers delay time and delay causal variables which
impact flight time based on a fused set of historical radar track data measurements for
2
given city pairs. Furthermore, this study successfully prioritizes aircraft routes in order to
aid air traffic decision makers in recommending the best route to take in regard to
minimizing delay based on historical and real-time data. This is the first step towards an
application of a data-driven DBN in a dynamic system that can help govern air traffic
decision makers (ATDMs) implementation of traffic management initiatives, air traffic
directives, and policies that are currently based on subjective measures. The end state of
this ongoing research provides a means of decision support in the presence of uncertainty
for air traffic operational decisions- scaling from a local focus (one airport); to a NAS
system-wide focus.
1.2 NextGen Explained
The vision of the NextGen is to build on near- and mid-term (through 2018) systems
developed by the FAA and other government partners, to improve performance,
prediction, and capacity of the National Airspace System (NAS) necessary to meet 2025
requirements [1]. More specifically, NextGen will allow aircraft to safely fly in closer
proximity on more direct routes, reducing delays and providing benefits for the
environment through reductions in carbon emissions, fuel consumption and noise.
Implementation of NextGen will be accomplished through a series of Operations
Improvement (OI) Increments that provide individual benefits and combine to provide a
paradigm change in the way the NAS operates. The OI Increments are often
interchangeable with the term “capabilities.” Related OI Increments are managed in
seven implementation portfolios [2]. The FAA portfolios include:
3
1. Trajectory Based Operations (TBO)
2. High Density Airports (HD)
3. Flexible Terminals and Airports (FLEX)
4. Collaborative Air Traffic Management (CATM)
5. Reduce Weather Impact (RWI)
6. Safety, Security and Environment (SSE)
7. Transform Facilities (FAC)
The NAS Enterprise Architecture establishes the foundation which evolution of the
NAS can be explicitly understood and modeled. It helps to provide a framework for
managing change in the NAS by providing a unifying approach and common language.
OIs represent distinct functional improvements to the NAS that provide direct benefits to
the user community. Figure 1-1, illustrates how the NextGen concept can create
improved capabilities for each flight phase in a typical flight profile.
Figure 1-1: NextGen 2025 Flight Profile [1]
4
Research activities on NextGen technology development, integration,
implementation and safety must be accomplished to achieve the benefits mentioned
above. The interdependencies that exist between TBO implementation portfolio and
flight delay prediction warrant analysis, not only at the local level, but at the system level
which prior research fails to efficiently achieve from a computational and accuracy
perspective [2]. Therefore the model should have an ability to accurately predict not only
flight delays and causal variables, but also prioritize aircraft routes to and from airports in
order to aid air traffic decision makers in recommending the best route to take in regard
to minimizing delay based on historical and real-time data.
1.3 Trajectory Predictor Technology
The FAA [3] describes Trajectory Predictor Technology (TPT) as the predicted path
an aircraft will follow through airspace. Aircraft trajectory can be described
mathematically by a time-ordered set of aircraft state vectors. This computation is
performed based on input data comprising of the current state and future intent of the
aircraft. The TPT uses models for aircraft performance, meteorological conditions, and
airspace adaptation data to perform this computation [3].
TPT can be incorporated into a client application to support various applications for
an air traffic based decision support system. These decision support systems will aid in
providing data, advisories, and recommended resolutions to ATM system. A diagram of
the typical process flow within a common TPT structure is described in
EuroControl/FAA Action Plan 16, and is shown in Figure 1-2. The TPT client application
receives data inputs from adaptation, weather, and aircraft models. The TP application
5
consists of the following four component processes: Preparation, Computation, Update,
and Export.
Figure 1-2: Trajectory Predictor Technology- Process Flow [3]
1.3.1 Trajectory Predictor Processes
The preparation process in [3], constructs initial conditions and a Behavior Model
that outputs a list of aircraft movements. Specifically, the Behavior Model details how an
aircraft will meet trajectory constraints within the user-specified criteria. As described in
[3], the following are three critical processes within the preparation process that aid in the
development of a simulated aircraft trajectory:
State Processing: The State Processing generates the Initial Conditions for
trajectory generation.
Flight Intent Processing: Flight intent processing operates on a Behavior Model,
or if the Behavior Model is not defined, it will create one from the Initial
6
Conditions and Flight Intent. The Flight Intent processing evaluates the Initial
Aircraft State, both laterally and vertically, against the set of constraints defined
in the Flight Intent. The output of the Flight Intent is comprised of the Initial
Conditions and the complete set of constraints that must adhered to during
trajectory generation.
Behavior Model Generation: The Behavior Model consists of ordered lists of
maneuvers that the aircraft will perform to meet the trajectory constraints. The
Behavior Model is internal to the TP and is built from the Initial Condition and
Flight Intent information.
The computational process calculates the predicted trajectory based on the predefined
Behavior Model. The update process monitors the conformance of the computed
predicted trajectory. The update process checks to see if the computed trajectory is in
conformance with the trajectory constraints specified in the Input Flight Intent. When the
trajectory is out of conformance, the Update process will re-compute the trajectory using
the updated Behavior Model and/or Flight Intent data.
Finally, the export process distributes the TP results to client processes. These client
processes will receive predicted trajectory data, error messages associated with the data,
and an updated Behavior Model when the trajectory does not match all the predefined
constraints. The export process sends its results to the output clients. These results
include the current predicted trajectory, an updated Behavior Model, and any relevant
error messages.
7
1.3.2 Trajectory Predictor Data Flow
Figure 1-3 depicts a diagram of a typical data flow a TP deployment starting from the
client inputs to the predicted trajectory (client output). Client inputs for a TP include:
Aircraft State: The Initial Aircraft State represents the aircraft state data at the
start of the trajectory computation cycle and is composed of, but not limited to,
the 3D aircraft position and associated time.
Flight Intent: Flight Intent is the element of the Flight Object that contains the
constraints and preferences applicable to the flight. It describes aircraft, airport,
and airspace constraints and operator preferences.
Behavior Model: The Behavior Model contains a list of maneuvers that describes
how the aircraft intends to satisfy the trajectory constraints and user preferences.
Processing Strategies and Configuration Control: The Processing Strategies
specifies how the predictor will conform to the constraints and preferences
identified in the Flight Intent. The Configuration Control defines processing
characteristics such as aircraft performance models and the functionality of the
integration and export functions.
The research and methods proposed in this dissertation focused on enhancing the
flight intent element within the trajectory preparation process that influences the
behavior model and provides the TP with the intended maneuvers that in turn creates
the predicted trajectory. In addition, this research focused on choosing the correct
method that would provide the functionality to iteratively learn aircraft intent and
behavior over time. The specific application for this research both prioritizes aircraft
8
routes and predicts both the flight delay time and causal reasons in order to aid air
traffic decision makers in recommending the best route to take in regard to
minimizing delay based on historical and real-time data.
Figure 1-3: Trajectory Predictor Technology- Data Flow [3]
1.4 Statement of the Problem
A challenge for flight delay prediction is the difficulty of transitioning research
concepts into systems and operations. One important aspect of this challenge ties to the
9
range of operational variations for which we develop our concepts and systems. Early
research concepts are conceived with too few of the real-world variations taken into
account mostly due to either limitations of computational power or operational
knowledge [1]. In a program needing to make system trade-offs for development,
promised benefits must reflect a broader range of routine and reasonable behaviors than
in research – but such trade-offs can be quite difficult to quantify, and it is difficult to
reflect the full range of operational events.
In contrast, the operational world embodies everything that the real-world throws at
us. This is where complexity and unpredictability conspire to demonstrate how poorly
our concepts and systems and procedures can fare when confronted with things we didn’t
expect in research or in development. The operations world is not just reasonable and
routine: it is the entire gamut of everything that happens whether we’ve anticipated it or
not.
A big distinction between these worlds and one that often impacts modernization and
transition to concepts like flight delay prediction is how predictable the world is that our
concepts or systems or operations have to deal with. Limits to predictability and
challenges to transition are both addressed if we focus more closely on uncertainties in
operations by developing a framework that enhances understanding of the impacts of
uncertainty and the quantitative relationships between uncertainty factors. Figure 1-4
depicts a graphical representation detailing the challenge researchers typically endure
when attempting to model a traditionally stochastic environment.
10
Figure 1-4: Research Environment Complexity
A second challenge involves having better quality data from historical sources about
the aircraft and its environment, and using that information to improve ATDMs
prediction at a more granular level that recommends which route an aircraft should fly
given both historical and real-time flight delay information. Researchers can factor in the
type of aircraft, the lateral path, and make pretty good predictions; however, there are
many factors that might happen during a flight that are not very predictable along with
data quality issues along the way- and these represent some of the challenges.
11
Figure 1- 5: Predictability challenges for airport and en-route delay factors. Not all delay
factors are included.
As shown in [4], there are a number of causal delay factors that interfere with
flight predictability. Some of these can be addressed through better standards or shared
planning, and others can be predicted to some degree and compensated for. Others,
though, are simply unknowable until they occur. Things like a Flight Management
System (FMS) issue that requires the aircraft to fly slower than expected, or an
unpredicted thunderstorm en-route, or a traffic flow management initiative (TFMI)
restriction (See Terms and Definitions) that is issued at the last minute due to a
temporarily blocked runway at the destination airport. Figure 1- 5 depicts a visual
representation of some of the more common challenges in flight delay prediction. In
truth, there are a nearly infinite number of factors that might happen that are not very
predictable – and these represent challenges to the development of any air traffic-based
model [5].
As a result of these challenges, the research performed in this dissertation
combined the best practices in trajectory and flight prediction to create a new data-driven
12
decision support tool. This tool combines more data (both historical and real-time) about
the aircraft’s behavior, the aircraft operator’s intent, and the external environment than
preceding researchers and provides decision support applications that can be used by
succeeding researchers to build off of.
1.5 Research Importance and Objectives
The objectives of this research focused on developing the big data-driven DBN
development to represent and predict flight delay and the associated causal and temporal
nature of delay uncertainty based off a novel fused historical dataset. This research can be
broken down into the following sub-objectives:
1. Develop a Dynamic Bayesian network (DBN) structure for the air traffic domain
that can continuously be developed to answer complex operational questions.
2. Learn DBN parameters from a fused set of aviation data on a big data parallel
computing platform that could not be computationally achieved using
conventional approaches.
3. Determine the optimal prediction horizon and classification threshold (See
Experiment 2: Varying the Measurement Rate) for the flight delay prediction
model.
4. Provide accurate prediction results for both delay and delay causal variables
greater than 80%1 of the time.
5. Integrate results of the flight delay model (if successful) into a developed real-
1 80% prediction accuracy threshold was taken from a general interpretation of the FAA’s
model/simulation standards and aligns with the 95th percentile of accuracy results from related prior art.
13
time trajectory predictor that recommends which route an aircraft should fly given
both historical and real-time flight delay information combined with data related
to the aircraft and the external environment (“other data” discussed extensively in
Section 3.4).
1.6 Research Scope
Because air traffic research (specifically flight delay) integrated with big data
technologies (such as Hadoop) are still in its early stages, there are not many models of
this nature being proposed currently. For that reason, this study does not try to directly
compare the accuracy of the proposed model against other existing models. Rather, the
scope of this research centers on creating a new approach to scale probabilistic graphical
models (specifically) to a computational scale that has never been performed on based on
the author’s literature review.
1.7 Dissertation Organization
This dissertation is organized as follows. Chapter 2 is the literature review on relevant
prior research to set the stage for the research effort and identify why DBNs were ultimately chosen
for this research. This chapter also takes a granular look at both trajectory and flight delay
prediction in independent sub-sections in order to portray these as two different topics (as
current researchers typically do) which this research aimed to bring together. Chapter 3
covers necessary background knowledge, the development of the big data driven DBN
methodology and stated sub-objectives. Chapter 4 validates the model based on empirical
experiments focused on both prediction accuracy and the intelligibility of the prediction
14
for flight delays, associated delay causal variables, and route trajectory. Chapter 5
provides conclusions and further research recommendations.
15
Chapter 2: Literature Review
2.1 Overview
Numerous journal articles have been published on methods for trajectory and flight
delay prediction of uncertainty in the NAS. In this chapter, some of the key studies related
to the author’s research are highlighted. All of the researchers focused on using some
type of mathematical or statistical model in order to predict aircraft trajectory and
environmental factors in a particular phase of flight. Some of the researchers attempted to
gain insight into flight delay prediction using the computed trajectory prediction.
2.2 Trajectory Based Operations Research
2.2.1 Mathematical Models in Trajectory Prediction
As discussed in Section 1.3, there are four components for the trajectory prediction
process. Of the four, this section will focus on the computation subfunction. The
preparation process brings together all the data necessary for the execution of the
trajectory prediction. Further, it is this process that is responsible for the translation of the
intent script (which this research develops) into the mathematical code used to perform
the computations. The update process ensures compliance with the aircraft intent or flight
plan and flags potential loss of spatial/temporal separation (for example) with other
trajectories. It is within the scope of this process to alter the intent script and behavior in
an attempt to regain airspace separation compliance. The export process returns the
resulting trajectory to the ground-based computer hosting the flight object. Because of the
diversity of the modeling equations, different state variables will be exported to update
16
the flight object. It should be noted that the abstraction dictates that, at a minimum, the
trajectory should be comprised of four dimensions (lateral, longitudinal, time and space
dimensions) and the geodetic coordinates of the aircraft for the duration of the prediction
time frame. It will be seen that only one of the many papers referenced complies with this
requirement. Furthermore, some of the papers do not operate in full three dimensional
spaces.
The mathematical models under study fall into one of the following classifications:
Point-Mass models: The majority of the identified research [6-17] used point mass flight
estimation models. This feature manifests the tendency toward more realistic modeling of
flight, but lacks the complexity of the kinetic model in that rotational moments are
ignored. The range of complexity varied greatly within this subset of papers. Point-mass
models signify that aerodynamic equations are in play with the above notable exception.
Kinematic Models: In these models [18-20], only position and time rate of changes are
modeled. The model is integrated forward with respect to time, acceleration to velocity,
etc.
Kinetic Models: One paper [21] in the set included moments and, therefore is classified
as full, kinetic models. Although this model represents the ultimate complexity of this
subset of documents, it is listed second to point-mass models due to the overwhelming
number of papers that used point-mass models.
17
2.2.2 TBO Uncertainty Analysis
Uncertainty in aircraft trajectory prediction has been studied in Federal Aviation
Administration (FAA)/Eurocontrol Action Plan 16 [3], which describes and quantifies
major sources of variation in end-to-end timing including departure timing, wind-field
prediction, flight intent, and flight parameters such as aircraft weight. Gaydos [4]
examined statistical uncertainty at different look-ahead times, and found that uncertainty
grew more quickly or more slowly at different points along similar trajectories in en-
route. Earlier work by Tino, Ren and Clarke [22] explains some of this spatial variability
as wind behavior, which also creates increasing uncertainty in timing at longer look-
ahead times. Mondoloni and Liang [23] described how variations due to wind observed
along a trajectory can be used to reduce uncertainty and improve predictability and
timing control during the remainder of the trajectory. However, as Rentas, Green, and
Cate have proven [24], characterizing NextGen TBO uncertainty impacts is far from
mature and more research into the causal and temporal relationships of trajectory
predictors is warranted.
2.2.3 TBO Summary
To recap, the research identified in this literature review focused first on the
mathematical models used to develop an aircraft trajectory and ensuing applications of
the developed trajectory with regards to uncertainty. In this research, the author has
chosen a point mass model for the computation subfunction (See Figure 1-3) based on
results from the aforementioned research. Refer to experiment four (Section 4.2.4), for a
more in depth description on how this comes together with the rest of the research.
18
2.3 Flight Delay Research
2.3.1 Statistical Methods
Historical approaches to learn and predict flight time delay and the associated causal
factors of delay can be categorized based on their use of either statistical linear and
nonlinear methods . The first approach in [25] and [26], use linear regression methods to
explain the influence of causal factors of delay. This approach does provide statistical
accuracy; however it has shortcomings, which include: 1) failure to include relevant
operational and environmental factors, 2) incorrect data independence assumptions, and
3) sensitivity to outliers which together- minimize its predictive power.
Vigneau [27] studied both delay and delay propagation from flight segment to
segment using conventional regression techniques. In Vigneau’s model, departure delay
depended on arrival delay from the previous segment, which then depended on the
departure delay from the previous segment. Time dimensions, airport capacity and load-
based factors were significant factors that were identified as influencing delay. The
model, however, was not applicable in the US because it treats bad weather as an
exception. In Europe, only 1~4% of delay can be attributed to bad weather, whereas in
the United States 70~75% of delay is due to bad weather [28].
2.3.2 Neural Networks
A neural network is typically referred to as a “black box” model that can be used to
predict departure delay from a set of input factors. The parameters of a neural network
model are not easily interpretable, and thus it is difficult to use a neural network model to
gain a comprehensible understanding of how the factors interact to cause delay. Dai and
19
Liou [29] developed an artificial neural network model to estimate individual flight
departure delay for the application of real time air traffic flow management. The network
incorporated 70 nodes in the hidden layer and was shown to outperform linear and non-
linear regression methods with their chosen dataset. The primary factors influencing
delay in this study were airline, aircraft type, time of day, day of week, route, flight
sequence and traffic flow.
Jehlen et al [30] developed a neural network model for predicting weather-related
aircraft delays and cancellations at the national, regional, and airport levels. The network
proved to slightly improve on traditional linear regression methods for predicting
airspace metrics such as total aggregate delay, arrival delay, airborne delay, and flight
cancellations at different scales; however, the lack of generalization that a neural network
provides to understand causal delay interactions for wide-application stakeholder use is
still absent.
2.3.3 Hidden Markov Models
HMM models a first-order Markov process where the observation state is a
probabilistic function of an underlying stochastic process that produces the sequence of
observations. The underlying stochastic process cannot be observed directly, it is hidden.
Both the hidden and observation states are modeled by discrete random variables as
shown in Neogi’s work where he and his colleagues used HMMs to detect mode changes
in aircraft flight data for conflict resolution [31].
The HMM formalism first appeared in several statistical papers in the mid-1960s,
but it took over ten years before its utility was recognized. Initially, the use of HMMs
20
was a great success, especially in the fields of automatic speech recognition (ASR) and
bio-sequence analysis. Because of its success, the use of HMMs in ASR is still dominant
nowadays, despite its lack of consistent performance [32].
One of the main problems of HMMs is the fact that the hidden state is represented
by a single discrete random variable. DBNs are able to break down the state of a complex
system into its constituent variables, taking advantage of the sparseness in the temporal
probability model. This can result in exponentially fewer parameters. The effect is that
using a DBN can lead to fewer space requirements for the model, less expensive
inference and easier learning.
2.3.4 Kalman Filters
A KFM is a HMM with conditional linear Gaussian distributions [33]. It is
generally used to solve uncertainty in linear dynamic systems. The KFM formalism first
appeared in papers in the 1960s [34], and was successfully used for the first time in
NASA’s Apollo program. Nowadays, it is still used in a wide range of applications. The
KFM formalism assumes the dynamic system is jointly Gaussian. This means the belief
state must be unimodal, which is inappropriate for many problems. The main advantage
of using a DBN over a KFM is that the DBN can use arbitrary probability distributions
instead of a single multivariate Gaussian distribution.
In application, Reference [12] reported on real data testing of a real-time freeway
traffic state estimator, with a particular focus on its adaptive capabilities. The pursued
method to the real-time adaptive estimation of the complete traffic state in freeway
stretches or networks is based on stochastic macroscopic traffic flow modeling and an
21
extended Kalman filter. Advantages are demonstrated via suitable real data testing. The
achieved testing results are both acceptable and promising for succeeding applications
but the author specifically mentions the lack of generalizability constraints when working
with Kalman filters- which DBNs compensate for. Other research efforts [36] and [37]
use Kalman filters to estimate time of arrival based on a trajectory prediction technology.
2.3.5 Bayesian Networks
Bayesian networks have been applied to various scenarios within the air traffic
domain because of their ability to provide approximate models for complex, and/or
poorly understood problems. Pepper, Mills, and Wolcik [38] presented a method of
accounting for uncertain weather information at the time of traffic flow management
(TFM) decisions, based on Bayesian decision networks. They found that the data from
past TFM events was not sufficient to distinguish between strategic TFM decisions, in
terms of metrics based on overall delays, cancellations, diversions, and departure
backlogs. However, the results did show that useful information can be extracted from
data on past TFM events by focusing on specific elements of the strategic TFM process
rather than the entire process comprehensively. From this research, it was imperative that
both tactical and strategic levels of TFM were considered in the proposed model.
Ning et al [39] used Bayesian networks to estimate delay with a focus on
investigating and quantifying how flight delays from a single airport propagate to impact
other airports. Specifically, their methodology combined multiple individual-airport
Bayesian network models into a system-level model capable of representing interactions
between airports. Their study demonstrated that integrating human judgment with
22
statistical analysis in structure construction and parameter estimation can improve
prediction accuracy. To simplify their calculation, the model only takes into account
weather effects and flight cancellations. Their model didn’t take into account many
factors which can affect delay such as demand, en route variables, and aircraft type (to
name a few)- which are accounted for in this study.
Liu and Ma [40] developed a flight-delay and delay propagation model based on
Bayesian networks. They trained the network with real data using the Expectation
Maximization (EM) algorithm and analyzed the influences from delay under different
states.
2.3.6 Dynamic Bayesian Networks
A BN is useful for problem domains where the state of the world is static. In such a
world, every variable has a single and fixed value. Unfortunately, this assumption of a
static world is not always sufficient. A dynamic Bayesian network (DBN), which is a BN
extended with a time dimension, can be used to model dynamic systems [41]. While there
was no identified research on DBNs within the specific scope of this research, the author
chose DBNs due to their successful applications in other fields specifically in creating
prognostic decision support systems for medical diagnosis of diseases as shown by [42-
44]. These researchers provide the needed proof to show that DBNs have become the
representation of choice because they embody a good tradeoff between expressiveness
and tractability. Figure 2-1 depicts the benefits of DBNs from both a knowledge
representation and reasoning perspective. Through its structure and its parameters, a
DBN comprehensively describes what is known about a particular domain and aims to
23
establish the interactions of all the variables contained within that domain. As such, a
DBN can be referred to as a “Portable Knowledge Format” that can succinctly and
compactly communicate the state of the domain as well as its dynamics over time.
Figure 2- 1: Benefits of DBN for Decision Support2
2.4 Literature Summary
A review on both trajectory prediction and flight delay research has been explored in
this literature review regarding the prediction of flight delay in combination or
independent of trajectory based operations uncertainty. The DBN formalism in this
research is the first development in temporal reasoning under uncertainty for the defined
scope of this research. Literature has shown that DBNs can have some significant
2 Figure taken from: bayesia.com
24
advantages over the aforementioned algorithms. In terms of state-space models, HMMs
and KFMs are really limited in their expressive power. In fact, it is not even correct to
call HMMs and KFMs other techniques, because the DBN formalism can be seen as a
generalization of both HMMs and KFM and can be iteratively updated with the
incorporation of data sources and subject matter experts in the field as will be described
in succeeding sections.
25
Chapter 3: Development of Flight Delay Model
3.1 Overview
The ensuing chapter describes the steps towards the design and implementation of
an aircraft flight delay model; a DBN for aircraft flight delay prediction and the
associated causal delay factors. A general overview of the background knowledge
required and the methodology for the DBN are shown in Figure 3-1.
Figure 3-1: Background and overview of steps for modeling delay prediction with a
DBN
3.2 Background Knowledge
To understand the advantages of using DBNs as the formal basis for prediction of
flight delay, it is important to first establish a formal definition of flight delay. According
to the FAA, a flight can be considered as delayed if the operation takes place 15 minutes
after scheduled pushback [45]. In this work, the author adopts the definition of [46] [47]
and defines delay as the time difference between real and scheduled departure and arrival
time.
26
3.2.1 Actor Interactions in the National Airspace System
To develop a robust delay model, it is imperative to first understand the actors
that interact in the National Airspace System (NAS) and the time horizons in which
decisions are required. Figure 3-2 depicts an abstract view of the model interactions
between an aircraft (flight crew) and two ATDMs (traffic flow management and air
traffic control) in terms of how ATDMs make decisions about flight planning, and a
decision model of how a flight crew responds. Specifically, as the aircraft flies from one
state to the next, the factors that typically affect where the aircraft will be in the next state
are the current flight plan said aircraft is following, current weather conditions that may
affect the lateral path, and other delay risk factors occurring either en-route or at the
arriving airport as noted in Section 1.4. The goal is to predict the duration of flight time
delay as the optimal minimization factor in order to provide the basis to change the
aircraft’s route in real-time which is referred to in Figure 3-2 as “NAS Treatments.” Used
as an example, if an aircraft is flying from airport A to airport B and no risk factors are
triggered, then the aircraft should get to its destination on the same flight plan route it
departed from; however, if an aircraft is flying from airport A to airport B and weather
requires the aircraft to change its route path, this research recommends which route an
aircraft should fly given both historical and real-time flight information. This suggests
that the intent under which each actor operates must be known and the DBN model is
used to quantify this intent and continuously update it based on new information.
27
Figure 3-2: An abstract representation of decision making
The actors who affect the way a flight is planned and executed as defined in [1], are
listed below with their respective primary functions:
Flight Crew (FC): has ultimate control and responsibility for the safe operation of
the aircraft;
Air Traffic Control (ATC): provides a safe, orderly, and expeditious flow of traffic
on a first-come, first served basis- often operating in the tactical decision space (< 2
hours look-ahead time);
Traffic Flow Management (TFM): balances air traffic demand with system
capacity to ensure the maximum utilization of the National Airspace System
(NAS) often operating in the strategic decision space (2-15 hour look-ahead time).
3.2.2 Delay Prediction Horizons and Classification Thresholds
For this study, four different prediction horizons were analyzed: 2, 4, 6, and 24
28
hours for delay prediction in the strategic planning phase which, if accurate, benefits all
mentioned actors. In other words, the prediction horizon denotes the predicted delay after
2, 4, 6, and 24 hours from the initial time3. Additionally, a classification threshold
prediction mechanism was established, where the output is a binary prediction of whether
the delay is more or less than a predefined threshold. This study tests four delay
classification thresholds: 0-30, 30-60, 60-90, and > 90 minutes.
3.2.3 Dynamic Bayesian Networks
DBNs expand on conventional Bayesian networks because they offer the ability to
represent the temporal nature of a process or system well. Additionally, the DBN model
provides the ability to learn from statistical data, relevant literature, and operational
expertise, while also providing a causal approach to modeling.
According to [48], Bayesian networks represent the state of certain phenomena at an
instant in time. A Bayesian network B = (G,P) is a pair where G is an directed acyclic
graph (DAG), with nodes corresponding to a set of random variables X, and P is a joint
probability distribution (JPD) of variables in X, which factorizes to:
Where π(X) are the parents of X in G. A JPD representation by a Bayesian network
typically decreases the number of parameters that are needed for estimation and
ultimately enables efficient probabilistic inference. However, in many applications, the
goal is to represent the temporal evolution of a certain process, that is, how the different
system variables evolve with time(t) or event, by reasoning over random processes X =
3 Initial time is established for this research to be at 6am Eastern Time since commercial traffic activity
throughout the NAS is at its lowest volume in the hours preceding.
P(𝑿) = ∏ 𝐏(𝐗 |𝝅(𝑿))𝒙∈𝑿
(1)
29
{X(t) : t Ɛ T}, instead of random variables. Extensions of BNs to model these processes
are called dynamic Bayesian networks (DBNs) [15]. DBNs assume that the Markov
property holds, which states that the future is independent of the past, given the present;
therefore, the following factorization is obtained:
Where X (t) = {X (t): X Ɛ X}.
Given a potentially infinite time horizon, the specification of a discrete-time DBN
may be prohibitive due to data scaling concerns. In order to allow for a compact
specification the following assumptions regarding DBNs are generally made:
The DBN is first-order Markovian:
such that the future is independent of the past given the present time.
The DBN is time-invariant:
such that the same independence relations hold at each point in time for U, V, ⊆ X and t,
u, s, t + c, u + c, s + c, Ɛ T.
(2)
𝑿(𝒕 + 𝟏) ⫫ 𝒑 𝑿(𝒕 − 𝟏)| 𝑿(𝒕) (3)
𝑼(𝒕) ⫫ 𝒑𝑽(𝒖) |𝑾(𝒔) ⇔ 𝑼(𝒕 + 𝒄) ⫫ 𝒑𝐕(𝐮 + 𝐜)| 𝐖(𝐬 + 𝐜) (4)
30
The DBN is homogeneous:
such that transition probabilities are fixed for U, V ⊆ X and t, t’, t + c, t’ + c Ɛ T. In other
words, As the DBN goes from one state to another; structure of the DBN remains the
same from start to end.
Given these assumptions for each temporal slice, a dependency structure between
the variables specifying the initial distribution of the joint process can be developed,
called the prior model. It is usually assumed that this structure is duplicated for all the
temporal slices (except the first slice, which can be different). Additionally, there are
edges between variables from different slices specifying how the process evolves as time
goes from t to t + 1 for t Ɛ {1,2,…}, which defines the transition model. In this model,
variables at time t are depicted by dashed objects, while variables at time t + 1 are
depicted by solid objects. The temporal foundation in application is depicted by the
choice of the prior and transition model, while causal knowledge, such as the belief that a
traffic flow management initiatives (TFMI) causes an air traffic directive (ATD) (i.e. air
traffic controller command to pilot), and the influence of a Traffic Flow Management
initiative (TFMI) and ATDs on aircraft flight delay time (i.e. delayed >30min, >60min,
etc.) is captured as well. Figure 3-3 depicts an abstract example of a DBN, where the
influences between aircraft flight time delay, TFMIs, and ATDs are depicted by a prior
transitional DBN model. For example, if Traffic Flow Manager sets a NAS initiative to
delay aircraft on the ground and/or in the air due to en-route weather, this gets forwarded
to air traffic facilities and the air traffic controller takes necessary action by providing
𝑷(𝑼(𝒕 + 𝒄)|𝑽(𝒕)) = 𝑷(𝒕′ + 𝒄)|𝑽(𝒕′)) (5)
31
directives to slow down or divert aircraft off their intended lateral flight plan path. This in
turn, creates a flight delay for the aircraft going from airport A to airport B. Figure 3-3
aims to depict this abstract process in order to introduce how a DBN which treats this
scenario from one state to the next.
Figure 3-3: Abstract example of DBN that represents the influences between NAS actors
In this study, the author extends the prior-transition DBN standard, and develops an
extended formalism (see Section 3.5) that provides more modeling power and improves
performance in terms of execution time and memory usage.
3.3 Problem Domain
Prior to the development of an effective DBN, it is necessary to formulate a concise
and explicit problem description. It is also essential to constrain the domain of the
problem in order to control under which conditions the model may be applicable.
The primary objective in this research is improving prediction support in the NAS as
it pertains to aircraft flight delay and route planning. A system such as the NAS may
include software, hardware, people, information, physical infrastructure, services, and
32
other system support items [2]. Figure 3-4, depicts the developed system hierarchy that
breaks down the NAS for the problem domain.
Figure 3-4: Problem domain system hierarchy
The following are definitions extracted from [2] for succeeding levels within the
system/subsystem hierarchy taken, as well as the specific entity description used for this
research purposes. Keep in mind, these are assumptions used for the particulars of the
research at hand and can be altered based on the overall scope of the analysis. For
example, if a more granular analysis is required, the main system could in fact be the
airport and subsystem being the all elements specific to said airport.
System- An integrated set of constituent parts that are combined in an
operational or support environment to accomplish a defined objective.
These parts include people, hardware, software, firmware,
33
information, procedures, facilities, services, and other support facets.
o Description: The NAS is the higher level system in our
empirical scenario.
Subsystem. A system in and of itself (reference the system definition)
contained within a higher level system. The functionality of a
subsystem contributes to the overall functionality of the higher level
system. The scope of a subsystem’s functionality is less than the scope
of functionality contained in the higher level system.
o Description: An airport or set of airports are the subsystems,
since by definition, airports are “less than” the scope of the
NAS.
Element. An integrated set of components that comprise a defined part
of a subsystem.
o Description: Since the primary focus of this objective is to
provide prediction support, our elements include the type of
elements that we are interested in predicting (e.g. flight delay
prediction, traffic flow management prediction, and airport
capacity prediction).
Component. Composed of multiple parts; a clearly identified part of
the product being designed or produced.
o Description: In order to predict the element flight delay (for
example), we will need to identify multiple nodes or attributes
that have causal relationships. In this case, since a DBN was
34
used, the author used this level for the multiple nodes for each
element.
Part. The lowest levels of separately identifiable items within a
system—are not normally subject to disassembly without destruction
or impairment of designed use.
o Description: At the lowest level, the time dimension will be
used as segmentation for the NAS. Since the main actor for
prediction is based on the aircraft, time can be broken out into
parts to provide prediction at a particular phase of flight (e.g.
ground departure, ascent, cruise, descent, ground arrival).
Uncertainty in this study is characterized based on behaviors in a population of
flights with the same origin and destination. The uncertainty associated with the delay
variables is roughly a function of the data that is being used to produce the delay and
route prediction. Table 3- 1 lists each category of variables that are utilized for the model
separated by fixed and temporal variables. Fixed data are categories of variables that have
only one value over the duration of the prediction horizon. Temporal data are categories
of variables having a value for each prediction horizon i.
Table 3- 1: DBN Fixed and Temporal Categories of Input Variables
Fixed Categories
Code Wording
AcChar Aircraft Characteristics (e.g. model type, airline)
CityP Multiple origin to single arriving or departing airport
35
Fixed Categories
Code Wording
Season Season (day-of-week, month-of-year)
DepGD Departure Ground Delay time
AirbD Airborne Delay time
ArrGD Arrival Ground Delay time
CdDepGD Causal departure ground delay factors
CdAirbGD Causal airborne ground delay factors
CdArrGD Causal arrival ground delay factors
DTResult Delay time prediction
Temporal Categories
Code Wording
SchTra Scheduled Traffic at time i
LatPthi Lateral path of ATC sectors traversed at time i
DepGDi Departure ground delay at time i
AirbGDi Airborne delay at time i
ArrGDi Arrival ground delay at time i
CdDepGDi Causal ground departure delay factors at time i
CdAirbGDi Causal airborne delay factors at time i
CdArrGDi Causal ground arrival delay factors at time i
DTResulti Delay classification threshold (0-30min,30-60min,60-90min,
>90min) prediction probability at time i
3.4 Data Processing
Previous research into delay prediction [49] [50] used the FAA’s Aviation System
Performance Metrics (ASPM) database for input data to provide the delay picture. While
ASPM provides detailed data on flights to and from airports, it lacks robustness as it only
provides this data for 77 airports, 22 carriers, and some VFR (visual flight rules) traffic.
For this study, a more robust data source was utilized by using aircraft radar track data
from MITRE’s Center for Advanced Aviation and System Development (CAASD),
otherwise known as Threaded Track. Threaded Track fuses a range of radar position
coordinates (lat, lon) throughout the flight into a single synthetic trajectory by applying a
36
series of noise attenuation algorithms [51]. These sources include the National Offload
Program (NOP), Airport Surface Detection Equipment System (ASDE-X) and Enhanced
Traffic Management System (ETMS) data.
ETMS provides the lowest quality position source updating at approximately one
minute intervals and is utilized only to fill gaps. NOP data used within Threaded Track has
three different formats: NOP-Center which provides position reports during the En Route
phase of flight; NOP- Automated Radar Terminal System (ARTS) and NOP-Standard
Terminal Automation Replacement System (STARS) contain Terminal Radar Approach
Control (TRACON) position returns for the flights with those specific automation systems;
and ASDE-X data provides one second update rate positions on the airport surface and in
the immediate area around the airport.
For this study, the author leveraged and built on a MITRE developed data analysis
project that centers on the fusion and post-processing of the threaded tracks with relevant
external data sources. Although ASPM was one of the sources used to provide the flight
delay story integrated with threaded track, an algorithm needed to be developed to fill in the
gaps. The author developed a ‘phase-of-flight’ post-processing algorithm that takes the time
series points of threaded tracks, partitions phase of flights based on the radar source and
aircraft horizontal or vertical characteristics, and tags the phase of flight from beginning to
end. Figure 3-5 depicts an example of how Threaded Track stitches sources of aircraft
position data to provide an accurate single-source gate-to-gate record of the position of the
flight along with a visual depiction of how the phase of flight post-processing algorithm
would partition and tag the data for each flight segment.
37
Figure 3-5: Threaded Track gate-to-gate flight data sources used for each phase of flight.
The data processing steps performed to create threaded track are depicted in Figure 3-
6 and are thoroughly described in the following sub-sections.
38
Figure 3-6: Data Processing Steps.
39
3.4.1 Data Segmentation
The NOP and ASDE-X data source are stored in a text format with one row per
radar return. Although a track identification (OD) column is present in each of the data
sources, the ID values are recycled within each air traffic facility and therefore do not
uniquely identify with a track. The segmentation process groups related radar returns into
segments, and assigns a unique segment ID to each group of returns. This process is
designed to avoid merging two flights whenever possible, and minimize the possibly of
splitting a single flight into multiple segments.
This process uses different criteria for assigning points to a segment depending on
the data source. The process begins by grouping the returns by air traffic facility, date,
and source-assigned track ID. The groups of points are then sorted by ascending time.
After the points are grouped and sorted, the segmentation criteria are applied to each
point in turn, and points within a segment are assigned the same segment ID. The
segmentation criteria are specified by Equations 6-9.
Equation 6 is used in the segmentation logic to ensure that two successive points
in a segment are temporally close. The longer update period in NOP en-route data
requires a looser time-bound between successive points.
(6)
(7)
40
Equation 7 is the lateral distance check which was developed to ensure that two
successive points are within a reasonable distance of one-another. Successive points that
fail the distance check occur most often when a track ID is recycled by a tracker.
Equation 8 is the flight information check used for NOP en-route records. This
was developed because the computer ID is commonly duplicated among tracks within an
air traffic facility. The flight information check was developed to use the beacon code
and aircraft call-sign information along with the computer ID to group points together.
Any two successive points in a segment must agree on at least two of the three fields.
Equation 9 describes the rules used to assign pairs of successive points to a
segment. For ASDE-X data records, successive points that share a track ID are
considered to belong to the same segment if they have the same Mode-S value, and pass
the time check. NOP, STARS, and ARTS records must pass the lateral distance and time
checks, NOP en-route records must additionally pass the flight information check.
(8)
(9)
41
The segmentation process is implemented as four jobs, one for each data source.
This process utilizes a distributed computing software framework called MapReduce.
The Map Phase of each job us used to perform the grouping and sorting of radar returns.
The Reduce Phase implements the segmentation criteria outlined above. The MapReduce
process will be discussed more extensively in Section 3.7.1.
3.4.2 Segment Metadata
After the raw data is segmented, the author developed a metadata collection
process that builds the segment level metadata to better understand the characteristics of a
segment. This information is subsequently used by the Fusion Process to connect the
segments to build the basic Flight Metadata. This process builds and collects information
including the flight start time and flight end time for each segment. Other metrics built
will be discussed in Section 3.4.5. An even deeper dive into these metrics can be viewed
in the data schema in Appendix B.
3.4.3 Data Fusion Process
The fusion process is designed to take one track, recorded by two separate air
traffic facilities and merge those tracks into one track that crosses between multiple air
traffic facilities. This process may potentially need to examine the entire collection of
data segments of the applicable time window. In order to reduce the associated magnitude
of data that would need to be examined; only the per-segment metadata is utilized in this
process. The metadata is an incomplete view of the segments- it contains only high-level
attributes such as: aircraft identifier, airline, departure and arrival airports, and the
42
bounding values of time, location, altitude, and speeds. The fusion process is designed to
fit between the segmentation process (which only aggregates clearly-defined, time-
contiguous radar data that corresponds to a flight and radar sensor) and a smoothing
process (which examines all track data available per-flight), and may therefore decide to
split a previously fused flight). Therefore, fusion is designed to reduce false negatives at
the expense of false positives, thus split flights will never subsequently be reconsidered
for merging by the smoothing process.
Fusion considers two primary attributes above all others- the window of time
associated with a segment, and the set of aircraft identification metrics associated with a
set of segments that create a fused track. The time window is based on the notion that
different radar sensors will generally overlap in coverage of a flight as time progresses;
overlapping time windows (within a reasonable quantum at the ends of the segment to
allow for the radar sweep rate and possibility of missing a few data points) imply that two
segments may represent the same flight. In addition, aircraft IDs, for the most part, are
highly consistent through the evolution of a flight. This means that usually, such an ID
can be used successfully to join all segments for a flight as long as the time window
constraints can be observed. There are a small percentage of flights where IDs are
inconsistent; this occurs because IDs have been abbreviated or misspelled as part of a
manual data entry process along the way. These flights with multiple IDs are
recognizable because of segments where multiple IDs or other identifying metadata
appear in single segments that can be used to join segments with different IDs.
This fusion process is typically difficult to process in a parallel computing
environment; however, due to the vast volume of data, a parallel computing environment
43
is required to complete the process in a timely manner. For this reason, the utilization of a
single flight ID allows for an opportunity to parallelize the problem to a per flight
process. The author developed method to handle processing is described by the following
algorithm:
1. Load the segment metadata, which comes from multiple sources (NOP, ASDE-X,
and ETMS)
2. Group the data by aircraft ID, for sets of segments where such IDs are
unambiguous keys for fusing flights, and create a separate group of ambiguous
cases.
3. Sort the data from each group by time and stream it into a Java program that
processes the segment metadata and emits pairs of uniquely-generated flight IDs
and segment IDs for the next step of processing.
The data fusion algorithm expects its data as a time-sorted sequence of comma-
separated value records that represent the segment metadata. By processing these records
in a temporal order, they can be fused into flights by examining only records that fit
within a time window that corresponds to the longest segment duration plus the time
quantum. As records expire from this window, they probably do not overlap any
subsequent records, and their corresponding flight data can therefore be omitted. The
records that lie within the current time window are indexed by all metadata attributes
(aircraft ID, airline code, airports, facility ID, etc.) that can be used to match records to,
or exclude records from, flights. These indices permit very fast matching of the limited
44
set of data in memory at any point in time.
3.4.4 Track Smoothing and Filtering
Each facility’s surveillance data offers differing quality, availability, and
coverage. This final step creates a synthesized track by smoothing and weighting the
contributions from each data source. Further explained, various sensors are integrated by
first computing a smoothed trajectory from each data source. Since the Threaded Track is
built off historical data sets, least squares smoothing filters have been shown to create
better trajectory estimates than those in used tracking systems which are subject to an
inherent measurement lag from aircraft accelerations [52]. These filters also provide
derived parameters from the raw trajectory such as speed, heading, climb gradient, etc.
Each radar sensor’s continuous derived track is then integrated into a single Threaded
Track using a weighted average based on the underlying accuracies in each source’s
sensors and data quality.
3.4.5 Flight Metadata
The Flight Metadata process unifies merged segment’s flight information into a
single summarized flight record. This output contains all of the relevant metrics available
from the source data in addition to providing links to external data sources such as
ASPM’s flight delay database. Figure 3-7 depicts the workflow schematic of how flight
delay information was integrated in the data automation workflow. As discussed
previously, the process utilizes NOP, ASDE-X, and ETMS segmentation metadata and
smooth track data information to generate algorithms (TrajectoryFusion, PhasesOfFlight)
45
which in turn generate flight metrics (i.e. Threaded Flight, Phases of Flight). External
flight delay data sources (as shown) were integrated into the data workflow process for
model development discussed in succeeding sections. Appendix A depicts the input,
process and output considerations for the complete list of data fused algorithms tested in
this research. In addition, Appendix B depicts the data schema of all of the variables used
in model development testing.
Figure 3-7: Data Fusion Workflow
3.4.6 Data Quality
Due to anomalies in the data, a flight can end up reporting multiple call signs,
departure/arrival airports, and aircraft types. To solve this issue, the Flight Metadata
process ranks each type of information on the number of times it appears in a single
flight. The highest scoring information is considered as the best guess. The Flight
46
Metadata process also preserves low scoring entries for later improvements and analysis
purpose.
3.5 DBN Formalism Extensions
As stated in Section 3.2.3, the standard for formulating the structure of a DBN is
typically modeled using a prior and transition, assuming a first-order Markov process,
time-invariance, and homogeneity. Unfortunately, to robustly model and infer aircraft
flight time delay in the presence of uncertainty requires extensions for a kth-order
Markov process, where in Murphy’s formalism [53] - this is not possible. Another issue
with the previous formalism is when unrolling the network for inference, every node is
copied to every time-slice, even if it has a constant value for all time-slices. Lastly,
although it is possible to introduce a different initial state using the previous method, it is
not possible to define a different ending state, which can be useful for modeling variables
that are only interesting after the end of the process. These three observations form the
basis of extensions to the DBN formalism.
To offset these constraints, the author applied a formalism extension consisting of
five components: (1) Temporal arcs, (2) Temporal plate, (3) Contemporal nodes (C), (4)
Anchor nodes (A), and (5) Terminal nodes (T), as shown in Figure 3- 8. A temporal arc is
an arc between a parent node and a child node with an index that denotes the temporal
order. The benefits of temporal arcs are that they provide a more comprehensible
visualization and allow for a much easier DBN specification that requires less coding.
The temporal plate is the area of the DBN definition that holds the temporal information
of the network. Specifically, it contains the variables that develop over time (and are
47
going to be unrolled for inference) and it has an index that denotes the sequence length T
of the dynamic process. The benefits of a temporal plate have the effect that regardless of
how many time-slices the DBN is unrolled to, the nodes outside the temporal plate are
unique.
Figure 3- 8: The five components of the DBN extended formalism
This is useful for the next component, contemporal nodes, which are nodes outside the
temporal plate whose values remain the same over time. For instance, if an ATDM is
seeking information on a specific aircraft type (e.g. an Airbus A320 – aircraft type does
not vary over a flight) they would specify this in the contemporal node which saves
memory and computational time. Lastly, anchor and terminal nodes are nodes located
outside the temporal plate that have one or more children inside the temporal plate, and if
48
unrolled for inference, these nodes are only connected to the first and last time-slice,
respectively. These nodes are useful for situations where it would be useful to introduce
extra variables before the start or after the end of the process that do not need to be
copied for every time-slice. These nodes are of vital importance for the DBN in this study
since they were used to extend the DBN formalism in a way that works for the NAS
system. Additionally, they were used as a guideline to develop an efficient DBN structure
for the flight delay prediction model.
3.6 DBN Structure Derivation
The development of a dynamic Bayesian network structure can be a demanding
undertaking. The initial specification of network structure is a challenging task, and the
best heuristic is to keep it concise. Concise models can incrementally be expanded to
more detailed and complex models by adding detail to the network via a node and
evaluating the functionality of that node. Starting with complex models typically makes it
unmanageable to evaluate functionality, since distant variables may interact in complex
ways [54].
Construction of the DBN structure commenced with the identification of factors that
had a direct influence on aircraft flight delay. This is driven by the fact that flight delay
has an extensive impact on how ATDMs respond to dissimilar situations of operational
and environmental uncertainty. The key causal factors that directly influence delay are
discriminated into the following categories according to their phase of flight: ground
departure causal delay factors, airborne causal delay factors, and ground arrival causal
delay factors. Using ground departure causal delays as an example, variables in this
49
category include: runway configuration, weather, traffic interactions, traffic restrictions,
and runway queue position. See Appendix A & B for an in depth explanation of both the
input and output considerations that went into building said variables as well as the data
schema which is the end product variables developed from both the fused data sources
and developed algorithms. The presented model was developed incrementally using a
combination of domain literature, expert knowledge, and regression analysis. Figure 3-9
depicts how the DBN model carries out the task of predicting delay time and causal delay
factors using the extended formalism.
Figure 3-9: Extended formalism for a second-order DBN.
The present model is an example of a second-order DBN using the extended formalism
discussed in Section 3.5 In other words, the variables that have a red arrow with the
number two in the box, means that the model predicts flight delay best when the previous
two instances are taken into account. The anchor and contemporal nodes are placed
outside the temporal plate (squared dashed line). The temporal plate denotes that the
50
DBN will be unrolled for t = 4 time-slices. In this graph, the nodes that are grey can be
fully observed and the nodes in white contain missing values.
3.7 DBN Parameter Learning
After obtaining the DBN structure, parameters were learned from the fused
threaded track dataset using the Expectation Maximization (EM) algorithm. EM is an
iterative algorithm that enables learning models from data with missing and/or latent
variables. The EM algorithm consists of an expectation step (E step) and a maximization
step (M step). In the E step, the probabilities of the missing variables are calculated given
the observed variables and the current values of the parameters (sufficient statistics are
computed). In the M step, the parameters are recomputed using the filled-in values as if
they were observed values. The process of filling-in the missing values and updating the
parameters is iterated until convergence. The different variants used for learning
parameters in Bayesian networks from both complete and incomplete data are discussed
more extensively in [55].
While the EM algorithm generally works well in estimating missing and/or latent
parameters in probabilistic graphical models, two problems for DBN parameter learning
using EM still exist. First, applying the EM algorithm to learn DBN parameters is often
subject to local optima and prone to premature convergence which could ultimately lead
to poor solution quality. To mitigate this problem, the author applied the Age-Layered
Expectation Maximization (ALEM) method [56], which is primarily based on the genetic
algorithm concept of creating and computing with a population of randomly initialized
entities (See Section 3.7.2). Second, as data size increases, learning time of conventional
sequential learning becomes intractable. To mitigate this problem, the author applied the
51
ALEM algorithm on the MapReduce distributed computing framework.
3.7.1 MapReduce for Massive Scale Distributed Computations
MapReduce is a programming framework for distributed computing on massive
data sets which was introduced by Google in 2004. It is a paradigm that allows users to
create parallel applications while hiding the details of data distribution, load balancing,
and fault tolerance [57]. MapReduce requires decomposition of an algorithm into map
and reduce steps. In the map phase, the input data are split into blocks and processed as a
set of input key-value pairs in parallel by multiple mappers. Each mapper applies to each
assigned datum a user-specified map function and produces as its output a set of
intermediate key-value pairs. Then the values with the same key are grouped together
(the sort and shuffle phase) and passed on to a reducer, which merges the values
belonging to the same key according to a user-defined reduce function.
Hadoop, an implementation of MapReduce, provides a framework for distributing
the data and user-specified MapReduce jobs across a large number of cluster nodes. It is
based on the master/slave architecture. The single master server (jobtracker), receives a
job assignment from the user, distributes the map and reduces tasks to slave nodes
(tasktrackers) and monitors their progress. Storage and distribution of data to slave nodes
is handled by the Hadoop Distributed File System (HDFS). A Hadoop node might denote
a tasktracker or jobtracker machine. A map task describes the work executed by a mapper
on one input split. A reduce task processes records with the same intermediate key. A
mapper/reducer might be assigned multiple map/reduce tasks. To learn the parameters
needed to support this research, Hadoop was run on the MITRE cluster – a continuously
52
growing cluster consisting of both a North and South configuration with a total of 129
data nodes, 1,630 mappers, 732 reducers, and over 2 petabytes of storage capacity that do
all the work.
3.7.2 ALEM on MapReduce (ALEMMR)
The ALEM algorithm is based on the genetic algorithm concept of creating and
computing with a population of randomly initialized entities [56]. Each entity has a
fitness, which is to be optimized, as well as an age corresponding to the amount of time
the entity has been in a population [58]. Entities are separated in layers with other entities
of like ages. Lower layers have young entities in the genetic algorithm, while higher
layers have the oldest member of the population. As entities age, they ascend to high
layers. The maximum age of each layer is determined by the age gap parameter; once
entities reach this age, they ascend to the next layer. Additionally, there are limits to the
maximum number of entities per layer. The age-layered structure reduces the possibility
of fit, old entities, stuck in local optima, overtaking the population due to their high
fitness.
In ALEM, a population of EM runs is created and updated [56]. The age of each
EM run relates to its number of iterations, and the fitness of each EM run is its likelihood.
EM runs are randomly initialized in the first layer, iterate until an age where they ascend
to the next layer, and may need to compete for a spot in the next layer. Competition
occurs when a layer is full: if an ascending EM run has greater likelihood, the non-
ascending EM run is discarded to make room. Otherwise, the ascending EM run is not
competitive enough and is discarded. ALEM continues until a given number of EM runs
53
successfully converge using a pre-defined convergence criterion and terminates when a
specified number of EM runs converge [58]. Figure 3-10 provides a representative
example of how the use of ALEMMR provides the distributed platform needed to run a
population of airport DBNs simultaneously.
This study adopts the ALEM MapReduce framework developed in [59], and
provides novelty by addressing two important shortfalls of that research: the amount of
evidence available, and how well ALEMMR scales when the population size grows to
thousands or millions. Using ALEMMR, multiple DBNs are processed for each
operation. For population treatment of EM runs, (i.e., DBN parameters) ALEMMR
terminates and starts new DBNs as well as executes the likelihood for the layers that are
changing, as illustrated in Figure 3- 10. More specifically, each mapper in the E-step
performs expectation calculations on a single evidence set and multiple DBN instances.
The reducer then performs maximum likelihood estimation and either begins or ends new
EM runs according to the ALEM layers. The added temporal dimensions unique to DBNs
are managed by time-indexed variables at each observed prediction horizon (2 hours).
Since ALEM operates on a dynamic population structure, the number of EM runs
performed for each MapReduce operation will vary based on pre-defined parameters.
Section 4.2.1 discusses the additional parameters required to run ALEMMR that consider
computational time and a global optima.
54
Figure 3- 10: Example of an ALEM application with populations of DBNs on
MapReduce.
55
Chapter 4: Empirical Experiments
4.1 Experimental Design
For the DBN used in this study, the author was interested in quantifying four research
questions:
1. By using ALEMMR, can we learn parameters that scale with an increasing data
size while addressing the EM local optima problem?
2. What is the recommended prediction horizon and classification threshold for
delay prediction?
3. Does the flight delay prediction model provide accurate prediction results for
delay time and causal variables for each phase of flight greater than 80% of the
time?
4. Can this approach integrate results of the flight delay prediction (if successful)
into a developed real-time trajectory decision support prediction system that
recommends which route an aircraft should fly given both historical and real-time
flight delay information combined with data related to the aircraft and the external
environment? (previously discussed in Section 3.4)
This experiment presents the prediction results of model runs conducted against more
than three years of fused threaded track data that covered the period August 2010-
September 20134. City pairs for all aircraft arriving and departing from the top thirty-five
major airports in the NAS were used- (as depicted in Figure 4- 1) to estimate accuracy;
4 Date range of the fused threaded track uploaded on HDFS
56
the author utilized millions of flight records used for parameter learning to obtain the
predicted beliefs of flight delay time and causal delay variables by using the holdout
method. In this method, the data is randomly partitioned into two independent sets, a
training set and a test set. The training set is then used to develop the model, whose
accuracy is estimated with the test set. 80% of the fused threaded track dataset was for
training, and 20% was used for test data.
Figure 4- 1: Track dispersion using a subset of threaded track historical radar data for
aircraft departing and arriving from the top thirty-five major airports in the NAS.
4.2 Empirical Experiments Overview
This section discusses the results of investigating the three research questions (see
Section 4.1). For the first experiment, the author applied the ALEM MapReduce
algorithm to the NAS-wide airport dataset to quantify both the time (minutes) as the data
set increases and the mean number of iterations until global convergence for DBNs with
varying levels of dataset size and hidden/missing data nodes. For the second experiment,
the author developed a confusion matrix that allows visualization of model performance.
In predictive analytics, a confusion matrix reports the number of false positives, false
57
negatives, true positives, and true negatives. The confusion matrix, by design, allows for
a more comprehensive analysis than the proportion of correct guesses (accuracy) that can
be beneficial for ATDMs. For the third experiment, accuracy results for the delay causal
variables aggregated for each phase of flight are provided to determine the accuracy of
the model to diagnose causes of delay.
4.2.1 Experiment 1: ALEMMR Flight Delay Application
To learn the DBN parameters using ALEM, the author used the following
parameters5: number of layers = 4; age gap = 4; and minimum runs in lowest layer = 4.
Additionally, the convergence tolerance was set to ᵋ =10−4, the maximum number of
iterations was set to 100, and the population size was set to terminate when 15 EM runs
converged.
Given the aforementioned parameters, the first objective was to improve the
solution quality by ensuring a convergence to global optima. To achieve this, the
researcher implemented the airport DBNs on Hadoop using a subset of the training data
(varying size up to 106 track variables) and ran it over the MITRE Hadoop cluster6.
Figure 4-2(a) depicts the mean number of iteration runs till global convergence, taking all
thirty-five major NAS airports into account with the number of missing or hidden nodes
equaling two and four, respectively. Overall, these results are significant for the
application of ALEM algorithm on MapReduce because the author drastically reduced
5 Parameters were set based on algorithmic best practices from prior art. Additional testing will be
performed to ensure optimal parameter settings in future applications. 6 For brevity, discussion and analysis of mappers and other factors related to parallel computing in
MapReduce were not included. Section 3.7.1 details the characteristics of the MITRE Hadoop Cluster
which was optimized for efficient data scaling to process the large problem size (multiple airport DBNs)
and extremely large population (data variables).
58
the average number of iterations from standard EM algorithms that traditionally require
hundreds of iterations till local convergence. This alone would reduce the computational
time required to solve the multi-airport problem significantly by applying this technique.
The author’s second objective was to quantify if DBN parameters can be learned
and scaled for the top thirty-five major airports in the NAS using just over three years of
fused threaded track flight records. It is easily inferred that increasing the size of the
training samples leads to increased training time; however, Figure 4-2(b) depicts the
results of the author’s implementation scaled to a flight data records size of n = 108, as
the population set increases super-linearly (n log n). The results show that when
comparing the sequential with the ALEMMR approach for only one airport, only a slight
speed-up improvement (3.2X) occurs per iteration; however, when the number of airports
increases up to 35, the ALEMMR provides an impressive 13.9x computational
acceleration over the sequential implementation. Future research is recommended to
explore minimizing processing time by applying more resources and/or a new parallel
framework (See Section 5.2.2).
(a)
59
(b)
Figure 4-2: Varying the size of training samples for learning the DBN using the Hadoop
MapReduce computing framework. (a) Depicts the average number of iterations till
convergence using a large set of varied training sample data and hidden/missing nodes.
(b) Depicts the scalability of the ALEM algorithm on MapReduce for an increasing data
set size where the data size (which scales to over 108) is instead supplemented with the
approximate time per iteration for an increasing number of airports. Speed-up of
ALEMMR relative to sequential EM is shown ranging from n = 1 airport to n = 35
airports.
4.2.2 Experiment 2: Varying the Measurement Rate
Four different strategic planning prediction horizons were analyzed: 2, 4, 6, and
24 hours. One would expect the length of the prediction horizon to affect the prediction
performance negatively as the horizon grows. The author also researched the impact of
changes in the arrival delay classification thresholds: 0-30, 30-60, 60-90, and >90
minutes. In other words, for a given prediction horizon, we would like to be able to
predict the delay will be within a given delay interval of time. The distribution result of
the joint probabilities is given by (6):
Where:
𝑃(𝑟𝑒𝑠𝑢𝑙𝑡1:𝑇) = ∏𝑇
𝑡 = 1∏
𝑁𝑖 = 1
𝑃 (𝑟𝑒𝑠𝑢𝑙𝑡𝑖𝑡 | 𝑃𝑎(𝑟𝑒𝑠𝑢𝑙𝑡
𝑖𝑡 )) (6)
60
T is the interval of the prediction horizon
N is the total number of the variables for the extracted model.
This prediction is dynamic, in that it continuously evolves throughout the strategic
prediction horizon by new measurements. After generating many examples, the author
used a test case population which contains over one million flight records for the
performance evaluation of the model. Table 4- 1 depicts the delay classification
confusion matrix for the optimal prediction horizon (two hours) where the sum of the
highlighted green boxes represents correct classification predictions. Overall, prediction
results detailing the flight delay time were reliable approximately 92% of the time (the
sum of the diagonal), which is very encouraging. The remaining 8% were incorrectly
predicted to be delayed in the 0-30 minute delay bin when in actuality 4% , 2%, and 2%
were delayed for 30-60, 60-90, and greater than 90 minutes, respectively. After review,
the majority of the false positives were related to a mixture of anomalies in relation to the
aircraft’s lateral path from airport A to airport B with the departure, airborne, and arrival
delay time.
Table 4- 1: The Classification Threshold for Flight Delay Using a Confusion Matrix and
Prediction Horizon of 2 Hours
Predicted (minutes)
0-30 30-60 60-90 >90
Actual
(minutes)
0-30 80% - - -
30-60 4% 7% - -
60-90 2% - 3% -
>90 2% - - 2%
61
4.2.3 Experiment 3: Causal Delay Prediction Results
After identifying the optimal prediction horizon - which is a two hour look ahead
for accurate delay prediction - the author utilized that information to model performance
for delay causal variables at each phase of flight for the two hour prediction horizon. For
brevity, the delay causes were aggregated based on predicted accuracy of causal variables
for each phase of flight. In other words, for a given phase of a flight, the results depicted
reflect the weighted predictive accuracy of all the causal variables within that phase.
Table 4-2 depicts the delay causal variable prediction results for both the DBN model and
a static BN model for the optimal prediction horizon (two hours). As a specific example,
let’s say an aircraft is traveling from airport A to airport B and is in the “cruise” phase of
flight as defined previously in Section 3.4. If this aircraft were to encounter a causal
airborne (i.e. cruise) delay factor that occurs during this phase of flight, the DBN would
be able to predict the cause of this delay based on end-state flight delay with an accuracy
of 93% as opposed to that of a BN with an accuracy of 84%. Overall, prediction results
for causal delays at each phase of flight show that not only does the DBN collectively
outperform the static BN in predicting delay at a given phase of flight, but with more data
and the fusion of different data sources continuously underway – the depth and accuracy
of results should improve. The reason that DBNs perform better then BNs ultimately
comes down to the temporal dimension added that creates granularity in the data, which
provides more concise results. If one refers back to Figure 3-4 where system hierarchy of
the problem domain is discussed, one can now see that the time slices (considered a
“Part” in the systems hierarchy) hold great importance. The best way to think of this
62
conceptually is to visualize an aircraft going from point A to point B. A BN will
aggregate and tell you that during this route of flight, this aircraft probabilistically
encountered a weather delay. A DBN will slice time into separate parts and provide you
with granular detail, so the same aircraft that encountered a weather delay in a BN may
really show that the principal cause of delay was due to congestion at the departure
airport followed by weather en-route, followed by a hold in the descent phase. The
separate time parts can then be rolled-up to identify the rank ordered list of causal delay.
Table 4- 2: Dynamic Bayesian Network & Static Bayesian Network Prediction Results
for Each Phase of Flight with Prediction Horizon of Two Hours
Variables Bayesian
Network
Dynamic Bayesian
Network
Causal Ground Departure
delay factors at time i 71% 82%
Causal airborne delay
factors at time i 84% 93%
Causal ground arrival delay
factors at time i 76% 85%
4.2.3 Experiment 4: Trajectory Route Selection Decision Support System
The previous experiments have proven that, with high confidence, a flight
prediction model developed using DBNs can be utilized for the prediction of both flight
delay and flight delay causal variables over a series of time states. In addition, the
previous experiments have proven that model development can be scaled out to more
airports then prior research has ever attempted. Specifically, experiment 1 scaled for the
top thirty-five major airports in the NAS. This experiment aimed to integrate results of
the flight delay prediction into a developed real-time trajectory decision support
63
prediction system that recommends which route an aircraft should fly given both
historical and real-time flight delay information combined with data related to the aircraft
and the external environment (previously discussed in Section 3.4).
Figure 4-3 depicts the data-driven decision support architecture that combines
both the elements of flight delay prediction with the recommended aircraft route
selection. There are five main components in the developed framework:
The Data Processing Model collects and trims both the real-time and
historical data such as threaded track weather data, delay data, flight data,
track data, and other subsets of data (defined in Appendix B). This
provides both the online parameter estimation and data update components
with the required input data.
The Online Parameter Estimation Component estimates data parameters
by calculating both historical data and real-time conditions to improve
adaptability of the DBN model.
The DBN Model, as the kernel of the framework, calculates aircraft route
selection according to the DBN model and the results of the online
parameter estimation. The model also sends data to the Data Update
Component.
The Data Update Component updates the route and all associated
historical data (priori estimate) with real-time data.
The Results Computation Component outputs flight delay predictions,
recommended routes, and the computed trajectories for a given airport
64
arrival-departure pair input using a point-mass mathematical model.7
Figure 4-3: Recommended Data-Driven Decision Support Architecture
The structure of the DBN for route selection was devised by the efforts of the
author’s domain knowledge in regards to modeling with DBNs and subject matter expert
feedback from primary air traffic actors such as traffic flow managers, pilots, and air
traffic controllers. The DBN model represents an aircraft flying in the NAS for a time
7 Details on the point-mass BADA aircraft performance mathematical model can be reviewed at the
following link: https://www.eurocontrol.int/sites/default/files/field_tabs/content/documents/sesar/bada-revision-atmosphere-model-2010.pdf
65
granularity of five minutes that is able to infer flight delay time, and route selection. The
author chose a time-step of five minutes because this is a time period that is short enough
to capture interesting dynamics of an aircraft, but long enough to capture values of the
variables that are sought (i.e. route selection, etc.).
Figure 4-4 depicts the lateral path of the recommended routes when scaled to all
the airports used in this study. More specifically, each delay state depicts the
recommended route for aircraft departing and arriving from all the airports in the study
given the integration of both flight delay data and other data mentioned in Figure 4-3 and
defined in Appendix B. For example, NAS delay state one is a scenario that was run on a
relatively calm day based on the flight delay heat map which was ultimately derived from
the efforts of the previous experiments. Each one of the models recommend lateral paths
provided for scenario one (and all scenarios for that matter) and are updated at the
aforementioned five minute update cycle and trajectory computed by a point mass
mathematical model8. On the opposite end of the spectrum, NAS delay state six is a
scenario that was run on day with medium-high delay, specifically clustered in the
Atlanta Metroplex area. In application, traffic flow managers could use this decision
support system to both understand how the Atlanta delay propagates through the rest of
the NAS, and use the recommend routes provided for the NAS for strategic planning.
8 Details on the point-mass BADA aircraft performance mathematical model can be reviewed at the
following link: https://www.eurocontrol.int/sites/default/files/field_tabs/content/documents/sesar/bada-revision-atmosphere-model-2010.pdf
66
Figure 4-4: NAS route selection & flight delay prediction lateral trajectory export from
the “Results Computation” component of Data-Driven Decision Support Architecture
4.2.4 Validation & Insight
The Data-Driven Decision Support Architecture was ultimately validated and
verified by the same subject matter experts that aided in the development of the DBN
models. More specifically, since this is a NAS-based recommendation engine, experts
were asked to focus specifically on the model recommended routes provided for the
geographic area they have expertise in. In addition, experts were asked to verify that the
model recommended routes given certain delay predictions (e.g. delay state six example)
seemed reasonable based on their expertise. Out of the 15 total subject matter experts, 13
overwhelmingly thought that the route recommended seemed reasonable and provided
feedback that the developed routes can even be used to develop Q routes in the en-route
phase of flight which supports another initiative in the TBO NextGen portfolio (discussed
in Section 1.2). In addition, six experts independently noted that with more testing the
approach developed by this research could be implemented into the TFM environment for
strategic (greater than two hours) and tactical planning (minute-by-minute decisions).
67
Additional insight by the experts focused on greater development of the external
environment. While the experts realize there are nearly infinite environment factors that
can come into play that can affect the fidelity of the model, most agreed that both the
volume and variety of different data sources used for this analysis was a big step in the
right direction with regards to data-driven model based decision-making in the big data
era.
68
Chapter 5: Conclusions & Future Work
5.1 Conclusions
In this study, the use of a DBN as the foundation for the development of a
predictive flight delay model was explored. DBNs provide a powerful means for
prediction and determining trade-offs (e.g. what-if scenarios) for managing flight time
delay in the presence of uncertainty. Making these trade-offs effectively is an essential
part of establishing decision support tools for the NextGen environment. The DBNs
ability to learn, classify, and predict parameters with high accuracy using a novel data
driven framework, encourages the author to continue this research to eventually obtain
the intended end state: a data-driven decision support system that can be used by ATDMs
to provide the optimal (best) or recommended (from past history) operational decisions
for both strategic and tactical prediction horizons. The author also believes the novel
computing strategy will find utility in the investigation of NAS system management
policies to manage flight time delay and aircraft route allocation for realistically-sized
flight route networks since it is more accurate, efficient, and extensible than prior
research makes possible.
5.2 Recommendations for Future Work
5.2.1 Air Traffic DBN Application
One focus for future research would be to continue testing both the properties and
parameters of the current DBN model and continue identifying more applications such as
when a TFMI should be initiated by traffic flow management. Specifically, the flight
69
delay model structure could be expanded to recommend when a TFMI should be
implemented since currently, strategic decisions by traffic flow managers largely rely on
non-optimal methods such as operational experience and tacit knowledge when deciding
on which TFMIs to implement for delay alleviation. An optimization method should then
be developed from the observed probabilities to provide the best decision or set of
decisions to the air traffic decision maker. One way this can be achieved is by identifying
the control law by which the decision support automation will take action in response to
variation from previous historical data. For example, given that the DBN model suggests
a TFMI should be implemented, future research could identify the trade-offs to be made
between early and ongoing intervention to compensate for delay propagation effects at
airport city pairs.
5.2.2 Research Other Parallel Computing Frameworks
Additional research should focus on the integration of this model and associated
data sources to perform analysis on an increasing number of airport sites using the
computational benefits of the Hadoop MapReduce computational framework. For real-
time decision support, new computational paradigms such as UC Berkeley’s Spark [60]
should be explored. Spark is an open source cluster computing system that provides
primitives for in-memory cluster computing making data loading and querying into
memory exponentially faster.
70
References
[1] Joint Planning and Development Office, "JPDO Trajectory-Based Operations (TBO) study team report," Washington D.C, 2011., in press.
[2] FAA, "National Airspace System: System Engineering Manual," Air Traffic Organization, Washington, D.C., 2006.
[3] B. Musialek, C. Munafo, R. Hollis and M. Paglione, "Literature Survey of Trajectory Predictor Technology," Federal Aviation Administration, Atlantic City, 2010.
[4] T. Gaydos, W. Kirkman, S. Shresta, E. Blair and J. Kuchenbrod, "Measures Variability and Uncertainty in Flight Operations," in Integrated Communication, Navigation, and Surveillance (ICNS), Herndon, VA, 2012.
[5] S. Mondoloni, "Aircraft trajectory prediction errors: including a summary of error sources and data- FAA/Eurocontrol action plan 16 common trajectory prediction capabilities," CSSI, INC, 2006., in press.
[6] Trajectory Computation Infrastructure Based on BADA Aircraft Performance Model; Gallo, Eduardo ; Lopez-Leonies, Javies ; Vilalplana, Miguel A.; Navarro, Francisco A. ; Boeing Research & Technology Europe, Technishe Universita Munchen; 39173
[7] Preliminary Results of a Robust Trajectory Prediction Method Using Advanced Flight Data; Dupuy, Dominique Marie; Porretta, Marco ; Center for Transport Studies Imperial College London; 39173
[8] Objective Function for 4D Trajectory Optimization in Trajectory Based Operations; Pleter, Octavian Thor; Constantinescu, Cristian Emil; Stefaneson, Irina Beatrice ; University Politechnica of Bucharest, Romanian Space Agency ; Aug.2009
[9] A Model to 4D Descent Trajectory Guidance; Rodriguez , Jose Miguel Canino; Deniz, Luis Gomez ; Herrero, Jesus Garcia; Portas, Juan Besada; Corredera, Jose Ramon ; Signal and Communications Department and Electronic Engineering Department, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain, Computer Science Department, Universidad Carlos III, Madrid, Spain , Signal, System and Radiocommunication Department, Universidad Politécnica de Madrid, Madrid, Spain; 39173
[10] Performances and Sensitivities of Optimal Trajectory Generation for Air Traffic Control Automation; Wu, Di ; Zhan, Yiyuan J. ; University of Minnesota; Aug.2009
[11] 3D Conflict Resolution of Multiple Aircraft via Dynamic Optimization; Raghunathan , Arvind U.; Gopal, Vipin ; Subramanian, Dharmashankar ; Biegler, Lorenz T.; Samad, Tariq ; Carnegie Mellon University, Pittsburgh, PA, Honeywell International, Minneapolis, MN; May 2003
[12] A Quaternion-based inverse Dynamics Model for Real-time UAV Trajectory Generation; Drury, Rick G.; Whidborne, James F. ; Cranfield University; Aug.2009
[13] Improved Ground Trajectory Prediction by Multi-Aircraft Track Fusion for Air Traffic Control; Lymperopoules, Loannis ; Lygeros, John ; Swiss Federal Institute of Technology Zurich; Aug.2009
[14] A Holding Function for Conflict Probe Applications; McNally, Dave ; Walton, Joe ; NASA Ames Research Center, University of California; Aug 2004
[15] Intent Inference and Strategic Path Prediction; Krozel, Jimmy Ph.D ; Andrisani II, Dominick Ph.D ; Metron Aviation Inc., Purdue University; Aug.2005
[16] On-Line Trajectory Optimization for Autonomous Air Vehicles; Twigg , Shannon ; Calise, Anthony Calise ; Johnson, Eric ; Georgia Institute of Technology ; 37834
71
[17] Utilizing RNAV Avionics Testing Lateral Offset Procedures; Herndon , Alert A.; Williams, Jeffrey T.; Vaughn, William ; DeArmon, James ; Duquette, Michelle ; Formosa, Jeffrey ; Jarvis, Edwin ; Spellman, Joseph ; The MITRE Corporation, Continental Airlines, FAA ; Oct. 2003
[18] Kinematics-Based model for Stochastic Simulation of Aircraft Operating in the National Airspace System; McGovern, Seamus M.; Cohen, Seth B.; Truong, Minh ; Farley, Gerard ; US DOT National Transportation Systems Center, EG&G Technical Services ; 39173
[19] Target Tracking and Essential Time of Arrival (ETA) Prediction for Arrival Aircraft; Roy , Kaushik ; Levy, Benjamin ; Tomlin, Claire J. ; Aug.2006
[20] Flight-Mode-Based Aircraft Conflict Detection using a Residual-Mean Interacting Multiple Model Algorithm ; Hwang , Inseok ; Hwang, Jesse ; Tomlin, Claire ; Stanford University; Aug 2003
[21] Robust Nonlinear LASSO Control: A New Approach for Autonomous Trajectory Tracking; Boyle, David P.; Chamitoff, Gregory E. ; Ball Aerospace Australia, NASA; Aug 2003
[22] C. P. Tino, L. Ren and J.P. B. Clarke, "Wind Forecast Error and Trajectory Prediction for En-route Scheduling," in AIAA-GNC, Chicago, IL, 2009.
[23] S. Mondoloni and D. Liang, "Improving Trajectory Forecasting Through Adaptive Filtering Techniques," in 5th USA/Europe ATM R & D Seminar, Budapest, Hungary, 2003.
[24] T. Rentas, S. M. Green and K. Cate, "Survey and Method for Determination of Trajectory Predictor Requirements," National Aeronautics and Space Administration, 2009.
[25] M. Hansen, "Delay and flight time normalization procedures for major airports: LAX case study," National Center of Excellence for Aviation Operations Research, Berkeley, CA, 2001.
[26] M. Abdel-Atl, C. Lee and B. Y.Q., "Detecting periodic patterns of arrival delay," Journal of Air Transport Management, vol. 13, no. 6, pp. 355-361, 2007., in press.
[27] W. Vigneau, "Flight Delay Propagation, Synthesis of the Study," EUROCONTROL, EEC Note No 18/03, 2003.
[28] M. Janic, "Modeling the Large Scale Disruptions of an Airline Network," Journal of Transportation Engineering, pp. pp. 249-260, April 2005.
[29] D. Dai and J. Liou, "Delay Prediction Models for Departure Flights," Journal of the Transportation Research Board, Vols. CR-ROM, 2006.
[30] R. Jehlen, A. Klein, B. Sridhar and Y. Wang, "Modeling Flight Delays and Cancellations at the National, Regional and Airport Levels in the United States," in Eighth USA/Europe Air Traffic Management Research and Development Seminar, Napa, California, 2009.
[31] Neogi, N.A.; Naseri, A., "Using Hidden Markov Models to Detect Mode Changes in Aircraft Flight Data for Conflict Resolution," Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on , vol.5, no., pp.3732,3737, 8-11 Oct. 2006
[32] M. J. Russell and J. A. Bilmes. Introduction to the special issue on new computational paradigms for acoustic modeling in speech recognition. Computer speech and language, 17:107–112, April 2003.
[33] G. Welcha and G. Bishop. An introduction to the Kalman filter. Technical Report 95-041, University of North Carolina at Chapel Hill, Department of computer science, Chapel Hill, NC, USA, April 2004.
[34] R. E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME - journal of basic engineering, 83:35–45, 1960.
[35] Y Wang, M Papageorgiou, A Messmer, P. Coppola, A. Tzimitsi, A. Nuzzolo, “An Adaptive Freeway Traffic State Estimator”, Automatica, vol.45, no.1, pp. 10-24, 2009. doi: 10.1016/j.automatica.2008.05.019
72
[36] Kirubarajan, T., and Y. Bar-Shalom. "Kalman filter versus IMM estimator: when do we need the latter?." Aerospace and Electronic Systems, IEEE Transactions on 39.4 (2003): 1452-1457.
[37] Singer, Robert A., and Kenneth W. Behnke. "Real-time tracking filter evaluation and selection for tactical applications." Aerospace and Electronic Systems, IEEE Transactions on 1 (1971): 100-110.
[38] J. W. Pepper, K. R. Mills and L. A. Wojcik, "Predictability and uncertainty in air traffic flow management," in 5th USA/Europe Air Traffic Management R&D Seminar (ATM-2003), Metrics and Performance Management, Budapest Hungary, 2003., in press.
[39] N. e. a. Xu, "Estimation of Delay Propagation in the National Aviation System Using Bayesian Networks," in 6th USA-Europe ATM Seminar, 2005.
[40] L. Yu-jie and M. Song, "Flight Delay and Delay Propagation Analysis Based on Bayesian Network," in Knowledge Acquisition and Modeling, 2008, Wuhan, 2008.
[41] T. Dean and K. Kanazawa. Probabilistic temporal reasoning. In Proceedings of the 7th national conference on artificial intelligence (AAAI-88), pages 524–529, St Paul, MN, USA, August 1988. MIT Press.
[42] Kim, Sun Yong, Seiya Imoto, and Satoru Miyano. "Inferring gene networks from time series microarray data using dynamic Bayesian networks." Briefings in bioinformatics 4.3 (2003): 228-235.
[43] van Gerven, Marcel AJ, Babs G. Taal, and Peter JF Lucas. "Dynamic Bayesian networks as prognostic models for clinical patient management." Journal of biomedical informatics 41.4 (2008): 515-529.
[44] Langmead, Christopher J. "Generalized queries and Bayesian statistical model checking in dynamic Bayesian networks: Application to personalized medicine." (2009): 201.
[45] FAA, "Aviation System Performance Metrics (ASPM)," 01 2012. [Online]. Available: aspmhelp.faa.gov. [Accessed 1 November 2013].
[46] B. R., R. Hsu, L. Berry and J. Rome, "Preliminary evaluations of flight delay propagation through an airline schedule," in Proceedings of the 2nd USA/Europe air traffic management R&D seminar, Orlando, FL, 1998., in press.
[47] N. Rupp, "Further investigations into the causes of flight delays," Department of Economy, East Carolina University, East Carolina, NC, 2007., in press.
[48] Jensen, Finn V. An introduction to Bayesian networks. Vol. 210. London: UCL press, 1996. [49] K. B. Laskey, N. Xu and C.-H. Chen, "Propagation of delays in the national airspace system,"
in Proceedings of the Twenty-Second Conference Annual Conference on Uncertainty in Artificial Intelligence, Arlington, VA, 2006.
[50] Wang, Paul TR, Lisa A. Schaefer, and Leonard A. Wojcik. "Flight connections and their impacts on delay propagation." Digital Avionics Systems Conference, 2003. DASC'03. The 22nd. Vol. 1. IEEE, 2003.
[51] A. C. Eckstein, C. Kurcz and M. O. Silva, "Threaded Track: geospatial data fusion for aircraft flight trajectories," The MITRE Corporation, Mclean,VA, 2012.
[52] A.C. Eckstein, J. Heidrich, “Analysis of aircraft performance data for procedural and operational performance” The MITRE Corporation, Mclean, VA 2010.
[53] K. P. Murphy, "Dynamic Bayesian networks: representation, inference and learning.," (Doctoral dissertation, University of California), Berkeley, 2002.
[54] C. Berzuini, "Representing time in causal probabilistic networks," Uncertainty in artificial intelligence, vol. 5, no. Elsevier Science Publisher B.V, pp. 15-28, 1990., in press.
[55] W. Buntine, "Operations for learning with graphical models," Journal of Artificial Intelligence Research, vol. 2, pp. 159-225, 1994.
73
[56] A. Saluja, P. K. Sundararajan and O. J. Mengshoel, "Age-layered expectation maximization for parameter learning in Bayesian networks.," in Proceedings of Artificial Intelligence and Statistics (AIStats), La Palma, Canary Islands, 2012.
[57] J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, pp. 107-113, 2008.
[58] G. S. Horny, "ALPS: The age-layered population structure for reducing the problem of premature convergence," in 8th annual conference on Genetic and evolutionary computation, 2006.
[59] E. B. Reed and M. J. Ole, "Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce," in Proc. of Big Learning: Algorithms, Systems and Tools., 2012.
[60] M. Zaharia, C. Mosharaf, M. Franklin, S. Shenker and I. Stoica, "Spark: cluster computing with working sets," in HotCloud 2010, Boston, MA, 2010., in press.
74
Appendix A: Data-Fused Algorithms
The algorithms used in this data fusion and analysis are in a variety of software
languages, but are tied together in their ability to be described in a map-reduce
framework on Hadoop and driven by Oozie workflows.
Data Sources
The data-fused algorithms were defined on a data model used to construct an Apache
Oozie workflow for automated processing. External data blocks are used to define data
sources external to the Hadoop Distributed File System or not part of this system. The
majority of the algorithms stem from the Threaded Track as the trajectory source input.
Algorithms
Each algorithm may take inputs from one or more data elements and output one or more
data elements. However, algorithms must interface through physical data objects. This is
because Oozie workflows are built around each algorithm (or set of algorithms) as block
boxes with the input/output elements shown. This allows algorithms to be coded in
MATLAB, Java, Pig, Python, etc. and driven by the Oozie workflow.
75
Figure A-1: Threaded Track Data Fused Algorithm Oozie Workflow
76
Data-Fused Algorithms Descriptions
The algorithms described below depict the detail behind Figure A-1. For example, the
first fusion algorithm discussed, Trajectory Fusion, discusses the input and output
considerations used in the development process. It is the development of these fusion
algorithms, that allowed for the development of the data schema shown in Appendix B,
and ultimately allowed for the research completed in this dissertation.
Trajectory Fusion
The trajectory fusion algorithm is the Threaded Track. This algorithm is used to fuse all
available radar data into a single synthetic trajectory with the highest fidelity coverage
throughout the flight envelope.
Input Considerations
Source quality may vary greatly between ASDEX, NOP Tracon, NOP Center, and ETMS
Output Considerations
Accuracy is highly source dependent. Users should check the active sensors field
which provides the contributing sensors at each point.
Points where the active sensor starts with "ETMS" are purely spliced from
ETMS_TZ messages. They have not been smoothed and will not contain several
parameters (climb gradient, accelerations, etc.).
The flight table provides traceability to the source data using the list of
nop/asdex/etms segment ids.
Flight arrival/departure fields are purely based on metadata information (no
trajectory information).
77
Not every flight will have an ETMS flight ID, but every ETMS flight is contained
within the threaded track.
The first 8 numbers in the flight ID correspond to the <yyyymmdd> date of the
first track point. Flights are partitioned into files based on this date.
All times are given in UTC.
Output points are synthetic and do not correspond to a single filtered radar hit.
Ruc Fusion
Ruc fusion is the process by which weather provided the Rapid Update Cycle Data is
fused to every track point in the Threaded Track. The key goal is to provide winds,
temperatures, etc., directly from interpolating variables from the RUC grid. This also
allows for calculating derived parameters such as airspeeds and Mach numbers.
Input Considerations
RUC data from either the isobaric model or hybrid model may be used. These ruc
models provide distinct output data variables as well as different calculations.
RUC data coverage does not include Alaska or Hawaii.
The vertical interpolation is dependent on measurements of reference pressure
below FL180 (not provided in our radar data), which are estimated from
ASOS. If ASOS data is not available, standard reference pressure is assumed.
Threaded track times, positions, and altitudes are used to interpolate values from
the RUC 4F grid.
Threaded track ground speeds are used with the associated RUC variables to
calculate all airspeeds.
78
Output Considerations
The RUC laterally/vertically extrapolated parameter can be used to identify when
measurements have been extrapolated outside the bounds of the RUC grid.
The ASOS snr field identifies the "signal strength" of the reference pressure
measurements on a 0-1 scale. When this field is "0", a standard reference
pressure is used. When this field is "1" it indicates that it is close to the ASOS
ground sensors.
Indicated airspeed and calibrated airspeed are taken to be equivalent.
Terrain Fusion
The Terrain Fusion algorithm is the process by which terrain elevation data sources are
used to identify the terrain elevation at each threaded track data point. This process also
runs after the RUC Fusion, which allows a computation of height above terrain by
examining the difference in the RUC derived geometric altitude and the terrain elevation.
Phases of Flight
The Phases of Flight algorithm is the process by which the Threaded Track is segmented
into generic flight envelope phases. One of the key components to this is identifying
surface points (with ASDEX) from in-flight points. The second component is to break in
flight sections into a single sequential sequence of ascending to cruise to descending.
This algorithm is typically the basis of most post-analysis since algorithms are generally
focused on one or more of these phases.
Input Considerations
79
Air/Ground phases are identified purely from the threaded track ground speed
profile
Start of Cruise and top of descent are identified purely from the threaded track
pressure altitude profile
Output Considerations
Phases are broken into their basic components: ground-takeoff-ascent-cruise-
descent-landing-ground (GTACDLG)
The sequence of these phases can be very useful for detecting merged and split
flights. When finding idiosyncrasies on other algorithms, it is recommended to
use the phase sequence as a quick check to better characterize the data.
Takeoff and Landing points may not precisely correspond to known physical
runway locations in all instances (this is a statistical process). The runway
locations are specifically excluded to prevent biasing results (by clipping the tails
of the distributions). Also, the altitudes many not have an exact correspondence
to these points, and can be very deceiving (especially for ASDEX data).
The ascent and descent phases are in more of the global flight envelope
context. Each of these phases may contain portions of flight with a positive,
negative, or neutral climb rate.
Top of descent is a fairly subjective measure, but in this case is intended to
identify more in the ATC context (rather than the pilot context), where all
subsequent vertical maneuvers are directed toward moving the aircraft toward the
approach, and all vertical maneuvers prior to the point are merely for the purposes
of en route separation and spacing.
80
Lateral Taxonomy
The Lateral Taxonomy algorithm is the process by which the Threaded Track is
segmented into groups of lat/lon points which can be described as straight / turn
segments. Points on the ground (identified in Phases Of Flight) are assigned a "ground"
type and ETMS points (threaded track source is ETMS) are assigned a "etms" type, since
they are segmented with a douglas-pueker algorithm using great circle distances, where
as the standard segmentation algorithm is a hybrid least squares algorithm using great
circles (straight) and small circles (turns). One of the main uses is that this segmented
version creates a simplified version which should contain less general variance, so any
individual track point with moderately higher deviation won't appear in the segmentation.
Input Considerations
Based on threaded track latitude/longitude measurements
Output Considerations
Ground and ETMS segments are determined purely from a Douglas-Pueker
algorithm using great circles.
Straight, Left, and Right turn segments are determined from a combination
Douglas-Pueker / least squares algorithm designed to optimize the estimate
around turns.
A single turn maneuver may be broken into several turn segments based on
changes in the apparent radius of the turn. This can be very common in turns to
final.
Straight segments will always provide the center point to the right of the segment.
81
Vertical Taxonomy
The Vertical Taxonomy algorithm is the process by which the Threaded Track is
segmented vertically into linear segments where the aircraft is ascending / level /
descending. This is the primary source for level-off computation metrics. Segments can
be divided into either constant climb gradient or constant climb rate depending upon the
specific goals of the analyst. Furthermore, the use of pressure altitude versus geometric
altitude should be considered. The algorithm relies on the use of linear least squares
segmentation.
Input Considerations
Based on threaded track pressure altitude trajectory.
Segmentation is based on segments of constant climb gradient (not constant climb
rate).
Output Considerations
A vertical segment is assigned to be "level" ("L") based on a threshold of the
climb gradient as well as the vertical altitude change over the duration of the
segment.
Level segments may still have a small altitude change.
Because of the high levels of quantization (relative to noise), the algorithm
performs better in higher climb gradients than shallow ones. This means that the
error may be highest for near-level flight (as high as mode C quantization), but
will reduce with increased climb/descent gradient.
There is no restriction on minimum/maximum segment length.
Segments are always split when the phase of flight changes (e.g. top of descent,
82
touchdown, etc.).
Speed Taxonomy
The Speed Taxonomy algorithm is the process by which the Threaded Track is
segmented temporally into linear segments where the aircraft is accelerating / constant
speed / decelerating. This process could be applied to ground speed, true airspeed,
indicated airspeed, or Mach number depending on the application. The algorithm relies
on the use of linear least squares segmentation.
Runway Assignment
The Runway Assignment algorithm attempts to assign both arrival and departure runway
assignments as well as refine the airport assignment in the Threaded Track flight table.
The algorithm generally considers the trajectory in terms of distance, heading, lateral
deviation, and altitude relative to a given airport/runway. Any available arrival and
departure fields from the flight plan are used to increase the weights toward those
assignments. If there is not enough information to make an association with an airport or
runway, the field may be null.
Input Considerations
The threaded track trajectory (latitude, longitude, altitude, and heading) is used to
compute a scoring function against a particular airport/runway based on
proximity.
The threaded track arrival and departure airports are used (when not null) as an
increased weight in the scoring function to favor these airports. The magnitude of
83
the increase is source dependent (ETMS, Center, Tracon, etc.).
The phases of flight is used to remove ground points, preventing points on
taxiways and other surface patterns from interfering with the scoring function.
If a merged flight occurs, the runway assignment is assigned based on the longest
in flight segment identified in the phases of flight.
Output Considerations
If the assigned airport/runway is null, then no reasonable assignment could be
made with any statistical confidence.
If the assigned airport is not null, but the respective airport probability is "-1",
then the assignment was based purely on the threaded track airport assignment,
and the trajectory information was not able to confirm or deny the assignment.
The airport IDs are mapped to the ICAO identifier when possible and use the
FAA identifier when no ICAO identifier exists (this may be different than the
threaded track airport ID for the same facility).
Probabilities are expected to be very source dependent (e.g. an ASDEX arrival
might have a substantially higher score than an ETMS only arrival)
Alternate airport/runway assignments provide a second best guess when there are
multiple reasonable scores (e.g. closely spaced parallel runways). The alternate
odds give the ratio of the alternate score to the primary score.
The selected trajectory point for the runway scoring function is also provided in
the output.
84
Missed Assignment
The Missed Approach algorithm attempts to identify both go-arounds and missed
approaches for a flight (may be assigned one or multiple). The algorithm is primarily
guided by the altitude profile provided by the vertical taxonomy but will also consider the
lateral position relative to an airport. When an event is detected, the algorithm will
attempt to assign an approach runway associated with the event.
Input Considerations
Based purely on vertical profile and proximity to airport. No voice commands are
used in this estimate.
Output Considerations
Each flight may have none, one, or multiple missed segments assigned.
These segments may occur for missed approaches, go arounds, test flights,
training flights, etc.
Consider filtering this candidate list to the desired output. Several factors to
consider might be: commercial vs general aviation, number of missed approach
segments, same departure/arrival airport, non-standard phases of flight sequence,
maximum flight altitude, etc.
85
Procedure Assignment
The Procedure Assignment algorithm provides a series of metrics in which the
conformance of the Lateral Taxonomy is measured against ground track (fix to fix over
specific ground path) procedure legs from JEPPESEN. Procedures filed or amended in
the flight plan are given special consideration in this measurement of conformance.
Equipage does not affect the assignment, but may be correlated separately.
Input Considerations
The lateral trajectory segments are used to measure conformance against
individual legs. No comparison is made to individual trajectory points.
Routes from the ETMS_RT table are used to provide context in the flight plan.
Only fix-to-fix procedure legs with a single ground path are considered (since
conformance is ill-defined for other leg types).
No voice clearance information is used in these algorithms - assignments are
based purely on flight plans and observed conformance.
Vertical and speed conformance are not required to be assigned to a procedure.
Output Considerations
Leg Segments are computed when there is a minimal likeness between a lateral
track segment and a procedure leg. Not every leg segment is used to compute a
procedure assignment. This allows for users with a more relaxed definition of
conformance to still utilize this table with their own definition.
Procedure segments are computed when specific standards of conformance are
met from the leg segments. These factors include measured deviations from the
leg and whether the leg segment is an overlay of another leg segment on a flight
86
plan filed procedure.
Only a single procedure can be assigned for the Sid, star, and approach. Multiple
procedures may occur for the en route procedure, but not concurrently (e.g. en
route overlay).
If no candidate procedure exists, or there is no clear best assignment from
multiple candidates, then no procedure will be assigned.
Procedure assignment has minimal requirements on distance flown, but will also
provide conformance distances for users to filter as they choose.
87
Holding Assignment
The Holding Assignment algorithm searches aircraft flight paths for loops and evaluates
detected looping patterns against geometric criteria to compute a metric representing the
confidence that the given pattern represents aircraft holding. Military flights, FAA check
flights, and flights that start and end at the same airport are excluded from consideration.
Input Considerations
Lateral Segments are used to determine the path of flight. Extremely short
segments are combined to avoid spurious loop detection (intersection of a path
with itself).
Threaded Track Flights are used to obtain origin and destination airports and to
obtain the call sign, which is used to identify military and FAA check flights.
Output Considerations
Lateral segments that were combined for purposes of computation are restored for
purposes of reporting the set of lateral segments defining a holding segment.
A lateral segment entering or exiting a holding pattern may be truncated, so that
only part of a lateral segment is considered part of the holding segment, if the
lateral segment extends beyond the region defined by the hold's looping pattern.
A holding segment may include lateral segments, or portions of lateral segments,
that are not part of the loop or series of loops used to detect the hold, if the
segments are close to the region enclosed by the loop.
88
Fuel Burn
The BADA 3.9 model is used to compute instantaneous fuel flow using International
Standard Atmosphere, total energy model enabled and a mass bleed computation using
the BADA nominal aircraft mass for the initial mass. The following filters are applied
The aircraft is ascending, descending or in the cruise phase as specified in the vertical
segment phase of flight field (see Phases of Flight). For all other points no fuel flow is
computed. The aircraft is supported by BADA. If the aircraft type is not supported NaN's
are returned for all points that pass the above filter. The altitude is between 0 and the
BADA maximum altitude padded by an additional 20%. If not, NaN's are returned for
these points. The true airspeed is between the BADA stall speed and Vmos (note a
conversion from calibrated airspeed is performed). If not, NaN's are returned for these
points.
Input Considerations
The following inputs (which are mapped to Threaded Track or other data) are required
for the BADA model.
Aircraft Type – Threaded Flight. If aircraft is not supported then NaN's are
returned
Climb rate – Vertical Segment
True Airspeed – RUC Track
Acceleration - Threaded Track
Altitude – Threaded Track
Aircraft Mass – Nominal Mass Provided by BADA Model
89
Output Considerations
Track level fuel burn provides the current instantaneous fuel flow (in kilograms
per minute) for each threaded track point, the accumulated fuel mass (NaN fuel
flow points are linearly interpolated over), mass of aircraft (starting from initial
BADA mass), and three derivative terms that will be used in later versions to
propagate error. The accumulated fuel mass is computed after the all
instantaneous fuel flow computations for a particular flight are complete. If fuel
flow values are NaN, the fuel mass computation uses linear interpolation or
nearest neighbor extrapolation to estimate the value. This can result in entire
aggregate computation (see below) to be based on extrapolated data.
Aggregate (segment) level fuel burn provides the total amount of fuel burned (in
kilograms) from specified radius rings from departure and arrival airport. The
author used the location of the airport assigned in the runway table. A radius ring
centered at airport origin is used when an assigned airport is provided or the first
and last valid track point (that is ascending, descending, or in cruise) is used when
no airport is provided. We report the distance from the origin of the ring and the
first/last track point. Because instantaneous fuel flow values can be NaN due to
e.g. unpopulated altitude, the percentage of interpolation/extrapolation is
provided.
90
Arrival Throughput
For each flight, arrival throughput is calculated for the 15 minute period ending with the
arrival of the flight, the 15 minute period centered on the arrival time of flight, and the 15
minute period beginning with the arrival time of the flight. Both throughput for the
arrival airport and throughput for the runway on which the flight landed are calculated. In
addition, the inter-arrival times between the flight and the one immediately preceding it,
and between the flight and the one immediately following it, are recorded for both the
arrival airport and the arrival runway, and the identities of the preceding and following
aircraft are recorded.
Input Considerations
The following data items are required:
Arrival airport - Flight Runway
Arrival runway - Flight Runway
Wheels-down time - Flight Phase
Data are required not only for the day, for which arrival throughput is calculated,
but also for the preceding and following day. Throughput periods for flights near
the end of the day may extend into the preceding or following day; the
immediately preceding or following aircraft may have landed the preceding or
following day; aircraft that land on the same day may have originated on the
preceding day, with the result that their arrival data may be recorded in the
previous day's runway and phase files; and a given aircraft for this day (as
determined by its threaded track ID and its presence in this day's runway and
phases files) may have landed on the following day.
91
Output Considerations
All throughput and inter-arrival time computations are based on wheels-down
times. Throughputs are reported as the count of aircraft in a 15-minute period.
Aircraft are assigned to a given day's throughput data based on threaded track id,
not arrival time.
92
Appendix B: Data Schema
Appendix B depicts all the variables tested and analyzed for input into the DBN models
developed. Specifically, this section goes over the schema of each variable which
includes variable type, format, values, and general description. This tie in with Appendix
A because the data variable schemas discussed are the resultant output (yellow-filled in
boxes from Figure A-1) from the algorithm fusion work.
Type and Format
The type column in each table describes the natural primitive type that the field can be
cast to such that the specified format can be used to convert the field back to the same csv
string without a loss of precision. Specific notes:
Unix times are read as <long>
threaded track ID is represented as a <long>
latitude and longitude require <double>
<Boolean> is represented as "0" and "1" in csv strings
Flight Data
Flight data records consist of a single csv record per flight. Each record is unique on the
threaded track id.
93
Threaded Flight
Column Name Type Format Units / Value Description
1 threadedTrackID long
<yyyymmddxxxxxx>
primary key to threaded
track
2 dataType string "F" unique data type identifier
3 dataVersion string "1.1","1.1.1","1.1.2" data version
(<schema>..<update>)
4 aircraftID char
aircraft ID / callsign
5 departureAirport char
reported departure airport
6 arrivalAirport char
reported arrival airport
7 aircraftType char
aircraft type identifier
8 firstMessageTimeUnix long
first synthetic track point
message time, in unix time
9 firstMessageTimeString char
first synthetic track point
message time, in string
format
10 lastMessageTimeUnix long
last synthetic track point
message time, in unix time
11 lastMessageTimeString char
last synthetic track point
message time, in string
format
12 etmsFlightID long
etms flight ID, null if not
linked to ETMS data
13 etmsDepartureDateUnix long
ETMS flight departure
date and time, in unix time
14 etmsDepartureDateString char
ETMS flight departure
date and time, in string
format
15 etmsArrivalDateUnix long
ETMS flight arrival date
and time, in unix time
16 etmsArrivalDateString char
ETMS flight arrival date
and time, in string format
17 asdexDepartureFlag boolean 0,1
flag to indicate when flight
has ASDE-X data at its
departure
18 asdexArrivalFlag boolean 0,1
flag to indicate when flight
has ASDE-X data at its
arrival
19 facilities char
List of sensors that
contributed to the synthetic
track. Sensors are
delimited by pipes and
94
labeled by a facility and
sensor (or just facility
when appropriate) and are
given in order of first
occurrence.
20 trackDate char yyyymmdd
first message date (no
time)
21 segmentIDs char
list of segmented track IDs
delimited by pipes
22 qualityMessage char
identifier to flag certain
events in the smoothing
process
23 allAircraftID char
list of all reported aircraft
IDs (delimited by pipes)
24 allDepartureAirport char
list of all reported
departure airports
(delimited by pipes)
25 allArrivalAirport char
list of all reported arrival
airports (delimited by
pipes)
26 allAircraftType char
list of all reported aircraft
Types (delimited by pipes)
27 modeSCode char
mode S code (24 bit
address) - from asdex data
28 allModeSCode char
list of all reported mode S
code
95
Phases Flight
Column Name Type Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "Q" unique data type identifier
3 dataVersion string
"3.y.z" data version
(<schema>..<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 sequence string
sequence of phases of flight identified
by the PhasesTrack ordered by time
6 throttleUpTime long
milliseconds unix time; estimated start of ground
roll on takeoff runway (asdex only)
7 wheelsUpTime long milliseconds unix time; estimated rotation point on
takeoff runway (asdex only)
8 startOfCruiseTime long milliseconds unix time; estimated start of cruise
9 topOfDescentTime long milliseconds unix time; estimated top of descent
10 wheelsDownTime long milliseconds unix time; estimated runway
touchdown time
11 taxiToGateTime long milliseconds unix time; estimated end of landing
deceleration / runway exit time
12 multipleTakeoff boolean
13 multipleLanding boolean
14 multipleInFlight boolean
96
Runways Flight
Column
Name Type
Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx>
primary key to threaded track
2 dataType string
"A" unique data type identifier
3 dataVersion string "3.y.z"
data version
(<schema>.<code>.<update>
)
4 threadedTrackVersion string
data version of threaded track
(pair to column 1)
5 departureAirport string
6 departureRunway string
7 arrivalAirport string
8 arrivalRunway string
9 departureAirportProbability float 2
10 departureRunwayProbability float 2
11 arrivalAirportProbability float 2
12 arrivalRunwayProbability float 2
13 departureAlternateRunway string
14 departureAlternateRunwayOdd
s float 2
15 departureAlternateAirport string
16 departureAlternateAirportOdds float 2
17 departurePointTime long millisecond
s
epoch time of track point
with best score for assigned
departure runway
18 departureLatitude doubl
e 6
latitude of track point with
best score for assigned
departure runway
19 departureLongitude doubl
e 6
longitude of track point with
best score for assigned
departure runway
20 departurePressureAltitude float 0
21 departureTrackHeading float 2
22 arrivalAlternateRunway string
23 arrivalAlternateRunwayOdds float 2
24 arrivalAlternateAirport string
25 arrivalAlternateAirportOdds float 2
97
26 arrivalPointTime long millisecond
s
epoch time of track point
with best score for assigned
arrival runway
27 arrivalLatitude doubl
e 6
latitude of track point with
best score for assigned arrival
runway
28 arrivalLongitude doubl
e 6
longitude of track point with
best score for assigned arrival
runway
29 arrivalPressureAltitude float 0
30 arrivalTrackHeading float 2
31 nfdcDatabaseDate string
98
Procedure Flight
Column Name Type Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "Y" unique data type identifier
3 dataVersion string "3.y.z" data version
(<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 sid string
6 enRoute string
7 star string
8 approach string
9 sidConformingDistance float 3
10 enRouteConformingDistance float 3
11 starConformingDistance float 3
12 approachConformingDistance float 3
13 sidOverlayDistance float 3
14 enRouteOverlayDistance float 3
15 starOverlayDistance float 3
16 approachOverlayDistance float 3
17 allSid string
pipe delimited dist
18 allEnRoute string
pipe delimited dist
19 allStar string
pipe delimited dist
20 allApproach string
pipe delimited dist
21 flightPlan string
pipe delimited dist
99
Arrival Throughput
Column Name Type Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key
to threaded track
2 dataType string "AT" unique data type identifier
3 dataVersion string "1.y.z" data version
(<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair
to column 1)
5 aptPrecedingFlt long
threadedTrackID of immediately
preceding arrival at airport
6 rwyPrecedingFlt long
threadedTrackID of immediately
preceding arrival on same runway
7 aptTrailingFlt long
threadedTrackID of immediately
following arrival at airport
8 rwyTrailingFlt long
threadedTrackID of immediately
following arrival on same runway
9 aptPrecedingIAT long milliseconds Time from immediately preceding
arrival at airport to this arrival
10 rwyPrecedingIAT long
milliseconds Time from immediately preceding
arrival on runway to this arrival
11 aptTrailingIAT long
milliseconds
Time from this arrival to
immediately following arrival at
airport
12 rwyTrailingIAT long
milliseconds
Time from this arrival to
immediately following arrival on
runway
13 aptPrecedingThroughput integer count per 15
minutes
airport throughput for 15 minutes
preceding this arrival
14 aptCenteredThroughput integer count per 15
minutes
airport throughput for 15 minutes
centered on this arrival
15 aptTrailingThroughput integer count per 15
minutes
airport throughput for 15 minutes
following this arrival
16 rwyPrecedingThroughput integer count per 15
minutes
runway throughput for 15 minutes
preceding this arrival
17 rwyCenteredThroughput integer count per 15
minutes
runway throughput for 15 minutes
centered on this arrival
18 rwyTrailingThroughput integer count per 15
minutes
runway throughput for 15 minutes
following this arrival
100
Segment Data
Segment data records are unique on the threaded track id and segment id. There are a
variable number of records per flight, but typically a fraction of the number of track
records.
Asos Segment
Column Name Type Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string
"C" unique data type identifier
3 dataVersion string "3.y.z" data version
(<schema>..<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 segmentID long milliseconds unix time, phases of flight point
6 segmentType string
phase of flight identifier
7 weatherTime long milliseconds time of reported data
8 ceiling float 0
9 visibility float 3
10 rvr float 3
11 rvrRunway string
12 vorrvr float 3
Minimum of Visibility and RVR
13 temperature float 1
14 dewPointTemperature float 1
15 windChillFactor float 2
16 heatIndex float 2
17 tempAndHumidityIndex float 2
18 relativeHumidity float 2
19 windDirection float 2
20 windSpeed float 2
21 barometricPressure float 2
22 significantWeather string
23 windGust float 1
24 peakWindDirection float 2
101
25 peakWindSpeed float 1
26 peakWindHour float 0
27 peakWindMinute float 0
102
Lateral Segment
Column Name Type Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "L" unique data type identifier
3 dataVersion string
"3.y.z" data version
(<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 segmentID int
6 segmentType string
ground (G), left turn (L), right turn (R),
straight (S), en route (E)
7 phase string
maps to phase of flight
8 startTime long milliseconds unix time; does not necessarily map to
threaded track record
9 endTime long milliseconds unix time; does not necessarily map to
threaded track record
10 startDistance float 3
along track distance; does not
necessarily map to threaded track
record
11 endDistance float 3
along track distance; does not
necessarily map to threaded track
record
12 startLatitude double 6
13 startLongitude double 6
14 startHeading float 2
15 endLatitude double 6
16 endLongitude double 6
17 endHeading float 2
18 centerLatitude double 6
19 centerLongitude double 6
20 turnRadius float 3
21 turnDirection int
left (-1), right or straight (1)
22 stdResidual float 3
23 maxResidual float 3
24 leftContinuous string
continuous (C), discontinuous (D),
endpoint (E)
25 rightContinuous string
continuous (C), discontinuous (D),
endpoint (E)
103
Vertical Segment
Column Name Type Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "V" unique data type identifier
3 dataVersion string "3.y.z" data version
(<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 segmentID int
6 segmentType string
ascending (A), descending (D), level
(L)
7 phase string
maps to phases of flight
8 startTime long milliseconds unix time; does not map to threaded
track record
9 endTime long milliseconds unix time; does not map to threaded
track record
10 startDistance float 3
11 endDistance float 3
12 startPressureAltitude float 0
13 endPressureAltitude float 0
14 pressureClimbGradient float 0
15 geometricClimbGradient float 0
16 minimumClimbRate float 0 ft/min
17 averageClimbRate float 0 ft/min
18 maximumClimbRate float 0 ft/min
19 stdResidual float 0
20 maxResidual float 0
21 leftContinuous string
continuous (C), discontinuous (D),
endpoint (E)
22 rightContinuous string
continuous (C), discontinuous (D),
endpoint (E)
104
Speed Segment
Column Name Type Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "S" unique data type identifier
3 dataVersion string "3.y.z" data version
(<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 segmentID int
6 segmentType string
accelerating (A), decelerating (D),
constant (C)
7 phase string
maps to phases of flight
8 startTime long milliseconds unix time; does not map to threaded
track record
9 endTime long milliseconds unix time; does not map to threaded
track record
10 startDistance float 3
11 endDistance float 3
12 startGroundSpeed float 1
13 endGroundSpeed float 1
14 groundAcceleration float 0
15 stdResidual float 3
16 maxResidual float 3
17 leftContinuous string
continuous (C), discontinuous (D),
endpoint (E)
18 rightContinuous string
continuous (C), discontinuous (D),
endpoint (E)
105
Missed Segment
Column Name Type Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "M" unique data type identifier
3 dataVersion string "3.y.z" data version
(<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 segmentID int
6 segmentType string
7 firstVerticalSegmentID int
maps to vertical segment
8 lastVerticalSegmentID int
maps to vertical segment
9 startTime long milliseconds unix time; maps to threaded track
record
10 endTime long milliseconds unix time; maps to threaded track
record
11 startDistance float 3
12 endDistance float 3
13 missedApproachHeight float 0
14 clearanceLimit float 0
15 approachAirport string
airport is same as assigned arrival
airport in runways flight
16 approachRunway string
not currently evaluated
17 nfdcDatabaseDate string
link to nfdc database cycle
106
Leg Segment
Column Name Type Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "W" unique data type identifier
3 dataVersion string "3.y.z" data version
(<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 segmentID int
6 lateralSegmentID int
maps to lateral segment record
7 procedureName string
8 procedureType string
9 regionCode string
10 airportID string
ICAO airport code
11 transitionID string
12 sequenceNumber int
Lookup Value for Leg Fix names
13 startLegType string
ARINC leg type
14 endLegType string
ARINC leg type
15 legLength float 3
path length from fix to fix along
procedure
16 startTrackDistance float 3
With respect to track
17 endTrackDistance float 3
With respect to track
18 startLegDistance float 3
With respect to procedure
19 endLegDistance float 3
With respect to procedure
20 startDeviation float 3
21 endDeviation float 3
22 angularDeviation float 2
23 radiusDeviation float 3
difference between lateral radius and leg
radius
24 lateralResidual float 3
25 isCandidateProcedure boolean
leg is contained in its procedure candidate
list [X data]
26 isAssignedProcedure boolean
column 24 is true, and its procedure was
assigned [Y data]
27 jeppesenCycle string
2 digit year followed by integer count
107
Procedure Segment
Column Name Type Significant
Digits
Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string
"X" unique data type identifier
3 dataVersion string
"3.y.z" data version
(<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 segmentID int
6 legSegmentIDs string
7 procedureName string
8 procedureType string
9 regionCode string
10 airportID string
11 startTrackDistance float 3
12 endTrackDistance float 3
13 conformingDistance float 3
14 overlayDistance float 3
15 maxDeviation float 3
16 legTransitionSequence string
pipe delimited list of
<transition>:<sequence>
17 flightPlan float 0 milliseconds time of first use in flightplan
18 isAssignedProcedure boolean
19 jeppesenCycle string
2 digit year followed by integer count
108
Fuel Segment
Column Name Type Format Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "G" unique data type identifier
3 dataVersion string "3.y.z" data version (<schema>..<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 segmentID int
6 radiusRing float
nautical
miles
radius ring used to compute fuel consumed;
-1 is used for entire flight
7 arrivalDepartureFlag string
used to identify fuel consumed for arrival or
depature; 'A'=arrival, 'D'=departure,
'AD'=entire flight
8 fuelMass float
kilograms
9 startTime long
milliseconds epoch time associated with first point where
fuel flow was computed
10 endTime long
milliseconds epoch time associated with last point where
fuel flow was computed
11 interiorInterpolation float
percentage (between 0 and 1) of time fuel
flow was interpolated to determine mass
12 leftExtrapolation float
percentage (between 0 and 1) of time fuel
flow was extrapolated after start time
13 rightExtrapolation float
percentage (between 0 and 1) of time fuel
flow was extrapolated before end time
14 distanceFromStart float
nautical
miles
for depatures - distance from first point
with fuel flow to departure airport. if no
departure airport is assigned the first track
point with phase is the origin of the ring and
this value is 0
for arrivals - difference between radius ring
and the radius of first track point inside
radius ring
15 distanceFromEnd float
nautical
miles
for depatures - difference between radius
ring and the radius of last track point inside
radius ring
for arrivals - distance from last point with
fuel flow to arrival airport. if no arrival
airport is assigned the last track point is the
origin of the ring and this value is 0
109
Holding Segment
Column Name Type Format Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "H" unique data type identifier
3 dataVersion string "1.y.z" data version (<schema>..<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 segmentID integer
6 startTime long
milliseconds time aircraft entered the hold
7 endTime long
milliseconds time aircraft exited the hold
8 startDist float
nautical
miles
distance along flight track that the holding
segment begins
9 endDist float
nautical
miles
distance along flight track that the holding
segment ends
10 startLat float
degrees latitude of the beginning of the holding
segment
11 startLon float
degrees longitude of the beginning of the holding
segment
12 endLat float
degrees latitude of the end of the holding segment
13 endLon float
degrees longitude of the end of the holding
segment
14 confidence float
0.00 - 1.00 higher values represent more confidence
that the segment is a hold
15 intialSeg integer
segmentID of first lateral segment
included in the holding segment
16 finalSeg integer
segmentID of last lateral segment
included in the holding segment
17 initSegTrunc boolean
true if only part of initSeg is included in
the holding segment
18 finalSegTrunc boolean
true if only part of initSeg is included in
the holding segment
110
Track Data
Track data records are unique on the threaded track ID and time. The number of track
records must be equivalent across all data types. If there are any track records for a given
data type that do not have values, track records with null fields are populated.
Threaded Track
Column Name Type Format Units / Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key
to threaded track
2 dataType string
"T" unique data type identifier
3 dataVersion string "1.1","1.1.1","1.1.2" data version
(<schema>..<update>)
4 time long
milliseconds unix time, exact match for track
point
5 latitude num
degrees synthetic position
6 longitude num
degrees synthetic position
7 pressureAltitude num
feet synthetic position
8 rawModeCAltitude num
100s of feet
raw mode C altitude report of
most dominant contributing sensor
using nearest neighbor temporal
interpolation
9 alongTrackDistance num
NM
derived along track distance
(cumulative along track distance
normalized to first track point)
10 groundSpeed num
knots derived ground speed
11 trackHeading num
degrees derived track bearing (true)
12 trackCurvature num
1 / NM derived track curvature (inverse
radius of curvature)
13 groundAcceleration num
knots / min rate of change of ground speed
with respect to time
14 climbGradient num
feet / NM derived climb gradient
15 crossTrackSmoothing num
NM
estimate of cross track RMS error
from the source data that was
smoothed out in the synthetic
trajectory
16 alongTrackSmoothing num
NM
estimate of along track RMS error
from the source data that was
smoothed out in the synthetic
111
trajectory
17 verticalTrackSmoothing num
feet
estimate of vertical track RMS
error from the source data that was
smoothed out in the synthetic
trajectory
18 lateralTrackBias num
NM
estimate of lateral bias error
between the source data and the
synthetic track (zero when only
one contributing sensor)
19 verticalTrackBias num
feet
estimate of vertical bias error
between the source data and the
synthetic track (zero when only
one contributing sensor)
20 activeSensors
FAC:SEN|...
List of sensors that contributed to
the current synthetic track point.
Sensors are delimited by pipes and
labeled by a facility and sensor (or
just facility when appropriate) and
are given in order of first
occurrence.
112
Phases Track
Column Name Type Format Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string
"P" unique data type identifier
3 dataVersion string "3.y.z" data version (<schema>..<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 time long
milliseconds unix time, exact match for track point
6 phase string
ground (G), takeoff roll (T), ascent (A),
cruise (C), descent (D), landing roll (L)
113
Terrain Track
Column Name Type Format Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "Z" unique data type identifier
3 dataVersion string "3.y.z" data version (<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 time long milliseconds unix time, exact match for track point
6 terrainElevation float %0.0f
7 heightAboveTerrain float %0.0f
8 terrainSource string
114
Rapid Update Cycle (RUC) Track
Column Name Type Format Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "R" unique data type identifier
3 dataVersion string "3.y.z" data version
(<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 time long milliseconds unix time, exact match for track point
6 trueAirspeed float 1 knots
7 indicatedAirspeed float 1
8 machNumber float 4
9 geometricAltitude float 0 feet
10 windMagnitude float 1
11 windDirection float 2
relative to true north
12 staticTemperature float 1
13 verticalVelocityPressure float 2
14 humidityMixingRatio float 2
relative for isobaric model, absolute
for hybrid model
15 cloudMixingRatio float 2
boolean for isobaric model, float for
hybrid model
16 rainMixingRatio float 2
17 snowMixingRatio float 2
18 iceMixingRatio float 2
19 turbulentKineticEnergy float 2
20 asosSNR float 2
21 rucInterpolationTime int
absolute time difference from threaded
track record to closest RUC data point
22 rucLaterallyExtrapolated boolean
threaded track record is outside the ruc
lateral grid
23 rucVerticallyExtrapolated boolean
threaded track record is outside the
vertical grid
24 rucFile1 string
ruc data hour for left side interpolant.
<m> is the ruc model and can take
values of "I" (isobaric) and "H"
(hybrid); <f> is the forecast hour.
115
Fuel Track
Column Name Type Format Units /
Value Description
1 threadedTrackID long
<yyyymmddxxxxxx> primary key to
threaded track
2 dataType string "B" unique data type identifier
3 dataVersion string
"3.y.z" data version (<schema>.<code>.<update>)
4 threadedTrackVersion string
data version of threaded track (pair to
column 1)
5 time long
milliseconds unix time, exact match for track point
6 instantaneousFuelFlow float kg/min
7 accumulatedFuelMass float
kg total mass of fuel burn up to this point
8 aircraftMass float
kg total mass of aircraft starting from BADA
nominal
9 errorTerm1 float
10 errorTerm2 float
11 errorTerm3 float
12 BADASoftwareVersion string