A DATA-DRIVEN SUPPORT SYSTEM FOR AIRCRAFT TRAJECTORY

A DATA-DRIVEN SUPPORT SYSTEM FOR AIRCRAFT TRAJECTORY PREDICTION IN THE NATIONAL AIRSPACE SYSTEM

By Travis John Gonzalez

B.S. in Aerospace Engineering, December 2008,

Embry-Riddle Aeronautical University

M.S. in Systems Engineering, May 2011, George Washington University

Dissertation submitted to

The Faculty of

The School of Engineering and Applied

Science of the George Washington University

In partial fulfillment of the

requirements for the degree of Doctor

of Philosophy

May 17, 2015

Dissertation directed by

Timothy Eveleigh

Professor of Engineering Management and Systems Engineering

ii

The School of Engineering and Applied Science of The George Washington University

certifies that Travis John Gonzalez has passed the Final Examination for the Degree of

Doctor of Philosophy as of November 12, 2014. This is the final and approved form of

the dissertation.

A DATA-DRIVEN SUPPORT SYSTEM

FOR AIRCRAFT TRAJECTORY PREDICTION IN THE

NATIONAL AIRSPACE SYSTEM

Travis John Gonzalez

Dissertation Research Committee:

Timothy Eveleigh, Professor of Engineering Management and Systems Engineering,

Dissertation Director

Thomas Holzer, Professor of Engineering Management and Systems Engineering,

Committee Member

Thomas Mazzuchi, Professor of Engineering Management and Systems Engineering and

Decision Sciences, Committee Member

Edward Murphree, Professor of Engineering Management and Systems Engineering,

Committee Member

Shahram Sarkani, Professor of Engineering Management and Systems Engineering,

Committee Member

iii

Dedication

To my mother, father, and grandparents

For without their enduring support and investment in me,

This accomplishment would not be possible

iv

Acknowledgements

I would first like to thank my dissertation advisors Timothy Eveleigh D.Sc.,

Thomas Holzer D.Sc., and Shahryar Sarkani Ph.D. Throughout the dissertation process,

they have given me the neccesary input and aided in properly managing research that

stretches across multiple technical domains. At times, when I was at a crossroads with

research direction, they worked with me and redirected me towards a positive trajectory

for success.

I would also like to thank the MITRE Corporation for providing the available data

and distributed computing platform for without these resources, this research would not

be possible. Thanks are also necessary for my colleagues, and friends in the aviation

industry (i.e. pilots, air traffic controllers, etc.) who provided valuable operational

expertise for model development. I am also grateful for the support and friendship of the

entire GWU Ph.D. cohort that worked alongside me the past few years.

Last but certaintly not least, I would like to thank my fiance, Melissa, for

supporting me throughout the entire process. Her help with structuring my time

management while I worked full-time and pursued this doctoral degree is one of the

reasons I have made it to the finish line!

v

Abstract of Dissertation

A DATA-DRIVEN SUPPORT SYSTEM FOR AIRCRAFT TRAJECTORY

PREDICTION IN THE NATIONAL AIRSPACE SYSTEM

Although a recent audit report from the U.S. Department of Transportation shows

declining flight delays over the last decade, scheduled U.S, passenger airlines still

accrued 92 million system delay minutes that were estimated to result in $7.2 billion in

direct aircraft operating costs in 2012. To address these flight delays, the Federal

Aviation Administration (FAA) is implementing the Next Generation Air Transportation

System (NextGen) which aims to transform air traffic operations to meet future growth.

A core component of NextGen is Trajectory Based Operations (TBO), with goals that

include improving throughput, flight efficiency, flight times, and schedule predictability

through better prediction and coordination of aircraft trajectories in the National Airspace

System (NAS). In this research, a novel approach is presented by constructing a Dynamic

Bayesian Network (DBN) to accurately quantify delay uncertainty for airport origin-

destination (OD) pairs. Since the size of the conditional probability tables (CPTs) grows

exponentially as the number of variables increase in the DBN, parameter learning was

developed within the Hadoop MapReduce distributed computing framework. Hadoop

aids in the mitigation of scaling concerns which significantly reduce the computational

time necessary for air traffic decision support. Experiments are performed using a fused

historical aircraft radar dataset that improves on current data limitations to dynamically

predict the probability of a delay and its causal factor(s) for the strategic prediction

horizon. The predictive performance of the model is evaluated by focusing on major OD

pairs in the NAS, and the results show flight delay time was predicted accurately

vi

approximately 92% of the time for the two hour prediction horizon. Furthermore, the

results from the delay model are integrated into a developed real-time trajectory predictor

that recommends which route an aircraft should fly given both historical and real-time

flight delay information combined with data related to the aircraft and the external

environment. This research is the first known attempt that combines elements of systems

engineering (SE), operations research (OR), and distributed computing concepts to derive

a data-driven decision support system for air traffic decision makers under operational

uncertainty.

vii

Table of Contents

Dedication .................................................................................................................. iii

Acknowledgements ................................................................................................... iv

Abstract of Dissertation ............................................................................................. v

Table of Contents ..................................................................................................... vii

List of Figures ............................................................................................................ x

List of Tables ............................................................................................................. xi

List of Acronyms...................................................................................................... 12

Terms and Definitions ............................................................................................. 13

Chapter 1: Introduction ............................................................................................ 1

1.1 Overview ......................................................................................................... 1

1.2 NextGen Explained ........................................................................................ 2

1.3 Trajectory Predictor Technology ................................................................. 4

1.4 Statement of the Problem .............................................................................. 8

1.5 Research Importance and Objectives .......................................................... 12

1.6 Research Scope ............................................................................................. 13

1.7 Dissertation Organization ........................................................................... 13

Chapter 2: Literature Review ................................................................................. 15

2.1 Overview ....................................................................................................... 15

2.2 Trajectory Based Operations Research ..................................................... 15

2.3 Flight Delay Research ................................................................................. 18

2.4 Literature Summary ................................................................................... 23

viii

Chapter 3: Development of Flight Delay Model ................................................... 25

3.1 Overview ....................................................................................................... 25

3.2 Background Knowledge .............................................................................. 25

3.2.1 Actor Interactions in the National Airspace System ................................ 26

3.2.2 Delay Prediction Horizons and Classification Thresholds ...................... 27

3.2.3 Dynamic Bayesian Networks ...................................................................... 28

3.3 Problem Domain ........................................................................................... 31

3.4 Data Processing ............................................................................................ 35

3.4.1 Data Segmentation ....................................................................................... 39

3.4.2 Segment Metadata ........................................................................................ 41

3.4.3 Data Fusion Process ..................................................................................... 41

3.4.4 Track Smoothing and Filtering ................................................................... 44

3.4.5 Flight Metadata ............................................................................................ 44

3.4.6 Data Quality .................................................................................................. 45

3.5 DBN Formalism Extensions ........................................................................ 46

3.6 DBN Structure Derivation .......................................................................... 48

3.7 DBN Parameter Learning ........................................................................... 50

3.7.1 MapReduce for Massive Scale Distributed Computations ...................... 51

3.7.2 ALEM on MapReduce (ALEMMR) .......................................................... 52

Chapter 4: Empirical Experiments ........................................................................ 55

4.1 Experimental Design ................................................................................... 55

4.2 Empirical Experiments Overview .............................................................. 56

4.2.1 Experiment 1: ALEMMR Flight Delay Application ................................ 57

ix

4.2.2 Experiment 2: Varying the Measurement Rate ........................................ 59

4.2.3 Experiment 3: Causal Delay Prediction Results ....................................... 61

4.2.3 Experiment 4: Trajectory Route Selection Decision Support System .... 62

4.2.4 Validation & Insight .................................................................................... 66

Chapter 5: Conclusions & Future Work ............................................................... 68

5.1 Conclusions .................................................................................................. 68

5.2 Recommendations for Future Work .......................................................... 68

5.2.1 Air Traffic DBN Application ...................................................................... 68

Appendix A: Data-Fused Algorithms .................................................................... 74

Appendix B: Data Schema ...................................................................................... 92

x

List of Figures

Figure 1-1: NextGen 2025 Flight Profile [1] ......................................................................3 Figure 1-2: Trajectory Predictor Technology- Process Flow [3] ........................................5 Figure 1-3: Trajectory Predictor Technology- Data Flow [3] .............................................8

Figure 1-4: Research Environment Complexity ...............................................................10 Figure 1- 5: Predictability challenges for airport and enroute delay factors.. ..................11 Figure 2-1: Benefits of DBN for Decision Support .........................................................23

Figure 3-1: Background and overview of steps for modeling delay with a DBN ............25 Figure 3-2: An abstract representation of decision making ..............................................27

Figure 3-3: Abstract example of DBN that represents the influences between NAS …..31 Figure 3-4: Problem domain system hierarchy .................................................................32 Figure 3-5: Threaded Track gate-to-gate flight data sources used for each phase ...........37

Figure 3-6: Data Processing Steps. ...................................................................................38 Figure 3-7: Data Fusion Workflow ...................................................................................45 Figure 3- 8: The five components of the DBN extended formalism ................................47

Figure 3-9: Extended formalism for a second-order DBN. ..............................................49 Figure 3- 10: ALEM application with populations of DBNs on MapReduce. .................54

Figure 4- 1: Track dispersion using a subset of threaded track historical radar data. ......56

Figure 4-2: Varying the size of training samples for learning the DBN. ..........................59 Figure 4-3: Data-Driven Decision Support Architecture ..................................................64

Figure 4-4: NAS route selection based on delay prediction time .....................................66

Figure A-1: Threaded Track Data Fused Algorithm Oozie Workflow. ...........................75

xi

List of Tables

Table 3- 1: DBN Fixed and Temporal Categories of Input Variables ..............................34

Table 4- 1: The Classification Threshold for Flight Delay Using a Confusion Matrix ....60 Table 4- 2: Dynamic Bayesian Network & Static Bayesian Network Results .................62

xii

List of Acronyms

ATC Air Traffic Control

ATDM Air Traffic Decision-Maker

BN Bayesian Network

DBN Dynamic Bayesian Network

FAA Federal Aviation Administration

FC Flight Crew

FP Flight Plan

HMM Hidden Markov Model

MC Markov Chain

MR MapReduce

NAS National Airspace System

NextGen Next Generation Air Transportation System

OR Operations Research

PBM Process-Based Management Charts

SE Systems Engineering

TBO Trajectory-Based Operations

TFM Traffic Flow Management/Manager

TFMI Traffic Flow Management Initiative

xiii

Terms and Definitions

The following, are key terms and definitions that are used throughout the study:

Component – Composed of multiple parts; a clearly identified part of the product

being designed or produced.

Element- An integrated set of components that comprise a defined part of a

subsystem.

Flight Plan – A subset of the flight object information used for flight planning

prior to departure that carries basic information about the flight and route to be

followed.

Part- The lowest levels of separately identifiable items within a system—are not

normally subject to disassembly without destruction or impairment of designed

use.

Program- projects of all sizes and complexity, ranging from a System to its

individual parts.

System- An integrated set of constituent parts that are combined in an operational

or support environment to accomplish a defined objective. These parts include

people, hardware, software, firmware, information, procedures, facilities,

services, and other support facets.

Subsystem- A system in and of itself (reference the system definition) contained

within a higher level system. The functionality of a subsystem contributes to the

overall functionality of the higher level system. The scope of a subsystem’s

xiv

functionality is less than the scope of functionality contained in the higher level

system.

Systems Engineering (SE) – a discipline that concentrates on the design and

application of the whole (system) as distinct from the parts. It involves looking at

a problem in its entirety, taking into account all the facets and all the variables and

relating the social to the technical aspects.

Traffic Flow Management Initiative (TFMI) – techniques used to manage

demand with capacity in the NAS.

Trajectory-Based Operations (TBO) – NextGen Portfolio of research that focus

on improving throughput, flight efficiency, flight times, and schedule

predictability through better prediction and coordination of aircraft 4-dimensional

trajectories (4DT) which consider lateral, longitudinal, time and space dimensions

1

Chapter 1: Introduction

1.1 Overview

Trajectory Based Operations (TBO) is the NextGen concept of improving

throughput, flight efficiency, flight times, and schedule predictability through better

prediction and coordination of aircraft 4-dimensional trajectories (4DT) which consider

lateral, longitudinal, time and space dimensions [1]. TBO uses the 4DT to both

strategically manage and tactically control surface and airborne operations. Implementing

TBO effectively requires understanding the interactions and trade-offs between proposed

TBO decisions, and sources of uncertainty. For TBO and the regional and local NAS air

traffic controllers it would serve, understanding system impacts and relationships have

proved difficult for analysts and decision-makers to visualize. The mathematics and

concepts of stochastic optimal control are suited to detailed analyses, but they are poorly

suited to providing accessible intuition and explanations to identify TBO characteristics

and trade-offs. Currently no analytical framework for an integrated understanding and

measurement of TBO uncertainty for either the strategic (2-15 hours) or tactical (less than

2 hours) prediction horizon exists- thus stems the importance and high level objective of

this study.

A strategic management decision in TBO is to predict the delay time of aircraft that

are flying from an origin to destination (city pair) airport under operational and

environmental uncertainty. This study achieves this task by developing a dynamic

Bayesian network (DBN) model that infers delay time and delay causal variables which

impact flight time based on a fused set of historical radar track data measurements for

2

given city pairs. Furthermore, this study successfully prioritizes aircraft routes in order to

aid air traffic decision makers in recommending the best route to take in regard to

minimizing delay based on historical and real-time data. This is the first step towards an

application of a data-driven DBN in a dynamic system that can help govern air traffic

decision makers (ATDMs) implementation of traffic management initiatives, air traffic

directives, and policies that are currently based on subjective measures. The end state of

this ongoing research provides a means of decision support in the presence of uncertainty

for air traffic operational decisions- scaling from a local focus (one airport); to a NAS

system-wide focus.

1.2 NextGen Explained

The vision of the NextGen is to build on near- and mid-term (through 2018) systems

developed by the FAA and other government partners, to improve performance,

prediction, and capacity of the National Airspace System (NAS) necessary to meet 2025

requirements [1]. More specifically, NextGen will allow aircraft to safely fly in closer

proximity on more direct routes, reducing delays and providing benefits for the

environment through reductions in carbon emissions, fuel consumption and noise.

Implementation of NextGen will be accomplished through a series of Operations

Improvement (OI) Increments that provide individual benefits and combine to provide a

paradigm change in the way the NAS operates. The OI Increments are often

interchangeable with the term “capabilities.” Related OI Increments are managed in

seven implementation portfolios [2]. The FAA portfolios include:

3

1. Trajectory Based Operations (TBO)

2. High Density Airports (HD)

3. Flexible Terminals and Airports (FLEX)

4. Collaborative Air Traffic Management (CATM)

5. Reduce Weather Impact (RWI)

6. Safety, Security and Environment (SSE)

7. Transform Facilities (FAC)

The NAS Enterprise Architecture establishes the foundation which evolution of the

NAS can be explicitly understood and modeled. It helps to provide a framework for

managing change in the NAS by providing a unifying approach and common language.

OIs represent distinct functional improvements to the NAS that provide direct benefits to

the user community. Figure 1-1, illustrates how the NextGen concept can create

improved capabilities for each flight phase in a typical flight profile.

Figure 1-1: NextGen 2025 Flight Profile [1]

4

Research activities on NextGen technology development, integration,

implementation and safety must be accomplished to achieve the benefits mentioned

above. The interdependencies that exist between TBO implementation portfolio and

flight delay prediction warrant analysis, not only at the local level, but at the system level

which prior research fails to efficiently achieve from a computational and accuracy

perspective [2]. Therefore the model should have an ability to accurately predict not only

flight delays and causal variables, but also prioritize aircraft routes to and from airports in

order to aid air traffic decision makers in recommending the best route to take in regard

to minimizing delay based on historical and real-time data.

1.3 Trajectory Predictor Technology

The FAA [3] describes Trajectory Predictor Technology (TPT) as the predicted path

an aircraft will follow through airspace. Aircraft trajectory can be described

mathematically by a time-ordered set of aircraft state vectors. This computation is

performed based on input data comprising of the current state and future intent of the

aircraft. The TPT uses models for aircraft performance, meteorological conditions, and

airspace adaptation data to perform this computation [3].

TPT can be incorporated into a client application to support various applications for

an air traffic based decision support system. These decision support systems will aid in

providing data, advisories, and recommended resolutions to ATM system. A diagram of

the typical process flow within a common TPT structure is described in

EuroControl/FAA Action Plan 16, and is shown in Figure 1-2. The TPT client application

receives data inputs from adaptation, weather, and aircraft models. The TP application

5

consists of the following four component processes: Preparation, Computation, Update,

and Export.

Figure 1-2: Trajectory Predictor Technology- Process Flow [3]

1.3.1 Trajectory Predictor Processes

The preparation process in [3], constructs initial conditions and a Behavior Model

that outputs a list of aircraft movements. Specifically, the Behavior Model details how an

aircraft will meet trajectory constraints within the user-specified criteria. As described in

[3], the following are three critical processes within the preparation process that aid in the

development of a simulated aircraft trajectory:

State Processing: The State Processing generates the Initial Conditions for

trajectory generation.

Flight Intent Processing: Flight intent processing operates on a Behavior Model,

or if the Behavior Model is not defined, it will create one from the Initial

6

Conditions and Flight Intent. The Flight Intent processing evaluates the Initial

Aircraft State, both laterally and vertically, against the set of constraints defined

in the Flight Intent. The output of the Flight Intent is comprised of the Initial

Conditions and the complete set of constraints that must adhered to during

trajectory generation.

Behavior Model Generation: The Behavior Model consists of ordered lists of

maneuvers that the aircraft will perform to meet the trajectory constraints. The

Behavior Model is internal to the TP and is built from the Initial Condition and

Flight Intent information.

The computational process calculates the predicted trajectory based on the predefined

Behavior Model. The update process monitors the conformance of the computed

predicted trajectory. The update process checks to see if the computed trajectory is in

conformance with the trajectory constraints specified in the Input Flight Intent. When the

trajectory is out of conformance, the Update process will re-compute the trajectory using

the updated Behavior Model and/or Flight Intent data.

Finally, the export process distributes the TP results to client processes. These client

processes will receive predicted trajectory data, error messages associated with the data,

and an updated Behavior Model when the trajectory does not match all the predefined

constraints. The export process sends its results to the output clients. These results

include the current predicted trajectory, an updated Behavior Model, and any relevant

error messages.

7

1.3.2 Trajectory Predictor Data Flow

Figure 1-3 depicts a diagram of a typical data flow a TP deployment starting from the

client inputs to the predicted trajectory (client output). Client inputs for a TP include:

Aircraft State: The Initial Aircraft State represents the aircraft state data at the

start of the trajectory computation cycle and is composed of, but not limited to,

the 3D aircraft position and associated time.

Flight Intent: Flight Intent is the element of the Flight Object that contains the

constraints and preferences applicable to the flight. It describes aircraft, airport,

and airspace constraints and operator preferences.

Behavior Model: The Behavior Model contains a list of maneuvers that describes

how the aircraft intends to satisfy the trajectory constraints and user preferences.

Processing Strategies and Configuration Control: The Processing Strategies

specifies how the predictor will conform to the constraints and preferences

identified in the Flight Intent. The Configuration Control defines processing

characteristics such as aircraft performance models and the functionality of the

integration and export functions.

The research and methods proposed in this dissertation focused on enhancing the

flight intent element within the trajectory preparation process that influences the

behavior model and provides the TP with the intended maneuvers that in turn creates

the predicted trajectory. In addition, this research focused on choosing the correct

method that would provide the functionality to iteratively learn aircraft intent and

behavior over time. The specific application for this research both prioritizes aircraft

8

routes and predicts both the flight delay time and causal reasons in order to aid air

traffic decision makers in recommending the best route to take in regard to

minimizing delay based on historical and real-time data.

Figure 1-3: Trajectory Predictor Technology- Data Flow [3]

1.4 Statement of the Problem

A challenge for flight delay prediction is the difficulty of transitioning research

concepts into systems and operations. One important aspect of this challenge ties to the

9

range of operational variations for which we develop our concepts and systems. Early

research concepts are conceived with too few of the real-world variations taken into

account mostly due to either limitations of computational power or operational

knowledge [1]. In a program needing to make system trade-offs for development,

promised benefits must reflect a broader range of routine and reasonable behaviors than

in research – but such trade-offs can be quite difficult to quantify, and it is difficult to

reflect the full range of operational events.

In contrast, the operational world embodies everything that the real-world throws at

us. This is where complexity and unpredictability conspire to demonstrate how poorly

our concepts and systems and procedures can fare when confronted with things we didn’t

expect in research or in development. The operations world is not just reasonable and

routine: it is the entire gamut of everything that happens whether we’ve anticipated it or

not.

A big distinction between these worlds and one that often impacts modernization and

transition to concepts like flight delay prediction is how predictable the world is that our

concepts or systems or operations have to deal with. Limits to predictability and

challenges to transition are both addressed if we focus more closely on uncertainties in

operations by developing a framework that enhances understanding of the impacts of

uncertainty and the quantitative relationships between uncertainty factors. Figure 1-4

depicts a graphical representation detailing the challenge researchers typically endure

when attempting to model a traditionally stochastic environment.

10

Figure 1-4: Research Environment Complexity

A second challenge involves having better quality data from historical sources about

the aircraft and its environment, and using that information to improve ATDMs

prediction at a more granular level that recommends which route an aircraft should fly

given both historical and real-time flight delay information. Researchers can factor in the

type of aircraft, the lateral path, and make pretty good predictions; however, there are

many factors that might happen during a flight that are not very predictable along with

data quality issues along the way- and these represent some of the challenges.

11

Figure 1- 5: Predictability challenges for airport and en-route delay factors. Not all delay

factors are included.

As shown in [4], there are a number of causal delay factors that interfere with

flight predictability. Some of these can be addressed through better standards or shared

planning, and others can be predicted to some degree and compensated for. Others,

though, are simply unknowable until they occur. Things like a Flight Management

System (FMS) issue that requires the aircraft to fly slower than expected, or an

unpredicted thunderstorm en-route, or a traffic flow management initiative (TFMI)

restriction (See Terms and Definitions) that is issued at the last minute due to a

temporarily blocked runway at the destination airport. Figure 1- 5 depicts a visual

representation of some of the more common challenges in flight delay prediction. In

truth, there are a nearly infinite number of factors that might happen that are not very

predictable – and these represent challenges to the development of any air traffic-based

model [5].

As a result of these challenges, the research performed in this dissertation

combined the best practices in trajectory and flight prediction to create a new data-driven

12

decision support tool. This tool combines more data (both historical and real-time) about

the aircraft’s behavior, the aircraft operator’s intent, and the external environment than

preceding researchers and provides decision support applications that can be used by

succeeding researchers to build off of.

1.5 Research Importance and Objectives

The objectives of this research focused on developing the big data-driven DBN

development to represent and predict flight delay and the associated causal and temporal

nature of delay uncertainty based off a novel fused historical dataset. This research can be

broken down into the following sub-objectives:

1. Develop a Dynamic Bayesian network (DBN) structure for the air traffic domain

that can continuously be developed to answer complex operational questions.

2. Learn DBN parameters from a fused set of aviation data on a big data parallel

computing platform that could not be computationally achieved using

conventional approaches.

3. Determine the optimal prediction horizon and classification threshold (See

Experiment 2: Varying the Measurement Rate) for the flight delay prediction

model.

4. Provide accurate prediction results for both delay and delay causal variables

greater than 80%1 of the time.

5. Integrate results of the flight delay model (if successful) into a developed real-

1 80% prediction accuracy threshold was taken from a general interpretation of the FAA’s

model/simulation standards and aligns with the 95th percentile of accuracy results from related prior art.

13

time trajectory predictor that recommends which route an aircraft should fly given

both historical and real-time flight delay information combined with data related

to the aircraft and the external environment (“other data” discussed extensively in

Section 3.4).

1.6 Research Scope

Because air traffic research (specifically flight delay) integrated with big data

technologies (such as Hadoop) are still in its early stages, there are not many models of

this nature being proposed currently. For that reason, this study does not try to directly

compare the accuracy of the proposed model against other existing models. Rather, the

scope of this research centers on creating a new approach to scale probabilistic graphical

models (specifically) to a computational scale that has never been performed on based on

the author’s literature review.

1.7 Dissertation Organization

This dissertation is organized as follows. Chapter 2 is the literature review on relevant

prior research to set the stage for the research effort and identify why DBNs were ultimately chosen

for this research. This chapter also takes a granular look at both trajectory and flight delay

prediction in independent sub-sections in order to portray these as two different topics (as

current researchers typically do) which this research aimed to bring together. Chapter 3

covers necessary background knowledge, the development of the big data driven DBN

methodology and stated sub-objectives. Chapter 4 validates the model based on empirical

experiments focused on both prediction accuracy and the intelligibility of the prediction

14

for flight delays, associated delay causal variables, and route trajectory. Chapter 5

provides conclusions and further research recommendations.

15

Chapter 2: Literature Review

2.1 Overview

Numerous journal articles have been published on methods for trajectory and flight

delay prediction of uncertainty in the NAS. In this chapter, some of the key studies related

to the author’s research are highlighted. All of the researchers focused on using some

type of mathematical or statistical model in order to predict aircraft trajectory and

environmental factors in a particular phase of flight. Some of the researchers attempted to

gain insight into flight delay prediction using the computed trajectory prediction.

2.2 Trajectory Based Operations Research

2.2.1 Mathematical Models in Trajectory Prediction

As discussed in Section 1.3, there are four components for the trajectory prediction

process. Of the four, this section will focus on the computation subfunction. The

preparation process brings together all the data necessary for the execution of the

trajectory prediction. Further, it is this process that is responsible for the translation of the

intent script (which this research develops) into the mathematical code used to perform

the computations. The update process ensures compliance with the aircraft intent or flight

plan and flags potential loss of spatial/temporal separation (for example) with other

trajectories. It is within the scope of this process to alter the intent script and behavior in

an attempt to regain airspace separation compliance. The export process returns the

resulting trajectory to the ground-based computer hosting the flight object. Because of the

diversity of the modeling equations, different state variables will be exported to update

16

the flight object. It should be noted that the abstraction dictates that, at a minimum, the

trajectory should be comprised of four dimensions (lateral, longitudinal, time and space

dimensions) and the geodetic coordinates of the aircraft for the duration of the prediction

time frame. It will be seen that only one of the many papers referenced complies with this

requirement. Furthermore, some of the papers do not operate in full three dimensional

spaces.

The mathematical models under study fall into one of the following classifications:

Point-Mass models: The majority of the identified research [6-17] used point mass flight

estimation models. This feature manifests the tendency toward more realistic modeling of

flight, but lacks the complexity of the kinetic model in that rotational moments are

ignored. The range of complexity varied greatly within this subset of papers. Point-mass

models signify that aerodynamic equations are in play with the above notable exception.

Kinematic Models: In these models [18-20], only position and time rate of changes are

modeled. The model is integrated forward with respect to time, acceleration to velocity,

etc.

Kinetic Models: One paper [21] in the set included moments and, therefore is classified

as full, kinetic models. Although this model represents the ultimate complexity of this

subset of documents, it is listed second to point-mass models due to the overwhelming

number of papers that used point-mass models.

17

2.2.2 TBO Uncertainty Analysis

Uncertainty in aircraft trajectory prediction has been studied in Federal Aviation

Administration (FAA)/Eurocontrol Action Plan 16 [3], which describes and quantifies

major sources of variation in end-to-end timing including departure timing, wind-field

prediction, flight intent, and flight parameters such as aircraft weight. Gaydos [4]

examined statistical uncertainty at different look-ahead times, and found that uncertainty

grew more quickly or more slowly at different points along similar trajectories in en-

route. Earlier work by Tino, Ren and Clarke [22] explains some of this spatial variability

as wind behavior, which also creates increasing uncertainty in timing at longer look-

ahead times. Mondoloni and Liang [23] described how variations due to wind observed

along a trajectory can be used to reduce uncertainty and improve predictability and

timing control during the remainder of the trajectory. However, as Rentas, Green, and

Cate have proven [24], characterizing NextGen TBO uncertainty impacts is far from

mature and more research into the causal and temporal relationships of trajectory

predictors is warranted.

2.2.3 TBO Summary

To recap, the research identified in this literature review focused first on the

mathematical models used to develop an aircraft trajectory and ensuing applications of

the developed trajectory with regards to uncertainty. In this research, the author has

chosen a point mass model for the computation subfunction (See Figure 1-3) based on

results from the aforementioned research. Refer to experiment four (Section 4.2.4), for a

more in depth description on how this comes together with the rest of the research.

18

2.3 Flight Delay Research

2.3.1 Statistical Methods

Historical approaches to learn and predict flight time delay and the associated causal

factors of delay can be categorized based on their use of either statistical linear and

nonlinear methods . The first approach in [25] and [26], use linear regression methods to

explain the influence of causal factors of delay. This approach does provide statistical

accuracy; however it has shortcomings, which include: 1) failure to include relevant

operational and environmental factors, 2) incorrect data independence assumptions, and

3) sensitivity to outliers which together- minimize its predictive power.

Vigneau [27] studied both delay and delay propagation from flight segment to

segment using conventional regression techniques. In Vigneau’s model, departure delay

depended on arrival delay from the previous segment, which then depended on the

departure delay from the previous segment. Time dimensions, airport capacity and load-

based factors were significant factors that were identified as influencing delay. The

model, however, was not applicable in the US because it treats bad weather as an

exception. In Europe, only 1~4% of delay can be attributed to bad weather, whereas in

the United States 70~75% of delay is due to bad weather [28].

2.3.2 Neural Networks

A neural network is typically referred to as a “black box” model that can be used to

predict departure delay from a set of input factors. The parameters of a neural network

model are not easily interpretable, and thus it is difficult to use a neural network model to

gain a comprehensible understanding of how the factors interact to cause delay. Dai and

19

Liou [29] developed an artificial neural network model to estimate individual flight

departure delay for the application of real time air traffic flow management. The network

incorporated 70 nodes in the hidden layer and was shown to outperform linear and non-

linear regression methods with their chosen dataset. The primary factors influencing

delay in this study were airline, aircraft type, time of day, day of week, route, flight

sequence and traffic flow.

Jehlen et al [30] developed a neural network model for predicting weather-related

aircraft delays and cancellations at the national, regional, and airport levels. The network

proved to slightly improve on traditional linear regression methods for predicting

airspace metrics such as total aggregate delay, arrival delay, airborne delay, and flight

cancellations at different scales; however, the lack of generalization that a neural network

provides to understand causal delay interactions for wide-application stakeholder use is

still absent.

2.3.3 Hidden Markov Models

HMM models a first-order Markov process where the observation state is a

probabilistic function of an underlying stochastic process that produces the sequence of

observations. The underlying stochastic process cannot be observed directly, it is hidden.

Both the hidden and observation states are modeled by discrete random variables as

shown in Neogi’s work where he and his colleagues used HMMs to detect mode changes

in aircraft flight data for conflict resolution [31].

The HMM formalism first appeared in several statistical papers in the mid-1960s,

but it took over ten years before its utility was recognized. Initially, the use of HMMs

20

was a great success, especially in the fields of automatic speech recognition (ASR) and

bio-sequence analysis. Because of its success, the use of HMMs in ASR is still dominant

nowadays, despite its lack of consistent performance [32].

One of the main problems of HMMs is the fact that the hidden state is represented

by a single discrete random variable. DBNs are able to break down the state of a complex

system into its constituent variables, taking advantage of the sparseness in the temporal

probability model. This can result in exponentially fewer parameters. The effect is that

using a DBN can lead to fewer space requirements for the model, less expensive

inference and easier learning.

2.3.4 Kalman Filters

A KFM is a HMM with conditional linear Gaussian distributions [33]. It is

generally used to solve uncertainty in linear dynamic systems. The KFM formalism first

appeared in papers in the 1960s [34], and was successfully used for the first time in

NASA’s Apollo program. Nowadays, it is still used in a wide range of applications. The

KFM formalism assumes the dynamic system is jointly Gaussian. This means the belief

state must be unimodal, which is inappropriate for many problems. The main advantage

of using a DBN over a KFM is that the DBN can use arbitrary probability distributions

instead of a single multivariate Gaussian distribution.

In application, Reference [12] reported on real data testing of a real-time freeway

traffic state estimator, with a particular focus on its adaptive capabilities. The pursued

method to the real-time adaptive estimation of the complete traffic state in freeway

stretches or networks is based on stochastic macroscopic traffic flow modeling and an

21

extended Kalman filter. Advantages are demonstrated via suitable real data testing. The

achieved testing results are both acceptable and promising for succeeding applications

but the author specifically mentions the lack of generalizability constraints when working

with Kalman filters- which DBNs compensate for. Other research efforts [36] and [37]

use Kalman filters to estimate time of arrival based on a trajectory prediction technology.

2.3.5 Bayesian Networks

Bayesian networks have been applied to various scenarios within the air traffic

domain because of their ability to provide approximate models for complex, and/or

poorly understood problems. Pepper, Mills, and Wolcik [38] presented a method of

accounting for uncertain weather information at the time of traffic flow management

(TFM) decisions, based on Bayesian decision networks. They found that the data from

past TFM events was not sufficient to distinguish between strategic TFM decisions, in

terms of metrics based on overall delays, cancellations, diversions, and departure

backlogs. However, the results did show that useful information can be extracted from

data on past TFM events by focusing on specific elements of the strategic TFM process

rather than the entire process comprehensively. From this research, it was imperative that

both tactical and strategic levels of TFM were considered in the proposed model.

Ning et al [39] used Bayesian networks to estimate delay with a focus on

investigating and quantifying how flight delays from a single airport propagate to impact

other airports. Specifically, their methodology combined multiple individual-airport

Bayesian network models into a system-level model capable of representing interactions

between airports. Their study demonstrated that integrating human judgment with

22

statistical analysis in structure construction and parameter estimation can improve

prediction accuracy. To simplify their calculation, the model only takes into account

weather effects and flight cancellations. Their model didn’t take into account many

factors which can affect delay such as demand, en route variables, and aircraft type (to

name a few)- which are accounted for in this study.

Liu and Ma [40] developed a flight-delay and delay propagation model based on

Bayesian networks. They trained the network with real data using the Expectation

Maximization (EM) algorithm and analyzed the influences from delay under different

states.

2.3.6 Dynamic Bayesian Networks

A BN is useful for problem domains where the state of the world is static. In such a

world, every variable has a single and fixed value. Unfortunately, this assumption of a

static world is not always sufficient. A dynamic Bayesian network (DBN), which is a BN

extended with a time dimension, can be used to model dynamic systems [41]. While there

was no identified research on DBNs within the specific scope of this research, the author

chose DBNs due to their successful applications in other fields specifically in creating

prognostic decision support systems for medical diagnosis of diseases as shown by [42-

44]. These researchers provide the needed proof to show that DBNs have become the

representation of choice because they embody a good tradeoff between expressiveness

and tractability. Figure 2-1 depicts the benefits of DBNs from both a knowledge

representation and reasoning perspective. Through its structure and its parameters, a

DBN comprehensively describes what is known about a particular domain and aims to

23

establish the interactions of all the variables contained within that domain. As such, a

DBN can be referred to as a “Portable Knowledge Format” that can succinctly and

compactly communicate the state of the domain as well as its dynamics over time.

Figure 2- 1: Benefits of DBN for Decision Support2

2.4 Literature Summary

A review on both trajectory prediction and flight delay research has been explored in

this literature review regarding the prediction of flight delay in combination or

independent of trajectory based operations uncertainty. The DBN formalism in this

research is the first development in temporal reasoning under uncertainty for the defined

scope of this research. Literature has shown that DBNs can have some significant

2 Figure taken from: bayesia.com

24

advantages over the aforementioned algorithms. In terms of state-space models, HMMs

and KFMs are really limited in their expressive power. In fact, it is not even correct to

call HMMs and KFMs other techniques, because the DBN formalism can be seen as a

generalization of both HMMs and KFM and can be iteratively updated with the

incorporation of data sources and subject matter experts in the field as will be described

in succeeding sections.

25

Chapter 3: Development of Flight Delay Model

3.1 Overview

The ensuing chapter describes the steps towards the design and implementation of

an aircraft flight delay model; a DBN for aircraft flight delay prediction and the

associated causal delay factors. A general overview of the background knowledge

required and the methodology for the DBN are shown in Figure 3-1.

Figure 3-1: Background and overview of steps for modeling delay prediction with a

DBN

3.2 Background Knowledge

To understand the advantages of using DBNs as the formal basis for prediction of

flight delay, it is important to first establish a formal definition of flight delay. According

to the FAA, a flight can be considered as delayed if the operation takes place 15 minutes

after scheduled pushback [45]. In this work, the author adopts the definition of [46] [47]

and defines delay as the time difference between real and scheduled departure and arrival

time.

26

3.2.1 Actor Interactions in the National Airspace System

To develop a robust delay model, it is imperative to first understand the actors

that interact in the National Airspace System (NAS) and the time horizons in which

decisions are required. Figure 3-2 depicts an abstract view of the model interactions

between an aircraft (flight crew) and two ATDMs (traffic flow management and air

traffic control) in terms of how ATDMs make decisions about flight planning, and a

decision model of how a flight crew responds. Specifically, as the aircraft flies from one

state to the next, the factors that typically affect where the aircraft will be in the next state

are the current flight plan said aircraft is following, current weather conditions that may

affect the lateral path, and other delay risk factors occurring either en-route or at the

arriving airport as noted in Section 1.4. The goal is to predict the duration of flight time

delay as the optimal minimization factor in order to provide the basis to change the

aircraft’s route in real-time which is referred to in Figure 3-2 as “NAS Treatments.” Used

as an example, if an aircraft is flying from airport A to airport B and no risk factors are

triggered, then the aircraft should get to its destination on the same flight plan route it

departed from; however, if an aircraft is flying from airport A to airport B and weather

requires the aircraft to change its route path, this research recommends which route an

aircraft should fly given both historical and real-time flight information. This suggests

that the intent under which each actor operates must be known and the DBN model is

used to quantify this intent and continuously update it based on new information.

27

Figure 3-2: An abstract representation of decision making

The actors who affect the way a flight is planned and executed as defined in [1], are

listed below with their respective primary functions:

Flight Crew (FC): has ultimate control and responsibility for the safe operation of

the aircraft;

Air Traffic Control (ATC): provides a safe, orderly, and expeditious flow of traffic

on a first-come, first served basis- often operating in the tactical decision space (< 2

hours look-ahead time);

Traffic Flow Management (TFM): balances air traffic demand with system

capacity to ensure the maximum utilization of the National Airspace System

(NAS) often operating in the strategic decision space (2-15 hour look-ahead time).

3.2.2 Delay Prediction Horizons and Classification Thresholds

For this study, four different prediction horizons were analyzed: 2, 4, 6, and 24

28

hours for delay prediction in the strategic planning phase which, if accurate, benefits all

mentioned actors. In other words, the prediction horizon denotes the predicted delay after

2, 4, 6, and 24 hours from the initial time3. Additionally, a classification threshold

prediction mechanism was established, where the output is a binary prediction of whether

the delay is more or less than a predefined threshold. This study tests four delay

classification thresholds: 0-30, 30-60, 60-90, and > 90 minutes.

3.2.3 Dynamic Bayesian Networks

DBNs expand on conventional Bayesian networks because they offer the ability to

represent the temporal nature of a process or system well. Additionally, the DBN model

provides the ability to learn from statistical data, relevant literature, and operational

expertise, while also providing a causal approach to modeling.

According to [48], Bayesian networks represent the state of certain phenomena at an

instant in time. A Bayesian network B = (G,P) is a pair where G is an directed acyclic

graph (DAG), with nodes corresponding to a set of random variables X, and P is a joint

probability distribution (JPD) of variables in X, which factorizes to:

Where π(X) are the parents of X in G. A JPD representation by a Bayesian network

typically decreases the number of parameters that are needed for estimation and

ultimately enables efficient probabilistic inference. However, in many applications, the

goal is to represent the temporal evolution of a certain process, that is, how the different

system variables evolve with time(t) or event, by reasoning over random processes X =

3 Initial time is established for this research to be at 6am Eastern Time since commercial traffic activity

throughout the NAS is at its lowest volume in the hours preceding.

P(𝑿) = ∏ 𝐏(𝐗 |𝝅(𝑿))𝒙∈𝑿

(1)

29

{X(t) : t Ɛ T}, instead of random variables. Extensions of BNs to model these processes

are called dynamic Bayesian networks (DBNs) [15]. DBNs assume that the Markov

property holds, which states that the future is independent of the past, given the present;

therefore, the following factorization is obtained:

Where X (t) = {X (t): X Ɛ X}.

Given a potentially infinite time horizon, the specification of a discrete-time DBN

may be prohibitive due to data scaling concerns. In order to allow for a compact

specification the following assumptions regarding DBNs are generally made:

The DBN is first-order Markovian:

such that the future is independent of the past given the present time.

The DBN is time-invariant:

such that the same independence relations hold at each point in time for U, V, ⊆ X and t,

u, s, t + c, u + c, s + c, Ɛ T.

(2)

𝑿(𝒕 + 𝟏) ⫫ 𝒑 𝑿(𝒕 − 𝟏)| 𝑿(𝒕) (3)

𝑼(𝒕) ⫫ 𝒑𝑽(𝒖) |𝑾(𝒔) ⇔ 𝑼(𝒕 + 𝒄) ⫫ 𝒑𝐕(𝐮 + 𝐜)| 𝐖(𝐬 + 𝐜) (4)

30

The DBN is homogeneous:

such that transition probabilities are fixed for U, V ⊆ X and t, t’, t + c, t’ + c Ɛ T. In other

words, As the DBN goes from one state to another; structure of the DBN remains the

same from start to end.

Given these assumptions for each temporal slice, a dependency structure between

the variables specifying the initial distribution of the joint process can be developed,

called the prior model. It is usually assumed that this structure is duplicated for all the

temporal slices (except the first slice, which can be different). Additionally, there are

edges between variables from different slices specifying how the process evolves as time

goes from t to t + 1 for t Ɛ {1,2,…}, which defines the transition model. In this model,

variables at time t are depicted by dashed objects, while variables at time t + 1 are

depicted by solid objects. The temporal foundation in application is depicted by the

choice of the prior and transition model, while causal knowledge, such as the belief that a

traffic flow management initiatives (TFMI) causes an air traffic directive (ATD) (i.e. air

traffic controller command to pilot), and the influence of a Traffic Flow Management

initiative (TFMI) and ATDs on aircraft flight delay time (i.e. delayed >30min, >60min,

etc.) is captured as well. Figure 3-3 depicts an abstract example of a DBN, where the

influences between aircraft flight time delay, TFMIs, and ATDs are depicted by a prior

transitional DBN model. For example, if Traffic Flow Manager sets a NAS initiative to

delay aircraft on the ground and/or in the air due to en-route weather, this gets forwarded

to air traffic facilities and the air traffic controller takes necessary action by providing

𝑷(𝑼(𝒕 + 𝒄)|𝑽(𝒕)) = 𝑷(𝒕′ + 𝒄)|𝑽(𝒕′)) (5)

31

directives to slow down or divert aircraft off their intended lateral flight plan path. This in

turn, creates a flight delay for the aircraft going from airport A to airport B. Figure 3-3

aims to depict this abstract process in order to introduce how a DBN which treats this

scenario from one state to the next.

Figure 3-3: Abstract example of DBN that represents the influences between NAS actors

In this study, the author extends the prior-transition DBN standard, and develops an

extended formalism (see Section 3.5) that provides more modeling power and improves

performance in terms of execution time and memory usage.

3.3 Problem Domain

Prior to the development of an effective DBN, it is necessary to formulate a concise

and explicit problem description. It is also essential to constrain the domain of the

problem in order to control under which conditions the model may be applicable.

The primary objective in this research is improving prediction support in the NAS as

it pertains to aircraft flight delay and route planning. A system such as the NAS may

include software, hardware, people, information, physical infrastructure, services, and

32

other system support items [2]. Figure 3-4, depicts the developed system hierarchy that

breaks down the NAS for the problem domain.

Figure 3-4: Problem domain system hierarchy

The following are definitions extracted from [2] for succeeding levels within the

system/subsystem hierarchy taken, as well as the specific entity description used for this

research purposes. Keep in mind, these are assumptions used for the particulars of the

research at hand and can be altered based on the overall scope of the analysis. For

example, if a more granular analysis is required, the main system could in fact be the

airport and subsystem being the all elements specific to said airport.

System- An integrated set of constituent parts that are combined in an

operational or support environment to accomplish a defined objective.

These parts include people, hardware, software, firmware,

33

information, procedures, facilities, services, and other support facets.

o Description: The NAS is the higher level system in our

empirical scenario.

Subsystem. A system in and of itself (reference the system definition)

contained within a higher level system. The functionality of a

subsystem contributes to the overall functionality of the higher level

system. The scope of a subsystem’s functionality is less than the scope

of functionality contained in the higher level system.

o Description: An airport or set of airports are the subsystems,

since by definition, airports are “less than” the scope of the

NAS.

Element. An integrated set of components that comprise a defined part

of a subsystem.

o Description: Since the primary focus of this objective is to

provide prediction support, our elements include the type of

elements that we are interested in predicting (e.g. flight delay

prediction, traffic flow management prediction, and airport

capacity prediction).

Component. Composed of multiple parts; a clearly identified part of

the product being designed or produced.

o Description: In order to predict the element flight delay (for

example), we will need to identify multiple nodes or attributes

that have causal relationships. In this case, since a DBN was

34

used, the author used this level for the multiple nodes for each

element.

Part. The lowest levels of separately identifiable items within a

system—are not normally subject to disassembly without destruction

or impairment of designed use.

o Description: At the lowest level, the time dimension will be

used as segmentation for the NAS. Since the main actor for

prediction is based on the aircraft, time can be broken out into

parts to provide prediction at a particular phase of flight (e.g.

ground departure, ascent, cruise, descent, ground arrival).

Uncertainty in this study is characterized based on behaviors in a population of

flights with the same origin and destination. The uncertainty associated with the delay

variables is roughly a function of the data that is being used to produce the delay and

route prediction. Table 3- 1 lists each category of variables that are utilized for the model

separated by fixed and temporal variables. Fixed data are categories of variables that have

only one value over the duration of the prediction horizon. Temporal data are categories

of variables having a value for each prediction horizon i.

Table 3- 1: DBN Fixed and Temporal Categories of Input Variables

Fixed Categories

Code Wording

AcChar Aircraft Characteristics (e.g. model type, airline)

CityP Multiple origin to single arriving or departing airport

35

Fixed Categories

Code Wording

Season Season (day-of-week, month-of-year)

DepGD Departure Ground Delay time

AirbD Airborne Delay time

ArrGD Arrival Ground Delay time

CdDepGD Causal departure ground delay factors

CdAirbGD Causal airborne ground delay factors

CdArrGD Causal arrival ground delay factors

DTResult Delay time prediction

Temporal Categories

Code Wording

SchTra Scheduled Traffic at time i

LatPthi Lateral path of ATC sectors traversed at time i

DepGDi Departure ground delay at time i

AirbGDi Airborne delay at time i

ArrGDi Arrival ground delay at time i

CdDepGDi Causal ground departure delay factors at time i

CdAirbGDi Causal airborne delay factors at time i

CdArrGDi Causal ground arrival delay factors at time i

DTResulti Delay classification threshold (0-30min,30-60min,60-90min,

>90min) prediction probability at time i

3.4 Data Processing

Previous research into delay prediction [49] [50] used the FAA’s Aviation System

Performance Metrics (ASPM) database for input data to provide the delay picture. While

ASPM provides detailed data on flights to and from airports, it lacks robustness as it only

provides this data for 77 airports, 22 carriers, and some VFR (visual flight rules) traffic.

For this study, a more robust data source was utilized by using aircraft radar track data

from MITRE’s Center for Advanced Aviation and System Development (CAASD),

otherwise known as Threaded Track. Threaded Track fuses a range of radar position

coordinates (lat, lon) throughout the flight into a single synthetic trajectory by applying a

36

series of noise attenuation algorithms [51]. These sources include the National Offload

Program (NOP), Airport Surface Detection Equipment System (ASDE-X) and Enhanced

Traffic Management System (ETMS) data.

ETMS provides the lowest quality position source updating at approximately one

minute intervals and is utilized only to fill gaps. NOP data used within Threaded Track has

three different formats: NOP-Center which provides position reports during the En Route

phase of flight; NOP- Automated Radar Terminal System (ARTS) and NOP-Standard

Terminal Automation Replacement System (STARS) contain Terminal Radar Approach

Control (TRACON) position returns for the flights with those specific automation systems;

and ASDE-X data provides one second update rate positions on the airport surface and in

the immediate area around the airport.

For this study, the author leveraged and built on a MITRE developed data analysis

project that centers on the fusion and post-processing of the threaded tracks with relevant

external data sources. Although ASPM was one of the sources used to provide the flight

delay story integrated with threaded track, an algorithm needed to be developed to fill in the

gaps. The author developed a ‘phase-of-flight’ post-processing algorithm that takes the time

series points of threaded tracks, partitions phase of flights based on the radar source and

aircraft horizontal or vertical characteristics, and tags the phase of flight from beginning to

end. Figure 3-5 depicts an example of how Threaded Track stitches sources of aircraft

position data to provide an accurate single-source gate-to-gate record of the position of the

flight along with a visual depiction of how the phase of flight post-processing algorithm

would partition and tag the data for each flight segment.

37

Figure 3-5: Threaded Track gate-to-gate flight data sources used for each phase of flight.

The data processing steps performed to create threaded track are depicted in Figure 3-

6 and are thoroughly described in the following sub-sections.

38

Figure 3-6: Data Processing Steps.

39

3.4.1 Data Segmentation

The NOP and ASDE-X data source are stored in a text format with one row per

radar return. Although a track identification (OD) column is present in each of the data

sources, the ID values are recycled within each air traffic facility and therefore do not

uniquely identify with a track. The segmentation process groups related radar returns into

segments, and assigns a unique segment ID to each group of returns. This process is

designed to avoid merging two flights whenever possible, and minimize the possibly of

splitting a single flight into multiple segments.

This process uses different criteria for assigning points to a segment depending on

the data source. The process begins by grouping the returns by air traffic facility, date,

and source-assigned track ID. The groups of points are then sorted by ascending time.

After the points are grouped and sorted, the segmentation criteria are applied to each

point in turn, and points within a segment are assigned the same segment ID. The

segmentation criteria are specified by Equations 6-9.

Equation 6 is used in the segmentation logic to ensure that two successive points

in a segment are temporally close. The longer update period in NOP en-route data

requires a looser time-bound between successive points.

(6)

(7)

40

Equation 7 is the lateral distance check which was developed to ensure that two

successive points are within a reasonable distance of one-another. Successive points that

fail the distance check occur most often when a track ID is recycled by a tracker.

Equation 8 is the flight information check used for NOP en-route records. This

was developed because the computer ID is commonly duplicated among tracks within an

air traffic facility. The flight information check was developed to use the beacon code

and aircraft call-sign information along with the computer ID to group points together.

Any two successive points in a segment must agree on at least two of the three fields.

Equation 9 describes the rules used to assign pairs of successive points to a

segment. For ASDE-X data records, successive points that share a track ID are

considered to belong to the same segment if they have the same Mode-S value, and pass

the time check. NOP, STARS, and ARTS records must pass the lateral distance and time

checks, NOP en-route records must additionally pass the flight information check.

(8)

(9)

41

The segmentation process is implemented as four jobs, one for each data source.

This process utilizes a distributed computing software framework called MapReduce.

The Map Phase of each job us used to perform the grouping and sorting of radar returns.

The Reduce Phase implements the segmentation criteria outlined above. The MapReduce

process will be discussed more extensively in Section 3.7.1.

3.4.2 Segment Metadata

After the raw data is segmented, the author developed a metadata collection

process that builds the segment level metadata to better understand the characteristics of a

segment. This information is subsequently used by the Fusion Process to connect the

segments to build the basic Flight Metadata. This process builds and collects information

including the flight start time and flight end time for each segment. Other metrics built

will be discussed in Section 3.4.5. An even deeper dive into these metrics can be viewed

in the data schema in Appendix B.

3.4.3 Data Fusion Process

The fusion process is designed to take one track, recorded by two separate air

traffic facilities and merge those tracks into one track that crosses between multiple air

traffic facilities. This process may potentially need to examine the entire collection of

data segments of the applicable time window. In order to reduce the associated magnitude

of data that would need to be examined; only the per-segment metadata is utilized in this

process. The metadata is an incomplete view of the segments- it contains only high-level

attributes such as: aircraft identifier, airline, departure and arrival airports, and the

42

bounding values of time, location, altitude, and speeds. The fusion process is designed to

fit between the segmentation process (which only aggregates clearly-defined, time-

contiguous radar data that corresponds to a flight and radar sensor) and a smoothing

process (which examines all track data available per-flight), and may therefore decide to

split a previously fused flight). Therefore, fusion is designed to reduce false negatives at

the expense of false positives, thus split flights will never subsequently be reconsidered

for merging by the smoothing process.

Fusion considers two primary attributes above all others- the window of time

associated with a segment, and the set of aircraft identification metrics associated with a

set of segments that create a fused track. The time window is based on the notion that

different radar sensors will generally overlap in coverage of a flight as time progresses;

overlapping time windows (within a reasonable quantum at the ends of the segment to

allow for the radar sweep rate and possibility of missing a few data points) imply that two

segments may represent the same flight. In addition, aircraft IDs, for the most part, are

highly consistent through the evolution of a flight. This means that usually, such an ID

can be used successfully to join all segments for a flight as long as the time window

constraints can be observed. There are a small percentage of flights where IDs are

inconsistent; this occurs because IDs have been abbreviated or misspelled as part of a

manual data entry process along the way. These flights with multiple IDs are

recognizable because of segments where multiple IDs or other identifying metadata

appear in single segments that can be used to join segments with different IDs.

This fusion process is typically difficult to process in a parallel computing

environment; however, due to the vast volume of data, a parallel computing environment

43

is required to complete the process in a timely manner. For this reason, the utilization of a

single flight ID allows for an opportunity to parallelize the problem to a per flight

process. The author developed method to handle processing is described by the following

algorithm:

1. Load the segment metadata, which comes from multiple sources (NOP, ASDE-X,

and ETMS)

2. Group the data by aircraft ID, for sets of segments where such IDs are

unambiguous keys for fusing flights, and create a separate group of ambiguous

cases.

3. Sort the data from each group by time and stream it into a Java program that

processes the segment metadata and emits pairs of uniquely-generated flight IDs

and segment IDs for the next step of processing.

The data fusion algorithm expects its data as a time-sorted sequence of comma-

separated value records that represent the segment metadata. By processing these records

in a temporal order, they can be fused into flights by examining only records that fit

within a time window that corresponds to the longest segment duration plus the time

quantum. As records expire from this window, they probably do not overlap any

subsequent records, and their corresponding flight data can therefore be omitted. The

records that lie within the current time window are indexed by all metadata attributes

(aircraft ID, airline code, airports, facility ID, etc.) that can be used to match records to,

or exclude records from, flights. These indices permit very fast matching of the limited

44

set of data in memory at any point in time.

3.4.4 Track Smoothing and Filtering

Each facility’s surveillance data offers differing quality, availability, and

coverage. This final step creates a synthesized track by smoothing and weighting the

contributions from each data source. Further explained, various sensors are integrated by

first computing a smoothed trajectory from each data source. Since the Threaded Track is

built off historical data sets, least squares smoothing filters have been shown to create

better trajectory estimates than those in used tracking systems which are subject to an

inherent measurement lag from aircraft accelerations [52]. These filters also provide

derived parameters from the raw trajectory such as speed, heading, climb gradient, etc.

Each radar sensor’s continuous derived track is then integrated into a single Threaded

Track using a weighted average based on the underlying accuracies in each source’s

sensors and data quality.

3.4.5 Flight Metadata

The Flight Metadata process unifies merged segment’s flight information into a

single summarized flight record. This output contains all of the relevant metrics available

from the source data in addition to providing links to external data sources such as

ASPM’s flight delay database. Figure 3-7 depicts the workflow schematic of how flight

delay information was integrated in the data automation workflow. As discussed

previously, the process utilizes NOP, ASDE-X, and ETMS segmentation metadata and

smooth track data information to generate algorithms (TrajectoryFusion, PhasesOfFlight)

45

which in turn generate flight metrics (i.e. Threaded Flight, Phases of Flight). External

flight delay data sources (as shown) were integrated into the data workflow process for

model development discussed in succeeding sections. Appendix A depicts the input,

process and output considerations for the complete list of data fused algorithms tested in

this research. In addition, Appendix B depicts the data schema of all of the variables used

in model development testing.

Figure 3-7: Data Fusion Workflow

3.4.6 Data Quality

Due to anomalies in the data, a flight can end up reporting multiple call signs,

departure/arrival airports, and aircraft types. To solve this issue, the Flight Metadata

process ranks each type of information on the number of times it appears in a single

flight. The highest scoring information is considered as the best guess. The Flight

46

Metadata process also preserves low scoring entries for later improvements and analysis

purpose.

3.5 DBN Formalism Extensions

As stated in Section 3.2.3, the standard for formulating the structure of a DBN is

typically modeled using a prior and transition, assuming a first-order Markov process,

time-invariance, and homogeneity. Unfortunately, to robustly model and infer aircraft

flight time delay in the presence of uncertainty requires extensions for a kth-order

Markov process, where in Murphy’s formalism [53] - this is not possible. Another issue

with the previous formalism is when unrolling the network for inference, every node is

copied to every time-slice, even if it has a constant value for all time-slices. Lastly,

although it is possible to introduce a different initial state using the previous method, it is

not possible to define a different ending state, which can be useful for modeling variables

that are only interesting after the end of the process. These three observations form the

basis of extensions to the DBN formalism.

To offset these constraints, the author applied a formalism extension consisting of

five components: (1) Temporal arcs, (2) Temporal plate, (3) Contemporal nodes (C), (4)

Anchor nodes (A), and (5) Terminal nodes (T), as shown in Figure 3- 8. A temporal arc is

an arc between a parent node and a child node with an index that denotes the temporal

order. The benefits of temporal arcs are that they provide a more comprehensible

visualization and allow for a much easier DBN specification that requires less coding.

The temporal plate is the area of the DBN definition that holds the temporal information

of the network. Specifically, it contains the variables that develop over time (and are

47

going to be unrolled for inference) and it has an index that denotes the sequence length T

of the dynamic process. The benefits of a temporal plate have the effect that regardless of

how many time-slices the DBN is unrolled to, the nodes outside the temporal plate are

unique.

Figure 3- 8: The five components of the DBN extended formalism

This is useful for the next component, contemporal nodes, which are nodes outside the

temporal plate whose values remain the same over time. For instance, if an ATDM is

seeking information on a specific aircraft type (e.g. an Airbus A320 – aircraft type does

not vary over a flight) they would specify this in the contemporal node which saves

memory and computational time. Lastly, anchor and terminal nodes are nodes located

outside the temporal plate that have one or more children inside the temporal plate, and if

48

unrolled for inference, these nodes are only connected to the first and last time-slice,

respectively. These nodes are useful for situations where it would be useful to introduce

extra variables before the start or after the end of the process that do not need to be

copied for every time-slice. These nodes are of vital importance for the DBN in this study

since they were used to extend the DBN formalism in a way that works for the NAS

system. Additionally, they were used as a guideline to develop an efficient DBN structure

for the flight delay prediction model.

3.6 DBN Structure Derivation

The development of a dynamic Bayesian network structure can be a demanding

undertaking. The initial specification of network structure is a challenging task, and the

best heuristic is to keep it concise. Concise models can incrementally be expanded to

more detailed and complex models by adding detail to the network via a node and

evaluating the functionality of that node. Starting with complex models typically makes it

unmanageable to evaluate functionality, since distant variables may interact in complex

ways [54].

Construction of the DBN structure commenced with the identification of factors that

had a direct influence on aircraft flight delay. This is driven by the fact that flight delay

has an extensive impact on how ATDMs respond to dissimilar situations of operational

and environmental uncertainty. The key causal factors that directly influence delay are

discriminated into the following categories according to their phase of flight: ground

departure causal delay factors, airborne causal delay factors, and ground arrival causal

delay factors. Using ground departure causal delays as an example, variables in this

49

category include: runway configuration, weather, traffic interactions, traffic restrictions,

and runway queue position. See Appendix A & B for an in depth explanation of both the

input and output considerations that went into building said variables as well as the data

schema which is the end product variables developed from both the fused data sources

and developed algorithms. The presented model was developed incrementally using a

combination of domain literature, expert knowledge, and regression analysis. Figure 3-9

depicts how the DBN model carries out the task of predicting delay time and causal delay

factors using the extended formalism.

Figure 3-9: Extended formalism for a second-order DBN.

The present model is an example of a second-order DBN using the extended formalism

discussed in Section 3.5 In other words, the variables that have a red arrow with the

number two in the box, means that the model predicts flight delay best when the previous

two instances are taken into account. The anchor and contemporal nodes are placed

outside the temporal plate (squared dashed line). The temporal plate denotes that the

50

DBN will be unrolled for t = 4 time-slices. In this graph, the nodes that are grey can be

fully observed and the nodes in white contain missing values.

3.7 DBN Parameter Learning

After obtaining the DBN structure, parameters were learned from the fused

threaded track dataset using the Expectation Maximization (EM) algorithm. EM is an

iterative algorithm that enables learning models from data with missing and/or latent

variables. The EM algorithm consists of an expectation step (E step) and a maximization

step (M step). In the E step, the probabilities of the missing variables are calculated given

the observed variables and the current values of the parameters (sufficient statistics are

computed). In the M step, the parameters are recomputed using the filled-in values as if

they were observed values. The process of filling-in the missing values and updating the

parameters is iterated until convergence. The different variants used for learning

parameters in Bayesian networks from both complete and incomplete data are discussed

more extensively in [55].

While the EM algorithm generally works well in estimating missing and/or latent

parameters in probabilistic graphical models, two problems for DBN parameter learning

using EM still exist. First, applying the EM algorithm to learn DBN parameters is often

subject to local optima and prone to premature convergence which could ultimately lead

to poor solution quality. To mitigate this problem, the author applied the Age-Layered

Expectation Maximization (ALEM) method [56], which is primarily based on the genetic

algorithm concept of creating and computing with a population of randomly initialized

entities (See Section 3.7.2). Second, as data size increases, learning time of conventional

sequential learning becomes intractable. To mitigate this problem, the author applied the

51

ALEM algorithm on the MapReduce distributed computing framework.

3.7.1 MapReduce for Massive Scale Distributed Computations

MapReduce is a programming framework for distributed computing on massive

data sets which was introduced by Google in 2004. It is a paradigm that allows users to

create parallel applications while hiding the details of data distribution, load balancing,

and fault tolerance [57]. MapReduce requires decomposition of an algorithm into map

and reduce steps. In the map phase, the input data are split into blocks and processed as a

set of input key-value pairs in parallel by multiple mappers. Each mapper applies to each

assigned datum a user-specified map function and produces as its output a set of

intermediate key-value pairs. Then the values with the same key are grouped together

(the sort and shuffle phase) and passed on to a reducer, which merges the values

belonging to the same key according to a user-defined reduce function.

Hadoop, an implementation of MapReduce, provides a framework for distributing

the data and user-specified MapReduce jobs across a large number of cluster nodes. It is

based on the master/slave architecture. The single master server (jobtracker), receives a

job assignment from the user, distributes the map and reduces tasks to slave nodes

(tasktrackers) and monitors their progress. Storage and distribution of data to slave nodes

is handled by the Hadoop Distributed File System (HDFS). A Hadoop node might denote

a tasktracker or jobtracker machine. A map task describes the work executed by a mapper

on one input split. A reduce task processes records with the same intermediate key. A

mapper/reducer might be assigned multiple map/reduce tasks. To learn the parameters

needed to support this research, Hadoop was run on the MITRE cluster – a continuously

52

growing cluster consisting of both a North and South configuration with a total of 129

data nodes, 1,630 mappers, 732 reducers, and over 2 petabytes of storage capacity that do

all the work.

3.7.2 ALEM on MapReduce (ALEMMR)

The ALEM algorithm is based on the genetic algorithm concept of creating and

computing with a population of randomly initialized entities [56]. Each entity has a

fitness, which is to be optimized, as well as an age corresponding to the amount of time

the entity has been in a population [58]. Entities are separated in layers with other entities

of like ages. Lower layers have young entities in the genetic algorithm, while higher

layers have the oldest member of the population. As entities age, they ascend to high

layers. The maximum age of each layer is determined by the age gap parameter; once

entities reach this age, they ascend to the next layer. Additionally, there are limits to the

maximum number of entities per layer. The age-layered structure reduces the possibility

of fit, old entities, stuck in local optima, overtaking the population due to their high

fitness.

In ALEM, a population of EM runs is created and updated [56]. The age of each

EM run relates to its number of iterations, and the fitness of each EM run is its likelihood.

EM runs are randomly initialized in the first layer, iterate until an age where they ascend

to the next layer, and may need to compete for a spot in the next layer. Competition

occurs when a layer is full: if an ascending EM run has greater likelihood, the non-

ascending EM run is discarded to make room. Otherwise, the ascending EM run is not

competitive enough and is discarded. ALEM continues until a given number of EM runs

53

successfully converge using a pre-defined convergence criterion and terminates when a

specified number of EM runs converge [58]. Figure 3-10 provides a representative

example of how the use of ALEMMR provides the distributed platform needed to run a

population of airport DBNs simultaneously.

This study adopts the ALEM MapReduce framework developed in [59], and

provides novelty by addressing two important shortfalls of that research: the amount of

evidence available, and how well ALEMMR scales when the population size grows to

thousands or millions. Using ALEMMR, multiple DBNs are processed for each

operation. For population treatment of EM runs, (i.e., DBN parameters) ALEMMR

terminates and starts new DBNs as well as executes the likelihood for the layers that are

changing, as illustrated in Figure 3- 10. More specifically, each mapper in the E-step

performs expectation calculations on a single evidence set and multiple DBN instances.

The reducer then performs maximum likelihood estimation and either begins or ends new

EM runs according to the ALEM layers. The added temporal dimensions unique to DBNs

are managed by time-indexed variables at each observed prediction horizon (2 hours).

Since ALEM operates on a dynamic population structure, the number of EM runs

performed for each MapReduce operation will vary based on pre-defined parameters.

Section 4.2.1 discusses the additional parameters required to run ALEMMR that consider

computational time and a global optima.

54

Figure 3- 10: Example of an ALEM application with populations of DBNs on

MapReduce.

55

Chapter 4: Empirical Experiments

4.1 Experimental Design

For the DBN used in this study, the author was interested in quantifying four research

questions:

1. By using ALEMMR, can we learn parameters that scale with an increasing data

size while addressing the EM local optima problem?

2. What is the recommended prediction horizon and classification threshold for

delay prediction?

3. Does the flight delay prediction model provide accurate prediction results for

delay time and causal variables for each phase of flight greater than 80% of the

time?

4. Can this approach integrate results of the flight delay prediction (if successful)

into a developed real-time trajectory decision support prediction system that

recommends which route an aircraft should fly given both historical and real-time

flight delay information combined with data related to the aircraft and the external

environment? (previously discussed in Section 3.4)

This experiment presents the prediction results of model runs conducted against more

than three years of fused threaded track data that covered the period August 2010-

September 20134. City pairs for all aircraft arriving and departing from the top thirty-five

major airports in the NAS were used- (as depicted in Figure 4- 1) to estimate accuracy;

4 Date range of the fused threaded track uploaded on HDFS

56

the author utilized millions of flight records used for parameter learning to obtain the

predicted beliefs of flight delay time and causal delay variables by using the holdout

method. In this method, the data is randomly partitioned into two independent sets, a

training set and a test set. The training set is then used to develop the model, whose

accuracy is estimated with the test set. 80% of the fused threaded track dataset was for

training, and 20% was used for test data.

Figure 4- 1: Track dispersion using a subset of threaded track historical radar data for

aircraft departing and arriving from the top thirty-five major airports in the NAS.

4.2 Empirical Experiments Overview

This section discusses the results of investigating the three research questions (see

Section 4.1). For the first experiment, the author applied the ALEM MapReduce

algorithm to the NAS-wide airport dataset to quantify both the time (minutes) as the data

set increases and the mean number of iterations until global convergence for DBNs with

varying levels of dataset size and hidden/missing data nodes. For the second experiment,

the author developed a confusion matrix that allows visualization of model performance.

In predictive analytics, a confusion matrix reports the number of false positives, false

57

negatives, true positives, and true negatives. The confusion matrix, by design, allows for

a more comprehensive analysis than the proportion of correct guesses (accuracy) that can

be beneficial for ATDMs. For the third experiment, accuracy results for the delay causal

variables aggregated for each phase of flight are provided to determine the accuracy of

the model to diagnose causes of delay.

4.2.1 Experiment 1: ALEMMR Flight Delay Application

To learn the DBN parameters using ALEM, the author used the following

parameters5: number of layers = 4; age gap = 4; and minimum runs in lowest layer = 4.

Additionally, the convergence tolerance was set to ᵋ =10−4, the maximum number of

iterations was set to 100, and the population size was set to terminate when 15 EM runs

converged.

Given the aforementioned parameters, the first objective was to improve the

solution quality by ensuring a convergence to global optima. To achieve this, the

researcher implemented the airport DBNs on Hadoop using a subset of the training data

(varying size up to 106 track variables) and ran it over the MITRE Hadoop cluster6.

Figure 4-2(a) depicts the mean number of iteration runs till global convergence, taking all

thirty-five major NAS airports into account with the number of missing or hidden nodes

equaling two and four, respectively. Overall, these results are significant for the

application of ALEM algorithm on MapReduce because the author drastically reduced

5 Parameters were set based on algorithmic best practices from prior art. Additional testing will be

performed to ensure optimal parameter settings in future applications. 6 For brevity, discussion and analysis of mappers and other factors related to parallel computing in

MapReduce were not included. Section 3.7.1 details the characteristics of the MITRE Hadoop Cluster

which was optimized for efficient data scaling to process the large problem size (multiple airport DBNs)

and extremely large population (data variables).

58

the average number of iterations from standard EM algorithms that traditionally require

hundreds of iterations till local convergence. This alone would reduce the computational

time required to solve the multi-airport problem significantly by applying this technique.

The author’s second objective was to quantify if DBN parameters can be learned

and scaled for the top thirty-five major airports in the NAS using just over three years of

fused threaded track flight records. It is easily inferred that increasing the size of the

training samples leads to increased training time; however, Figure 4-2(b) depicts the

results of the author’s implementation scaled to a flight data records size of n = 108, as

the population set increases super-linearly (n log n). The results show that when

comparing the sequential with the ALEMMR approach for only one airport, only a slight

speed-up improvement (3.2X) occurs per iteration; however, when the number of airports

increases up to 35, the ALEMMR provides an impressive 13.9x computational

acceleration over the sequential implementation. Future research is recommended to

explore minimizing processing time by applying more resources and/or a new parallel

framework (See Section 5.2.2).

(a)

59

(b)

Figure 4-2: Varying the size of training samples for learning the DBN using the Hadoop

MapReduce computing framework. (a) Depicts the average number of iterations till

convergence using a large set of varied training sample data and hidden/missing nodes.

(b) Depicts the scalability of the ALEM algorithm on MapReduce for an increasing data

set size where the data size (which scales to over 108) is instead supplemented with the

approximate time per iteration for an increasing number of airports. Speed-up of

ALEMMR relative to sequential EM is shown ranging from n = 1 airport to n = 35

airports.

4.2.2 Experiment 2: Varying the Measurement Rate

Four different strategic planning prediction horizons were analyzed: 2, 4, 6, and

24 hours. One would expect the length of the prediction horizon to affect the prediction

performance negatively as the horizon grows. The author also researched the impact of

changes in the arrival delay classification thresholds: 0-30, 30-60, 60-90, and >90

minutes. In other words, for a given prediction horizon, we would like to be able to

predict the delay will be within a given delay interval of time. The distribution result of

the joint probabilities is given by (6):

Where:

𝑃(𝑟𝑒𝑠𝑢𝑙𝑡1:𝑇) = ∏𝑇

𝑡 = 1∏

𝑁𝑖 = 1

𝑃 (𝑟𝑒𝑠𝑢𝑙𝑡𝑖𝑡 | 𝑃𝑎(𝑟𝑒𝑠𝑢𝑙𝑡

𝑖𝑡 )) (6)

60

T is the interval of the prediction horizon

N is the total number of the variables for the extracted model.

This prediction is dynamic, in that it continuously evolves throughout the strategic

prediction horizon by new measurements. After generating many examples, the author

used a test case population which contains over one million flight records for the

performance evaluation of the model. Table 4- 1 depicts the delay classification

confusion matrix for the optimal prediction horizon (two hours) where the sum of the

highlighted green boxes represents correct classification predictions. Overall, prediction

results detailing the flight delay time were reliable approximately 92% of the time (the

sum of the diagonal), which is very encouraging. The remaining 8% were incorrectly

predicted to be delayed in the 0-30 minute delay bin when in actuality 4% , 2%, and 2%

were delayed for 30-60, 60-90, and greater than 90 minutes, respectively. After review,

the majority of the false positives were related to a mixture of anomalies in relation to the

aircraft’s lateral path from airport A to airport B with the departure, airborne, and arrival

delay time.

Table 4- 1: The Classification Threshold for Flight Delay Using a Confusion Matrix and

Prediction Horizon of 2 Hours

Predicted (minutes)

0-30 30-60 60-90 >90

Actual

(minutes)

0-30 80% - - -

30-60 4% 7% - -

60-90 2% - 3% -

>90 2% - - 2%

61

4.2.3 Experiment 3: Causal Delay Prediction Results

After identifying the optimal prediction horizon - which is a two hour look ahead

for accurate delay prediction - the author utilized that information to model performance

for delay causal variables at each phase of flight for the two hour prediction horizon. For

brevity, the delay causes were aggregated based on predicted accuracy of causal variables

for each phase of flight. In other words, for a given phase of a flight, the results depicted

reflect the weighted predictive accuracy of all the causal variables within that phase.

Table 4-2 depicts the delay causal variable prediction results for both the DBN model and

a static BN model for the optimal prediction horizon (two hours). As a specific example,

let’s say an aircraft is traveling from airport A to airport B and is in the “cruise” phase of

flight as defined previously in Section 3.4. If this aircraft were to encounter a causal

airborne (i.e. cruise) delay factor that occurs during this phase of flight, the DBN would

be able to predict the cause of this delay based on end-state flight delay with an accuracy

of 93% as opposed to that of a BN with an accuracy of 84%. Overall, prediction results

for causal delays at each phase of flight show that not only does the DBN collectively

outperform the static BN in predicting delay at a given phase of flight, but with more data

and the fusion of different data sources continuously underway – the depth and accuracy

of results should improve. The reason that DBNs perform better then BNs ultimately

comes down to the temporal dimension added that creates granularity in the data, which

provides more concise results. If one refers back to Figure 3-4 where system hierarchy of

the problem domain is discussed, one can now see that the time slices (considered a

“Part” in the systems hierarchy) hold great importance. The best way to think of this

62

conceptually is to visualize an aircraft going from point A to point B. A BN will

aggregate and tell you that during this route of flight, this aircraft probabilistically

encountered a weather delay. A DBN will slice time into separate parts and provide you

with granular detail, so the same aircraft that encountered a weather delay in a BN may

really show that the principal cause of delay was due to congestion at the departure

airport followed by weather en-route, followed by a hold in the descent phase. The

separate time parts can then be rolled-up to identify the rank ordered list of causal delay.

Table 4- 2: Dynamic Bayesian Network & Static Bayesian Network Prediction Results

for Each Phase of Flight with Prediction Horizon of Two Hours

Variables Bayesian

Network

Dynamic Bayesian

Network

Causal Ground Departure

delay factors at time i 71% 82%

Causal airborne delay

factors at time i 84% 93%

Causal ground arrival delay

factors at time i 76% 85%

4.2.3 Experiment 4: Trajectory Route Selection Decision Support System

The previous experiments have proven that, with high confidence, a flight

prediction model developed using DBNs can be utilized for the prediction of both flight

delay and flight delay causal variables over a series of time states. In addition, the

previous experiments have proven that model development can be scaled out to more

airports then prior research has ever attempted. Specifically, experiment 1 scaled for the

top thirty-five major airports in the NAS. This experiment aimed to integrate results of

the flight delay prediction into a developed real-time trajectory decision support

63

prediction system that recommends which route an aircraft should fly given both

historical and real-time flight delay information combined with data related to the aircraft

and the external environment (previously discussed in Section 3.4).

Figure 4-3 depicts the data-driven decision support architecture that combines

both the elements of flight delay prediction with the recommended aircraft route

selection. There are five main components in the developed framework:

The Data Processing Model collects and trims both the real-time and

historical data such as threaded track weather data, delay data, flight data,

track data, and other subsets of data (defined in Appendix B). This

provides both the online parameter estimation and data update components

with the required input data.

The Online Parameter Estimation Component estimates data parameters

by calculating both historical data and real-time conditions to improve

adaptability of the DBN model.

The DBN Model, as the kernel of the framework, calculates aircraft route

selection according to the DBN model and the results of the online

parameter estimation. The model also sends data to the Data Update

Component.

The Data Update Component updates the route and all associated

historical data (priori estimate) with real-time data.

The Results Computation Component outputs flight delay predictions,

recommended routes, and the computed trajectories for a given airport

64

arrival-departure pair input using a point-mass mathematical model.7

Figure 4-3: Recommended Data-Driven Decision Support Architecture

The structure of the DBN for route selection was devised by the efforts of the

author’s domain knowledge in regards to modeling with DBNs and subject matter expert

feedback from primary air traffic actors such as traffic flow managers, pilots, and air

traffic controllers. The DBN model represents an aircraft flying in the NAS for a time

7 Details on the point-mass BADA aircraft performance mathematical model can be reviewed at the

following link: https://www.eurocontrol.int/sites/default/files/field_tabs/content/documents/sesar/bada-revision-atmosphere-model-2010.pdf

https://www.eurocontrol.int/sites/default/files/field_tabs/content/documents/sesar/bada-revision-atmosphere-model-2010.pdf


65

granularity of five minutes that is able to infer flight delay time, and route selection. The

author chose a time-step of five minutes because this is a time period that is short enough

to capture interesting dynamics of an aircraft, but long enough to capture values of the

variables that are sought (i.e. route selection, etc.).

Figure 4-4 depicts the lateral path of the recommended routes when scaled to all

the airports used in this study. More specifically, each delay state depicts the

recommended route for aircraft departing and arriving from all the airports in the study

given the integration of both flight delay data and other data mentioned in Figure 4-3 and

defined in Appendix B. For example, NAS delay state one is a scenario that was run on a

relatively calm day based on the flight delay heat map which was ultimately derived from

the efforts of the previous experiments. Each one of the models recommend lateral paths

provided for scenario one (and all scenarios for that matter) and are updated at the

aforementioned five minute update cycle and trajectory computed by a point mass

mathematical model8. On the opposite end of the spectrum, NAS delay state six is a

scenario that was run on day with medium-high delay, specifically clustered in the

Atlanta Metroplex area. In application, traffic flow managers could use this decision

support system to both understand how the Atlanta delay propagates through the rest of

the NAS, and use the recommend routes provided for the NAS for strategic planning.

8 Details on the point-mass BADA aircraft performance mathematical model can be reviewed at the

following link: https://www.eurocontrol.int/sites/default/files/field_tabs/content/documents/sesar/bada-revision-atmosphere-model-2010.pdf



66

Figure 4-4: NAS route selection & flight delay prediction lateral trajectory export from

the “Results Computation” component of Data-Driven Decision Support Architecture

4.2.4 Validation & Insight

The Data-Driven Decision Support Architecture was ultimately validated and

verified by the same subject matter experts that aided in the development of the DBN

models. More specifically, since this is a NAS-based recommendation engine, experts

were asked to focus specifically on the model recommended routes provided for the

geographic area they have expertise in. In addition, experts were asked to verify that the

model recommended routes given certain delay predictions (e.g. delay state six example)

seemed reasonable based on their expertise. Out of the 15 total subject matter experts, 13

overwhelmingly thought that the route recommended seemed reasonable and provided

feedback that the developed routes can even be used to develop Q routes in the en-route

phase of flight which supports another initiative in the TBO NextGen portfolio (discussed

in Section 1.2). In addition, six experts independently noted that with more testing the

approach developed by this research could be implemented into the TFM environment for

strategic (greater than two hours) and tactical planning (minute-by-minute decisions).

67

Additional insight by the experts focused on greater development of the external

environment. While the experts realize there are nearly infinite environment factors that

can come into play that can affect the fidelity of the model, most agreed that both the

volume and variety of different data sources used for this analysis was a big step in the

right direction with regards to data-driven model based decision-making in the big data

era.

68

Chapter 5: Conclusions & Future Work

5.1 Conclusions

In this study, the use of a DBN as the foundation for the development of a

predictive flight delay model was explored. DBNs provide a powerful means for

prediction and determining trade-offs (e.g. what-if scenarios) for managing flight time

delay in the presence of uncertainty. Making these trade-offs effectively is an essential

part of establishing decision support tools for the NextGen environment. The DBNs

ability to learn, classify, and predict parameters with high accuracy using a novel data

driven framework, encourages the author to continue this research to eventually obtain

the intended end state: a data-driven decision support system that can be used by ATDMs

to provide the optimal (best) or recommended (from past history) operational decisions

for both strategic and tactical prediction horizons. The author also believes the novel

computing strategy will find utility in the investigation of NAS system management

policies to manage flight time delay and aircraft route allocation for realistically-sized

flight route networks since it is more accurate, efficient, and extensible than prior

research makes possible.

5.2 Recommendations for Future Work

5.2.1 Air Traffic DBN Application

One focus for future research would be to continue testing both the properties and

parameters of the current DBN model and continue identifying more applications such as

when a TFMI should be initiated by traffic flow management. Specifically, the flight

69

delay model structure could be expanded to recommend when a TFMI should be

implemented since currently, strategic decisions by traffic flow managers largely rely on

non-optimal methods such as operational experience and tacit knowledge when deciding

on which TFMIs to implement for delay alleviation. An optimization method should then

be developed from the observed probabilities to provide the best decision or set of

decisions to the air traffic decision maker. One way this can be achieved is by identifying

the control law by which the decision support automation will take action in response to

variation from previous historical data. For example, given that the DBN model suggests

a TFMI should be implemented, future research could identify the trade-offs to be made

between early and ongoing intervention to compensate for delay propagation effects at

airport city pairs.

5.2.2 Research Other Parallel Computing Frameworks

Additional research should focus on the integration of this model and associated

data sources to perform analysis on an increasing number of airport sites using the

computational benefits of the Hadoop MapReduce computational framework. For real-

time decision support, new computational paradigms such as UC Berkeley’s Spark [60]

should be explored. Spark is an open source cluster computing system that provides

primitives for in-memory cluster computing making data loading and querying into

memory exponentially faster.

70

References

[1] Joint Planning and Development Office, "JPDO Trajectory-Based Operations (TBO) study team report," Washington D.C, 2011., in press.

[2] FAA, "National Airspace System: System Engineering Manual," Air Traffic Organization, Washington, D.C., 2006.

[3] B. Musialek, C. Munafo, R. Hollis and M. Paglione, "Literature Survey of Trajectory Predictor Technology," Federal Aviation Administration, Atlantic City, 2010.

[4] T. Gaydos, W. Kirkman, S. Shresta, E. Blair and J. Kuchenbrod, "Measures Variability and Uncertainty in Flight Operations," in Integrated Communication, Navigation, and Surveillance (ICNS), Herndon, VA, 2012.

[5] S. Mondoloni, "Aircraft trajectory prediction errors: including a summary of error sources and data- FAA/Eurocontrol action plan 16 common trajectory prediction capabilities," CSSI, INC, 2006., in press.

[6] Trajectory Computation Infrastructure Based on BADA Aircraft Performance Model; Gallo, Eduardo ; Lopez-Leonies, Javies ; Vilalplana, Miguel A.; Navarro, Francisco A. ; Boeing Research & Technology Europe, Technishe Universita Munchen; 39173

[7] Preliminary Results of a Robust Trajectory Prediction Method Using Advanced Flight Data; Dupuy, Dominique Marie; Porretta, Marco ; Center for Transport Studies Imperial College London; 39173

[8] Objective Function for 4D Trajectory Optimization in Trajectory Based Operations; Pleter, Octavian Thor; Constantinescu, Cristian Emil; Stefaneson, Irina Beatrice ; University Politechnica of Bucharest, Romanian Space Agency ; Aug.2009

[9] A Model to 4D Descent Trajectory Guidance; Rodriguez , Jose Miguel Canino; Deniz, Luis Gomez ; Herrero, Jesus Garcia; Portas, Juan Besada; Corredera, Jose Ramon ; Signal and Communications Department and Electronic Engineering Department, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain, Computer Science Department, Universidad Carlos III, Madrid, Spain , Signal, System and Radiocommunication Department, Universidad Politécnica de Madrid, Madrid, Spain; 39173

[10] Performances and Sensitivities of Optimal Trajectory Generation for Air Traffic Control Automation; Wu, Di ; Zhan, Yiyuan J. ; University of Minnesota; Aug.2009

[11] 3D Conflict Resolution of Multiple Aircraft via Dynamic Optimization; Raghunathan , Arvind U.; Gopal, Vipin ; Subramanian, Dharmashankar ; Biegler, Lorenz T.; Samad, Tariq ; Carnegie Mellon University, Pittsburgh, PA, Honeywell International, Minneapolis, MN; May 2003

[12] A Quaternion-based inverse Dynamics Model for Real-time UAV Trajectory Generation; Drury, Rick G.; Whidborne, James F. ; Cranfield University; Aug.2009

[13] Improved Ground Trajectory Prediction by Multi-Aircraft Track Fusion for Air Traffic Control; Lymperopoules, Loannis ; Lygeros, John ; Swiss Federal Institute of Technology Zurich; Aug.2009

[14] A Holding Function for Conflict Probe Applications; McNally, Dave ; Walton, Joe ; NASA Ames Research Center, University of California; Aug 2004

[15] Intent Inference and Strategic Path Prediction; Krozel, Jimmy Ph.D ; Andrisani II, Dominick Ph.D ; Metron Aviation Inc., Purdue University; Aug.2005

[16] On-Line Trajectory Optimization for Autonomous Air Vehicles; Twigg , Shannon ; Calise, Anthony Calise ; Johnson, Eric ; Georgia Institute of Technology ; 37834

71

[17] Utilizing RNAV Avionics Testing Lateral Offset Procedures; Herndon , Alert A.; Williams, Jeffrey T.; Vaughn, William ; DeArmon, James ; Duquette, Michelle ; Formosa, Jeffrey ; Jarvis, Edwin ; Spellman, Joseph ; The MITRE Corporation, Continental Airlines, FAA ; Oct. 2003

[18] Kinematics-Based model for Stochastic Simulation of Aircraft Operating in the National Airspace System; McGovern, Seamus M.; Cohen, Seth B.; Truong, Minh ; Farley, Gerard ; US DOT National Transportation Systems Center, EG&G Technical Services ; 39173

[19] Target Tracking and Essential Time of Arrival (ETA) Prediction for Arrival Aircraft; Roy , Kaushik ; Levy, Benjamin ; Tomlin, Claire J. ; Aug.2006

[20] Flight-Mode-Based Aircraft Conflict Detection using a Residual-Mean Interacting Multiple Model Algorithm ; Hwang , Inseok ; Hwang, Jesse ; Tomlin, Claire ; Stanford University; Aug 2003

[21] Robust Nonlinear LASSO Control: A New Approach for Autonomous Trajectory Tracking; Boyle, David P.; Chamitoff, Gregory E. ; Ball Aerospace Australia, NASA; Aug 2003

[22] C. P. Tino, L. Ren and J.P. B. Clarke, "Wind Forecast Error and Trajectory Prediction for En-route Scheduling," in AIAA-GNC, Chicago, IL, 2009.

[23] S. Mondoloni and D. Liang, "Improving Trajectory Forecasting Through Adaptive Filtering Techniques," in 5th USA/Europe ATM R & D Seminar, Budapest, Hungary, 2003.

[24] T. Rentas, S. M. Green and K. Cate, "Survey and Method for Determination of Trajectory Predictor Requirements," National Aeronautics and Space Administration, 2009.

[25] M. Hansen, "Delay and flight time normalization procedures for major airports: LAX case study," National Center of Excellence for Aviation Operations Research, Berkeley, CA, 2001.

[26] M. Abdel-Atl, C. Lee and B. Y.Q., "Detecting periodic patterns of arrival delay," Journal of Air Transport Management, vol. 13, no. 6, pp. 355-361, 2007., in press.

[27] W. Vigneau, "Flight Delay Propagation, Synthesis of the Study," EUROCONTROL, EEC Note No 18/03, 2003.

[28] M. Janic, "Modeling the Large Scale Disruptions of an Airline Network," Journal of Transportation Engineering, pp. pp. 249-260, April 2005.

[29] D. Dai and J. Liou, "Delay Prediction Models for Departure Flights," Journal of the Transportation Research Board, Vols. CR-ROM, 2006.

[30] R. Jehlen, A. Klein, B. Sridhar and Y. Wang, "Modeling Flight Delays and Cancellations at the National, Regional and Airport Levels in the United States," in Eighth USA/Europe Air Traffic Management Research and Development Seminar, Napa, California, 2009.

[31] Neogi, N.A.; Naseri, A., "Using Hidden Markov Models to Detect Mode Changes in Aircraft Flight Data for Conflict Resolution," Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on , vol.5, no., pp.3732,3737, 8-11 Oct. 2006

[32] M. J. Russell and J. A. Bilmes. Introduction to the special issue on new computational paradigms for acoustic modeling in speech recognition. Computer speech and language, 17:107–112, April 2003.

[33] G. Welcha and G. Bishop. An introduction to the Kalman filter. Technical Report 95-041, University of North Carolina at Chapel Hill, Department of computer science, Chapel Hill, NC, USA, April 2004.

[34] R. E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME - journal of basic engineering, 83:35–45, 1960.

[35] Y Wang, M Papageorgiou, A Messmer, P. Coppola, A. Tzimitsi, A. Nuzzolo, “An Adaptive Freeway Traffic State Estimator”, Automatica, vol.45, no.1, pp. 10-24, 2009. doi: 10.1016/j.automatica.2008.05.019

72

[36] Kirubarajan, T., and Y. Bar-Shalom. "Kalman filter versus IMM estimator: when do we need the latter?." Aerospace and Electronic Systems, IEEE Transactions on 39.4 (2003): 1452-1457.

[37] Singer, Robert A., and Kenneth W. Behnke. "Real-time tracking filter evaluation and selection for tactical applications." Aerospace and Electronic Systems, IEEE Transactions on 1 (1971): 100-110.

[38] J. W. Pepper, K. R. Mills and L. A. Wojcik, "Predictability and uncertainty in air traffic flow management," in 5th USA/Europe Air Traffic Management R&D Seminar (ATM-2003), Metrics and Performance Management, Budapest Hungary, 2003., in press.

[39] N. e. a. Xu, "Estimation of Delay Propagation in the National Aviation System Using Bayesian Networks," in 6th USA-Europe ATM Seminar, 2005.

[40] L. Yu-jie and M. Song, "Flight Delay and Delay Propagation Analysis Based on Bayesian Network," in Knowledge Acquisition and Modeling, 2008, Wuhan, 2008.

[41] T. Dean and K. Kanazawa. Probabilistic temporal reasoning. In Proceedings of the 7th national conference on artificial intelligence (AAAI-88), pages 524–529, St Paul, MN, USA, August 1988. MIT Press.

[42] Kim, Sun Yong, Seiya Imoto, and Satoru Miyano. "Inferring gene networks from time series microarray data using dynamic Bayesian networks." Briefings in bioinformatics 4.3 (2003): 228-235.

[43] van Gerven, Marcel AJ, Babs G. Taal, and Peter JF Lucas. "Dynamic Bayesian networks as prognostic models for clinical patient management." Journal of biomedical informatics 41.4 (2008): 515-529.

[44] Langmead, Christopher J. "Generalized queries and Bayesian statistical model checking in dynamic Bayesian networks: Application to personalized medicine." (2009): 201.

[45] FAA, "Aviation System Performance Metrics (ASPM)," 01 2012. [Online]. Available: aspmhelp.faa.gov. [Accessed 1 November 2013].

[46] B. R., R. Hsu, L. Berry and J. Rome, "Preliminary evaluations of flight delay propagation through an airline schedule," in Proceedings of the 2nd USA/Europe air traffic management R&D seminar, Orlando, FL, 1998., in press.

[47] N. Rupp, "Further investigations into the causes of flight delays," Department of Economy, East Carolina University, East Carolina, NC, 2007., in press.

[48] Jensen, Finn V. An introduction to Bayesian networks. Vol. 210. London: UCL press, 1996. [49] K. B. Laskey, N. Xu and C.-H. Chen, "Propagation of delays in the national airspace system,"

in Proceedings of the Twenty-Second Conference Annual Conference on Uncertainty in Artificial Intelligence, Arlington, VA, 2006.

[50] Wang, Paul TR, Lisa A. Schaefer, and Leonard A. Wojcik. "Flight connections and their impacts on delay propagation." Digital Avionics Systems Conference, 2003. DASC'03. The 22nd. Vol. 1. IEEE, 2003.

[51] A. C. Eckstein, C. Kurcz and M. O. Silva, "Threaded Track: geospatial data fusion for aircraft flight trajectories," The MITRE Corporation, Mclean,VA, 2012.

[52] A.C. Eckstein, J. Heidrich, “Analysis of aircraft performance data for procedural and operational performance” The MITRE Corporation, Mclean, VA 2010.

[53] K. P. Murphy, "Dynamic Bayesian networks: representation, inference and learning.," (Doctoral dissertation, University of California), Berkeley, 2002.

[54] C. Berzuini, "Representing time in causal probabilistic networks," Uncertainty in artificial intelligence, vol. 5, no. Elsevier Science Publisher B.V, pp. 15-28, 1990., in press.

[55] W. Buntine, "Operations for learning with graphical models," Journal of Artificial Intelligence Research, vol. 2, pp. 159-225, 1994.

73

[56] A. Saluja, P. K. Sundararajan and O. J. Mengshoel, "Age-layered expectation maximization for parameter learning in Bayesian networks.," in Proceedings of Artificial Intelligence and Statistics (AIStats), La Palma, Canary Islands, 2012.

[57] J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, pp. 107-113, 2008.

[58] G. S. Horny, "ALPS: The age-layered population structure for reducing the problem of premature convergence," in 8th annual conference on Genetic and evolutionary computation, 2006.

[59] E. B. Reed and M. J. Ole, "Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce," in Proc. of Big Learning: Algorithms, Systems and Tools., 2012.

[60] M. Zaharia, C. Mosharaf, M. Franklin, S. Shenker and I. Stoica, "Spark: cluster computing with working sets," in HotCloud 2010, Boston, MA, 2010., in press.

74

Appendix A: Data-Fused Algorithms

The algorithms used in this data fusion and analysis are in a variety of software

languages, but are tied together in their ability to be described in a map-reduce

framework on Hadoop and driven by Oozie workflows.

Data Sources

The data-fused algorithms were defined on a data model used to construct an Apache

Oozie workflow for automated processing. External data blocks are used to define data

sources external to the Hadoop Distributed File System or not part of this system. The

majority of the algorithms stem from the Threaded Track as the trajectory source input.

Algorithms

Each algorithm may take inputs from one or more data elements and output one or more

data elements. However, algorithms must interface through physical data objects. This is

because Oozie workflows are built around each algorithm (or set of algorithms) as block

boxes with the input/output elements shown. This allows algorithms to be coded in

MATLAB, Java, Pig, Python, etc. and driven by the Oozie workflow.

http://mitrepedia.mitre.org/index.php/Image:LeviathanDataModelLegend.png

75

Figure A-1: Threaded Track Data Fused Algorithm Oozie Workflow

http://mitrepedia.mitre.org/index.php/Image:LeviathanDataModel.png

76

Data-Fused Algorithms Descriptions

The algorithms described below depict the detail behind Figure A-1. For example, the

first fusion algorithm discussed, Trajectory Fusion, discusses the input and output

considerations used in the development process. It is the development of these fusion

algorithms, that allowed for the development of the data schema shown in Appendix B,

and ultimately allowed for the research completed in this dissertation.

Trajectory Fusion

The trajectory fusion algorithm is the Threaded Track. This algorithm is used to fuse all

available radar data into a single synthetic trajectory with the highest fidelity coverage

throughout the flight envelope.

Input Considerations

Source quality may vary greatly between ASDEX, NOP Tracon, NOP Center, and ETMS

Output Considerations

Accuracy is highly source dependent. Users should check the active sensors field

which provides the contributing sensors at each point.

Points where the active sensor starts with "ETMS" are purely spliced from

ETMS_TZ messages. They have not been smoothed and will not contain several

parameters (climb gradient, accelerations, etc.).

The flight table provides traceability to the source data using the list of

nop/asdex/etms segment ids.

Flight arrival/departure fields are purely based on metadata information (no

trajectory information).

77

Not every flight will have an ETMS flight ID, but every ETMS flight is contained

within the threaded track.

The first 8 numbers in the flight ID correspond to the <yyyymmdd> date of the

first track point. Flights are partitioned into files based on this date.

All times are given in UTC.

Output points are synthetic and do not correspond to a single filtered radar hit.

Ruc Fusion

Ruc fusion is the process by which weather provided the Rapid Update Cycle Data is

fused to every track point in the Threaded Track. The key goal is to provide winds,

temperatures, etc., directly from interpolating variables from the RUC grid. This also

allows for calculating derived parameters such as airspeeds and Mach numbers.


RUC data from either the isobaric model or hybrid model may be used. These ruc

models provide distinct output data variables as well as different calculations.

RUC data coverage does not include Alaska or Hawaii.

The vertical interpolation is dependent on measurements of reference pressure

below FL180 (not provided in our radar data), which are estimated from

ASOS. If ASOS data is not available, standard reference pressure is assumed.

Threaded track times, positions, and altitudes are used to interpolate values from

the RUC 4F grid.

Threaded track ground speeds are used with the associated RUC variables to

calculate all airspeeds.

78


The RUC laterally/vertically extrapolated parameter can be used to identify when

measurements have been extrapolated outside the bounds of the RUC grid.

The ASOS snr field identifies the "signal strength" of the reference pressure

measurements on a 0-1 scale. When this field is "0", a standard reference

pressure is used. When this field is "1" it indicates that it is close to the ASOS

ground sensors.

Indicated airspeed and calibrated airspeed are taken to be equivalent.

Terrain Fusion

The Terrain Fusion algorithm is the process by which terrain elevation data sources are

used to identify the terrain elevation at each threaded track data point. This process also

runs after the RUC Fusion, which allows a computation of height above terrain by

examining the difference in the RUC derived geometric altitude and the terrain elevation.

Phases of Flight

The Phases of Flight algorithm is the process by which the Threaded Track is segmented

into generic flight envelope phases. One of the key components to this is identifying

surface points (with ASDEX) from in-flight points. The second component is to break in

flight sections into a single sequential sequence of ascending to cruise to descending.

This algorithm is typically the basis of most post-analysis since algorithms are generally

focused on one or more of these phases.


79

Air/Ground phases are identified purely from the threaded track ground speed

profile

Start of Cruise and top of descent are identified purely from the threaded track

pressure altitude profile


Phases are broken into their basic components: ground-takeoff-ascent-cruise-

descent-landing-ground (GTACDLG)

The sequence of these phases can be very useful for detecting merged and split

flights. When finding idiosyncrasies on other algorithms, it is recommended to

use the phase sequence as a quick check to better characterize the data.

Takeoff and Landing points may not precisely correspond to known physical

runway locations in all instances (this is a statistical process). The runway

locations are specifically excluded to prevent biasing results (by clipping the tails

of the distributions). Also, the altitudes many not have an exact correspondence

to these points, and can be very deceiving (especially for ASDEX data).

The ascent and descent phases are in more of the global flight envelope

context. Each of these phases may contain portions of flight with a positive,

negative, or neutral climb rate.

Top of descent is a fairly subjective measure, but in this case is intended to

identify more in the ATC context (rather than the pilot context), where all

subsequent vertical maneuvers are directed toward moving the aircraft toward the

approach, and all vertical maneuvers prior to the point are merely for the purposes

of en route separation and spacing.

80

Lateral Taxonomy

The Lateral Taxonomy algorithm is the process by which the Threaded Track is

segmented into groups of lat/lon points which can be described as straight / turn

segments. Points on the ground (identified in Phases Of Flight) are assigned a "ground"

type and ETMS points (threaded track source is ETMS) are assigned a "etms" type, since

they are segmented with a douglas-pueker algorithm using great circle distances, where

as the standard segmentation algorithm is a hybrid least squares algorithm using great

circles (straight) and small circles (turns). One of the main uses is that this segmented

version creates a simplified version which should contain less general variance, so any

individual track point with moderately higher deviation won't appear in the segmentation.


Based on threaded track latitude/longitude measurements


Ground and ETMS segments are determined purely from a Douglas-Pueker

algorithm using great circles.

Straight, Left, and Right turn segments are determined from a combination

Douglas-Pueker / least squares algorithm designed to optimize the estimate

around turns.

A single turn maneuver may be broken into several turn segments based on

changes in the apparent radius of the turn. This can be very common in turns to

final.

Straight segments will always provide the center point to the right of the segment.

81

Vertical Taxonomy

The Vertical Taxonomy algorithm is the process by which the Threaded Track is

segmented vertically into linear segments where the aircraft is ascending / level /

descending. This is the primary source for level-off computation metrics. Segments can

be divided into either constant climb gradient or constant climb rate depending upon the

specific goals of the analyst. Furthermore, the use of pressure altitude versus geometric

altitude should be considered. The algorithm relies on the use of linear least squares

segmentation.


Based on threaded track pressure altitude trajectory.

Segmentation is based on segments of constant climb gradient (not constant climb

rate).


A vertical segment is assigned to be "level" ("L") based on a threshold of the

climb gradient as well as the vertical altitude change over the duration of the

segment.

Level segments may still have a small altitude change.

Because of the high levels of quantization (relative to noise), the algorithm

performs better in higher climb gradients than shallow ones. This means that the

error may be highest for near-level flight (as high as mode C quantization), but

will reduce with increased climb/descent gradient.

There is no restriction on minimum/maximum segment length.

Segments are always split when the phase of flight changes (e.g. top of descent,

82

touchdown, etc.).

Speed Taxonomy

The Speed Taxonomy algorithm is the process by which the Threaded Track is

segmented temporally into linear segments where the aircraft is accelerating / constant

speed / decelerating. This process could be applied to ground speed, true airspeed,

indicated airspeed, or Mach number depending on the application. The algorithm relies

on the use of linear least squares segmentation.

Runway Assignment

The Runway Assignment algorithm attempts to assign both arrival and departure runway

assignments as well as refine the airport assignment in the Threaded Track flight table.

The algorithm generally considers the trajectory in terms of distance, heading, lateral

deviation, and altitude relative to a given airport/runway. Any available arrival and

departure fields from the flight plan are used to increase the weights toward those

assignments. If there is not enough information to make an association with an airport or

runway, the field may be null.


The threaded track trajectory (latitude, longitude, altitude, and heading) is used to

compute a scoring function against a particular airport/runway based on

proximity.

The threaded track arrival and departure airports are used (when not null) as an

increased weight in the scoring function to favor these airports. The magnitude of

83

the increase is source dependent (ETMS, Center, Tracon, etc.).

The phases of flight is used to remove ground points, preventing points on

taxiways and other surface patterns from interfering with the scoring function.

If a merged flight occurs, the runway assignment is assigned based on the longest

in flight segment identified in the phases of flight.


If the assigned airport/runway is null, then no reasonable assignment could be

made with any statistical confidence.

If the assigned airport is not null, but the respective airport probability is "-1",

then the assignment was based purely on the threaded track airport assignment,

and the trajectory information was not able to confirm or deny the assignment.

The airport IDs are mapped to the ICAO identifier when possible and use the

FAA identifier when no ICAO identifier exists (this may be different than the

threaded track airport ID for the same facility).

Probabilities are expected to be very source dependent (e.g. an ASDEX arrival

might have a substantially higher score than an ETMS only arrival)

Alternate airport/runway assignments provide a second best guess when there are

multiple reasonable scores (e.g. closely spaced parallel runways). The alternate

odds give the ratio of the alternate score to the primary score.

The selected trajectory point for the runway scoring function is also provided in

the output.

84

Missed Assignment

The Missed Approach algorithm attempts to identify both go-arounds and missed

approaches for a flight (may be assigned one or multiple). The algorithm is primarily

guided by the altitude profile provided by the vertical taxonomy but will also consider the

lateral position relative to an airport. When an event is detected, the algorithm will

attempt to assign an approach runway associated with the event.


Based purely on vertical profile and proximity to airport. No voice commands are

used in this estimate.


Each flight may have none, one, or multiple missed segments assigned.

These segments may occur for missed approaches, go arounds, test flights,

training flights, etc.

Consider filtering this candidate list to the desired output. Several factors to

consider might be: commercial vs general aviation, number of missed approach

segments, same departure/arrival airport, non-standard phases of flight sequence,

maximum flight altitude, etc.

85

Procedure Assignment

The Procedure Assignment algorithm provides a series of metrics in which the

conformance of the Lateral Taxonomy is measured against ground track (fix to fix over

specific ground path) procedure legs from JEPPESEN. Procedures filed or amended in

the flight plan are given special consideration in this measurement of conformance.

Equipage does not affect the assignment, but may be correlated separately.


The lateral trajectory segments are used to measure conformance against

individual legs. No comparison is made to individual trajectory points.

Routes from the ETMS_RT table are used to provide context in the flight plan.

Only fix-to-fix procedure legs with a single ground path are considered (since

conformance is ill-defined for other leg types).

No voice clearance information is used in these algorithms - assignments are

based purely on flight plans and observed conformance.

Vertical and speed conformance are not required to be assigned to a procedure.


Leg Segments are computed when there is a minimal likeness between a lateral

track segment and a procedure leg. Not every leg segment is used to compute a

procedure assignment. This allows for users with a more relaxed definition of

conformance to still utilize this table with their own definition.

Procedure segments are computed when specific standards of conformance are

met from the leg segments. These factors include measured deviations from the

leg and whether the leg segment is an overlay of another leg segment on a flight

86

plan filed procedure.

Only a single procedure can be assigned for the Sid, star, and approach. Multiple

procedures may occur for the en route procedure, but not concurrently (e.g. en

route overlay).

If no candidate procedure exists, or there is no clear best assignment from

multiple candidates, then no procedure will be assigned.

Procedure assignment has minimal requirements on distance flown, but will also

provide conformance distances for users to filter as they choose.

87

Holding Assignment

The Holding Assignment algorithm searches aircraft flight paths for loops and evaluates

detected looping patterns against geometric criteria to compute a metric representing the

confidence that the given pattern represents aircraft holding. Military flights, FAA check

flights, and flights that start and end at the same airport are excluded from consideration.


Lateral Segments are used to determine the path of flight. Extremely short

segments are combined to avoid spurious loop detection (intersection of a path

with itself).

Threaded Track Flights are used to obtain origin and destination airports and to

obtain the call sign, which is used to identify military and FAA check flights.


Lateral segments that were combined for purposes of computation are restored for

purposes of reporting the set of lateral segments defining a holding segment.

A lateral segment entering or exiting a holding pattern may be truncated, so that

only part of a lateral segment is considered part of the holding segment, if the

lateral segment extends beyond the region defined by the hold's looping pattern.

A holding segment may include lateral segments, or portions of lateral segments,

that are not part of the loop or series of loops used to detect the hold, if the

segments are close to the region enclosed by the loop.

88

Fuel Burn

The BADA 3.9 model is used to compute instantaneous fuel flow using International

Standard Atmosphere, total energy model enabled and a mass bleed computation using

the BADA nominal aircraft mass for the initial mass. The following filters are applied

The aircraft is ascending, descending or in the cruise phase as specified in the vertical

segment phase of flight field (see Phases of Flight). For all other points no fuel flow is

computed. The aircraft is supported by BADA. If the aircraft type is not supported NaN's

are returned for all points that pass the above filter. The altitude is between 0 and the

BADA maximum altitude padded by an additional 20%. If not, NaN's are returned for

these points. The true airspeed is between the BADA stall speed and Vmos (note a

conversion from calibrated airspeed is performed). If not, NaN's are returned for these

points.


The following inputs (which are mapped to Threaded Track or other data) are required

for the BADA model.

Aircraft Type – Threaded Flight. If aircraft is not supported then NaN's are

returned

Climb rate – Vertical Segment

True Airspeed – RUC Track

Acceleration - Threaded Track

Altitude – Threaded Track

Aircraft Mass – Nominal Mass Provided by BADA Model

89


Track level fuel burn provides the current instantaneous fuel flow (in kilograms

per minute) for each threaded track point, the accumulated fuel mass (NaN fuel

flow points are linearly interpolated over), mass of aircraft (starting from initial

BADA mass), and three derivative terms that will be used in later versions to

propagate error. The accumulated fuel mass is computed after the all

instantaneous fuel flow computations for a particular flight are complete. If fuel

flow values are NaN, the fuel mass computation uses linear interpolation or

nearest neighbor extrapolation to estimate the value. This can result in entire

aggregate computation (see below) to be based on extrapolated data.

Aggregate (segment) level fuel burn provides the total amount of fuel burned (in

kilograms) from specified radius rings from departure and arrival airport. The

author used the location of the airport assigned in the runway table. A radius ring

centered at airport origin is used when an assigned airport is provided or the first

and last valid track point (that is ascending, descending, or in cruise) is used when

no airport is provided. We report the distance from the origin of the ring and the

first/last track point. Because instantaneous fuel flow values can be NaN due to

e.g. unpopulated altitude, the percentage of interpolation/extrapolation is

provided.

90

Arrival Throughput

For each flight, arrival throughput is calculated for the 15 minute period ending with the

arrival of the flight, the 15 minute period centered on the arrival time of flight, and the 15

minute period beginning with the arrival time of the flight. Both throughput for the

arrival airport and throughput for the runway on which the flight landed are calculated. In

addition, the inter-arrival times between the flight and the one immediately preceding it,

and between the flight and the one immediately following it, are recorded for both the

arrival airport and the arrival runway, and the identities of the preceding and following

aircraft are recorded.


The following data items are required:

Arrival airport - Flight Runway

Arrival runway - Flight Runway

Wheels-down time - Flight Phase

Data are required not only for the day, for which arrival throughput is calculated,

but also for the preceding and following day. Throughput periods for flights near

the end of the day may extend into the preceding or following day; the

immediately preceding or following aircraft may have landed the preceding or

following day; aircraft that land on the same day may have originated on the

preceding day, with the result that their arrival data may be recorded in the

previous day's runway and phase files; and a given aircraft for this day (as

determined by its threaded track ID and its presence in this day's runway and

phases files) may have landed on the following day.

91


All throughput and inter-arrival time computations are based on wheels-down

times. Throughputs are reported as the count of aircraft in a 15-minute period.

Aircraft are assigned to a given day's throughput data based on threaded track id,

not arrival time.

92

Appendix B: Data Schema

Appendix B depicts all the variables tested and analyzed for input into the DBN models

developed. Specifically, this section goes over the schema of each variable which

includes variable type, format, values, and general description. This tie in with Appendix

A because the data variable schemas discussed are the resultant output (yellow-filled in

boxes from Figure A-1) from the algorithm fusion work.

Type and Format

The type column in each table describes the natural primitive type that the field can be

cast to such that the specified format can be used to convert the field back to the same csv

string without a loss of precision. Specific notes:

Unix times are read as <long>

threaded track ID is represented as a <long>

latitude and longitude require <double>

<Boolean> is represented as "0" and "1" in csv strings

Flight Data

Flight data records consist of a single csv record per flight. Each record is unique on the

threaded track id.

93

Threaded Flight

Column Name Type Format Units / Value Description

1 threadedTrackID long

<yyyymmddxxxxxx>

primary key to threaded

track

2 dataType string "F" unique data type identifier

3 dataVersion string "1.1","1.1.1","1.1.2" data version

(<schema>..<update>)

4 aircraftID char

aircraft ID / callsign

5 departureAirport char

reported departure airport

6 arrivalAirport char

reported arrival airport

7 aircraftType char

aircraft type identifier

8 firstMessageTimeUnix long

first synthetic track point

message time, in unix time

9 firstMessageTimeString char

first synthetic track point

message time, in string

format

10 lastMessageTimeUnix long

last synthetic track point

message time, in unix time

11 lastMessageTimeString char

last synthetic track point

message time, in string

format

12 etmsFlightID long

etms flight ID, null if not

linked to ETMS data

13 etmsDepartureDateUnix long

ETMS flight departure

date and time, in unix time

14 etmsDepartureDateString char

ETMS flight departure

date and time, in string

format

15 etmsArrivalDateUnix long

ETMS flight arrival date

and time, in unix time

16 etmsArrivalDateString char

ETMS flight arrival date

and time, in string format

17 asdexDepartureFlag boolean 0,1

flag to indicate when flight

has ASDE-X data at its

departure

18 asdexArrivalFlag boolean 0,1

flag to indicate when flight

has ASDE-X data at its

arrival

19 facilities char

List of sensors that

contributed to the synthetic

track. Sensors are

delimited by pipes and

94

labeled by a facility and

sensor (or just facility

when appropriate) and are

given in order of first

occurrence.

20 trackDate char yyyymmdd

first message date (no

time)

21 segmentIDs char

list of segmented track IDs

delimited by pipes

22 qualityMessage char

identifier to flag certain

events in the smoothing

process

23 allAircraftID char

list of all reported aircraft

IDs (delimited by pipes)

24 allDepartureAirport char

list of all reported

departure airports

(delimited by pipes)

25 allArrivalAirport char

list of all reported arrival

airports (delimited by

pipes)

26 allAircraftType char

list of all reported aircraft

Types (delimited by pipes)

27 modeSCode char

mode S code (24 bit

address) - from asdex data

28 allModeSCode char

list of all reported mode S

code

95

Phases Flight

Column Name Type Significant

Digits

Units /

Value Description


<yyyymmddxxxxxx> primary key to

threaded track

2 dataType string "Q" unique data type identifier

3 dataVersion string

"3.y.z" data version


4 threadedTrackVersion string

data version of threaded track (pair to

column 1)

5 sequence string

sequence of phases of flight identified

by the PhasesTrack ordered by time

6 throttleUpTime long

milliseconds unix time; estimated start of ground

roll on takeoff runway (asdex only)

7 wheelsUpTime long milliseconds unix time; estimated rotation point on

takeoff runway (asdex only)

8 startOfCruiseTime long milliseconds unix time; estimated start of cruise

9 topOfDescentTime long milliseconds unix time; estimated top of descent

10 wheelsDownTime long milliseconds unix time; estimated runway

touchdown time

11 taxiToGateTime long milliseconds unix time; estimated end of landing

deceleration / runway exit time

12 multipleTakeoff boolean

13 multipleLanding boolean

14 multipleInFlight boolean

96

Runways Flight

Column

Name Type

Significant

Digits

Units /

Value Description


<yyyymmddxxxxxx>

primary key to threaded track

2 dataType string

"A" unique data type identifier

3 dataVersion string "3.y.z"

data version

(<schema>.<code>.<update>

)


data version of threaded track

(pair to column 1)

5 departureAirport string

6 departureRunway string

7 arrivalAirport string

8 arrivalRunway string

9 departureAirportProbability float 2

10 departureRunwayProbability float 2

11 arrivalAirportProbability float 2

12 arrivalRunwayProbability float 2

13 departureAlternateRunway string

14 departureAlternateRunwayOdd

s float 2

15 departureAlternateAirport string

16 departureAlternateAirportOdds float 2

17 departurePointTime long millisecond

s

epoch time of track point

with best score for assigned

departure runway

18 departureLatitude doubl

e 6

latitude of track point with

best score for assigned

departure runway

19 departureLongitude doubl

e 6

longitude of track point with

best score for assigned

departure runway

20 departurePressureAltitude float 0

21 departureTrackHeading float 2

22 arrivalAlternateRunway string

23 arrivalAlternateRunwayOdds float 2

24 arrivalAlternateAirport string

25 arrivalAlternateAirportOdds float 2

97

26 arrivalPointTime long millisecond

s

epoch time of track point

with best score for assigned

arrival runway

27 arrivalLatitude doubl

e 6

latitude of track point with

best score for assigned arrival

runway

28 arrivalLongitude doubl

e 6

longitude of track point with

best score for assigned arrival

runway

29 arrivalPressureAltitude float 0

30 arrivalTrackHeading float 2

31 nfdcDatabaseDate string

98

Procedure Flight


Digits

Units /

Value Description



threaded track

2 dataType string "Y" unique data type identifier

3 dataVersion string "3.y.z" data version

(<schema>.<code>.<update>)



column 1)

5 sid string

6 enRoute string

7 star string

8 approach string

9 sidConformingDistance float 3

10 enRouteConformingDistance float 3

11 starConformingDistance float 3

12 approachConformingDistance float 3

13 sidOverlayDistance float 3

14 enRouteOverlayDistance float 3

15 starOverlayDistance float 3

16 approachOverlayDistance float 3

17 allSid string

pipe delimited dist

18 allEnRoute string

pipe delimited dist

19 allStar string

pipe delimited dist

20 allApproach string

pipe delimited dist

21 flightPlan string

pipe delimited dist

99

Arrival Throughput


Digits

Units /

Value Description


<yyyymmddxxxxxx> primary key

to threaded track

2 dataType string "AT" unique data type identifier




data version of threaded track (pair

to column 1)

5 aptPrecedingFlt long

threadedTrackID of immediately

preceding arrival at airport

6 rwyPrecedingFlt long


preceding arrival on same runway

7 aptTrailingFlt long


following arrival at airport

8 rwyTrailingFlt long


following arrival on same runway

9 aptPrecedingIAT long milliseconds Time from immediately preceding

arrival at airport to this arrival

10 rwyPrecedingIAT long

milliseconds Time from immediately preceding

arrival on runway to this arrival

11 aptTrailingIAT long

milliseconds

Time from this arrival to

immediately following arrival at

airport

12 rwyTrailingIAT long

milliseconds

Time from this arrival to

immediately following arrival on

runway

13 aptPrecedingThroughput integer count per 15

minutes

airport throughput for 15 minutes

preceding this arrival

14 aptCenteredThroughput integer count per 15

minutes


centered on this arrival

15 aptTrailingThroughput integer count per 15

minutes


following this arrival

16 rwyPrecedingThroughput integer count per 15

minutes

runway throughput for 15 minutes

preceding this arrival

17 rwyCenteredThroughput integer count per 15

minutes


centered on this arrival

18 rwyTrailingThroughput integer count per 15

minutes


following this arrival

100

Segment Data

Segment data records are unique on the threaded track id and segment id. There are a

variable number of records per flight, but typically a fraction of the number of track

records.

Asos Segment


Digits

Units /

Value Description



threaded track

2 dataType string

"C" unique data type identifier





column 1)

5 segmentID long milliseconds unix time, phases of flight point

6 segmentType string

phase of flight identifier

7 weatherTime long milliseconds time of reported data

8 ceiling float 0

9 visibility float 3

10 rvr float 3

11 rvrRunway string

12 vorrvr float 3

Minimum of Visibility and RVR

13 temperature float 1

14 dewPointTemperature float 1

15 windChillFactor float 2

16 heatIndex float 2

17 tempAndHumidityIndex float 2

18 relativeHumidity float 2

19 windDirection float 2

20 windSpeed float 2

21 barometricPressure float 2

22 significantWeather string

23 windGust float 1

24 peakWindDirection float 2

101

25 peakWindSpeed float 1

26 peakWindHour float 0

27 peakWindMinute float 0

102

Lateral Segment


Digits

Units /

Value Description



threaded track

2 dataType string "L" unique data type identifier






column 1)

5 segmentID int


ground (G), left turn (L), right turn (R),

straight (S), en route (E)

7 phase string

maps to phase of flight

8 startTime long milliseconds unix time; does not necessarily map to

threaded track record

9 endTime long milliseconds unix time; does not necessarily map to

threaded track record

10 startDistance float 3

along track distance; does not

necessarily map to threaded track

record

11 endDistance float 3

along track distance; does not

necessarily map to threaded track

record

12 startLatitude double 6

13 startLongitude double 6

14 startHeading float 2

15 endLatitude double 6

16 endLongitude double 6

17 endHeading float 2

18 centerLatitude double 6

19 centerLongitude double 6

20 turnRadius float 3

21 turnDirection int

left (-1), right or straight (1)

22 stdResidual float 3

23 maxResidual float 3

24 leftContinuous string

continuous (C), discontinuous (D),

endpoint (E)

25 rightContinuous string


endpoint (E)

103

Vertical Segment


Digits

Units /

Value Description



threaded track

2 dataType string "V" unique data type identifier





column 1)

5 segmentID int


ascending (A), descending (D), level

(L)

7 phase string

maps to phases of flight

8 startTime long milliseconds unix time; does not map to threaded

track record

9 endTime long milliseconds unix time; does not map to threaded

track record



12 startPressureAltitude float 0

13 endPressureAltitude float 0

14 pressureClimbGradient float 0

15 geometricClimbGradient float 0

16 minimumClimbRate float 0 ft/min

17 averageClimbRate float 0 ft/min

18 maximumClimbRate float 0 ft/min





endpoint (E)



endpoint (E)

104

Speed Segment


Digits

Units /

Value Description



threaded track

2 dataType string "S" unique data type identifier





column 1)

5 segmentID int


accelerating (A), decelerating (D),

constant (C)

7 phase string

maps to phases of flight

8 startTime long milliseconds unix time; does not map to threaded

track record

9 endTime long milliseconds unix time; does not map to threaded

track record



12 startGroundSpeed float 1

13 endGroundSpeed float 1

14 groundAcceleration float 0





endpoint (E)



endpoint (E)

105

Missed Segment


Digits

Units /

Value Description



threaded track

2 dataType string "M" unique data type identifier





column 1)

5 segmentID int


7 firstVerticalSegmentID int

maps to vertical segment

8 lastVerticalSegmentID int

maps to vertical segment

9 startTime long milliseconds unix time; maps to threaded track

record

10 endTime long milliseconds unix time; maps to threaded track

record



13 missedApproachHeight float 0

14 clearanceLimit float 0

15 approachAirport string

airport is same as assigned arrival

airport in runways flight

16 approachRunway string

not currently evaluated

17 nfdcDatabaseDate string

link to nfdc database cycle

106

Leg Segment


Digits

Units /

Value Description



threaded track

2 dataType string "W" unique data type identifier





column 1)

5 segmentID int

6 lateralSegmentID int

maps to lateral segment record

7 procedureName string

8 procedureType string

9 regionCode string

10 airportID string

ICAO airport code

11 transitionID string

12 sequenceNumber int

Lookup Value for Leg Fix names

13 startLegType string

ARINC leg type

14 endLegType string

ARINC leg type

15 legLength float 3

path length from fix to fix along

procedure

16 startTrackDistance float 3

With respect to track

17 endTrackDistance float 3

With respect to track

18 startLegDistance float 3

With respect to procedure

19 endLegDistance float 3

With respect to procedure

20 startDeviation float 3

21 endDeviation float 3

22 angularDeviation float 2

23 radiusDeviation float 3

difference between lateral radius and leg

radius

24 lateralResidual float 3

25 isCandidateProcedure boolean

leg is contained in its procedure candidate

list [X data]

26 isAssignedProcedure boolean

column 24 is true, and its procedure was

assigned [Y data]

27 jeppesenCycle string

2 digit year followed by integer count

107

Procedure Segment


Digits

Units /

Value Description



threaded track

2 dataType string

"X" unique data type identifier






column 1)

5 segmentID int

6 legSegmentIDs string

7 procedureName string

8 procedureType string

9 regionCode string

10 airportID string

11 startTrackDistance float 3

12 endTrackDistance float 3

13 conformingDistance float 3

14 overlayDistance float 3

15 maxDeviation float 3

16 legTransitionSequence string

pipe delimited list of

<transition>:<sequence>

17 flightPlan float 0 milliseconds time of first use in flightplan

18 isAssignedProcedure boolean

19 jeppesenCycle string

2 digit year followed by integer count

108

Fuel Segment

Column Name Type Format Units /

Value Description



threaded track

2 dataType string "G" unique data type identifier

3 dataVersion string "3.y.z" data version (<schema>..<update>)



column 1)

5 segmentID int

6 radiusRing float

nautical

miles

radius ring used to compute fuel consumed;

-1 is used for entire flight

7 arrivalDepartureFlag string

used to identify fuel consumed for arrival or

depature; 'A'=arrival, 'D'=departure,

'AD'=entire flight

8 fuelMass float

kilograms

9 startTime long

milliseconds epoch time associated with first point where

fuel flow was computed

10 endTime long

milliseconds epoch time associated with last point where

fuel flow was computed

11 interiorInterpolation float

percentage (between 0 and 1) of time fuel

flow was interpolated to determine mass

12 leftExtrapolation float


flow was extrapolated after start time

13 rightExtrapolation float


flow was extrapolated before end time

14 distanceFromStart float

nautical

miles

for depatures - distance from first point

with fuel flow to departure airport. if no

departure airport is assigned the first track

point with phase is the origin of the ring and

this value is 0

for arrivals - difference between radius ring

and the radius of first track point inside

radius ring

15 distanceFromEnd float

nautical

miles

for depatures - difference between radius

ring and the radius of last track point inside

radius ring

for arrivals - distance from last point with

fuel flow to arrival airport. if no arrival

airport is assigned the last track point is the

origin of the ring and this value is 0

109

Holding Segment


Value Description



threaded track

2 dataType string "H" unique data type identifier




column 1)

5 segmentID integer

6 startTime long

milliseconds time aircraft entered the hold

7 endTime long

milliseconds time aircraft exited the hold

8 startDist float

nautical

miles

distance along flight track that the holding

segment begins

9 endDist float

nautical

miles

distance along flight track that the holding

segment ends

10 startLat float

degrees latitude of the beginning of the holding

segment

11 startLon float

degrees longitude of the beginning of the holding

segment

12 endLat float

degrees latitude of the end of the holding segment

13 endLon float

degrees longitude of the end of the holding

segment

14 confidence float

0.00 - 1.00 higher values represent more confidence

that the segment is a hold

15 intialSeg integer

segmentID of first lateral segment

included in the holding segment

16 finalSeg integer

segmentID of last lateral segment

included in the holding segment

17 initSegTrunc boolean

true if only part of initSeg is included in

the holding segment

18 finalSegTrunc boolean

true if only part of initSeg is included in

the holding segment

110

Track Data

Track data records are unique on the threaded track ID and time. The number of track

records must be equivalent across all data types. If there are any track records for a given

data type that do not have values, track records with null fields are populated.

Threaded Track

Column Name Type Format Units / Value Description


<yyyymmddxxxxxx> primary key

to threaded track

2 dataType string

"T" unique data type identifier

3 dataVersion string "1.1","1.1.1","1.1.2" data version


4 time long

milliseconds unix time, exact match for track

point

5 latitude num

degrees synthetic position

6 longitude num

degrees synthetic position

7 pressureAltitude num

feet synthetic position

8 rawModeCAltitude num

100s of feet

raw mode C altitude report of

most dominant contributing sensor

using nearest neighbor temporal

interpolation

9 alongTrackDistance num

NM

derived along track distance

(cumulative along track distance

normalized to first track point)

10 groundSpeed num

knots derived ground speed

11 trackHeading num

degrees derived track bearing (true)

12 trackCurvature num

1 / NM derived track curvature (inverse

radius of curvature)

13 groundAcceleration num

knots / min rate of change of ground speed

with respect to time

14 climbGradient num

feet / NM derived climb gradient

15 crossTrackSmoothing num

NM

estimate of cross track RMS error

from the source data that was

smoothed out in the synthetic

trajectory

16 alongTrackSmoothing num

NM

estimate of along track RMS error

from the source data that was


111

trajectory

17 verticalTrackSmoothing num

feet

estimate of vertical track RMS

error from the source data that was


trajectory

18 lateralTrackBias num

NM

estimate of lateral bias error

between the source data and the

synthetic track (zero when only

one contributing sensor)

19 verticalTrackBias num

feet

estimate of vertical bias error

between the source data and the

synthetic track (zero when only

one contributing sensor)

20 activeSensors

FAC:SEN|...

List of sensors that contributed to

the current synthetic track point.

Sensors are delimited by pipes and

labeled by a facility and sensor (or

just facility when appropriate) and

are given in order of first

occurrence.

112

Phases Track


Value Description



threaded track

2 dataType string

"P" unique data type identifier




column 1)

5 time long

milliseconds unix time, exact match for track point

6 phase string

ground (G), takeoff roll (T), ascent (A),

cruise (C), descent (D), landing roll (L)

113

Terrain Track


Value Description



threaded track

2 dataType string "Z" unique data type identifier

3 dataVersion string "3.y.z" data version (<schema>.<code>.<update>)



column 1)

5 time long milliseconds unix time, exact match for track point

6 terrainElevation float %0.0f

7 heightAboveTerrain float %0.0f

8 terrainSource string

114

Rapid Update Cycle (RUC) Track


Value Description



threaded track

2 dataType string "R" unique data type identifier





column 1)

5 time long milliseconds unix time, exact match for track point

6 trueAirspeed float 1 knots

7 indicatedAirspeed float 1

8 machNumber float 4

9 geometricAltitude float 0 feet

10 windMagnitude float 1

11 windDirection float 2

relative to true north

12 staticTemperature float 1

13 verticalVelocityPressure float 2

14 humidityMixingRatio float 2

relative for isobaric model, absolute

for hybrid model

15 cloudMixingRatio float 2

boolean for isobaric model, float for

hybrid model

16 rainMixingRatio float 2

17 snowMixingRatio float 2

18 iceMixingRatio float 2

19 turbulentKineticEnergy float 2

20 asosSNR float 2

21 rucInterpolationTime int

absolute time difference from threaded

track record to closest RUC data point

22 rucLaterallyExtrapolated boolean

threaded track record is outside the ruc

lateral grid

23 rucVerticallyExtrapolated boolean

threaded track record is outside the

vertical grid

24 rucFile1 string

ruc data hour for left side interpolant.

<m> is the ruc model and can take

values of "I" (isobaric) and "H"

(hybrid); <f> is the forecast hour.

115

Fuel Track


Value Description



threaded track

2 dataType string "B" unique data type identifier


"3.y.z" data version (<schema>.<code>.<update>)



column 1)

5 time long

milliseconds unix time, exact match for track point

6 instantaneousFuelFlow float kg/min

7 accumulatedFuelMass float

kg total mass of fuel burn up to this point

8 aircraftMass float

kg total mass of aircraft starting from BADA

nominal

9 errorTerm1 float

10 errorTerm2 float

11 errorTerm3 float

12 BADASoftwareVersion string

Documents

A DATA-DRIVEN SUPPORT SYSTEM FOR AIRCRAFT TRAJECTORY