21
Semantic Representation and Scale-up of Integrated Air Traffic Management Data Rich Keller, Ph.D. * Shubha Ranjan + Mei Wei * Michelle Eshow *Intelligent Systems Division / Aviation Systems Division + Moffett Technologies, Inc. NASA Ames Research Center Point of contact: [email protected] Work funded by NASA’s Aeronautics Research Mission Directorate International Workshop on Semantic Big Data, San Francisco, USA, July 1, 2016

Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Semantic Representation and Scale-up of Integrated Air Traffic Management Data

Rich Keller, Ph.D.* Shubha Ranjan+

Mei Wei* Michelle Eshow

*Intelligent Systems Division / Aviation Systems Division +Moffett Technologies, Inc.

NASA Ames Research Center

Point of contact: [email protected]

Work funded by NASA’s Aeronautics Research Mission Directorate

International Workshop on Semantic Big Data, San Francisco, USA, July 1, 2016

Page 2: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Aviation Data is Big Data

• Volume: 30M+ flights yearly 3.6B passengers forecast for 2016

• Variety: flight tracks, weather maps, aircraft maintenance records, flight charts, baggage routing data, passenger itineraries

• Velocity: high frequency data from aircraft surveillance systems and on-board health & safety systems 24x7

Page 3: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

New Project

Build a large queryable semantic repository of air traffic management (ATM) data using semantic integration techniques

Page 4: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

? The Big Question ?

Can semantic representations scale up to accomplish practical tasks using Big Data?

Conduct a scale-up experiment to answer the question

Page 5: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Outline

• Aviation Data Integration Problem

• Semantic Integration Approach

• Design of our Scale-up Experiment

• Results

• Approaches to Improving Scale-up Performance

• Conclusions

Page 6: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Background: Aviation Data Integration Problem

• NASA researchers require historical ATM data for future airspace concept development & validation

• NASA Ames’ ATM Data Warehouse archives data collected from FAA, NASA, NOAA, DOT, industry

– Warehouse captures 13 sources of aviation data:

• flight tracks, advisories, weather data, delay stats

• some from live feeds and some from periodic updates

– Data holdings available back to 2009

– 30TB of data; some in a database; most in flat files

Page 7: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Problem: Non-integrated Data

• ATM Warehouse data is replicated & archived in

its original format

• Data sets lack standardization –data formats –nomenclature – conceptual structure

• To analyze and mine data, researchers must

download data and write special-purpose

integration code for each new task

Huge time sink!

• Possible cross-dataset mismatches: – terminology – scientific units – temporal/spatial

alignment – conceptualization

organization

Page 8: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Proposed Solution Relieve users of responsibility for integration

Integrate Warehouse data sources on the server side

using Semantic Integration

Page 9: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

data sources

Semantic Integration Approach: Prototype System Diagram

SPARQL Queries

Other Data Sources

translators

Integrated ATM

Data Store

Flight Track

Airspace Advisories

Weather

FAA

ATM Warehouse(

subset)

ASPM

Airlines, Aircraft Airport Info

Common Cross-ATM Ontology

Large Triple Store

Page 10: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Meteorology

• 150+ classes • 150+ datatype properties • 100+ object properties

ATM Ontology Airspace

Page 11: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Ontology Representation of a Flight

Aircraft Fix #1

aircraft flown

model

manufacturer

has fix

Flight Track for DAL1512

Aeronautical Flight Weather Equipment

Industry

KEY

KATL Airport • airport name: Hartsfield-Jack… • FAA airport code: ATL • ICAO airport code: KATL • located in state: GA • offset from UTC: -5

Flight DAL1512 • actual arrival: 2012-09-08T20:35 • actual depart: 2012-09-08T19:03 • call sign: DAL1512 • user category: commercial • flight route string: KATL.CADIT6…

Delta Air Lines • name: Delta Air Lines • callsign: DELTA • ICAO carrier code: DAL • IATA carrier code: DL

KORD Airport • airport name: O’Hare Intnl. • FAA airport code: ORD • ICAO airport code: KORD • located in state: IL • offset from UTC: -6

Aircraft N342NB • registrant: Delta Air Lines, Inc. • serial number: 1746 • certificate issue: 2009-12-31 • manufacture year: 2002 • mode S code: 50742752 • registration number: N342NB

A319-111 • AC type designator: A319 • model ID: A391-111 • number engines: 2

AircraftTrackPoint #2 • reporting time: 2012-09-08T19:03:32 • sequence number: 2 • ground speed: 184 • altitude: 3600.0 • latitude: 33.65 • longitude: -84.48333

Aircraft Fix #1 AircraftTrackPoint #1 • reporting time: 2012-09-08T19:03:00 • sequence number: 1 • ground speed: 461 • altitude: 3700.0 • latitude: 33.6597 • longitude: -84.495555

KATL METAR @18:52 KATL Weather@18:52 • dewpoint: 19 • report time: 2012-09-08T18:52 • report string: KATL 301852Z 11004KT… • surface pressure: 1010.1 • surface temperature: 22

Rway 09R/27L • runway ID = 09R/27L

has flight Path

next fix

Airbus

Page 12: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Experimental Methodology 1. Develop ontology

2. Write data source translators

3. Run translators to generate data for a period covering one day of air traffic to/from a major airport (Atlanta): 1342 flights; ~2.4M triples

4. Load data into two commercial triple stores (AllegroGraph/Franz and GraphDB/Ontotext)

5. Develop a set of SPARQL performance benchmark queries and run on both triple stores

6. Replicate one day’s worth of data x 31 to approximate one month of air traffic: ~40+K flights; ~36M triples*

7. Run queries again to compare results *Estimate: 10B triples/yr. for US domestic flights

Page 13: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Sample Benchmark SPARQL Queries - from a set of 17 queries for evaluating performance on scale-up -

• Flight Demographics:

– F1: Find Delta flights using A319s departing Atlanta-area airports

– F3: Find flights with rainy departures from Atlanta airport

• Airspace Sector Capacity: – S6: Find the busiest US airspace sectors for each hour in the day

• Traffic Management Statistics:

– T1: Find flights that were subject to ground delays

• Weather-Impacted Traffic:

– W1: Calculate hourly impact of weather on flight delays

• Flight Delay Data:

– A3: Compare hourly airport arrival capacity with demand

Page 14: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Results for 17 benchmark queries

Flight Period Execution Time

Min Max Avg

1 Day 11 ms 9.6 sec 1.19 sec

1 Month 8 ms 1651.2 sec (170x increase) 96.65 sec (80x increase)

Observations: • ~30% of queries experienced no increase in execution time • ~60% of queries scaled in proportion to

increase in triples • 1 query experienced exponential increase

(350x – 700x, depending on triple store)

Conclusion: Scaling to multi-year flight periods does not appear feasible unless multi-hour or multi-day response times are acceptable

Page 15: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

5 Potential Scale-Up Approaches

1. Hardware: triple ‘appliances’ for faster storage, retreival & processing

2. Algorithm: better graph matching algorithms 3. Software: better query planners; new indexing

approaches ----------------------------------------------------------------

4. Query reformulation: rewrite queries

5. Triple reduction: reduce graph search space

Hardware designers, researchers, triple store architects (1,2,3) Application developers, triple store users (4,5)

Page 16: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

4. Query Reformulation

• SPARQL queries can (in theory) be rewritten to improve efficiency

• Lack of transparency regarding how SPARQL queries are translated into code and executed makes rewriting difficult

• Tools to assist with optimization are missing or poorly documented

• Wanted!: performance monitoring tools query plan inspector index formulation tools

• SQL performance analysis tools are mature; SPARQL tools are primitive (in our experience)

Page 17: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Current Status Update

• Have scaled up to 1 month of actual flight data from the three NY Metropolitan airports: ~257M triples considerably more than the 36M/month reported for Atlanta airport in the paper

• Will be re-testing benchmark queries against this data, but not easily comparable to existing data due to changed geographic region

Page 18: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Conclusion: Adequate tools not yet available to support real-world performance tuning for SPARQL queries in commercial triple stores

Caveat: Experience limited to only 2 triple stores!

Summary • Described a real-world practical application for big

semantic data: integrating heterogeneous ATM data

• Reviewed experiments performed to scale-up data and measure impact on query performance

• Discussed approaches to improving performance

Page 19: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

In the end

Q: Can semantic representations scale to accomplish practical tasks using Big Data? A: Well, I’m still not sure!

(…to be continued)

Page 20: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

Triple Reduction

• Reduce the underlying search space by modifying the representation

• Undesirable trade-off possible: trade representational fidelity for efficiency

Example: representation of Aircraft Track Points

Page 21: Semantic Representation and Scale-up of Integrated Air ...groppe/sbd/... · Semantic Representation and Scale-up of Integrated Air Traffic Management Data ... download data and write

TrackPoint Representation Tradeoff

Aircraft Fix #1 AircraftTrackPoint • reporting time: 2012-09-08T19:03:00 • sequence number: 31 • ground speed: 461 • altitude: 3700.0 • latitude: 33.6597 • longitude: -84.495555

Aircraft Fix #1 AircraftTrackPoint • reporting time: 2012-09-08T19:03:00 • sequence number: 31 • ground speed: 461

Aircraft Fix #1 GeographicFix • altitude: 3700.0 • latitude: 33.6597 • longitude: -84.495555

hasFix

vs. Representation #1 (2 inst. per minute: ~70% of all instances)

Representation #2 (1 inst. per minute: ~54% of all instances)