LSDE 2018 - Large Scale Data Engineering

LSDE2015 · LSDE2015-2016 · LSDE2016 · LSDE2017 · LSDE2018 · VU Canvas

LSDE: Large Scale Data Engineering 2018

Dataset. Commercial airplanes periodically send out radio messages containing their position details (plane identifier, flight number, latitude, longitude, height, speed, ...) . These ADS-B messages are picked up by enthusiasts and collected in systems such as the OpenSky network or Flightradar24. We have obtained ~200 GB of compressed ADS-B messages from September 2015 in a compressed format.

P1: Airport Quality. Detect holding patterns (plane circling near airport) and go-arounds (aborted landings). Use this to determine worst airports or airlines. Are there correlations by date, airport size, day of week or weather?

Summary: This group used Spark with Scala and Java for large-scale data extraction and aggregation, supplemented with more interactive visualization of smaller data subsets on a single computer using the Python analysis stack (Jupyter Notebooks, Pandas and Matplotlib/Basemap). The paper is an interesting read on the sparsity of the data points (near-to-ground data is not picked up by the sensors) and the resulting complexity of their task of analyzing landings. They also talked with a pilot to confirm their methodology for detecting landings -- triggered by significant descents.

Data curiosity: ****
Related work: **
Technical difficulties mastered: ***
Visualization coolness: ****