LSDE 2022 - Large Scale Data Engineering

2015 · 2015-2016 · 2016 · 2017 · 2018 · 2019 · 2020 · 2021 · 2022 · Canvas

LSDE: Large Scale Data Engineering 2022

Dataset. Commercial airplanes periodically send out radio messages containing their position details (plane identifier, flight number, latitude, longitude, height, speed, ...) . These ADS-B messages are picked up by enthusiasts and collected in systems such as the OpenSky network or Flightradar24. We have obtained 700 GB of compressed ADS-B messages from September 2016 and 2017 in a compressed format.

P1: Airport Quality. Detect holding patterns (plane circling near airport) and go-arounds (aborted landings). Use this to determine worst airports or airlines. Are there correlations by date, airport size, day of week or weather?

Summary: This group used Spark with Scala and Java for large-scale data extraction and aggregation, supplemented with more interactive visualization of smaller data subsets on a single computer using the Python analysis stack (Jupyter Notebooks, Pandas and Matplotlib/Basemap). The paper is an interesting read on the sparsity of the data points (near-to-ground data is not picked up by the sensors) and the resulting complexity of their task of analyzing landings. They also talked with a pilot to confirm their methodology for detecting landings -- triggered by significant descents.

Data curiosity: ****
Related work: **
Technical difficulties mastered: ***
Visualization coolness: ****