Dataset. Commercial airplanes periodically send out radio messages containing their position details (plane identifier, flight number, latitude, longitude, height, speed, ...) . These ADS-B messages are picked up by enthusiasts and collected in systems such as the OpenSky network or Flightradar24. We have obtained ~200 GB of compressed ADS-B messages from September 2015 in a compressed format.
P1: Airport Quality. Detect holding patterns (plane circling near airport) and go-arounds (aborted landings). Use this to determine worst airports or airlines. Are there correlations by date, airport size, day of week or weather?
Summary: This group used Spark with Scala and Java for large-scale data extraction and aggregation, supplemented with more interactive visualization of smaller data subsets on a single computer using the Python analysis stack (Jupyter Notebooks, Pandas and Matplotlib/Basemap). The paper is an interesting read on the sparsity of the data points (near-to-ground data is not picked up by the sensors) and the resulting complexity of their task of analyzing landings. They also talked with a pilot to confirm their methodology for detecting landings -- triggered by significant descents.
Data curiosity: **** Related work: ** Technical difficulties mastered: *** Visualization coolness: ****