LSDE2015 · LSDE2015-2016 · LSDE2016 · LSDE2017 · LSDE2018 · VU Canvas
LSDE: Large Scale Data Engineering 2018

Dataset. Through the US Freedom of Information Law, records of all taxi rides in New York City were made public. The ~20 GB dataset contains ~170M rows on trips and paid fares. It includes time and place of passenger pickup and dropoff, trip time and distance, a driver and car identifier, and information about how much was paid for the trip.

T2: Heat Map. Create a traffic heatmap visualizing busy/empty street segments per time of day. Use a route planner to determine road segments each trip has traversed. When there are alternatives, consider the recorded trip distance.

Summary. The below visualization was created by recreating the likely route followed by 220 million taxi-trips using the GraphHopper route-planner, wrapped in Spark to parallelize this task and to decompose the computed routes into road segments and to aggregate counts by segment and time. The end result is a WebGL visualization by MapBoxGL.

Data curiosity: *
Related work: *
Technical difficulties mastered: ***
Visualization coolness: ***

Taxi Heat Map -- Thomas Koch and Hassan Jalil (paper)