LSDE2015 · LSDE2015-2016 · LSDE2016 · LSDE2017 · LSDE2018 · VU Canvas
LSDE: Large Scale Data Engineering 2018

Dataset. Wikipedia publishes page view statistics for their projects. We have collected a ~820GB dataset of this dataset from 2014 (will be updated to 2016). Tip: The page names mentioned in those files is before redirects etc. are performed. It might be a good idea to use the Wikipedia database dumps to resolve those first. Also, normalizing accesses by the sum of clicks on the observed pages might help to reduce skew.

W2: Election Prediction. Correlate clicks on Wikipedia pages with movements in the US polls. Focus on the debates during the USA primaries. What wikipedia topics spiked around (just before and after) these debates? Can we learn from these insights and the actual activity in Wikipedia with reference to the November presidential elections?

Summary. After the fact people will agree that the 2016 USA presidential elections were very hard to predict, so this group was facing a tough challenge. Their conclusion is that the Wikipedia logs are not a sufficient basis for predicting elections, though they could very clearly show correlations between election events such as debates and Wikipedia accesses of election-related topics. What the visualization misses in utility it compensates in beauty.

Data curiosity: ***
Paper writing: ***
Technical difficulties: mastered ***
Visualization coolness: ****

Wikipedia Election Prediction -- Georgiana Diana Ciocirdel and Mihai Varga (paper)