2015 · 2015-2016 · 2016 · 2017 · 2018 · 2019 · 2020 · 2021 · VU Canvas
LSDE: Large Scale Data Engineering 2021

Dataset. Wikipedia publishes hourly page view statistics for their projects. This data is available in this shape from 2015 onwards. The popularity of topics in Wikipedia can give an indication of the interest of people over time and space (the latter, specifically in non-english language domains).

M5: COVID-19 Attention. Analyze the pageviews statistics of Wikipedia over the past 4 years, and compare the previous access patterns with the months of the pandemic. Split this out over various language domains that can be related to countries (e.g. nl, de, fr, it, se, es). We are interesting to learn what topics are on the minds of the Wikipedia users over the months, hence prominent topics; and specifically topics whose attention is significantly altered (upwards or downwards) during peaks of COVID-19. Specifically, you could try to correlate temporal changes in attention span in certain countries to the COVID stringency in that country. Consider various forms of visualizing these results over topic (clouds?) time and space.

Data curiosity: ***
Writing: **
Technical difficulties mastered: ***
Visualization coolness: **


COVID-19 Attention -- Roman Dahm, Yannick Brunink (paper)