LSDE2015 · LSDE2015-2016 · LSDE2016 · LSDE2017 · LSDE2018 · VU Canvas
LSDE: Large Scale Data Engineering 2018
Data Analysis Projects: Your Task
Assignment

Each group chooses one of the topics on the projects page, in a FCFS manner based on the final leaderboard ranking of practical 1c.

For each topic your are asked to identify two related papers. These papers may be taken from the technical reading material in this course, which are typically related to particular systems that work in Hadoop. But, often you should also look at different scientific papers , e.g. papers relating to the data science task you are addressing.

You will be asked to give presentations on your project (one planning presentation 1 week in and a progress presentation 3 weeks in). In the presentation there should be both attention for how you are addressing your project (and possible early findings), as well as the technologies used.

The final result of the project is code (tar.gz) and a written project report, and a simple web-based visualization.

Presentation

For both the planning and the progress presentation (see website page 'Lecture Schedule' for the schedule):

  • you must have submitted your presentation to Canvas in the evening before the first presentation day that week
  • all group members must be physically present and participate in the presentation

If you know you cannot make it one of the days, please communicate this via lsde@outlook.com

Please read the description of what we expect of the presentations in the Canvas assignment corresponding to it. In all cases, slides should:

  • not contain too much info,
  • use a large font,
  • contain no more than 5 bullets per slide, and
  • please use figures wherever possible.
  • Ah. One more thing: please practice your talk beforehand

There will be a tight schedule, and we will mercilessly hard-stop when time is up and let the next group begin. Otherwise we will not be able to let all scheduled groups present.

Reporting

The final project report for assignment 2 is due October 25. The project report should be a paper of length 5-12 pages formatted two-column like a scientific paper (latex-style, latex-example, word-example) which contains the following information & structure:

  • Introduction: what is this project about and why is it interesting?
  • Related Work: meaningfully summarize related literature to explain what this project builds on, in particular, discuss the two research papers chosen with this project (and discussed in your presentation)
  • Research Questions: which questions is this project trying to answer, and/or hypotheses to investigate? These should cover both the project topic, as well as the technological side (aptness of the tools for the job, scalability of the solution).
  • Project Setup: the steps taken during the project.
  • Experiments: a description of the experiments and their results.
  • Conclusions: revisit the research questions and hypotheses and try to answer them. Any new questions? Insights in the usability of the employed technology for particular tasks?
  • Bibliography

If using Latex for formatting your paper, please use bibtex for the bibliography.

Apart from the report and your code, we also ask you to create a web-based visualization of your results. This visualization can be basic, and due to the technical restrictions of this site (and to keep things simple) we request you to do this in the form of server-static HTML/javascript web pages.

Though the visualization should be basic, the cooler it is, the better. We will keep these online, so this is also a visible proof of you having passed this course and mastering large-scale data engineering techniques.