Each group chooses one of the topics on the projects page, in a FCFS manner based on the final leaderboard ranking of practical 1c.
For each topic your are asked to identify two related papers. These papers may be taken from the technical reading material in this course, which are typically related to particular systems that work in Hadoop. But, often you should also look at different scientific papers , e.g. papers relating to the data science task you are addressing.
You will be asked to give presentations on your project (one planning presentation 1 week in and a progress presentation 3 weeks in). In the presentation there should be both attention for how you are addressing your project (and possible early findings), as well as the technologies used.
The final result of the project is code (tar.gz) and a written project report, and a simple web-based visualization.
For both the planning and the progress presentation (see website page 'Lecture Schedule' for the schedule):
If you know you cannot make it one of the days, please communicate this via lsde@outlook.com
Please read the description of what we expect of the presentations in the Canvas assignment corresponding to it. In all cases, slides should:
There will be a tight schedule, and we will mercilessly hard-stop when time is up and let the next group begin. Otherwise we will not be able to let all scheduled groups present.
The final project report for assignment 2 is due October 25. The project report should be a paper of length 5-12 pages formatted two-column like a scientific paper (latex-style, latex-example, word-example) which contains the following information & structure:
If using Latex for formatting your paper, please use bibtex for the bibliography.
Apart from the report and your code, we also ask you to create a web-based visualization of your results. This visualization can be basic, and due to the technical restrictions of this site (and to keep things simple) we request you to do this in the form of server-static HTML/javascript web pages.
Though the visualization should be basic, the cooler it is, the better. We will keep these online, so this is also a visible proof of you having passed this course and mastering large-scale data engineering techniques.