Each group chooses one of the below topics in a FCFS manner based on the final leaderboard ranking of practical1. The outcome is as follows, in ranking order of Assignment1:
For each topic your are asked to identify two related papers. These papers may be taken from the technical reading material in this course, which are typically related to particular systems that work in Hadoop. But, you could also choose different scientific papers , e.g. papers relating to the data science task you are adressing -- in any case, please announce (by email to lsde_course@outlook.com) before your presentation which related papers you have chosen. Given that the presentations are on October 10 and 12, please send this email before Friday October 7.
You will be asked to give a presentation on your project in progress. In the presentation there should be both attention for how you are addressing your project (and possible early findings), as well as the technologies used. You may also summarize relevant points from the two related scientific papers you chose.
The final result of the project is code (tar.gz) and a written project report, and a simple web-based visualization.
Please send your presentation in advance to lsde_course@outlook.com -- one full day in advance! There are two reasons for this requirement:
On the day you present (see website page 'Lecture Schedule' for the schedule):
If you know you cannot make it, please arrange a swap with a group that presents on the other day. If you cannot make it on both occasions, contact us well in advance.
In the presentation, we would like you to explain to us and the other students how you are solving your Assignment2 data analyis problem and why.
A suggested setup of the presentation would be as follows:
Remember that all of this needs to fit in 12 minutes, to allow for some questions. Normal persons would prepare 5-8 slides. Of course, it depends on what is on the slides and how much you tell per slide. But, in all cases, slides should:
There will be a tight schedule, and we will mercilessly hard-stop you after 15 minutes and let the next group begin. Otherwise we will not be able to let all scheduled groups present.
The final project report for assignment 2 is due October 23. The project report should be a paper of length 5-12 pages formatted two-column like a scientific paper (latex-style, latex-example, word-example) which contains the following information & structure:
If using Latex for formatting your paper, please use bibtex for the bibliography.
Apart from the report and your code, we also ask you to create a web-based visualization of your results. This visualization can be basic, and due to the technical restrictions of this site (and to keep things simple) we request you to do this in the form of server-static HTML/javascript web pages.
Though the visualization should be basic, the cooler it is, the better. We will keep these online, so this is also a visible proof of you having passed this course and mastering large-scale data engineering techniques.