[an error occurred while processing this directive]
Data Analysis Projects

In this assignment the goal is to get experience with a real Big Data challenge by attempting to solve this on a large Hadoop cluster. Computation for all projects will be performed on the SurfSara Hadoop cluster. We will make all datasets mentioned below available there.

Access to the cluster is via ssh login.hathi.surfsara.nl. We provide a proxy for the SurfSara Hadoop Status webpages to avoid kerberos authentication problems from the VU and eduroam networks.

Assignment

Each group chooses one of the below topics in a FCFS manner based on the final leaderboard ranking of practical1. The outcome is as follows:

Associated with each topic are related papers. These papers will be the papers to present in class in a 20 minute presentation.

The final result of the project is code (tar.gz) and a written project report.

Presentation

The goal of the presentation is to get across the few most important points and motivate the audience to learn more about them. Not to summarize each detail in the paper.

Prepare slides. Each slide should contain maximally 5 lines of text and maximally 25 words. Diagrams and graphs (if simple and beautiful) are recommended.

Please practice your presentation beforehand, and send the slides to us one day in advance. Then, you get early feedback.

Report

The project report should be a paper of length 5-12 pages formatted two-column like a scientific paper (latex-style, latex-example, word-example) which contains the following information & structure:

If using Latex for formatting your paper, please use bibtex for the bibliography.

[an error occurred while processing this directive]