[an error occurred while processing this directive]

Large Scale Data Engineering is a MSc course by VU University professor in the Large-Scale Analytical Data Management special chair Peter Boncz, and Hannes Mühleisen from the Database Architectures research group of CWI, developed specifically for the Amsterdam Data Science initiative.

Goals & Scope

The goal of the course is to gain insight into and experience with algorithms and infrastructures for managing big data.

This course confronts the students with some data management tasks, where the challenge is that the mere size of this data causes naive solutions, and/or solutions that work only on a single machine, to stop being practical.

Solving such tasks requires the computer scientist to have insight in the main factors that underlie algorithm performance (access pattern, hardware latency/bandwidth), as well as possess certain skills and experience in managing large-scale computing infrastructure.

Results

The course has now completed succesfully. Here is a small (biased ;-) sample of the results of the second assignment, where students had to analyze datasets using cluster technology:

Course Structure

There are two lectures per week, on Monday 11:00-12:45 and Thursday 09:15-10:45 in room P647. More information about the lectures is in the schedule. This course may take a significant amount of time and effort, and requires significant practical work. The practicals are done outside lecture hours, at the discretion of the students.

In the first assignment the students can work either on their own laptops via a prepared VM, or in the cloud using an Amazon EC2 Micro Instance.

The second assignment, using a Hadoop Cluster, will be done on the SurfSARA Hadoop cluster (90 machines, 720 cores, 1.2PB storage).

Tasks

The students must work in groups consisting of two students; the VU BlackBoard is used for registering student groups and for reporting grades.

For this course, each group must deliver the following:

The final grade (1-10) is weighted (30%,20%,40%) between these respective tasks with the final 10% based on your individual observed participation during the lectures.

Getting Help

You can email questions to lsde2015@outlook.com or via chat or audio or video connection to that account via skype.

Acknowledgements

The lecture slides for LSDE2015 are based on those used in the Extreme Computing course, and were graciously provided by dr. Stratis Viglas, of University of Edinburgh.

[an error occurred while processing this directive]