Repeatability & Workability Evaluation

SIGMOD 2008 was the first database conference that proposed testing the code associated to conference submissions against the data sets used by the authors, to test the repeatability of the experiments presented in the submitted papers. A detailed report on this initiative has been published in ACM SIGMOD Record, 37(1):39-45, March 2008.

The experience has been continued in conjunction with SIGMOD 2009 (report published in ACM SIGMOD Record, 38(3):40-43, September 2009) and SIGMOD 2010, iteratively implementing the lessons learnt to further improve the process.

In these lines, also SIGMOD 2011 offers authors an experimental repeatability and workability evaluation of their accepted papers.

The Goal
The Process
Preparing the Experiments
Best Practices
Repeatability Committee

The Goal

The goal of the repeatability/workability effort is to ensure that SIGMOD papers stand as reliable, referenceable works for future research. The premise is that experimental papers will be most useful when their results have been tested and generalized by objective third parties.

The Process

The repeatability & workability process tests that the experiments published in SIGMOD 2011 can be reproduced (repeatability) and possibly extended by modifying some aspects of the experiment design (workability). Authors participate on a voluntary basis, but authors benefit as well:

Mention on the repeatability website
The ability to run their software on other sites
Often, far better documentation for new members of research teams

Preparing the Experiments

The repeatability committee focusses on repeating the derivation of presented data (table or graph) from initial data.

Derivation involves a system (and its environment), a workload, some metrics and a set of experiments. System setup is the most challenging aspect when repeating an experiment: the system needs to be installed in a new environment, both the system and part of the environment must be configured, and experiments must be run. Obviously, installation/ configuration will be easier to reproduce if it is automatic rather than manual. Similarly, experiments will be easier to reproduce if they are automatic (e.g., a script that takes a range of values for each experiment parameter as arguments) rather than manual (e.g., a script that must be edited so that a constant takes the value of a given experiment parameter).

Best Practices

(1) Use a VM as an environment for your experiments.
If you are working on the cloud, you might already be doing this. But it might make sense even if you are working with your own server. The latest VM engines (e.g., VirtualBox 3.2) have support for direct IOs, the overhead in terms of CPU usage is reasonable for most experiments (it can be calibrated if relevant), various types of network cards are available. In most cases, using a VM will not alter your performance; it will make it easier for you to setup and execute your experiments at different points in time (just freeze your VM and reuse it next year - it will not have changed so you can repeat your experiments immediately), or on various HW platforms. As a side effect, it will make it much easier for the rest of us to set up and execute your experiments.

(2) Represent separate setup experiment tasks as executable scripts or programs with explicit pre- and post-conditions.
Clearly identifying each setup or experiment task makes it easier to define the pre/post conditions and check that they are fulfilled (e.g., What is the state of your flash device after it is formatted? Can you actually perform an experiment if a given setup task failed?)

(3) Use a workflow engine to automate the setup/experiment tasks.
We have prepared an extension of VistTrails and some examples based on database tuning experiments that illustrate the (large) benefits and the (modest) costs of this approach.

Here, we present an extended case study of preparing an experiment on a distributed database system for repatability and workability.

Repeatability Committee

Philippe Bonnet, chair (IT University of Copenhagen)
Stefan Manegold, advisor (CWI Amsterdam)

Stratos Idreos (CWI Amsterdam)
Milena Ivanova (CWI Amsterdam)
Ryan Johnson (EPF Lausanne)
Dan Olteanu (University of Oxford)
Paolo Papotti (Universita Roma Tre)
Nancy Hall (University of Wisconsin)
Rene Mueller (IBM Almaden)
Christine Reilly (U.Texas Pan Am)
Tim Kraska (UC Berkeley)
Cong Yu (Yahoo)
Dimitris Tsirogiannis (Microsoft)
David Koop (University of Utah)
Wei Cao (Remnin University)
Wolfgang Gatterbauer (University of Washington)

Maintained by Stefan Manegold

RWE 2008 2009 2010 2011

hosted by