Overview
========

Due to time constraint, I am not able to finish both repeatability and workability evaluation. I have focused in the workability aspect of the paper for this review.

All the experiments are executed on a cluster provided by the authors. There is not much flexibility for users to change the cluster configuration (e.g., number of computers/CPU). But, in other words, we do not need to care about how to setup the environment.

There are not much documentations about the programs. For example, the synthetic data is with some unknown binary format. It may be because the authors did not target the workability test at the beginning. But the authors are generally responsive and have provided help, e.g., providing additional script for the workability evaluation.

The evaluation is not easy. The most difficult part to me is to plot the graph using Tableau, from the raw outputs from experiments. Even if I was repeating an experiment in the paper (Fig. 5, 6), I was not able to get the graphs. Anyway, the raw outputs are delimited text files. The estimated progress by each algorithm is stated in Estimates.csv. I can generate a similar graph to Fig. 5 using Excel. By understanding the format of raw output, I can plot a customized graph, which I think it is not easy using Tableau as I am not familiar with Tableau.

Workability
===========

a) customized query (suggested by reviewer 1)

Reviewer 1 suggested to perform an experiment with a customized query - a sum query with joining two large tables. The authors have provided the scripts for this experiment. By just executing the script, I can obtain the results of the experiment and the results look fine. The provided script also becomes the base to be modified for other experiments.

b) different source data

There are some synthetic datasets in the authors' server (from 256MB to 16GB). I tested joining 256MB data with 1GB data. The program is executed and there are outputs. Although it takes certain time to finish, there are only two values for estimated progress only (at 0\% and 100\%). The authors explained that I need to generate the appropriate histograms and 1\% data samples for the dataset in the execution script. I tried to repeat the experiment after I modified the script but no luck. It is quite difficult to understand the scripts, especially the commands to invoke authors' programs. I did not investigate this problem further, with the authors, as the deadline is passed already.

I have tested other experiments with the datasets which have 1\% data samples. For example, the 16GB data. The experiments seem fine and there are much more recorded estimated progress. 

There are some data generation scripts provided by authors. I do not have the permission to check the scripts at the beginning, but the authors later have granted me the permission. The scripts are quite complex but I see that the synthetic data is generated by org.apache.pig.test.utils.datagen.DataGenerator. This may explain the format of the synthetic data. Due to time constraint, I did not test the data generation function. Anyway, I do not know how to generate the histograms and the 1\% data samples.