GENERAL OBSERVATION:

The results I obtained locally support the paper's conclusions. The differences I could see were small and easily attributable to noise. The experimental setup is very complex and rather inflexible, which made a timely workability analysis impossible. 
 

SETUP: 

Running the experiments went smoothly, but importing the data into Tableau was a pain because it would glitch while trying to import my datasets. The authors were very responsive, however, and we eventually got it working. It would have been nice to have a way to queue up multiple scripts, though. Having to come back every 30-50 minutes to verify each run before starting the next was a bit annoying.


REPEATABILITY:

- Fig 5: everything was roughly the same except Pig's estimator had more spikes near the end on my plot
- Fig 6/7: These came out essentially the same
- Fig 8/9: My local graphs were a bit smoother than the paper's but no real differences
- Fig 10: The jitters around the spike at 60% progress vary a fair amount, but this is to be expected given the randomness of the (skewed) input data.


WORKABILITY:

The setup was too complex to modify easily. Besides the requirement to run on a properly configured cluster, input data format was supplied by the authors in an unspecified binary format, and workflows were complex with few, if any, tunables. The paper authors were aware of these difficulties and explained during discussions that they targeted only the repeatability aspect of the evaluation.