SIGMOD 2010 Repeatability & Workability Evaluation for paper #26 Active Knowledge : Dynamically Enriching RDF Knowledge Bases by Web Services by Nicoleta Preda, Fabian Suchanek, Gjergji Kasneci, Thomas Neumann, Wenjun Yuan Hardware & Software environment =============================== | Paper | Review 1 -------+-------------------------------------+----------------------------------------------------------------- class | (not specified) | desktop CPU | Pentium(R) Dual-Core CPU 2.50GHz | AMD Athlon(tm) 64 X2 Dual Core Processor 4600+, cpu MHz 1000.000 RAM | 2 GB | 2 GB OS | Debian 4.3.2, Kernel version 2.6.30 | Fedora 11 Linux version 2.6.30.9-96.fc11.x86_64 Submission ========== The authors provided - source code for all their code and additional software (); - data generators; - scripts to re-run all experiments and collect the output of the proposed methods in a text file; Repeatability Evaluation ======================== Process ------- The given instructions are clear and simple to follow. The authors were very responsive in supplying additional/improve code and explanations. Detailed Results ---------------- The queries supplied in the benchmark differ from the queries reported in the paper. It was not possible to consider queries different from those fixed by the authors. By the end of this RWE, the authors produced nevertheless a user-friendly user interface for their system. * Experiment 1 Runnable, but the repeated results are partly different from those reported in the paper (and also different from those supplied with the code). For query 1, the repeated results have the same number of web calls and total results as reported. For query 2, the repeated results (number of calls and total answers) are different for DF: In most cases, the first three results given with the code are much better in terms of total answers per web calls, while the remaining three results in the given set are worse than in the repeated results. In the repeated results, F-RDF needs consistently more time and web calls to obtain the same number total answers. F-RDF(R) obtains less answers, though with a comparable number or more web calls. For query 3, the repeated results for DF and F-RDF are comparable with those reported. For F-RDF(R), we need in average less calls for the same answers. This is different for the previous cases, where we needed more calls for the number of answers. For query 4, DF and F-RDF need less or the same number of calls for the same number of answers in the repeated results. F-RDF(R) uses about twice less calls to retrieve a third of the number of answers given. For query 5, DF uses more calls but retrieves far less answers. F-RDF(R) can even obtain 20 times more answers with an increase of about 6% in the number of calls. For query 6, DF and F-RDF obtain comparable results (almost identical) to those reported. F-RDF(R), however, obtains the same number of answers with far less calls. For query 7, DF and F-RDF obtain comparable results (almost identical) to those reported. F-RDF(R), however, obtain the same number of answers with far less calls. * Experiment 2 Repeatable. Figure 9(c) could not be fully followed. The plots are generated from three data points and it would have been more appropriate to plot the data points only and not a continuous line connecting them. It is not discussed to what extent the continuous line would truly approximate the outcome of the actual runs. This issue has been raised with the authors who agree with it. * Experiment 3. The paper does not explain how to repeat this experiment. It only states that (i) 100 queries were considered that ask for books published by an author whose name is given, and that (ii) 98% of the output books were correct answers. The authors explained how this test could be run, but it has severe external limitations due to restricted number of calls allowed per day. Summary ------- Some of the experiments of the paper are not repeatable due to (i) the randomness inherent in some of the algorithms and the fact that no quality/error guarantees seem to be offered by the proposed algorithms, (ii) changes in content of external web sources, as well as (iii) various access restrictions imposed by external web sources. This has been raised with the authors who confirmed it. The machine used in the repeated experiments needed consistently more time than reported in the paper. This is most likely due to the external RDF engine. Workability Evaluation ====================== was only evaluated to the extend that the experiments were run on a different hardware platform.