october 28, 1994

Should WWW documents be structured?

Vincent Quint

INRIA Rhône-Alpes, Grenoble, France

Documents available on the World-Wide Web can roughly be separated into two categories:

The HTML DTDs fit the needs of the first category, but they pose problems with the second category:

For solving these problems, one could use two different types of document representations on the Web: a simple formalism, HTML (including HTML 3, with a rich representation of tables and equations), and a more powerful one, providing high level features for handling structured documents.

Taking the logical structure of documents into account is an efficient way of allowing a number of manipulations, for both the writer and the reader of documents. SGML is a good candidate for the second representation. It is able to represent very diverse documents, by using a specific DTD for each type of document, and a number of tools are (and will be) available for manipulating such documents. It also allows simple conversion to HTML. Although converting a document from HTML to any SGML DTD remains a complex problem, there exist solutions for converting some HTML documents to some DTDs. Thus, the two representations will not be incompatible and information documents will have the same status as anchor documents.