| Extreme Markup Languages |
The legendary computer luminary Donald Knuth bibref [Knuth WWW] once asked why nobody ever takes a computer program to bed to read. The answer was simple — most computer programs are unreadable. Comments are too few and too far between, and the simple process of reading becomes the painful process of reverse engineering. Knuth's solution was that programs should highlight[@style='bold'] not be 90% code and 10% comments, they should be 90% descriptive text and 10% code. This was the start of Literate Programming, or “ acronym LitProg” bibref [LP WWW].
In the early days of computer programming, documentation within the source code of a program was an unaffordable luxury. Some systems actively stripped comments from code in order to save storage. Although storage no longer tends to be a significant limit, programmers still spend little time on documenting their code. There are several reasons for this. One is that programmers tend to be judged by whether their code works (or appears to), not by how well it is commented. Further, if project managers need to cut something from a development schedule, documentation is an item that can be removed from the schedule without forcibly affecting the delivered functionality.
On a personal level, during the fleeting moments that a programmer writes a particular piece of code, that code can appear to be so clear and obvious that documentation seems all but unnecessary. It is only when the programmer returns to the code after a month or more, all moments of clarity lost in the past, that the impact becomes obvious. The situation is even worse when a different programmer has to work on the code. With no documentation to read, the code needs to be reverse engineered in order to understand its intent and logic, but this is difficult to do with sufficient accuracy. Too often, the code is rewritten in the belief that rewriting is easier than reverse engineering. This squanders any experience gained by the original programmer, while simultaneously introducing new bugs into the code base.
No documentation system can force an undisciplined or lazy programmer to document their code, but there are things that can be done to make the task less onerous. acronym LitProg tools, which allow code fragments to be interspersed within the documentation, put the documentation for a piece of code right beside that code, in the same file. Compared to having code and documentation in separate files, this greatly increases the chance that as code is modified, the documentation is also modified to keep it up to date.
A “literate program” (or “literate document”) is a human readable document containing short sections of code (known variously as “macros”, “chunks”, or “fragments”), written and ordered so that it can be understood easily by people. By contrast, most computer programs are ordered purely for the benefit of program compilers. In a literate program, source code fragments (or any textual fragments) can appear in any suitable order. When the literate document is processed, the code fragments are assembled into the order required to produce the source files by “tangling” the document, to introduce Knuth's terminology. Literate documents are also “woven” to convert them into a final documentation format. Traditionally the documentation format was TeX or LaTeX, but these days it can also be acronym.grp acronym (X)HTML expansion [(Extensible) Hypertext Markup Language], acronym.grp acronym XSLFO expansion [Extensible Stylesheet Language Formatting Objects], or acronym.grp acronym PDF expansion [Portable Document Format, aka Acrobat].
What follows is a non-exhaustive list of acronym LitProg tools. All of these tools predate XML. References to further information can be found in the bibliography.
| WEB |
paraWEB was Knuth's original acronym LitProg system for Pascal. WEB directly marks up many of the syntactic features of Pascal, so that in creating a valid WEB document, a programmer has pre-validated much of the syntax of the code fragments. Note that WEB was written before the acronym.grp acronym WWW expansion [World Wide Web] came to prominence. Knuth's choice of name relates to his ideas of tangling and weaving. bibref [Knuth 92] |
| CWEB |
paraAlso produced by Knuth's group, CWEB supported C rather than Pascal. It has now been extended to handle C++ and Java as well. bibref [CWEB WWW] |
| FWEB |
paraFWEB is a multi-language acronym LitProg tool which is similar in spirit to WEB & CWEB. It was the first acronym LitProg tool to support Fortran. bibref [FWEB WWW] |
| noweb |
paranoweb is the most well-known of the language-insensitive acronym LitProg tools. These tools do not provide any syntactic support for any computer languages, and treat all code fragments as nothing more than text fragments. Language-insensitive acronym LitProg tools can be used for any (textual) programming language or control files, so their loss of syntactic support is compensated for by a gain in flexibility. bibref [noweb WWW] |
| FunnelWeb |
paraFunnelWeb is another language-insensitive tool. Its unique feature is that its macros can have parameters, providing some of the power of a language pre-processor. xmLP, the XML acronym LitProg tool described in this paper, takes its inspiration most strongly from FunnelWeb. bibref [FunnelWeb WWW] |
| SWEB |
paraSWEB is C. Michael Sperberg-McQueen's SGML acronym LitProg tool. It was the first acronym LitProg tool whose document format could feasibly be parsed by something other than the tool itself. bibref [SWEB WWW] |
Sun's Javadoc is a powerful tool for generating reference documentation from comments embedded in Java code, and has inspired similar tools for other programming languages. Javadoc is ideal for documenting the available methods & classes in a Java acronym.grp acronym API expansion [Application Programming Interface]. However, Javadoc is not a LitProg tool.
The documentation that Javadoc produces extends down only the the method signatures. It does not provide any support for documenting the workings of individual methods. It does not allow the order in which methods/classes are presented to be controlled to improve readability. These are not criticisms, just observations. There is no such thing as one size fits all documentation. Indeed, there are at least 3 major classes of documentation:
acronym LitProg tools do a good job of generating detailed documentation. Javadoc does a good job of generating reference documentation. Neither provides sufficient support for generating good user documentation. So, not all documentation is the same, and no documentation tool is suitable for every type of documentation. This paper focuses on acronym LitProg tools, and hence on the problem of creating detailed documentation of the workings of program code.
This paper was written as a literate program, using an extended version of the “Extreme Markup Languages 2002” DTD. The literate document was processed twice using an XML LitProg tool, “xmLP” bibref [xmLP WWW], which is described in this paper. The literate document was first “tangled”, where the macros were expanded to produce the source files. It was then “woven”, where the macros were cross-referenced and this document was generated. Both processes need to resolve the macros in the document, but for different purposes.
highlight[@style='bold'] No source code fragments were copied into the literate document, because the literate document is the original source material from which the source code files are produced.
The following scenario illustrates how literate programs can be a valuable tool for maintaining synchronized files. highlight[@style='bold'] Note: if at any stage you want to jump ahead to read about the acronym LitProg tool “xmLP”, you can go directly to Section xref 3. However, you are encouraged to read this section first to get a sense of what a acronym LitProg tool needs to achieve.
highlight[@style='ital'] This demonstrates the documentation created by xmLP. So this section shows the output format. The input format is discussed in Section xref 3.
Consider the problem of representing the way a particular stock market share price changes over time (a “time series”). Taking a simplified view, a single daily price summary, which is an “event” from the time series, can be written as
highlight[@style='bold'] xmLP Macro “Time Series Event Instance” [ highlight[@style='bold'] #1] =
<event date=" highlight[@style='bold'] 2002-02-20"> highlight[@style='bold'] <open> highlight[@style='bold'] 85.70</open> highlight[@style='bold'] <high> highlight[@style='bold'] 92.10</high> highlight[@style='bold'] <low> highlight[@style='bold'] 81.37</low> highlight[@style='bold'] <close> highlight[@style='bold'] 86.05</close> highlight[@style='bold'] <volume multiplier=" highlight[@style='bold'] 1000"> highlight[@style='bold'] 811786</volume> highlight[@style='bold'] </event>
highlight[@style='ital'] This macro is invoked in file #2 (Figure xref 16)
highlight[@style='ital'] This macro is invoked in file #4 (Figure xref 20)
highlight[@style='ital'] Note: this shows how xmLP XML macro definitions are “woven” into the documentation. Note the automatically generated cross-references.
Here, “date” is the date of the event, “open” is the opening (starting) price for that day, “high” and “low” are the maximum and minimum prices for that day (respectively), and “close” is the closing (final) price for that day. The “volume” is the number of shares traded during that day, and is commonly given in terms of thousands of shares traded.
The purpose of this example is to produce highlight[@style='bold'] both a DTD and a W3C XML Schema to describe this event structure, within the limits of what each of these schema technologies can do. A knowledge of DTD and W3C XML Schema constructs is assumed.
The “open”, “high”, “low”, and “close” elements each contain a decimal number. In the DTD, decimal numbers can only be represented as unconstrained text. However, a suitably named entity can be used to suggest to human readers that decimal values should be used.
highlight[@style='bold'] xmLP Macro “DTD: decimal pseudo-definition” [ highlight[@style='bold'] #2] =
<!ENTITY % Decimal "#PCDATA">
highlight[@style='ital'] This macro is invoked in macro #3 (Figure xref 3)
highlight[@style='ital'] Note: this shows how xmLP text macro definitions are “woven” into the documentation.
From a machine perspective, this is nothing more than a syntactic nicety. However, it makes maintenance easier for humans (by making the intent clear), and that makes it worth doing.
highlight[@style='bold'] xmLP Macro “DTD: financial elements” [ highlight[@style='bold'] #3] =
highlight[@style='ital'] {DTD: decimal pseudo-definition[2], Figure xref 2}
<!ELEMENT open (%Decimal;)>
<!ELEMENT high (%Decimal;)>
<!ELEMENT low (%Decimal;)>
<!ELEMENT close (%Decimal;)>highlight[@style='ital'] This macro is also defined in macro #6 (Figure xref 6)
highlight[@style='ital'] This macro is invoked in file #1 (Figure xref 14)
highlight[@style='ital'] Note: this shows how invocations (expansions) of one macro inside another are indicated and cross-referenced in the documentation.
highlight[@style='ital'] Note: this macro is defined in multiple sections that are concatenated in document order to produce the complete content of the macro.
The W3C XML Schema datatypes contain a suitable decimal type, “xsd:decimal”, so the Schema equivalent is straightforward.
highlight[@style='bold'] xmLP Macro “W3C XML Schema: financial elements” [ highlight[@style='bold'] #4] =
<xsd:element name=" highlight[@style='bold'] open" type=" highlight[@style='bold'] xsd:decimal"/> <xsd:element name=" highlight[@style='bold'] high" type=" highlight[@style='bold'] xsd:decimal"/> <xsd:element name=" highlight[@style='bold'] low" type=" highlight[@style='bold'] xsd:decimal"/> <xsd:element name=" highlight[@style='bold'] close" type=" highlight[@style='bold'] xsd:decimal"/>
highlight[@style='ital'] This macro is also defined in macro #7 (Figure xref 7)
highlight[@style='ital'] This macro is invoked in file #3 (Figure xref 18)
The “volume” element contains a non-negative integer value (number of shares traded). It also has a positive integer “multiplier” attribute, since the volume is typically given in units of thousands of shares. As before, in the DTD the values are simply unconstrained text.
highlight[@style='bold'] xmLP Macro “DTD: integer pseudo-definitions” [ highlight[@style='bold'] #5] =
<!ENTITY % NonNegativeInteger "#PCDATA"> <!ENTITY % PositiveInteger "CDATA">
highlight[@style='ital'] This macro is invoked in macro #6 (Figure xref 6)
The DTD nonetheless allows a default value of ‘1’ to be defined for the “multiplier” attribute, so that its use with the “volume” element is optional.
highlight[@style='bold'] xmLP Macro “DTD: financial elements” [ highlight[@style='bold'] #6] =
highlight[@style='ital'] {DTD: integer pseudo-definitions[5], Figure xref 5}
<!ELEMENT volume (%NonNegativeInteger;)>
<!ATTLIST volume
multiplier %PositiveInteger; "1">highlight[@style='ital'] This macro is also defined in macro #3 (Figure xref 3)
highlight[@style='ital'] This macro is invoked in file #1 (Figure xref 14)
The W3C XML Schema data types contain the necessary integer data types. Although the Schema version is longer, it defines the same structure for the “volume” element.
highlight[@style='bold'] xmLP Macro “W3C XML Schema: financial elements” [ highlight[@style='bold'] #7] =
<xsd:element name=" highlight[@style='bold'] volume"> highlight[@style='bold'] <xsd:complexType> highlight[@style='bold'] <xsd:simpleContent> highlight[@style='bold'] <xsd:extension base=" highlight[@style='bold'] xsd:nonNegativeInteger"> highlight[@style='bold'] <xsd:attribute name=" highlight[@style='bold'] multiplier" default=" highlight[@style='bold'] 1" type=" highlight[@style='bold'] xsd:positiveInteger"/> highlight[@style='bold'] </xsd:extension> highlight[@style='bold'] </xsd:simpleContent> highlight[@style='bold'] </xsd:complexType> highlight[@style='bold'] </xsd:element>
highlight[@style='ital'] This macro is also defined in macro #4 (Figure xref 4)
highlight[@style='ital'] This macro is invoked in file #3 (Figure xref 18)
The “event” element should contain no more than one each of the elements “open”, “high”, “low”, “close”, and “volume”. The order is not important. An “event” does not need to contain all of these elements, as any of the values could be undefined or unavailable. So each of the financial elements occurs 0 or 1 times in an “event”, in any order.
It is possible, but tedious, to create an XML DTD rule that enumerates all of the possible content options for “event”. Instead, it is simpler to make the DTD stricter than the W3C XML Schema, and have it enforce an (unnecessary) order on the financial elements.
highlight[@style='bold'] xmLP Macro “DTD: event” [ highlight[@style='bold'] #8] =
<!ELEMENT event (open?, high?, low?, close?, volume?)>
highlight[@style='ital'] This macro is also defined in macro #10 (Figure xref 10)
highlight[@style='ital'] This macro is invoked in file #1 (Figure xref 14)
The “event” element is also required to have a “date” attribute to date the values that it contains.
highlight[@style='bold'] xmLP Macro “DTD: date pseudo-definition” [ highlight[@style='bold'] #9] =
<!ENTITY % Date "CDATA">
highlight[@style='ital'] This macro is invoked in macro #10 (Figure xref 10)
highlight[@style='bold'] xmLP Macro “DTD: event” [ highlight[@style='bold'] #10] =
highlight[@style='ital'] {DTD: date pseudo-definition[9], Figure xref 9}
<!ATTLIST event
date %Date; #REQUIRED>highlight[@style='ital'] This macro is also defined in macro #8 (Figure xref 8)
highlight[@style='ital'] This macro is invoked in file #1 (Figure xref 14)
W3C XML Schema supports the “0 or 1 times each in any order” rule using “xsd:all”.
highlight[@style='bold'] xmLP Macro “W3C XML Schema: event” [ highlight[@style='bold'] #11] =
<xsd:element name=" highlight[@style='bold'] event"> highlight[@style='bold'] <xsd:complexType> highlight[@style='bold'] <xsd:all> highlight[@style='bold'] <xsd:element ref=" highlight[@style='bold'] open"/> highlight[@style='bold'] <xsd:element ref=" highlight[@style='bold'] high"/> highlight[@style='bold'] <xsd:element ref=" highlight[@style='bold'] low"/> highlight[@style='bold'] <xsd:element ref=" highlight[@style='bold'] close"/> highlight[@style='bold'] <xsd:element ref=" highlight[@style='bold'] volume"/> highlight[@style='bold'] </xsd:all> highlight[@style='bold'] <xsd:attribute name=" highlight[@style='bold'] date" use=" highlight[@style='bold'] required" type=" highlight[@style='bold'] xsd:date"/> highlight[@style='bold'] </xsd:complexType> highlight[@style='bold'] </xsd:element>
highlight[@style='ital'] This macro is invoked in file #3 (Figure xref 18)
To represent a time series, a number of events are contained within a “timeSeries” element. A time series can contain any number of events, even zero. The dates of the events within a time series must be unique, but neither of the schema technologies used here can enforce that condition.
highlight[@style='bold'] xmLP Macro “DTD: timeSeries” [ highlight[@style='bold'] #12] =
<!ELEMENT timeSeries (event*)>
highlight[@style='ital'] This macro is invoked in file #1 (Figure xref 14)
highlight[@style='bold'] xmLP Macro “W3C XML Schema: timeSeries” [ highlight[@style='bold'] #13] =
<xsd:element name=" highlight[@style='bold'] timeSeries"> highlight[@style='bold'] <xsd:complexType> highlight[@style='bold'] <xsd:sequence> highlight[@style='bold'] <xsd:element ref=" highlight[@style='bold'] event" minOccurs=" highlight[@style='bold'] 0" maxOccurs=" highlight[@style='bold'] unbounded"/> highlight[@style='bold'] </xsd:sequence> highlight[@style='bold'] </xsd:complexType> highlight[@style='bold'] </xsd:element>
highlight[@style='ital'] This macro is invoked in file #3 (Figure xref 18)
With all of its required sections now explained, the DTD is assembled from the component macros as follows.
highlight[@style='bold'] xmLP File [ highlight[@style='bold'] #1]: src/timeseries.dtd =
<?xml version="1.0" encoding="utf-8"?> highlight[@style='ital'] {DTD: financial elements[3,6], Figures xref 3, xref 6} highlight[@style='ital'] {DTD: event[8], Figure xref 8} highlight[@style='ital'] {DTD: timeSeries[12], Figure xref 12}
highlight[@style='ital'] Note: this shows how xmLP file macro definitions are “woven” into the documentation. These define the source files that are generated during “tangling”.
This produces the following source file:
<?xml version="1.0" encoding="utf-8"?> <!ENTITY % Decimal "#PCDATA"> <!ELEMENT open (%Decimal;)> <!ELEMENT high (%Decimal;)> <!ELEMENT low (%Decimal;)> <!ELEMENT close (%Decimal;)> <!ENTITY % NonNegativeInteger "#PCDATA"> <!ENTITY % PositiveInteger "CDATA"> <!ELEMENT volume (%NonNegativeInteger;)> <!ATTLIST volume multiplier %PositiveInteger; "1"> <!ELEMENT event (open?, high?, low?, close?, volume?)> <!ENTITY % Date "CDATA"> <!ATTLIST event date %Date; #REQUIRED> <!ELEMENT timeSeries (event*)>
The sample instance file using the DTD (and containing just a single event) requires an appropriate “DOCTYPE” declaration.
highlight[@style='bold'] xmLP File [ highlight[@style='bold'] #2]: src/timeseries-dtd.xml =
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE timeSeries SYSTEM "timeseries.dtd"> <timeSeries> highlight[@style='bold'] highlight[@style='ital'] {Time Series Event Instance[1], Figure xref 1} highlight[@style='bold'] </timeSeries>
This produces the following source file:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE timeSeries SYSTEM "timeseries.dtd">
<timeSeries>
<event date="2002-02-20">
<open>85.70</open>
<high>92.10</high>
<low>81.37</low>
<close>86.05</close>
<volume multiplier="1000">811786</volume>
</event>
</timeSeries>highlight[@style='ital'] Note: the tangled source file was inserted into this document automatically using an acronym.grp acronym XSLT expansion [Extensible Stylesheet Language Transformations] script.
The W3C XML Schema is assembled from the component macros as follows.
highlight[@style='bold'] xmLP File [ highlight[@style='bold'] #3]: src/timeseries.xsd =
<?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> highlight[@style='bold'] highlight[@style='ital'] {W3C XML Schema: financial elements[4,7], Figures xref 4, xref 7} highlight[@style='bold'] highlight[@style='ital'] {W3C XML Schema: event[11], Figure xref 11} highlight[@style='bold'] highlight[@style='ital'] {W3C XML Schema: timeSeries[13], Figure xref 13} highlight[@style='bold'] </xsd:schema>
This produces the following source file:
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="open" type="xsd:decimal"/>
<xsd:element name="high" type="xsd:decimal"/>
<xsd:element name="low" type="xsd:decimal"/>
<xsd:element name="close" type="xsd:decimal"/>
<xsd:element name="volume">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:nonNegativeInteger">
<xsd:attribute name="multiplier" default="1" type="xsd:positiveInteger"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<xsd:element name="event">
<xsd:complexType>
<xsd:all>
<xsd:element ref="open"/>
<xsd:element ref="high"/>
<xsd:element ref="low"/>
<xsd:element ref="close"/>
<xsd:element ref="volume"/>
</xsd:all>
<xsd:attribute name="date" use="required" type="xsd:date"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="timeSeries">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="event" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>The sample instance file using the W3C XML Schema (and containing just a single event) requires an appropriate “schemaLocation” declaration (in this case a “noNamespaceSchemaLocation” declaration). The “xmlns:xsi” declaration is suppressed for brevity, but generated in the actual source file.
highlight[@style='bold'] xmLP File [ highlight[@style='bold'] #4]: src/timeseries-schema.xml =
<?xml version="1.0" encoding="utf-8"?> <timeSeries xsi:noNamespaceSchemaLocation="timeseries.xsd"> highlight[@style='bold'] highlight[@style='ital'] {Time Series Event Instance[1], Figure xref 1} highlight[@style='bold'] </timeSeries>
This produces the following source file:
<?xml version="1.0" encoding="utf-8"?>
<timeSeries xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="timeseries.xsd">
<event date="2002-02-20">
<open>85.70</open>
<high>92.10</high>
<low>81.37</low>
<close>86.05</close>
<volume multiplier="1000">811786</volume>
</event>
</timeSeries>What you have read in this section is a literate program which defines and describes the DTD and W3C XML Schema fragments required to handle a real-world problem. The code fragments in the macros appear within a human-readable context that quickly clarifies what those fragments do, why they are needed, and what their limitations are. Being able to view DTD fragments beside their equivalent Schema fragments makes it easy to compare the two approaches in detail.
Having established the nature of a literate program, the way in which the “xmLP” tool supports acronym LitProg can be described. Traditional acronym LitProg tools provide the following: