Extreme Markup Languages
front
title

xmLP — a Literate Programming Tool for XML & Text

fname Anthony B. surname Coates [ affil Financial XML Specialist]

fname Zarella surname Rendon [ affil XML Factor & W3C XSL WG]
body
section
title

1 Overview of Literate Programming

para

The legendary computer luminary Donald Knuth bibref [Knuth WWW] once asked why nobody ever takes a computer program to bed to read. The answer was simple — most computer programs are unreadable. Comments are too few and too far between, and the simple process of reading becomes the painful process of reverse engineering. Knuth's solution was that programs should highlight[@style='bold'] not be 90% code and 10% comments, they should be 90% descriptive text and 10% code. This was the start of Literate Programming, or “ acronym LitProg bibref [LP WWW].

subsec1
title

1.1 Documentation of Programs

para

In the early days of computer programming, documentation within the source code of a program was an unaffordable luxury. Some systems actively stripped comments from code in order to save storage. Although storage no longer tends to be a significant limit, programmers still spend little time on documenting their code. There are several reasons for this. One is that programmers tend to be judged by whether their code works (or appears to), not by how well it is commented. Further, if project managers need to cut something from a development schedule, documentation is an item that can be removed from the schedule without forcibly affecting the delivered functionality.

para

On a personal level, during the fleeting moments that a programmer writes a particular piece of code, that code can appear to be so clear and obvious that documentation seems all but unnecessary. It is only when the programmer returns to the code after a month or more, all moments of clarity lost in the past, that the impact becomes obvious. The situation is even worse when a different programmer has to work on the code. With no documentation to read, the code needs to be reverse engineered in order to understand its intent and logic, but this is difficult to do with sufficient accuracy. Too often, the code is rewritten in the belief that rewriting is easier than reverse engineering. This squanders any experience gained by the original programmer, while simultaneously introducing new bugs into the code base.

para

No documentation system can force an undisciplined or lazy programmer to document their code, but there are things that can be done to make the task less onerous. acronym LitProg tools, which allow code fragments to be interspersed within the documentation, put the documentation for a piece of code right beside that code, in the same file. Compared to having code and documentation in separate files, this greatly increases the chance that as code is modified, the documentation is also modified to keep it up to date.

subsec1
title

1.2 LitProg

para

A “literate program” (or “literate document”) is a human readable document containing short sections of code (known variously as “macros”, “chunks”, or “fragments”), written and ordered so that it can be understood easily by people. By contrast, most computer programs are ordered purely for the benefit of program compilers. In a literate program, source code fragments (or any textual fragments) can appear in any suitable order. When the literate document is processed, the code fragments are assembled into the order required to produce the source files by “tangling” the document, to introduce Knuth's terminology. Literate documents are also “woven” to convert them into a final documentation format. Traditionally the documentation format was TeX or LaTeX, but these days it can also be acronym.grp  acronym (X)HTML expansion  [(Extensible) Hypertext Markup Language], acronym.grp  acronym XSLFO expansion  [Extensible Stylesheet Language Formatting Objects], or acronym.grp  acronym PDF expansion  [Portable Document Format, aka Acrobat].

subsec1
title

1.3 LitProg Tools

para

What follows is a non-exhaustive list of acronym LitProg tools. All of these tools predate XML. References to further information can be found in the bibliography.

deflist
WEB
para

WEB was Knuth's original acronym LitProg system for Pascal. WEB directly marks up many of the syntactic features of Pascal, so that in creating a valid WEB document, a programmer has pre-validated much of the syntax of the code fragments. Note that WEB was written before the acronym.grp  acronym WWW expansion  [World Wide Web] came to prominence. Knuth's choice of name relates to his ideas of tangling and weaving. bibref [Knuth 92]

CWEB
para

Also produced by Knuth's group, CWEB supported C rather than Pascal. It has now been extended to handle C++ and Java as well. bibref [CWEB WWW]

FWEB
para

FWEB is a multi-language acronym LitProg tool which is similar in spirit to WEB & CWEB. It was the first acronym LitProg tool to support Fortran. bibref [FWEB WWW]

noweb
para

noweb is the most well-known of the language-insensitive acronym LitProg tools. These tools do not provide any syntactic support for any computer languages, and treat all code fragments as nothing more than text fragments. Language-insensitive acronym LitProg tools can be used for any (textual) programming language or control files, so their loss of syntactic support is compensated for by a gain in flexibility. bibref [noweb WWW]

FunnelWeb
para

FunnelWeb is another language-insensitive tool. Its unique feature is that its macros can have parameters, providing some of the power of a language pre-processor. xmLP, the XML acronym LitProg tool described in this paper, takes its inspiration most strongly from FunnelWeb. bibref [FunnelWeb WWW]

SWEB
para

SWEB is C. Michael Sperberg-McQueen's SGML acronym LitProg tool. It was the first acronym LitProg tool whose document format could feasibly be parsed by something other than the tool itself. bibref [SWEB WWW]

subsec1
title

1.4 Javadoc

para

Sun's Javadoc is a powerful tool for generating reference documentation from comments embedded in Java code, and has inspired similar tools for other programming languages. Javadoc is ideal for documenting the available methods & classes in a Java acronym.grp  acronym API expansion  [Application Programming Interface]. However, Javadoc is not a LitProg tool.

para

The documentation that Javadoc produces extends down only the the method signatures. It does not provide any support for documenting the workings of individual methods. It does not allow the order in which methods/classes are presented to be controlled to improve readability. These are not criticisms, just observations. There is no such thing as one size fits all documentation. Indeed, there are at least 3 major classes of documentation:

seqlist
  1. para
    User (functional) documentation;
  2. para
    Detailed documentation within methods (functions) of the what, why, and how of the code;
  3. para
    Reference documentation which lists the available methods (functions) in an acronym API (library).
para

acronym LitProg tools do a good job of generating detailed documentation. Javadoc does a good job of generating reference documentation. Neither provides sufficient support for generating good user documentation. So, not all documentation is the same, and no documentation tool is suitable for every type of documentation. This paper focuses on acronym LitProg tools, and hence on the problem of creating detailed documentation of the workings of program code.

section
title

2 A LitProg Scenario

para

This paper was written as a literate program, using an extended version of the “Extreme Markup Languages 2002” DTD. The literate document was processed twice using an XML LitProg tool, “xmLP” bibref [xmLP WWW], which is described in this paper. The literate document was first “tangled”, where the macros were expanded to produce the source files. It was then “woven”, where the macros were cross-referenced and this document was generated. Both processes need to resolve the macros in the document, but for different purposes.

para

highlight[@style='bold'] No source code fragments were copied into the literate document, because the literate document is the original source material from which the source code files are produced.

para

The following scenario illustrates how literate programs can be a valuable tool for maintaining synchronized files. highlight[@style='bold'] Note: if at any stage you want to jump ahead to read about the acronym LitProg tool “xmLP”, you can go directly to Section xref 3. However, you are encouraged to read this section first to get a sense of what a acronym LitProg tool needs to achieve.

subsec1
title

2.1

No text content!

para

highlight[@style='ital'] This demonstrates the documentation created by xmLP. So this section shows the output format. The input format is discussed in Section xref 3.

para

Consider the problem of representing the way a particular stock market share price changes over time (a “time series”). Taking a simplified view, a single daily price summary, which is an “event” from the time series, can be written as

figure
Figure 1
 highlight[@style='bold'] xmLP Macro “Time Series Event Instance” [ highlight[@style='bold'] #1] =
<event date=" highlight[@style='bold'] 2002-02-20"> highlight[@style='bold'] 
  <open> highlight[@style='bold'] 85.70</open> highlight[@style='bold'] 
  <high> highlight[@style='bold'] 92.10</high> highlight[@style='bold'] 
  <low> highlight[@style='bold'] 81.37</low> highlight[@style='bold'] 
  <close> highlight[@style='bold'] 86.05</close> highlight[@style='bold'] 
  <volume multiplier=" highlight[@style='bold'] 1000"> highlight[@style='bold'] 811786</volume> highlight[@style='bold'] 
</event>

highlight[@style='ital'] This macro is invoked in file #2 (Figure xref 16)

highlight[@style='ital'] This macro is invoked in file #4 (Figure xref 20)

para

highlight[@style='ital'] Note: this shows how xmLP XML macro definitions are “woven” into the documentation. Note the automatically generated cross-references.

para

Here, “date” is the date of the event, “open” is the opening (starting) price for that day, “high” and “low” are the maximum and minimum prices for that day (respectively), and “close” is the closing (final) price for that day. The “volume” is the number of shares traded during that day, and is commonly given in terms of thousands of shares traded.

para

The purpose of this example is to produce highlight[@style='bold'] both a DTD and a W3C XML Schema to describe this event structure, within the limits of what each of these schema technologies can do. A knowledge of DTD and W3C XML Schema constructs is assumed.

subsec1
title

2.2 open, high, low, close

para

The “open”, “high”, “low”, and “close” elements each contain a decimal number. In the DTD, decimal numbers can only be represented as unconstrained text. However, a suitably named entity can be used to suggest to human readers that decimal values should be used.

figure
Figure 2
 highlight[@style='bold'] xmLP Macro “DTD: decimal pseudo-definition” [ highlight[@style='bold'] #2] =
<!ENTITY % Decimal "#PCDATA">

highlight[@style='ital'] This macro is invoked in macro #3 (Figure xref 3)

para

highlight[@style='ital'] Note: this shows how xmLP text macro definitions are “woven” into the documentation.

para

From a machine perspective, this is nothing more than a syntactic nicety. However, it makes maintenance easier for humans (by making the intent clear), and that makes it worth doing.

figure
Figure 3
 highlight[@style='bold'] xmLP Macro “DTD: financial elements” [ highlight[@style='bold'] #3] =
 highlight[@style='ital'] {DTD: decimal pseudo-definition[2], Figure  xref 2}
<!ELEMENT open  (%Decimal;)>
<!ELEMENT high  (%Decimal;)>
<!ELEMENT low   (%Decimal;)>
<!ELEMENT close (%Decimal;)>

highlight[@style='ital'] This macro is also defined in macro #6 (Figure xref 6)

highlight[@style='ital'] This macro is invoked in file #1 (Figure xref 14)

para

highlight[@style='ital'] Note: this shows how invocations (expansions) of one macro inside another are indicated and cross-referenced in the documentation.

para

highlight[@style='ital'] Note: this macro is defined in multiple sections that are concatenated in document order to produce the complete content of the macro.

para

The W3C XML Schema datatypes contain a suitable decimal type, “xsd:decimal”, so the Schema equivalent is straightforward.

figure
Figure 4
 highlight[@style='bold'] xmLP Macro “W3C XML Schema: financial elements” [ highlight[@style='bold'] #4] =
<xsd:element name=" highlight[@style='bold'] open" type=" highlight[@style='bold'] xsd:decimal"/>
<xsd:element name=" highlight[@style='bold'] high" type=" highlight[@style='bold'] xsd:decimal"/>
<xsd:element name=" highlight[@style='bold'] low" type=" highlight[@style='bold'] xsd:decimal"/>
<xsd:element name=" highlight[@style='bold'] close" type=" highlight[@style='bold'] xsd:decimal"/>

highlight[@style='ital'] This macro is also defined in macro #7 (Figure xref 7)

highlight[@style='ital'] This macro is invoked in file #3 (Figure xref 18)

subsec1
title

2.3 volume

para

The “volume” element contains a non-negative integer value (number of shares traded). It also has a positive integer “multiplier” attribute, since the volume is typically given in units of thousands of shares. As before, in the DTD the values are simply unconstrained text.

figure
Figure 5
 highlight[@style='bold'] xmLP Macro “DTD: integer pseudo-definitions” [ highlight[@style='bold'] #5] =
<!ENTITY % NonNegativeInteger "#PCDATA">
<!ENTITY % PositiveInteger "CDATA">

highlight[@style='ital'] This macro is invoked in macro #6 (Figure xref 6)

para

The DTD nonetheless allows a default value of ‘1’ to be defined for the “multiplier” attribute, so that its use with the “volume” element is optional.

figure
Figure 6
 highlight[@style='bold'] xmLP Macro “DTD: financial elements” [ highlight[@style='bold'] #6] =
 highlight[@style='ital'] {DTD: integer pseudo-definitions[5], Figure  xref 5}
<!ELEMENT volume (%NonNegativeInteger;)>
<!ATTLIST volume
  multiplier %PositiveInteger; "1">

highlight[@style='ital'] This macro is also defined in macro #3 (Figure xref 3)

highlight[@style='ital'] This macro is invoked in file #1 (Figure xref 14)

para

The W3C XML Schema data types contain the necessary integer data types. Although the Schema version is longer, it defines the same structure for the “volume” element.

figure
Figure 7
 highlight[@style='bold'] xmLP Macro “W3C XML Schema: financial elements” [ highlight[@style='bold'] #7] =
<xsd:element name=" highlight[@style='bold'] volume"> highlight[@style='bold'] 
  <xsd:complexType> highlight[@style='bold'] 
    <xsd:simpleContent> highlight[@style='bold'] 
      <xsd:extension base=" highlight[@style='bold'] xsd:nonNegativeInteger"> highlight[@style='bold'] 
        <xsd:attribute name=" highlight[@style='bold'] multiplier" default=" highlight[@style='bold'] 1" type=" highlight[@style='bold'] xsd:positiveInteger"/> highlight[@style='bold'] 
      </xsd:extension> highlight[@style='bold'] 
    </xsd:simpleContent> highlight[@style='bold'] 
  </xsd:complexType> highlight[@style='bold'] 
</xsd:element>

highlight[@style='ital'] This macro is also defined in macro #4 (Figure xref 4)

highlight[@style='ital'] This macro is invoked in file #3 (Figure xref 18)

subsec1
title

2.4 event

para

The “event” element should contain no more than one each of the elements “open”, “high”, “low”, “close”, and “volume”. The order is not important. An “event” does not need to contain all of these elements, as any of the values could be undefined or unavailable. So each of the financial elements occurs 0 or 1 times in an “event”, in any order.

para

It is possible, but tedious, to create an XML DTD rule that enumerates all of the possible content options for “event”. Instead, it is simpler to make the DTD stricter than the W3C XML Schema, and have it enforce an (unnecessary) order on the financial elements.

figure
Figure 8
 highlight[@style='bold'] xmLP Macro “DTD: event” [ highlight[@style='bold'] #8] =
<!ELEMENT event (open?, high?, low?, close?, volume?)>

highlight[@style='ital'] This macro is also defined in macro #10 (Figure xref 10)

highlight[@style='ital'] This macro is invoked in file #1 (Figure xref 14)

para

The “event” element is also required to have a “date” attribute to date the values that it contains.

figure
Figure 9
 highlight[@style='bold'] xmLP Macro “DTD: date pseudo-definition” [ highlight[@style='bold'] #9] =
<!ENTITY % Date "CDATA">

highlight[@style='ital'] This macro is invoked in macro #10 (Figure xref 10)

figure
Figure 10
 highlight[@style='bold'] xmLP Macro “DTD: event” [ highlight[@style='bold'] #10] =
 highlight[@style='ital'] {DTD: date pseudo-definition[9], Figure  xref 9}
<!ATTLIST event
  date %Date; #REQUIRED>

highlight[@style='ital'] This macro is also defined in macro #8 (Figure xref 8)

highlight[@style='ital'] This macro is invoked in file #1 (Figure xref 14)

para

W3C XML Schema supports the “0 or 1 times each in any order” rule using “xsd:all”.

figure
Figure 11
 highlight[@style='bold'] xmLP Macro “W3C XML Schema: event” [ highlight[@style='bold'] #11] =
<xsd:element name=" highlight[@style='bold'] event"> highlight[@style='bold'] 
  <xsd:complexType> highlight[@style='bold'] 
    <xsd:all> highlight[@style='bold'] 
      <xsd:element ref=" highlight[@style='bold'] open"/> highlight[@style='bold'] 
      <xsd:element ref=" highlight[@style='bold'] high"/> highlight[@style='bold'] 
      <xsd:element ref=" highlight[@style='bold'] low"/> highlight[@style='bold'] 
      <xsd:element ref=" highlight[@style='bold'] close"/> highlight[@style='bold'] 
      <xsd:element ref=" highlight[@style='bold'] volume"/> highlight[@style='bold'] 
    </xsd:all> highlight[@style='bold'] 
    <xsd:attribute name=" highlight[@style='bold'] date" use=" highlight[@style='bold'] required" type=" highlight[@style='bold'] xsd:date"/> highlight[@style='bold'] 
  </xsd:complexType> highlight[@style='bold'] 
</xsd:element>

highlight[@style='ital'] This macro is invoked in file #3 (Figure xref 18)

subsec1
title

2.5 timeSeries

para

To represent a time series, a number of events are contained within a “timeSeries” element. A time series can contain any number of events, even zero. The dates of the events within a time series must be unique, but neither of the schema technologies used here can enforce that condition.

figure
Figure 12
 highlight[@style='bold'] xmLP Macro “DTD: timeSeries” [ highlight[@style='bold'] #12] =
<!ELEMENT timeSeries (event*)>

highlight[@style='ital'] This macro is invoked in file #1 (Figure xref 14)

figure
Figure 13
 highlight[@style='bold'] xmLP Macro “W3C XML Schema: timeSeries” [ highlight[@style='bold'] #13] =
<xsd:element name=" highlight[@style='bold'] timeSeries"> highlight[@style='bold'] 
  <xsd:complexType> highlight[@style='bold'] 
    <xsd:sequence> highlight[@style='bold'] 
      <xsd:element ref=" highlight[@style='bold'] event" minOccurs=" highlight[@style='bold'] 0" maxOccurs=" highlight[@style='bold'] unbounded"/> highlight[@style='bold'] 
    </xsd:sequence> highlight[@style='bold'] 
  </xsd:complexType> highlight[@style='bold'] 
</xsd:element>

highlight[@style='ital'] This macro is invoked in file #3 (Figure xref 18)

subsec1
title

2.6 DTD Source Files

para

With all of its required sections now explained, the DTD is assembled from the component macros as follows.

figure
Figure 14
 highlight[@style='bold'] xmLP File [ highlight[@style='bold'] #1]: src/timeseries.dtd =
<?xml version="1.0" encoding="utf-8"?>
 highlight[@style='ital'] {DTD: financial elements[3,6], Figures  xref 3,  xref 6}
 highlight[@style='ital'] {DTD: event[8], Figure  xref 8}
 highlight[@style='ital'] {DTD: timeSeries[12], Figure  xref 12}
para

highlight[@style='ital'] Note: this shows how xmLP file macro definitions are “woven” into the documentation. These define the source files that are generated during “tangling”.

para

This produces the following source file:

figure
Figure 15
<?xml version="1.0" encoding="utf-8"?>


<!ENTITY % Decimal "#PCDATA">

<!ELEMENT open  (%Decimal;)>
<!ELEMENT high  (%Decimal;)>
<!ELEMENT low   (%Decimal;)>
<!ELEMENT close (%Decimal;)>


<!ENTITY % NonNegativeInteger "#PCDATA">
<!ENTITY % PositiveInteger "CDATA">

<!ELEMENT volume (%NonNegativeInteger;)>
<!ATTLIST volume
  multiplier %PositiveInteger; "1">


<!ELEMENT event (open?, high?, low?, close?, volume?)>


<!ENTITY % Date "CDATA">

<!ATTLIST event
  date %Date; #REQUIRED>


<!ELEMENT timeSeries (event*)>
para

The sample instance file using the DTD (and containing just a single event) requires an appropriate “DOCTYPE” declaration.

figure
Figure 16
 highlight[@style='bold'] xmLP File [ highlight[@style='bold'] #2]: src/timeseries-dtd.xml =
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE timeSeries SYSTEM "timeseries.dtd">
<timeSeries> highlight[@style='bold'] 
   highlight[@style='ital'] {Time Series Event Instance[1], Figure  xref 1} highlight[@style='bold'] 
</timeSeries>
para

This produces the following source file:

figure
Figure 17
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE timeSeries SYSTEM "timeseries.dtd">
<timeSeries>
  <event date="2002-02-20">
    <open>85.70</open>
    <high>92.10</high>
    <low>81.37</low>
    <close>86.05</close>
    <volume multiplier="1000">811786</volume>
  </event>
</timeSeries>
para

highlight[@style='ital'] Note: the tangled source file was inserted into this document automatically using an acronym.grp  acronym XSLT expansion  [Extensible Stylesheet Language Transformations] script.

subsec1
title

2.7 W3C XML Schema Source Files

para

The W3C XML Schema is assembled from the component macros as follows.

figure
Figure 18
 highlight[@style='bold'] xmLP File [ highlight[@style='bold'] #3]: src/timeseries.xsd =
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> highlight[@style='bold'] 
   highlight[@style='ital'] {W3C XML Schema: financial elements[4,7], Figures  xref 4,  xref 7} highlight[@style='bold'] 
   highlight[@style='ital'] {W3C XML Schema: event[11], Figure  xref 11} highlight[@style='bold'] 
   highlight[@style='ital'] {W3C XML Schema: timeSeries[13], Figure  xref 13} highlight[@style='bold'] 
</xsd:schema>
para

This produces the following source file:

figure
Figure 19
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="open" type="xsd:decimal"/>
  <xsd:element name="high" type="xsd:decimal"/>
  <xsd:element name="low" type="xsd:decimal"/>
  <xsd:element name="close" type="xsd:decimal"/>
  <xsd:element name="volume">
    <xsd:complexType>
      <xsd:simpleContent>
        <xsd:extension base="xsd:nonNegativeInteger">
          <xsd:attribute name="multiplier" default="1" type="xsd:positiveInteger"/>
        </xsd:extension>
      </xsd:simpleContent>
    </xsd:complexType>
  </xsd:element>
  <xsd:element name="event">
    <xsd:complexType>
      <xsd:all>
        <xsd:element ref="open"/>
        <xsd:element ref="high"/>
        <xsd:element ref="low"/>
        <xsd:element ref="close"/>
        <xsd:element ref="volume"/>
      </xsd:all>
      <xsd:attribute name="date" use="required" type="xsd:date"/>
    </xsd:complexType>
  </xsd:element>
  <xsd:element name="timeSeries">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element ref="event" minOccurs="0" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>
para

The sample instance file using the W3C XML Schema (and containing just a single event) requires an appropriate “schemaLocation” declaration (in this case a “noNamespaceSchemaLocation” declaration). The “xmlns:xsi” declaration is suppressed for brevity, but generated in the actual source file.

figure
Figure 20
 highlight[@style='bold'] xmLP File [ highlight[@style='bold'] #4]: src/timeseries-schema.xml =
<?xml version="1.0" encoding="utf-8"?>
<timeSeries xsi:noNamespaceSchemaLocation="timeseries.xsd"> highlight[@style='bold'] 
   highlight[@style='ital'] {Time Series Event Instance[1], Figure  xref 1} highlight[@style='bold'] 
</timeSeries>
para

This produces the following source file:

figure
Figure 21
<?xml version="1.0" encoding="utf-8"?>
<timeSeries xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="timeseries.xsd">
  <event date="2002-02-20">
    <open>85.70</open>
    <high>92.10</high>
    <low>81.37</low>
    <close>86.05</close>
    <volume multiplier="1000">811786</volume>
  </event>
</timeSeries>
subsec1
title

2.8 Summary

para

What you have read in this section is a literate program which defines and describes the DTD and W3C XML Schema fragments required to handle a real-world problem. The code fragments in the macros appear within a human-readable context that quickly clarifies what those fragments do, why they are needed, and what their limitations are. Being able to view DTD fragments beside their equivalent Schema fragments makes it easy to compare the two approaches in detail.

section
title

3 Using xmLP

subsec1
title

3.1 xmLP Goals & Approach

para

Having established the nature of a literate program, the way in which the “xmLP” tool supports acronym LitProg can be described. Traditional acronym LitProg tools provide the following:

seqlist
  1. para
    A complete (but tool-specific) markup language for literate programs, including both code and documentation sections;
  2. para
    (Optionally) One or more code section markups for specific programming languages, or for other source file formats;
  3. para
    The ability to “tangle” literate programs (assemble code sections) to generate the source files;
  4. para
    The ability to “weave” literate programs into a final documentation format.
para

The advent of XML made it less attractive and less necessary to define and support custom markup languages for literate documents. To take advantage of XML, xmLP takes the following approach:

seqlist
  1. para
    Rather than define its own complete document markup, xmLP only defines a handful of XML elements which are intended to be used in conjunction with any suitable XML document markup, e.g. acronym XHTML or DocBook;
  2. para
    In order to keep xmLP from being constrained to a specific programming language, xmLP follows the traditional acronym LitProg tools “FunnelWeb” and “noweb” (among others). These tools treat all code as text. This can be an advantage when you need to include a programming language or other file format that is not supported by your acronym LitProg tool of choice. xmLP adds a slight extra — support for well-formed XML fragments. These make it easier to create correctly generate well-formed XML, by removing the risk of unmatched open or close tags;
  3. para
    xmLP does not weave literate programs into a final documentation format as traditional acronym LitProg tools do. Instead, the xmLP weaver adds just the contextual information needed to allow an acronym XSLT stylesheet to format the code sections in the literate program.
para

Primarily, xmLP provides the highlight[@style='bold'] business logic to deal with code macros, both for tangling and weaving. End users can concentrate on writing stylesheets that define the look and layout of their documentation, without having to worry about the semantics of macros and macro invocation, and without having to worry about building cross-reference information for the macros. These things are handled by xmLP.

subsec1
title

3.2 Defining a macro

para

So, what does the xmLP markup look like? In the literate document from which this paper was woven, the macro corresponding to Figure xref 2 is actually written as

figure
Figure 22
<lp:macro lp:usage=" highlight[@style='bold'] once" lp:final=" highlight[@style='bold'] true"> highlight[@style='bold'] 
  <lp:name> highlight[@style='bold'] DTD: decimal pseudo-definition</lp:name> highlight[@style='bold'] 
  <lp:text> highlight[@style='bold'] 
<!ENTITY % Decimal "#PCDATA">
</lp:text> highlight[@style='bold'] 
</lp:macro>
para

xmLP defines the following elements and attributes for creating macros. As mentioned previously, xmLP elements and attribute definitions are added to a DTD/Schema/etc. to make them available directly in the authoring of a document. A fragment DTD containing the complete list of xmLP elements and attributes is included in the appendix in Section xref 5.

deflist
highlight[@style='bold'] lp:macro
para

element: Indicates the definition of an xmLP macro.

highlight[@style='bold'] lp:usage
para

attribute: One of “never” or “once” (the default) or “multiple”. Used to indicate how often the macro is to be invoked (used). This proves to be a valuable quality-control (sanity checking) measure, and is taken from “FunnelWeb”.

highlight[@style='bold'] lp:final
para

attribute: One of “true” (the default) or “false”. If true, only this macro definition can have the given “lp:name”. If false, all macro definitions with the same “lp:name” are concatenated in document order to fully define the macro. Once again, taken from “FunnelWeb”.

highlight[@style='bold'] lp:name
para

element: The name of the macro being defined. In principle, the name may contain XML elements, so that MathML expressions and the like can be used in macro names. In practice, xmLP currently applies the XPath “normalize-space” function to the macro name to generate a simple text name that is then used to decide whether two macro definitions have the same macro name or not. This is not ideal, but sufficiently good for most purposes.

highlight[@style='bold'] lp:text
para

element: Indicates a plain text component of the macro definition.

subsec1
title

3.3 Invoking a macro

para

Invoking (calling or expanding) a macro is done with another xmLP element, “lp:invoke”. This element stands in place of the macro being called, and is entirely replaced by it during tangling (assembly of the generated source code files). Taking as an example the macro defined in Figures xref 3 and xref 6, this macro is written over highlight[@style='bold'] two concatenated definitions (note that “lp:final” is set to false), and uses “lp:invoke” to insert the contents of the macro defined in Figure xref 5:

figure
Figure 23
<lp:macro lp:usage=" highlight[@style='bold'] once" lp:final=" highlight[@style='bold'] false"> highlight[@style='bold'] 
  <lp:name> highlight[@style='bold'] DTD: financial elements</lp:name> highlight[@style='bold'] 
  <lp:text> highlight[@style='bold'] <lp:invoke>
  <lp:name> highlight[@style='bold'] DTD: decimal pseudo-definition</lp:name>
</lp:invoke> highlight[@style='bold'] 
<!ELEMENT open  (%Decimal;)>
<!ELEMENT high  (%Decimal;)>
<!ELEMENT low   (%Decimal;)>
<!ELEMENT close (%Decimal;)>

  </lp:text> highlight[@style='bold'] 
</lp:macro>
figure
Figure 24
<lp:macro lp:usage=" highlight[@style='bold'] once" lp:final=" highlight[@style='bold'] false"> highlight[@style='bold'] 
  <lp:name> highlight[@style='bold'] DTD: financial elements</lp:name> highlight[@style='bold'] 
  <lp:text> highlight[@style='bold'] <lp:invoke>
  <lp:name> highlight[@style='bold'] DTD: integer pseudo-definitions</lp:name>
</lp:invoke> highlight[@style='bold'] 
<!ELEMENT volume (%NonNegativeInteger;)>
<!ATTLIST volume
  multiplier %PositiveInteger; "1">

  </lp:text> highlight[@style='bold'] 
</lp:macro>
deflist
highlight[@style='bold'] lp:invoke
para

element: Invokes an xmLP macro by name, replacing the “lp:invoke” element completely with the macro contents.

subsec1
title

3.4 XML macros

para

As well as plain text, xmLP macros can contain well-formed XML using “lp:xml”. As previously mentioned, well-formed XML fragments remove the risk of unmatched open or close tags. You can use plain text fragments (“lp:text”) to generate XML if you want, and it is sometimes useful to do so, but you take the risk of having unmatched tags in your generated XML source files. The following example corresponds to the output shown in Figure xref 1.

figure
Figure 25
<lp:macro lp:usage=" highlight[@style='bold'] multiple" lp:final=" highlight[@style='bold'] true"> highlight[@style='bold'] 
  <lp:name> highlight[@style='bold'] Time Series Event Instance</lp:name> highlight[@style='bold'] 
  <lp:xml> highlight[@style='bold'] 
    <event date=" highlight[@style='bold'] 2002-02-20"> highlight[@style='bold'] 
      <open> highlight[@style='bold'] 85.70</open> highlight[@style='bold'] 
      <high> highlight[@style='bold'] 92.10</high> highlight[@style='bold'] 
      <low> highlight[@style='bold'] 81.37</low> highlight[@style='bold'] 
      <close> highlight[@style='bold'] 86.05</close> highlight[@style='bold'] 
      <volume multiplier=" highlight[@style='bold'] 1000"> highlight[@style='bold'] 811786</volume> highlight[@style='bold'] 
    </event> highlight[@style='bold'] 
  </lp:xml> highlight[@style='bold'] 
</lp:macro>
deflist
highlight[@style='bold'] lp:xml
para

element: Indicates a well-formed XML component of the macro definition.

subsec1
title

3.5 Defining output source files

para

xmLP uses the “lp:file” element to distinguish top-level macros that define output source files (there can only be one such macro for each output source file). These file macros have a file name rather than a macro name. Note that “lp:file” macros cannot be invoked by other macros. Namespaces are supported by xmLP using “lp:namespace”, as in the following example which corresponds to the output shown in Figure xref 18.

figure
Figure 26
<lp:file lp:filename=" highlight[@style='bold'] src/timeseries.xsd"> highlight[@style='bold'] 
  <lp:namespace lp:value=" highlight[@style='bold'] http://www.w3.org/2001/XMLSchema" lp:prefix=" highlight[@style='bold'] xsd"/> highlight[@style='bold'] 
  <lp:text> highlight[@style='bold'] 
<?xml version="1.0" encoding="utf-8"?>

</lp:text> highlight[@style='bold'] 
  <lp:xml> highlight[@style='bold'] 
    <xsd:schema> highlight[@style='bold'] <lp:invoke>
  <lp:name> highlight[@style='bold'] W3C XML Schema: financial elements</lp:name>
</lp:invoke> highlight[@style='bold'] <lp:invoke>
  <lp:name> highlight[@style='bold'] W3C XML Schema: event</lp:name>
</lp:invoke> highlight[@style='bold'] <lp:invoke>
  <lp:name> highlight[@style='bold'] W3C XML Schema: timeSeries</lp:name>
</lp:invoke> highlight[@style='bold'] 
    </xsd:schema> highlight[@style='bold'] 
  </lp:xml> highlight[@style='bold'] 
</lp:file>
deflist
highlight[@style='bold'] lp:file
para

element: Indicates the definition of an xmLP highlight[@style='bold'] file macro.

highlight[@style='bold'] lp:filename
para

attribute: The file name (or path) of the file being defined.

highlight[@style='bold'] lp:namespace
para

element: Indicates that a namespace declaration should be added to the tangled XML.

highlight[@style='bold'] lp:prefix
para

attribute: The namespace prefix to use.

highlight[@style='bold'] lp:value
para

attribute: The namespace identifier (typically a URI).

para

To support W3C XML Schema, it is sometimes necessary to specify a Schema location using “lp:schemaLocation”, as in the following example which corresponds to the output shown in Figure xref 20.

figure
Figure 27
<lp:file lp:filename=" highlight[@style='bold'] src/timeseries-schema.xml"> highlight[@style='bold'] 
  <lp:schemaLocation lp:namespace=" highlight[@style='bold'] " lp:location=" highlight[@style='bold'] timeseries.xsd"/> highlight[@style='bold'] 
  <lp:text> highlight[@style='bold'] 
<?xml version="1.0" encoding="utf-8"?>

</lp:text> highlight[@style='bold'] 
  <lp:xml> highlight[@style='bold'] 
    <timeSeries> highlight[@style='bold'] <lp:invoke>
  <lp:name> highlight[@style='bold'] Time Series Event Instance</lp:name>
</lp:invoke> highlight[@style='bold'] 
    </timeSeries> highlight[@style='bold'] 
  </lp:xml> highlight[@style='bold'] 
</lp:file>
deflist
highlight[@style='bold'] lp:schemaLocation
para

element: Indicates that a (W3C XML) Schema location declaration should be added to the tangled XML.

highlight[@style='bold'] lp:namespace
para

attribute: The namespace identifier (typically a URI). Can be empty.

highlight[@style='bold'] lp:location
para

attribute: The Schema location URI.

subsec1
title

3.6 Implementation of xmLP

para

The current implementation of xmLP (version 1.1) is written as 600 lines of acronym XSLT (plus stylesheets for particular formats like acronym XHTML). This may change in future implementations. A potential improvement to xmLP would be to introduce parameterized macros (in the manner of “FunnelWeb”), but it is yet to be decided whether this is best done in acronym XSLT or in a more general purpose programming language.

section
title

4 Conclusion

para

This paper has introduced literate programming, and indeed this paper is a literate program itself. It has demonstrated how programs (and other source files) can be defined within the natural flow of a human-readable document, rather than in the flow defined by a compiler. It has also introduced a simple acronym LitProg tool, xmLP, which can be used to turn any XML document into a literate document.

para

The source files for this paper will be available at

verbatim
http://xmLP.sourceforge.net/2002/extreme/

section
title

5 Appendix — a fragment DTD for xmLP

para

This DTD fragment is non-normative.

figure
Figure 28
<?xml version='1.0' encoding='UTF-8' ?>

<!-- PUBLIC "+//IDN xmLP.org//DTD Sample Module for xmLP//EN" -->

<!-- The name of an "xmLP" macro. -->
<!ELEMENT lp:name ANY>

<!-- An invocation of an "xmLP" macro. -->
<!ELEMENT lp:invoke (lp:name)>

<!-- Text within an "xmLP" macro. -->
<!ELEMENT lp:text (#PCDATA | lp:invoke)*>

<!-- Balanced XML within an "xmLP" macro. -->
<!ELEMENT lp:xml ANY>

<!-- An "xmLP" macro. -->
<!ELEMENT lp:macro (lp:name , lp:namespace* , (lp:text | lp:xml)*)>
<!ATTLIST lp:macro lp:usage  (never | once | multiple )  'once'
                   lp:final  (true | false )  'true' >

<!-- An "xmLP" namespace declaration. -->
<!ELEMENT lp:namespace EMPTY>
<!ATTLIST lp:namespace lp:prefix NMTOKEN  #REQUIRED
                       lp:value  CDATA    #REQUIRED >

<!-- An "xmLP" schemaLocation declaration. -->
<!ELEMENT lp:schemaLocation EMPTY>
<!ATTLIST lp:schemaLocation lp:namespace CDATA #REQUIRED
                            lp:location CDATA #REQUIRED >

<!-- An "xmLP" output file. -->
<!ELEMENT lp:file ((lp:namespace | lp:schemaLocation)* , (lp:text | lp:xml)*)>
<!ATTLIST lp:file lp:filename CDATA  #REQUIRED >

<!-- "xmLP" block elements. -->
<!ENTITY % lpBlock "lp:macro | lp:file">
rear

bibliog

Bibliography

bibitem

bib [CWEB WWW] pub  CWEB [LitProg tool], http://sunburn.stanford.edu/~knuth/cweb.html

bibitem

bib [FunnelWeb WWW] pub  FunnelWeb [LitProg tool], http://www.ross.net/funnelweb/

bibitem

bib [FWEB WWW] pub  FWEB [LitProg tool], http://w3.pppl.gov/~krommes/fweb_toc.html

bibitem

bib [Knuth 92] pub  “Literate Programming” by Donald Knuth, 1992, ISBN 0-937073-80-6, http://www-cs-faculty.stanford.edu/~knuth/lp.html

bibitem

bib [Knuth WWW] pub  Donald Knuth, http://www-cs-faculty.stanford.edu/~knuth/

bibitem

bib [LP WWW] pub  Literate Programming, http://www.literateprogramming.com/, http://www.loria.fr/services/tex/english/litte.html

bibitem

bib [noweb WWW] pub  noweb [LitProg tool], http://www.eecs.harvard.edu/~nr/noweb/#top

bibitem

bib [SWEB WWW] pub  SWEB [LitProg tool], http://tigger.uic.edu/~cmsmcq/tech/sweb/sweb.html

bibitem

bib [xml-litprog-l] pub  xml-litprog-l [mailing list], http://groups.yahoo.com/group/xml-litprog-l/

bibitem

bib [xmLP WWW] pub  xmLP [LitProg tool], http://xmLP.sourceforge.net/



xmLP — a Literate Programming Tool for XML & Text

Anthony B. Coates [Financial XML Specialist]
[email protected]
Zarella Rendon [XML Factor & W3C XSL WG]
[email protected]