Mulberry home page

Mulberry Technologies, Inc.

Mulberry Home Page
Washington Technologies White Papers

Washington Technologies White Papers

Comparing HTML to SGML

Norma Haakonstadt, ArborText Inc.

Mark-up for HTML (or paper) is presentation or format focused. Content mark-up (for example, SGML) is our preferred choice for supporting multiple delivery (and presentation) requirements.

Markup for Electronic Presentation

In markup for electronic presentation the objective or goal for creating data in HTML is to support a single form of electronic delivery. The start-up costs are low because there are free or low-cost tools available, training is minimal because of a small tag set, and there is a minimal change to author productivity.

The return on investment is quite low for three main reasons. First, because HTML itself continues to change, your mark-up (and almost by default, the data) becomes obsolete very quickly. HTML 1 is different from HTML 2, etc. Because the mark-up becomes obsolete, you are actually creating legacy data. Most important is the inability to support reuse or recycling of information.

Although start-up costs may be low, the cost to produce an alternative output type is quite high because it requires conversion (which can cost anywhere from $1 to $50 a page) or reauthoring. Both are high in cost, and from a quality, accuracy, and efficiency standpoint should be avoided. Because there is usually no easy way to get from HTML to a two-column fully formatted paper delivery, for example, what generally results is having to support multiple sources of the same information -- one for each of the output types. Keeping these information sources in sync (because of last minute tweaking or the time it takes to get updates made to each source), is an expensive document maintenance issue.

The value to the customer is moderate. HTML provides only limited ways to traverse the data, no support for unique data presentations (for example, based on the reader skill or security level), and is limited to how much control, if any, you have on what amount or what combination of data appears on the screen at one time. If you have liability issues and concerns (for example, the warning notice must appear on the screen at the same time as the step to which it applies), this can be extremely problematic and could make HTML useless as a delivery mechanism.

Markup for Content

In comparing HTML to SGML, we state our goal for SGML as creating a single source of reusable information objects that can be combined to create a variety of publications and delivered in a variety of formats.

Compared to an HTML start-up, the cost to start an SGML-based system can be high. The costs are associated with the need for an in-depth up-front document analysis, the new tools that need to be purchased, retraining for new processes and new tools, and the cost of converting into SGML any existing data you wish to use.

While the start-up costs are higher when we compare SGML to HTML, the return on investment is very high with SGML. Improvements in author productivity have been reported between 30% and 50% because of the shift from focusing on format to focusing on content. Redundant authoring is also eliminated which reduces unnecessary time and costs. One company reported their average page took 8 hours to author, but only 5 minutes to search for and retrieve for reuse in their new system. Reuse or recycling of information can be quite significant. Another company reported that on the average 80% of any given publication was information that appeared in other publications.

Because redundant authoring is minimized, author productivity is improved, and concurrent processing is supported, reductions in total production times of 20% to 75% have been reported. This can be quite significant if time to market is critical for your company or readiness is important to your customer. If the lifespan of your data is moderate to long, then the elimination of conversion costs as you change/upgrade hardware or software becomes a significant contributor to a high rate of return on your initial investment.

With SGML data, the cost to produce alternative output types is low because it is supported by a large number of delivery tool providers and because it is easy to automate production. For example, a large European company reported their CD-ROM production went from 2 to 4 weeks to 1 to 2 days.

To produce HTML is an on-the-fly transformation process (HTML is a very small, flat DTD) and you can automatically apply formatting and keep together requirements for lights-out printing. For example, one site was able to eliminate their need to review printed output page-by-page when they implemented an SGML-based publishing system.

Because SGML mark-up is meta data (information about your information) in a neutral format, your data is optimized for future output needs.

The value to your customers is extremely high. Because you can add an unlimited variety of information about your information to your information, SGML is supported by knowledge-based systems, such as those that can deliver information based on skill level, security level, and past searches. In addition, this meta data is what is used to keep information together under specified conditions, which minimizes liability.

A single source of reusable information objects results in more consistent, accurate data being available to your customers. And, it makes it easier for you to configure publications to meet your customers' unique requirements.

Comparing the Two Approaches

The following table summarizes the differences between the HTML and SGML approaches:

  HTML SGML
Start-up Costs Low High
Return on Investment Low High
Cost to Produce Alternative Output High Low
Value to Customer Moderate High

Mulberry Home Page
Mulberry home page