DHQ to NLM/NCBI 2.3 Mapping Specification

Namespaces in the source and target formats

All elements in the source are in the DHQ namespace, http://digitalhumanities.org/DHQ/namespace, except as noted below (Creative Commons and RDF namespaces appearing in the header). In this map, no namespace prefix is used for DHQ elements (as is commonly the case in source documents); the stylesheet should use the dhq prefix for clarity (allowing the results to be in no namespace).

In the result, no MathML will appear, but the XLink namespace, http://www.w3.org/1999/xlink/, may occasionally be used. It can be declared at the top level to avoid repeating its declarations. The target format, NLM/NCBI, keeps its own elements in no namespace.

Article structure

DHQarticle becomes article. All articles will have at least article/front and article/body; article/back and article/floats will be created in some cases as described below.

Metadata

Metadata for the target format is all derived from the DHQheader element in the source. In all cases there will be front/journal-meta and front/article-meta elements, in that order, in the result.

Additionally, if there is a DHQheader/history/sourceDesc or DHQheader/history/revisionDesc, a front/notes element is created.

journal-meta

Inside journal-meta, create the following, in this order:

article-meta

Inside article-meta, create the following, in this order:

front/notes

As described above, a front/notes element is generated if either sourceDesc or revisionDesc appears inside DHQheader/history. Map them as follows:

Article body structures

DHQarticle/text becomes article/body.

Wherever it occurs (at any depth), div becomes sec. If a div/@type appears, it becomes sec/@sec-type.

Wherever it occurs, text/head or div/head becomes title in the corresponding body or sec.

In general, @id wherever it appears becomes @id on the corresponding target element. Report a warning where this is not possible.

Loose structural elements

epigraph becomes display-quote, with @content-type='epigraph'.

xtext becomes boxed-text, with @content-type="floating-text". (Note that boxed-text is not only for texts rendered in a box.) Its contents map as elsewhere, including head becoming title.

letter also becomes boxed-text, with @content-type='letter'.

opener passes through.

dateline becomes p with @content-type='dateline'.

salute becomes p with @content-type='salute'.

closer passes through.

signed becomes p with @content-type='signed'.

ps becomes sec with @sec-type='postscript'.

Miscellaneous paragraph-level elements

p becomes p.

cit becomes disp-quote with @content-type='citation'. Its quote child is processed as described elsewhere, and followed by a attrib, in which the ptr, ref or bibl at the end of the cit is inserted.

quote, when it has @rend='block, becomes disp-quote with @content-type='block-quote'. When the quote has text (not paragraph-level) children, a p should be included inside the disp-quote, wrapping the text.

When quote has @rend='inline', it becomes named-content with @content-type='quote'.

eg becomes preformat.

example becomes statement with @content-type='example'. example/label becomes statement/label.

lg becomes verse-group. If present, lg/label becomes verse-group/title.

l becomes verse-line.

sp becomes speech. If sp contains any elements other than speaker, stage or p, wrap them in p and throw a warning.

speaker becomes speaker.

stage becomes p with @content-type='stage direction'.

Lists, tables, figures and graphics

list becomes list. Its @type maps to @list-type as follows:

item becomes list-item.

table becomes table-wrap containing table. Assign the table/@id, if it exists, to the table-wrap not the table.

row becomes tr.

cell becomes td.

figure becomes fig. Its element contents are transformed as indicated below, but must processed in the following order:

graphic becomes graphic. Handle its attributes as follows:

mediaObject becomes media. Handle its attributes the same as for graphic, except that no accommodation need be made for @alt-url.

caption becomes caption. If caption has text contents, wrap them in p inside the result caption. p contents become p as elsewhere. As with sp, if other elements appear inside caption, they must be wrapped in their own p elements (as caption only permits p in the target format).

figDesc becomes alt-text.

label becomes label.

Inline elements

lb is allowed only inside quote when it has text (not paragraph-level) content. If it has an @n, the lb becomes named-content with @content-type='line break', with the value of @n as its content. Otherwise, the element is dropped.

q is dropped, but its content is prefixed with a literal “ [literal: tag as x[@x-type='archive']] and suffixed with a literal “ [literal: tag as x[@x-type='archive']] ("curly quotes")

note generates an xref with the note/@id as its @rid. (The note itself will be picked up in the back matter.)

emph becomes named-content with @content-type='emphasis'.

hi becomes one of the following, depending on the value of its @rend:

name becomes named-content with @content-type='name'.

term becomes named-content with @content-type='term'.

code becomes named-content with @content-type='code'.

called is dropped, but its content is prefixed with a literal “ [literal: tag as x[@x-type='archive']] and suffixed with a literal “ [literal: tag as x[@x-type='archive']] ("curly quotes")

foreign becomes named-content with @content-type='foreign'.

ref is handled differently depending on whether its @target indicates an internal link (by starting with '#') or an external link (otherwise):

ptr is handled exactly the same way ref is, except in addition to mapping to a target element, contents are generated for the resulting xref or ext-link, as follows:

gi becomes monospace; additionally, the value is prefixed with a '<' and suffixed with a '>'.

att, val, tag and class all become monospace.

bibl, when it has @type='reviewTarget', should be dropped (it is accounted for inside article-meta).

When bibl appears in text content, it is treated like note: an xref is generated, whose @rid names the bibl/@id. The bibl itself will be picked up in the back matter (see below).

Back matter structures

If any of figures, notes, listBibl or appendix appears inside DHQarticle, or if any bibl or note elements are used inside DHQarticle/text, article/back is created. Inside, in this order, generate the following:

Bibliography elements

bibl becomes a ref containing a citation. The bibl/@id is attached to the ref, not the ref. The contents of the bibl, with the exception of label, become the contents of the citation.

bibl/label becomes label on the parent ref (not on the citation resulting directly from the bibl).

vol becomes volume.

date is decomposed, on the basis of its @when, into day, month and year, as described above for creating pub-date. Note that at least year should be possible for any date/@when. If a date appears without a @when, instead create named-content with @content-type='date'.

author becomes name inside person-group. Contiguous authors should be gathered in a single person-group. Within name, the entire contents of author becomes surname.

editor becomes person inside person-group in the same way as author, except that @person-group-type='editor'.

title becomes source.

pubPlace becomes publisher-loc.

publisher becomes publisher-name.

idno becomes issn if it has @type='issn', isbn if it has @type='isbn', or object-id otherwise. In the last case, @type becomes @content-type.

price becomes named-content with @content-type='price'.

extent becomes page-range.