To transform a valid XHTML file in the canonical form into a valid DTBook file.
Transformation of a general HTML document to a DTBook document, is likely to be a two stage process.
The DAISY Consortium does not supply any tools for performing the first stage, as the process required probably will differ among organizations. For the second stage, the DAISY Consortium has developed the no_hks_xhtml2dtbook transformer.
h1 element.body element should be an h1 element.
Any child element placed before the first h1, will be ignored during transformation, and will not be present in the generated DTBook document.h1 to h6) must be child elements (in the XML sense of the term) of the
body element.
<body>
<div class="start-of-book">
<h1>The title</h1>
:
:
<h1>Content</h1>
:
:
</div>
<div class="main-stuff">
<h1>Chapter 1 How it all began</h1>
:
:
<h1>Chapter 2 How it continued</h1>
:
:
</div>
</body>
h3 can not be a h5 heading.
It must be a h4 heading, or one of h1, h2 or h3.
: .... some text in a paragraph.</p> <h3>A heading on level 3</h3> <h3>Another heading on level 3</h3> <p>Some more text in a paragraph .... :Note that the following is perfectly okay (and makes sense):
: .... some text in a paragraph.</p> <h3>A heading on level 3</h3> <h4>A subheading on level 4</h4> <p>Some more text in a paragraph .... :In the cases where a heading has no relevant following siblings before a heading on the same, or higher, level, a "dummy" paragraph is inserted in the generated DTBook document.
: .... some text in a paragraph.</p> <h3>A heading on level 3</h3> <h3>Another heading on level 3</h3> <p>Some more text in a paragraph .... :would be transformed into:
:
.... some text in a paragraph.</p>
</level3>
<level3>
<h3>A heading on level 3</h3>
<p class="dummy" />
</level3>
<level3>
<h3>Another heading on level 3</h3>
<p>Some more text in a paragraph ....
:
br element may not be child elements (in the XML sense of the term) of the body element.span element may not be child elements (in the XML sense of the term) of the body element,
unless the class attribute ...
span element is evaluated to be a part of an image group
(more details).
<body>
<h1>The title</h1>
<span class="sentence">This is a sentence, and also a child of the body element.</span>
<span class="sentence">And so is this.</span>
:
:
</body>
Rather, you should use:
<body>
<h1>The title</h1>
<p>
<span class="sentence">This is a sentence, and also a child of the body element.</span>
<span class="sentence">And so is this.</span>
</p>
:
:
</body>
span element, with a value for the class attribute starting with the string page-,
must have a text content that, when normalized, is suitable to form part of an id attribute value in the DTBook file.
<span class="page-normal">4</span>
<span class="page-normal">
89
</span>
<span class="page-front">xiv</span>
<span class="page-special">B-34</span>
are perfectly okay, and will result in the id values page-4, page-89,
page-xiv
and page-B-34, respectively, in the DTBook file.<span class="page-normal">page 4</span>does not comply with this requirement.
div and blockquote elements may not have br
or span as child elements (in the XML sense of the term).div and blockquote elements may not have text content.
This excludes markup such as:
<div>
This is some text before the picture.
<img src="fig01.png" alt="Map: Norway" />
This is some text after the picture.
</div>
Instead you should use:
<div>
<p>This is some text before the picture.</p>
<img src="fig01.png" alt="Map: Norway" />
<p>This is some text after the picture.</p>
</div>
or, perhaps better, skip the div element:
<p>This is some text before the picture.</p> <img src="fig01.png" alt="Map: Norway" /> <p>This is some text after the picture.</p>
It is generally recommended to have a markup with a very "flat" structure. One should especially avoid having block elements inside block elements, as in the following example:
<p>This is some text before the list.
<ul>
<li>The first list item.</li>
<li>This is the second and last item.</li>
</ul>
And this is the text after the list.</p>
Proper markup should rather be as follows:
<p>This is some text before the list.</p>
<ul>
<li>The first list item.</li>
<li>This is the second and last item.</li>
</ul>
<p>And this is the text after the list.</p>
For correct transformation from XHTML to DTBook, the following parameters must be given to the transformation style sheet:
| Parameter name | Default value | Comments |
|---|---|---|
uid |
[UID] |
The unique identifier for the publication. This parameter should be given a sensible value to be sure that the generated DTBook file has correct meta data. If using DAISY Pipeline to perform the transformation, the user will be offered the opportunity to specify the identifier. If the transformation is used by Pipeline as a part of a DAISY 2.02 to DAISY 3.0 DTB migration, DAISY Pipeline should be able to provide an identifier based on the DAISY 2.02 DTB. |
transformationMode |
standalone |
Used to define how the style sheet transforms the document. If this parameter is given the value
DTBmigration, transformation rules are used that are appropriate for a
migration of a DAISY 2.02 XHTML content to a Z39-86.2005 DTBook file. Any other value, will result in use of transformation rules suitable for a generic XHTML to DTBook converting process. |
title |
[DTB_TITLE] |
The title of the publication. This parameter should be given a sensible value to be sure that the generated DTBook file has correct meta data. If using DAISY Pipeline to perform the transformation, the user will be offered the opportunity to specify the title. If the transformation is used by Pipeline as a part of a DAISY 2.02 to DAISY 3.0 DTB migration, DAISY Pipeline should be able to provide a title based on the DAISY 2.02 DTB. |
cssURI |
[cssURI] |
The URI to the Cascading Style Sheet (CSS) to be used for the DTBook file. If this parameter is not specified, no reference will be made to a style sheet. |
transferDcMetadata |
false |
This parameter is only applicable if the transformation is used as a part of a DAISY 2.02 to DAISY 3.0 DTB migration. If the parameter is set to true, the transformer will try to transfer appropriate meta data from the DAISY 2.02 NCC file to the generated DTBook file. In this context, appropriate meta data is simply meta data with a name attribute value starting with
dc:. Note: the dc:title meta data is not handled
through this mechanism, as it is specified with the title parameter.
If this parameter is set to true, the parameter nccURI must be specified.
|
nccURI |
[nccURI] |
This parameter is only applicable if the transformation is used as a part of a DAISY 2.02 to DAISY 3.0 DTB migration,
and if the parameter transferDcMetadata is set to true.The parameter is used to specify the URI to the DAISY 2.02 NCC file, in order to facilitate meta data transferring as described above. If the transformation is used by Pipeline as a part of a DAISY 2.02 DTB to DAISY 3.0 migration, DAISY Pipeline should be able to provide a suitable value for this parameter. |
A DTBook file compliant with the DTBook 2005-2 DTD. The various elements in the XHTML file are handled according to the following information.
| XHTML element | Generated DTBook element | Comments |
|---|---|---|
head/meta |
head/meta |
|
head/title |
frontmatter/doctitle |
|
body |
book |
|
h1 to h6 |
<levelx>
<hx>....</hx>
:
:
</levelx>
|
h1 elements, the following rules apply:
|
span, with a value for the class attribute equal to
sentence. |
sent |
|
span, where the value for the class attribute starts
with
the string page-
|
pagenum |
|
span, where the value for the class attribute ends with
the string -prodnote. |
prodnote |
span element is evaluated to be a part of an image group (link to more information),
the following rules apply:
|
span, with a value for the class attribute equal to
noteref. |
noteref |
|
span, with a value for the class attribute equal to
caption, and the span element evaluated to be a part of an image group
(more information). |
<imggroup>
<img.../>
<caption>...</caption>
:
</imggroup>
|
|
span, with no class
attribute, or a value for the class attribute different from any the ones mentioned above. |
span |
|
div, with a value for the class attribute equal to
notebody. |
note |
|
div, with no class
attribute, or a value for the class attribute different from the one mentioned above. |
div |
|
img, where the element is evaluated to be a part of an image group. |
<imggroup>
<img.../>
:
:
</imggroup>
|
See section on image groups. |
img, where the element is evaluated not to be a part of an image group. |
img |
|
ol |
list |
|
ul |
list |
|
table, tr, td, th and col |
table, tr, td, th and col respectively |
|
p, blockquote, li, dl, dt, dd,
strong, em, sub, sup and
br
|
p, blockquote, li, dl, dt, dd,
strong, em, sub, sup and
br respectively |
|
a, where the value of the href attribute
does not contain the string .smil#.
|
a |
a elements are handled during transformation.
|
Other elements than the ones listed above will result in a comment in the DTBook file.
If, and only if, the style sheet input parameter transformationMode is given the value
DTBmigration, then for all XHTML elements listed above, the following rule applies:
If the XHTML element has an a element as a child (in the XML sense of the term),
and this a element has an href attribute with a
value containing the string .smil#, the value of the href attribute
is transferred to a smilref attribute for the DTBook element that results from the transformation of the XHTML element.
The a element will not be transformed in this process.
So the following piece of XHTML code:
<h2 id="baaw_0007"><a href="baaw0004.smil#baaw_0007">Section 1.1</a></h2> <span class="page-normal" id="baaw_0008"><a href="baaw0004.smil#baaw_0008">4</a></span> <span class="page-normal" id="baaw_0009"><a href="baaw0004.smil#baaw_0009">5</a></span> :would be transformed into:
<level2> <h2 id="baaw_0007" smilref="baaw0004.smil#baaw_0007">Section 1.1</h2> <pagenum page="normal" id="page-4" smilref="baaw0004.smil#baaw_0008">4</pagenum> <pagenum page="normal" id="page-5" smilref="baaw0004.smil#baaw_0009">5</pagenum> : </level2>
When an img element occurs in the input document,
an imggroup element is created,
if the img element
has one, or both, of the following elements:
span element, where the value for the class attribute ends with the string -prodnote.span element, where the value for the class attribute is equal to the string caption.
as the first following sibling(s) (in the XML sense of the term).
The img element, and whatever results from transformation of the two elements listed above, will be placed
in the imggroup element, and
appropriate values for id and imgref attributes are created for all elements in the image group.
As an example, the following markup in the XHTML:
:
<h2><a href="smil0026.smil#0001">Town halls in Norway</a></h2>
<p><a href="smil0026.smil#0002">This is the paragraph before the image, caption and description.</a></p>
<img id="fig04" src="file04.png" alt="Picture: The Oslo Town Hall" />
<span class="caption"><a href="smil0026.smil#0003">The town hall in Oslo, located close to the harbor,
is one of the largest brick buildings in the city.</a></span>
<span class="optional-prodnote"><a href="smil0026.smil#0004">A photography showing a rather large building with two towers.</a></span>
<p><a href="smil0026.smil#0005">This is the paragraph after the image, caption and description.</a></p>
:
will result in the following markup in the generated DTBook file (id and imgref values may differ):
:
<level2>
<h2 smilref="smil0026.smil#0001">Town halls in Norway</h2>
<p smilref="smil0026.smil#0002">This is the paragraph before the image, caption and description.</p>
<imggroup id="imggrp-d1e340">
<img src="file04.png" alt="Picture: The Oslo Town Hall" id="img-d1e340"/>
<caption imgref="img-d1e340" id="caption-d1e340" smilref="smil0026.smil#0003">The town hall in Oslo,
located close to the harbor, is one of the largest brick buildings in the city.</caption>
<prodnote render="optional" imgref="img-d1e340" id="pnote-d1e340"
smilref="smil0026.smil#0004">A photography showing a rather large building with two towers.</prodnote>
</imggroup>
<p smilref="smil0026.smil#0005">This is the paragraph after the image, caption and description.</p>
:
</level2>
:
However, the following, very similar, markup in the XHTML:
:
<h2><a href="smil0026.smil#0001">Town halls in Norway</a></h2>
<p><a href="smil0026.smil#0002">This is the paragraph before the image, caption and description.</a></p>
<img id="fig04" src="file04.png" alt="Picture: The Oslo Town Hall" />
<span class="caption"><a href="smil0026.smil#0003">The town hall in Oslo, located close to the harbor,
is one of the largest brick buildings in the city.</a></span>
<span class="prodnote"><a href="smil0026.smil#0004">A photography showing a rather large building with two towers.</a></span>
<p><a href="smil0026.smil#0005">This is the paragraph after the image, caption and description.</a></p>
:
will result in the following markup in the generated DTBook file (id and imgref values may differ):
:
<level2>
<h2 smilref="smil0026.smil#0001">Town halls in Norway</h2>
<p smilref="smil0026.smil#0002">This is the paragraph before the image, caption and description.</p>
<imggroup id="imggrp-d1e340">
<img src="file04.png" alt="Picture: The Oslo Town Hall" id="img-d1e340"/>
<caption imgref="img-d1e340" id="caption-d1e340" smilref="smil0026.smil#0003">The town hall in Oslo,
located close to the harbor, is one of the largest brick buildings in the city.</caption>
</imggroup>
<span class="prodnote" smilref="smil0026.smil#0004">A photography showing a rather large building with two towers.</span>
<p smilref="smil0026.smil#0005">This is the paragraph after the image, caption and description.</p>
:
</level2>
:
Note that this DTBook markup is in fact invalid. It is left as an exercise to the reader to trace the cause of this error back to the XHTML code.
On error, this transformer will send a fatal message, then throw an exception and abort.
LGPL