This script transforms a valid XHTML file in the canonical form into a valid DTBook file.
Transformation of a general HTML document to a DTBook document, is likely to be a two stage process.
The DAISY Consortium does not supply any tools for performing the first stage, as the process required probably will differ among organizations. For the second stage, the DAISY Consortium has developed the script described in this document.
h1 element.body element should be an h1 element.
Any child element placed before the first h1, will be ignored during transformation, and will not be present in the generated DTBook document.h1 to h6) must be child elements (in the XML sense of the term) of the
body element.
<body>
<div class="start-of-book">
<h1>The title</h1>
:
:
<h1>Content</h1>
:
:
</div>
<div class="main-stuff">
<h1>Chapter 1 How it all began</h1>
:
:
<h1>Chapter 2 How it continued</h1>
:
:
</div>
</body>
h3 can not be a h5 heading.
It must be a h4 heading, or one of h1, h2 or h3.
: .... some text in a paragraph.</p> <h3>A heading on level 3</h3> <h3>Another heading on level 3</h3> <p>Some more text in a paragraph .... :Note that the following is perfectly okay (and makes sense):
: .... some text in a paragraph.</p> <h3>A heading on level 3</h3> <h4>A subheading on level 4</h4> <p>Some more text in a paragraph .... :In the cases where a heading has no relevant following siblings before a heading on the same, or higher, level, a "dummy" paragraph is inserted in the generated DTBook document.
: .... some text in a paragraph.</p> <h3>A heading on level 3</h3> <h3>Another heading on level 3</h3> <p>Some more text in a paragraph .... :would be transformed into:
:
.... some text in a paragraph.</p>
</level3>
<level3>
<h3>A heading on level 3</h3>
<p class="dummy" />
</level3>
<level3>
<h3>Another heading on level 3</h3>
<p>Some more text in a paragraph ....
:
br element may not be child elements (in the XML sense of the term) of the body element.span element may not be child elements (in the XML sense of the term) of the body element,
unless the class attribute ...
span element is evaluated to be a part of an image group
(more details).
<body>
<h1>The title</h1>
<span class="sentence">This is a sentence,
and also a child of the body element.</span>
<span class="sentence">And so is this.</span>
:
:
</body>
Rather, you should use:
<body>
<h1>The title</h1>
<p>
<span class="sentence">This is a sentence,
and also a child of the body element.</span>
<span class="sentence">And so is this.</span>
</p>
:
:
</body>
span element, with a value for the class attribute starting with the string page-,
must have a text content that, when normalized, is suitable to form part of an id attribute value in the DTBook file.
<span class="page-normal">4</span>
<span class="page-normal">
89
</span>
<span class="page-front">xiv</span>
<span class="page-special">B-34</span>
are perfectly okay, and will result in the id values page-4, page-89,
page-xiv
and page-B-34, respectively, in the DTBook file.<span class="page-normal">page 4</span>does not comply with this requirement.
div and blockquote elements may not have br
or span as child elements (in the XML sense of the term).div and blockquote elements may not have text content.
This excludes markup such as:
<div>
This is some text before the picture.
<img src="fig01.png" alt="Map: Norway" />
This is some text after the picture.
</div>
Instead you should use:
<div>
<p>This is some text before the picture.</p>
<img src="fig01.png" alt="Map: Norway" />
<p>This is some text after the picture.</p>
</div>
or, perhaps better, skip the div element:
<p>This is some text before the picture.</p> <img src="fig01.png" alt="Map: Norway" /> <p>This is some text after the picture.</p>
It is generally recommended to have a markup with a very "flat" structure. One should especially avoid having block elements inside block elements, as in the following example:
<p>This is some text before the list.
<ul>
<li>The first list item.</li>
<li>This is the second and last item.</li>
</ul>
And this is the text after the list.</p>
Proper markup should rather be as follows:
<p>This is some text before the list.</p>
<ul>
<li>The first list item.</li>
<li>This is the second and last item.</li>
</ul>
<p>And this is the text after the list.</p>
A DTBook document that is hopefully valid. The output is automatically validated, so watch out for error reports.
The documents linked below are parts of the Transformer technical documentation. These are developer and systems-administrator centric documents.