Pipeline Script: XHTML to DTBook [BETA]

Overview

This script transforms a valid XHTML file in the canonical form into a valid DTBook file.

Input Requirements

A note on the two stage process for creating a DTBook file from an HTML document

Transformation of a general HTML document to a DTBook document, is likely to be a two stage process.

  1. The first stage will be to turn the HTML document into a canonical form of an XHTML document. Depending on the state of the HTML document, this will be a process consisting of both automatic and manual processing. Requirements for the canonical form are given below.
  2. The second stage is to create the DTBook document from the canonical XHTML. This will be done in a completely automatic XSLT 2.0 transformation process, normally controlled by the DAISY Pipeline.

The DAISY Consortium does not supply any tools for performing the first stage, as the process required probably will differ among organizations. For the second stage, the DAISY Consortium has developed the script described in this document.

Requirements for the canonical form of the XHTML document

It is generally recommended to have a markup with a very "flat" structure. One should especially avoid having block elements inside block elements, as in the following example:

<p>This is some text before the list.
    <ul>
        <li>The first list item.</li>
        <li>This is the second and last item.</li>
    </ul>
And this is the text after the list.</p>
Proper markup should rather be as follows:
<p>This is some text before the list.</p>
<ul>
    <li>The first list item.</li>
    <li>This is the second and last item.</li>
</ul>
<p>And this is the text after the list.</p>

Configuration

Input XHTML
Required. The input XHTML file to be converted.
Output DTBook
Required. The output DTBook file to be created.
Title
Optional. The title of the publication. If no value is supplied, the information is extracted from the source file, if possible.
dtb:uid
Optional. The publications unique identifier. If no value is supplied, the information is extracted from the source file, if possible.
CSS
Optional. The Cascading Style Sheet (CSS) to be referenced from the generated DTBook document.

Output

A DTBook document that is hopefully valid. The output is automatically validated, so watch out for error reports.

Appendix: List of Transformers used

The documents linked below are parts of the Transformer technical documentation. These are developer and systems-administrator centric documents.

  1. XHTML to DTBook
  2. Validator