Generates audio for a full-text dtbook file. Makes the input file and generated audio ready for se_tpb_filesetcreator.
This transformer is able to manipulate its input before it is passed to the tts system. The text is extracted from the document as xml fragments and xslt can be used on a sync point level. Arbitrary unicode codepoints can be replaced by user-defined strings, and it is also possible to use regular expressions in a search-replace-manner.
This transformer is using a 2-pass approach, i.e. first reading though the input file, extracting xml fragments to pass to the the different TTS-systems, and then reads through the file once more to collect the generated audio files.
Regardless the audio kind, attributes will be placed on elements representing synch points. Those attributes are smil:clipBegin, smil:clipEnd and smil:src with namespace URI http://www.w3.org/2001/SMIL20/.
This transformer is written to work with a manuscript, that is a dtbook-2005-1 or dtbook-2005-2 document possibly enriched with elements and attributes from other namespaces. The input document must be "synch point normalized", see se_tpb_syncPointNormalizer for such transformation.
Some elements are supposed to be announced audible. Those elements must have attributes holding the say-before and say-after text strings. se_tpb_annonsator can be used to add those attributes to a dtbook document. Since those attribute names are configurable, make sure they match whatever se_tpb_annonsator uses.
Given the expected input the transformer outputs a manuscript, that is a dtbook-2005-1 or dtbook-2005-2 document with additional attributes indicating corresponding audio. Those attributes, clipBegin, clipEnd and src, namespace URI http://www.w3.org/2001/SMIL20/, point out which elements should be represented by audio in the generated talking book. Output also includes the generated audio files referrenced by the smil-attributes.
sent-level synchronization should be used, although configurable. Other usage has not been tested.
No specific recovery scheme. On error, this transformer will send a fatal message, then throw an exception and abort.
The file pointed to by the tdf variable sgConfig provides the possibility to affect the processing of the document. Things like on which elements to synch, merge audio and so on, are configured there. A description of the possibilities follows together with a short example:
An example follows:
<?xml version="1.0" encoding="utf-8"?>
<sgConfig>
<absoluteSynch>
<item>pagenum</item>
<item>noteref</item>
<item>annoref</item>
<item>linenum</item>
</absoluteSynch>
<containsSynch>
<item>sent</item>
</containsSynch>
<announceAttributes>
<item id="before" uri="http://www.daisy.org/ns/pipeline/annon" prefix="annon" local="before"/>
<item id="after" uri="http://www.daisy.org/ns/pipeline/annon" prefix="annon" local="after"/>
</announceAttributes>
<mergeAudio>
<item>h1</item>
<item>h2</item>
<item>h3</item>
<item>h4</item>
<item>h5</item>
<item>h6</item>
<item>level/hd</item>
</mergeAudio>
<silence>
<afterLast>2000</afterLast>
<afterFirst>800</afterFirst>
<beforeAnnouncement>300</beforeAnnouncement>
<afterAnnouncement>300</afterAnnouncement>
<afterRegularPhrase>200</afterRegularPhrase>
</silence>
</sgConfig>
se_tpb_speechgenerator uses a simple factory/builder to get hold of TTS implementations. The factory must be configured properly since it is not able to locate TTS systems on its own. The configuration consists of sections that are operating system specific. As subsections, there are language specific sections. Each language must contain no more than one TTS system. During runtime, the TTS Builder configuration file is validated using relaxng and schematron, but since a DTD is a compact way of showing a document's structure, here's one:
<!DOCTYPE ttsbuilder [
<!ELEMENT ttsbuilder (os+)>
<!ELEMENT os (property*, lang*)>
<!ELEMENT property (EMPTY)>
<!ELEMENT lang (tts)>
<!ELEMENT tts (param+)>
<!ELEMENT param (EMPTY)>
<!ATTLIST property name CDATA #REQUIRED>
<!ATTLIST property match CDATA #REQUIRED>
<!ATTLIST lang lang CDATA #REQUIRED>
<!ATTLIST tts default (true) CDATA #IMPLIED>
<!ATTLIST tts instances CDATA #IMPLIED>
<!ATTLIST param name CDATA #REQUIRED>
<!ATTLIST param value CDATA #REQUIRED>
]>
Besides the rules expressible in a DTD, there are a few others, asserted using schematron:
Configuration of a TTS mainly consists of parameters for a certain TTS wrapper, such as Java class name or command to run a TTS program. Each TTS system needs its own Java-wrapper, and hence their configuration can differ extensively. The wrapper communicate with the TTS system of your choice. The properties read from the TTS Builder Configuration are passed to the TTS Java wrapper constructor together with a some utility functions wrapped together in the class se_tpb_speechgen2.tts.TTSUtils. The TTSUtils instance will also have a look at some configuration parameters to be able to provide desired functionality, e.g. regex filtering, character substitution and so on. TTSUtils will look at parameters described below. After that, it's up to the wrapper to decide what to do with remaining parameters. This gives a developer great possibilities when it comes to creating a TTS wrapper and its configuration.
TTSUtils will treat parameters as follows:
An example of the configuration follows:
<?xml version="1.0" encoding="UTF-8"?>
<!-- the Java class parameter must be supplied -->
<!-- ${transformer_dir} variable will be evaluated to the directory where se_tpb_speechgenerator resides. -->
<ttsbuilder>
<!--******************************************************************************
Windows
*******************************************************************************-->
<os>
<!-- all properties must match java's System.getProperties()-properties.
Standard regex match for an os to be usable in this program. -->
<property name="os.name" match="[Ww]indows.*" />
<lang lang="en">
<!-- since xml:lang determines which tts to use when in
this program, provide only one tts per language! -->
<!-- this is configuration for one tts impl. the "default" attribute
should be set to true for one configuration for each os. -->
<tts default="true">
<!-- the Java class name -->
<param name="class" value="se_tpb_speechgen2.tts.adapters.LocalStreamTTS"/>
<!-- the binary SAPI-talking program used for tts conversion -->
<param
name="command"
value="${transformer_dir}/tts/SimpleCommandLineTTS/SimpleCommandLineTTS.exe"/>
<!-- an xml file containing simple search-replace regex rules. -->
<param name="regex" value="${transformer_dir}/regex/general.xml"/>
<!-- xslt applied on each synchpoint -->
<param name="xslt" value="${transformer_dir}/xslt/transform.xsl"/>
<!-- an xml file containing simple search-replace regex rules.
Those rules specifically replaces years in digits with text. -->
<param name="yearFilename" value="${transformer_dir}/config/year_en.xml"/>
<!-- SAPI specific parameter: The value will be used to embed the text in
SAPI's xml-like way. This value will result in the following tags
surrounding the input text:
<voice optional="Gender=Male"></voice>
Where the starting point is <voice optional=""></voice>.
More on SAPI xml codes:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/Whitepapers/WP_XML_TTS_Tutorial.asp
-->
<param name="sapiVoiceSelection" value="Gender=Male"/>
<!-- An ability to filter characters and replace them with custom strings. -->
<param
name="characterSubstitutionTables"
value="${transformer_dir}/character-translation-table.xml"/>
<!-- The encoding of the character translation table. -->
<param name="characterFallbackStates" value="fallbackToLatinTransliteration"/>
</tts>
</lang>
</os>
<!--******************************************************************************
Linux
*******************************************************************************-->
<os>
<property name="os.name" match="[Ll]inux.*" />
<lang lang="en">
<tts default="true">
<param name="class" value="se_tpb_speechgen2.tts.adapters.LocalStreamTTS"/>
<param name="regex" value="${transformer_dir}/regex/general.xml"/>
<param name="ttsProperties" value="${transformer_dir}/conf/loquendo.xml"/>
<param name="xslt" value="${transformer_dir}/xslt/loquendo-en.xsl"/>
<param name="year" value="${transformer_dir}/regex/year_en.xml"/>
<!-- character substitution choises -->
<param name="characterSubstitutionTables" value="${transformer_dir}/charsubst/character-translation-table.xml"/>
</tts>
</lang>
</os>
</ttsbuilder>
The transformer comes with two java TTS wrappers. One is named LocalStreamTTS and it works in a very simple way. It communicates with an external TTS program by the standard input and output streams. That is, it pipes to the external program's standard input stream first a filename, linebreak, and then- using 1 line - the phrase to be generated and written to the file pointed out. The external program generates the audio, writes it to the file, and then prints "OK" to its standard output stream. If the external program reads an empty line, it means it is time to exit. If the program does not print "OK" Narrator will stop.
If you need to use a TTS system that can not be used this way, it is possible to develop your own TTS Java wrapper. To do so, you develop a java class that implements the se_tpb_speechgen2.tts.TTSAdapter interface. The class should have a constructor taking to parameters, the first one an instance of se_tpb_speechgen2.tts.TTSUtils, and the other one a java.util.Map containing parameters from the configuration file. This lets you use the TTS system - and possible inter-process communication - of your choice. Once you have set up a proper TTS Builder Configuration your new TTS wrapper is ready to run.
May need to access some TTS system, which is not part of the Daisy Pipeline.
Martin Blomberg, TPB.
LGPL