Generates audio for a full-text dtbook file. The typical use is having a TTS system generate audio, but this is not a requirement. For example, a silent template audio file could be associated with each synch point in order to make an "empty" book ready for se_tpb_filesetcreator without the need for a possibly time consuming tts process.
Regardless the audio kind, attributes will be placed on elements representing synch points. Those attributes are smil:clipBegin, smil:clipEnd and smil:src with namespace URI http://www.w3.org/2001/SMIL20/.
This transformer is written to work with a manuscript, that is a dtbook-2005-1 or dtbook-2005-2 document possibly enriched with elements and attributes from other namespaces. The input document must be "synch point normalized", see se_tpb_syncPointNormalizer for such transformation.
Some elements are supposed to be announced audible. Those elements must have attributes holding the say-before and say-after text strings. se_tpb_annonsator can be used to add those attributes to a dtbook document. Since those attribute names are configurable, make sure they match whatever se_tpb_annonsator uses.
Given the expected input the transformer outputs a manuscript, that is a dtbook-2005-1/2 document enriched with, among others, attributes indicating corresponding audio. Those attributes, smil:clipBegin, smil:clipEnd and smil:src, namespace URI http://www.w3.org/2001/SMIL20/, point out which elements should be represented by audio in the generated talking book. Output also includes the generated audio files referrenced by the smil-attributes.
sent-level synchronization should be used, although configurable. Other usage has not been tested.
No specific recovery scheme. On error, this transformer will send a fatal message, then throw an exception and abort.
The file pointed to by the tdf variable sgConfig provides the possibility to affect the processing of the document. Things like on which elements to synch, merge audio and so on, are configured there. A description of the possibilities follows together with a short example:
An example follows:
<?xml version="1.0" encoding="utf-8"?> <sgConfig> <absoluteSynch> <item>pagenum</item> <item>noteref</item> <item>annoref</item> <item>linenum</item> </absoluteSynch> <containsSynch> <item>sent</item> </containsSynch> <announceAttributes> <item id="before" uri="http://www.daisy.org/ns/pipeline/annon" prefix="annon" local="before"/> <item id="after" uri="http://www.daisy.org/ns/pipeline/annon" prefix="annon" local="after"/> </announceAttributes> <mergeAudio> <item>h1</item> <item>h2</item> <item>h3</item> <item>h4</item> <item>h5</item> <item>h6</item> <item>level/hd</item> </mergeAudio> <silence> <afterLast>2000</afterLast> <afterFirst>800</afterFirst> <beforeAnnouncement>300</beforeAnnouncement> <afterAnnouncement>300</afterAnnouncement> <afterRegularPhrase>200</afterRegularPhrase> </silence> </sgConfig>
se_tpb_speechgenerator uses a simple factory to get hold of TTS implementations. The factory must be configured properly since it is not able to locate TTS systems on its own. The configuration consists of sections that are operating system specific. As subsections, there are language specific sections. Each language must contain no more than one TTS system. During runtime, the TTS Builder configuration file is validated using relaxng and schematron, but since a DTD is a compact way of showing a document's structure, here's one:
<!DOCTYPE ttsbuilder [ <!ELEMENT ttsbuilder (os+)> <!ELEMENT os (property*, lang*)> <!ELEMENT property (EMPTY)> <!ELEMENT lang (tts)> <!ELEMENT tts (param+)> <!ELEMENT param (EMPTY)> <!ATTLIST property name CDATA #REQUIRED> <!ATTLIST property match CDATA #REQUIRED> <!ATTLIST lang lang CDATA #REQUIRED> <!ATTLIST tts default (true) CDATA #IMPLIED> <!ATTLIST param name CDATA #REQUIRED> <!ATTLIST param value CDATA #REQUIRED> ]>
Besides the rules expressible in a DTD, there are a few others, asserted using schematron:
Configuration of a TTS mainly consists of parameters for a certain TTS wrapper, such as Java class name or path to binary TTS program. Each TTS system needs its own Java-wrapper, and hence their configuration can differ extensively. The wrapper communicate with the TTS system of your choice. The properties read from the TTS Builder Configuration are passed to the TTS Java wrapper constructor (if there is one taking a java.util.Map as parameter) and from there, it's up to the wrapper to decide what to do. This gives a developer great possibilities when it comes to creating a TTS wrapper and its configuration. If the Java wrapper extends se_tpb_speechgenerator.ExternalTTS, some functionality is available. By calling the void setParamMap(java.util.Map) the super class attempts to read the following parameters:
This will make calls to the following super class methods do something useful:
An example of the configuration follows:
<?xml version="1.0" encoding="UTF-8"?>
<!-- the Java class parameter must be supplied -->
<!-- ${transformer_dir} variable will be evaluated to the directory where se_tpb_speechgenerator resides. -->
<ttsbuilder>
<!--******************************************************************************
Windows
*******************************************************************************-->
<os>
<!-- all properties must match java's System.getProperties()-properties.
Standard regex match for an os to be usable in this program. -->
<property name="os.name" match="[Ww]indows.*" />
<lang lang="en">
<!-- since xml:lang determines which tts to use when in
this program, provide only one tts per language! -->
<!-- this is configuration for one tts impl. the "default" attribute
should be set to true for one configuration for each os. -->
<tts default="true">
<!-- the Java class name -->
<param name="class" value="se_tpb_speechgenerator.SAPIImpl"/>
<!-- the binary SAPI-talking program used for tts conversion -->
<param
name="binary"
value="${transformer_dir}/tts/SimpleCommandLineTTS/SimpleCommandLineTTS.exe"/>
<!-- an xml file containing simple search-replace regex rules. -->
<param name="generalRegexFilename" value="${transformer_dir}/regex/richard.xml"/>
<!-- book specific regexes, will be applied before "generalRegexFilename". -->
<param
name="specificRegexFilename"
value="${transformer_dir}/regex/someBookSpecific-re.xml"/>
<!-- xslt applied on each synchpoint -->
<param name="xsltFilename" value="${transformer_dir}/xslt/transform.xsl"/>
<!-- an xml file containing simple search-replace regex rules.
Those rules specifically replaces years in digits with text. -->
<param name="yearFilename" value="${transformer_dir}/config/year_en.xml"/>
<!-- SAPI specific parameter: The value will be used to embed the text in
SAPI's xml-like way. This value will result in the following tags
surrounding the input text:
<voice optional="Gender=Male"></voice>
Where the starting point is <voice optional=""></voice>.
More on SAPI xml codes:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/Whitepapers/WP_XML_TTS_Tutorial.asp
-->
<param name="sapiVoiceSelection" value="Gender=Male"/>
<!-- An ability to filter characters and replace them with custom strings. -->
<param
name="characterSubstitutionTables"
value="${transformer_dir}/character-translation-table.xml"/>
<!-- The encoding of the character translation table. -->
<param name="characterFallbackStates" value="fallbackToLatinTransliteration"/>
</tts>
</lang>
<lang lang="sv">
<tts>
<param name="class" value="se_tpb_speechgenerator.SAPIImpl"/>
<param name="binary" value="${transformer_dir}/tts/SimpleCommandLineTTS/SimpleCommandLineTTS.exe"/>
<param name="generalRegexFilename" value="${transformer_dir}/regex/richard.xml"/>
<param name="xsltFilename" value="${transformer_dir}/xslt/transform.xsl"/>
<param name="yearFilename" value="${transformer_dir}/config/year_se.xml"/>
<param name="sapiVoiceSelection" value="Language=41D"/>
<param name="characterSubstitutionTables" value="${transformer_dir}/character-translation-table.xml"/>
<param name="characterFallbackStates" value="fallbackToLatinTransliteration"/>
</tts>
</lang>
</os>
<!--******************************************************************************
Linux
*******************************************************************************-->
<os>
<property name="os.name" match="[Ll]inux.*" />
<lang lang="en">
<tts id="loquendo" default="true">
<param name="class" value="se_tpb_speechgenerator.LoquendoImpl"/>
<param name="binary" value="${transformer_dir}/../../../narratorLoquendo"/>
<param name="generalRegexFilename" value="${transformer_dir}/regex/richard.xml"/>
<param name="ttsProperties" value="${transformer_dir}/config/loquendo.xml"/>
<param name="xsltFilename" value="${transformer_dir}/xslt/loquendo-en.xsl"/>
<param name="yearFilename" value="${transformer_dir}/config/year_en.xml"/>
</tts>
</lang>
</os>
</ttsbuilder>
If you need to use a TTS system other than SAPI, you must develop your own TTS Java wrapper. One way of doing that is to develop a class from scratch implementing se_tpb_speechgenerator.TTS. But the easiest way is to extend se_tpb_speechgenerator.ExternalTTS. The class is abstract, leaving three methods left to implement:
The parameters configured in the TTS Builder Configuration will be passed to a constructor accepting a java.util.Map as a single parameter, otherwise they will be passed to the wrapper by a call to se_tpb_speechgenerator.ExternalTTS.setParamMap(java.util.Map). See the javadoc for more details. This lets you use the TTS system - and possible inter-process communication - of your choice. Once you have set up a proper TTS Builder Configuration your new TTS wrapper is ready to run.
At TPB we have been using a simple Java wrapper for the Loquendo TTS Linux version. Work has been made to make the TTS better for us. Some pre-processing rules have been developed using regexes, and those may come in handy for anyone using the SAPI version of the Loquendo TTS together with Narrator. Read more about what have been done: loquendo-preproc.html.
May need to access some TTS system, which is not part of the Daisy Pipeline.
Martin Blomberg, TPB.
LGPL