This script lets you manipulate the character repertoire of the XML documents in a fileset. Practically, this means to replace a certain character with one or several other characters.
Character repertoire manipulation is done for example when preparing an XML file for a specific output medium. One example is speech synthesizers, who typically doesnt recognize and correctly pronounce all characters in the Unicode repertoire. Another example is when an XML document is being prepared for Braille.
The manipulation process is multilayered. You can use tables that explicitly define replacement strings for a set of characters. You can also use generic Unicode-based transliteration routines. See further Configuration..
The character translation table with a mapping between characters and their replacement strings must comply to the xml format used in java.util.Properties. See http://java.sun.com/dtd/properties.dtd and java.util.Properties for details.
The key attribute of the entry element must be a hex value representing a unicode codepoint, and the entry element value an arbitrary length string of characters.
Example of replacement text table (this also exists as a real file (example-table.xml) in the transformer directory):
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>
This is an example of an input translation table for int_daisy_unicodeTranscoder.
The key attribute contains the hex codepoint to be translated,
and the entry text node the replacement string.
The entries match two hebrew characters and some other stuff.
The table can be built using: www.unicode.org/Public/UNIDATA/UnicodeData.txt
</comment>
<entry key="05E2">hebrew ayin</entry>
<entry key="05DD">hebrew final mem</entry>
<entry key="00A5">currency yen</entry>
<entry key="00AE">registered sign</entry>
</properties>
The documents linked below are parts of the Transformer technical documentation. These are developer and systems-administrator centric documents.