I18n Transformer
http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Main
User Documentation

Transformers
Overview

Default
XSLT Transformer

Core
Fragment Extractor Transformer
I18n Transformer
Log Transformer
SQL Transformer
Filter Transformer
Read DOM Session Transformer
Write DOM Session Transformer
XInclude Transformer
CInclude Transformer

Optional
XT Transformer
LDAP Transformer

I18n Transformer

Developing and maintaining multi-language sites is a common problem for web developers. The usage of XML and XSL makes this task much more easier, especially with Cocoon's content, logic and presentation separation concept.

This approach for internationalization (further - i18n) of XML documents within Cocoon is based on a transformer - I18nTransformer , which uses XML dictionaries for all the i18n data. The namespace of i18n is defined as follows: xmlns:i18n="http://apache.org/cocoon/i18n/2.0"

The first implementation was developed by Lassi Immonen. In this implementation the syntax was changed according to the Infozone Group's i18n proposal (with small changes) and some new features were implemented.

Enhancements for number, date and time have been contributed by Michael Enke.

  • Name : i18n
  • Class: org.apache.cocoon.transformation.I18nTransformer
  • Cacheable: no.
Features supported

The following features are supported by the i18n transformer:

  • Text translation
  • Attribute translation
  • Param substitution
  • Substitution param translation
  • Date internationalization (New!)
  • Number internationalization (New!)
  • Locale support (New!)
  • A dictionary update and language addition automation stylesheet (New!)

A simple example of i18n:

<para title="first" name="article"  i18n:attr="title name">
  <i18n:text>This text will be translated.</i18n:text>
</para>

The text inside the <i18n:text> will be used as a key to find the translation in the dictionary. All attributes that are listed in the <i18n:attr> attribute also will be translated and their values will be used as dictionary keys.

Note This i18n approach was re-designed to implement i18n of dates, currencies, etc. Although the possibilities supported allow for complicated formatting, you will need to use XSP to achieve more flexibility in some cases.

Markup content for translation
Simple text translation

To translate some simple text we use the <i18n:text> tag:

<i18n:text>Text to be translated</i18n:text>

The text between the <i18n:text>-tags is used as a key to find the translation in the dictionary.

The 'i18n:key' attribute can be used to specify a special key for the dictionary. Normally, the text itself is used as the key to find the translation in the dictionary. If we specify the 'i18n:key' attribute this key is used to find the translation and the text itself is used as the default value, if no translation can be found.

<i18n:text i18n:key="key_text">Default value</i18n:text>

Note Maybe it would be better to have a possibility to use i18n:key in any element and not only in i18n:text? E.g.: <ul> <li i18n:key="Item1" /> <li i18n:key="Item2" /> ... </ul>

Translation with param substitution

To translate the text with param substitution the <i18n:translate> tag must be used. We can specify some <i18n:param>-tags which contain parameters. The values of these parameters will be inserted into the translated text, replacing placeholders. Placeholders have the following syntax: \{[0-9]+\}. An example:

    
<i18n:translate>
	<i18n:text>Some {0} was inserted {1}.</i18n:text>
	<i18n:param>text</i18n:param>
	<i18n:param>here</i18n:param>
</i18n:translate>

Now we want to translate this into German. First, the processor will look into the dictionary, we specified, for the string:

Some {0} was inserted {1}.

It finds the string and translates it to German:

Etwas {0} wurde {1} eingesetzt.

Now the processor will replace the parameters. {0} will be replaced with "text" and {1} with "here". This results in:

Etwas text wurde here eingesetzt.

As we see, it is sometimes necessary to translate the parameters as well, since "here" is not a German word and "text" should be written uppercase. This can simply be done by marking up the parameters with <i18n:text> again:

<i18n:translate>
	<i18n:text>Some {0} was inserted {1}.</i18n:text>
	<i18n:param><i18n:text>text</i18n:text></i18n:param>
	<i18n:param><i18n:text>here</i18n:text></i18n:param>
</i18n:translate>

Note Generally, it is not necessary for the text for param substitution to be translated. E.g., it can come from a database with predefined placeholders for i18n params and there is no need to use <i18n:text> for its translation.

Attributes

Additionally we can translate Attributes. This is very useful for HTML-forms since labels of buttons are set via an attribute in HTML. To translate attributes of a tag, add an additional attribute named 'i18n:attr' containing a list of attributes, which should be translated, separated by spaces. An example:

<INPUT type="submit" value="Submit" i18n:attr="value"/>

The attribute, which will be translated is 'value'. Parameter replacement is not available for attributes at this time.

Date, time and number formatting

To format dates according to the current locale use <i18n:date src-pattern="dd/MM/yyyy" pattern="dd:MMM:yyyy" value="01/01/2001" />. The 'src-pattern' attribute will be used to parse the 'value', then the date will be formatted according to the current locale using the format specified by 'pattern' attribute.

To format time for a locale (e.g. de_DE) use <i18n:time src-pattern="dd/MM/yyyy" locale="de_DE" value="01/01/2001" />. The 'src-pattern' and 'pattern' attribute may also contain 'short', 'medium', 'long' or 'full'. The date will be formatted according to this format.

To format date and time use <i18n:date-time />.

It is also possible to specify a src-locale: <i18n:date src-pattern="short" src-locale="en_US" locale="de_DE"> 12/24/01 </i18n:date> will result in 24.12.2001

A given real pattern and src-pattern (not short, medium, long, full) overwrites the locale and src-locale.

If no pattern was specified then the date will be formatted with the DateFormat.DEFAULT format (both date and time). If no value for the date is specified then the current date will be used. E.g.: <i18n:date/> will result in the current date, formatted with default localized pattern.

To format numbers in locale sensitive manner use <i18n:number pattern="0.##" value="2.0" />. This will be useful for Arabic, Indian, etc. number formatting. Additionally, currencies and percent formatting can be used. E.g.:

  • <i18n:number sub-type="currency" value="1703.74" /> will result in localized presentation of the value - $1,703.74 for US locale.
  • <i18n:number sub-type="int-currency" value="170374" /> will result in localized presentation of the value - $1,703.74 for US locale, 170374 for a currency without subunit.
  • <i18n:number sub-type="percent" value="1.2" /> will result in localized percent value - %120 for most of the locales.

Also, date and number formatting can be used with substitution params. Additional type attribute must be used with params to indicate the param type (date or number). Default type is string.

<i18n:translate>
  <i18n:text>
    You have to pay {0} for {1} pounds or {2} of your profit. Valid from {3}
  </i18n:text>
  <i18n:param type="number" sub-type="currency"
              pattern="$#,##0.00">102.5</i18n:param>
  <i18n:param type="number" value="2.5">
  <i18n:param type="number" sub-type="percent" value="0.10" />	
  <i18n:param type="date" pattern="dd-MMM-yy" />
</i18n:translate>

Result will be like this: You have to pay $102.5 for 2.5 pounds or 10% of your profit. Valid from 13-Jun-01

Dictionaries

Dictionaries contain the translations for the text to be translated. They consist of a list of entries, where each entry specifies the translation(s) for a key. An entry may contain the translation for various languages. An example:

<translations>
  <entry>
    <key>Some {0} was inserted {1}.</key>
    <translation lang="en">Some {0} was {1} inserted.</translation>		
    <translation lang="de">Etwas {0} wurde {1} eingesetzt.</translation>
  </entry>
</translations>

For each text, we want to translate, we must provide a key, where the key is either text as we have written it in the document or the value of the 'i18n:key' attribute. The key must be written exactly like in the document, including spaces, linefeeds, etc.

Then we must enter a translation for the text with the <translation>-tag, where the 'lang'-attribute specifies the language of the translated text. If the text contains placeholders, they'll be replaced at the correct places in the translation with the given parameters.

How to migrate from the old I18nTransformer

Dictionary structure remained the same, so old dictionaries can be used. Previous <i:tr> tags are renamed to <i18n:text>. (The namespace prefix is not important, you can choose any you like).

The old transformer supported translation of any tag using its text value as the key:

<elem i18n:tr="y">This text will be translated.</elem>
				

You have to change that for the new transformer like this:

<elem><i18n:text>This text will be translated.</i18n:text></elem>
				

There was a possibility in the old transformer for choosing image paths depending on the language. Now you can achieve the same result by translating the 'src' attribute of img element.

Note I am not sure that image path translation in the old manner is possible without XSP, because the language code was used and not a dictionary. I'll add a feature for this kind of translation in the near future.

Sample
Sitemap configuration

To use I18nTransformer, it must be added to the sitemap:

<map:transformers default="xslt">
  <map:transformer name="i18n"
                   src="org.apache.cocoon.transformation.I18nTransformer"/>
</map:transformers>

Then, a match must be declared, something like this:

<map:match pattern="file">
  <map:generate src="{1}"/>
  <map:transform type="i18n">
    <parameter name="available_lang_1" value="en"/>
    <parameter name="available_lang_2" value="ru"/>
    <parameter name="src" value="translations/dictionary.xml"/>
  </map:transform>
  <map:transform src="stylesheet.xsl"/>
  <map:serialize />
</map:match>
Simple i18n file

To use i18n pages you will need to declare the i18n namespace in your src files and wrap all i18n text by <i18n:text> tags. To translate attributes of an element, add an additional attribute named 'i18n:attr' containing a list of attributes, which should be translated, separated by spaces.

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:i18n="http://apache.org/cocoon/i18n/2.0">
  <elem title="main_title" i18n:attr="title">
    <i18n:text>Text to be translated</i18n:text>
  </elem>
</root>
				

A more interesting example of usage you can find in the samples/i18n directory.

Note To make attribute translation work the newer than 1.3.0 version of Xerces is needed, where the removeAttribute() bug is fixed.

Usage Pattern for Dictionary Generator Stylesheet

Description is given for a real world example: To correct/add Spanish translation in/to an existing dictionary:

Key generation

Generate a dictionary with keys and placeholders for Spanish translations. Optionally, for one of the languages existing translations can be kept. To do it set stylesheet params (manually in stylesheet or in command-line): mode = keys (indicates, that only keys must be in result) new-lang = es (language to be added) keep-lang = en (language to be kept in result, for convenience) Command line for Xalan (Of course, Xerces and Xalan must be in your classpath):

java org.apache.xalan.xslt.Process -IN simple_dict.xml -XSL merge.xsl \
-OUT simple_dict_es.xml -PARAM mode keys -PARAM new-lang es -PARAM keep-lang en

(Windows users: Do not enter '\' symbol, continue typing on the same line.)

This will create a file simple_dict_es.xml with entries, keys and placeholders.

Translation

Replace placeholders with translation according to the keys or original translations, if they were kept during generation.

Add to the original dictionary

(Note. This step will be unnecessary when multiple dictionary support will be implemented. Hope, this will be soon) Use the same stylesheet for this purpose with this params:

mode = merge
new-lang = es
new-dict = simple_dict_es.xml

Command line for Xalan:

java org.apache.xalan.xslt.Process -IN simple_dict.xml -XSL merge.xsl \
-OUT simple_dict_new.xml -PARAM mode merge -PARAM new-lang es \
-PARAM new-dict simple_dict_es.xml

(Windows users: Do not enter '\' symbol, continue typing on the same line.)

Finally
To be done
  • Multiple dictionary support
  • Dictionary import and include capabilities (like in XSLT)
  • Command line dictionary-from-source generation
  • Dictionary caching
Contacts

Feel free to contact for any comments and improvement ideas either directly Konstantin Piroumian or through the Cocoon Mail List.

Copyright © 1999-2002 The Apache Software Foundation. All Rights Reserved.