Locales and I18N Text

A locale is an object that represents language and region (and variant).  A variant is user-defined, and may represent a dialect (but may be used in other ways, e.g., to distinguish between Windows and Mac platforms).  Examples:  en, en_US, and fr_CA.  The syntax is: “basename_language_country_variant”.  Note some browsers use hyphens instead of underscores.

The input and display of information is locale-sensitive: dates, times, numbers, currencies, percentages, and language.

One use of locales is when sorting Strings in a language-specific alphabet.  the normal compareTo method won’t work (just like equals), and thus java.util.Collections.sort doesn’t either.

The java.text.Collator class provides natural language comparisons.  Natural language comparisons depend upon locale-specific rules that determine the equality and ordering of characters in a particular writing system.  A collator knows that the character sequence M i c h è l e is equal to M i c h e ` l e in some situations, usually those in which natural language processing is important.  A Collator object can even understand several levels of character differences.  A collator has methods that control how precisely you want the comparison to be: Different letters (“w” and “p”) are considered primarily different.  The difference between accented and unaccented versions of the same letter (“o” and “ŏ”) are considered a secondary difference.  Different cases of the same letter (“A” and “a”) are tertiary differences.  So depending upon how you configure a Collator instance, you can consider the words “Michèle” and “Michele” to be equal.

To sort according to a locale you must also use a Collator object, custom built for the locale you want.  This object has a working compare method you can use:

Locale loc = ...;
Collator coll = Collator.getInstance( loc );
collator.setStrength( Collator.TERTIARY );
...
if ( coll.compare( a, b ) < 0 ...
...
Collections.sort( myList, coll );

Show ISO docs for country codes and language codes.

Locales are in java.util package.  There is a default locale set by the user (in Unix via the LANG environment variable; in Windows via the Regional Settings control panel).  (ex: Locale frenchCanadian = new Locale( "fr", "CA" );)  Class Locale has several useful methods including the country and language names (displayNames) and static methods for getting and setting the default Locale.  In addition, there is a large number of locales provided as static properties of class Locale for you to use.  (These include support for language specific names for days and months.)

If not using the default locale, you can request one from the user (via a “Language” menu perhaps), on the command line for stand-alone applications (via a DOS BAT file, or via a Unix shell script), or from PARAM tags in an Applet.

Formatting

The java.text package contains a number of classes used for formatting numeric and date information.  Last term we discussed DateFormat abstract class whose objects can be used to generate date, time, and date-and-time formatter objects, customized for a given locale.  (ex: 10000.23 ==> 10,000.23 or 1,0000.23 or 10.000,23 or ...)  These formatters know about two dozen locale conventions, all the major ones you might need for business.  However there is a way to specify new formats for numbers and dates (see also SimpleDateFormat).

These formatting objects generally have a format method to produce output, and a parse method to read a String according to the formatting conventions.

In addition to DateFormat there is Calendar, GregorianCalendar, DateFormatSymbols classes to help format dates and times:

DateFormatSymbols dfs = new DateFormatSymbols( Locale.FRANCE );
String [] monthNames = dfs.getMonths();
Calendar cal = Calendar.getInstance( Locale.FRANCE );
System.out.println( monthNames[ cal.get(cal.DAY_OF_MONTH) ] );

It is possible to set a local timezone from the locale as well, to ensure dates and times display the way the user expects.  Note setting the timezone doesn’t affect the date or time, merely the way it is displayed.  (So dates and times for network wide resources can be compared correctly and displayed correctly too.

Numbers are formatted using NumberFormat:

NumberFormat nf = NumberFormat.getInstance(Locale.FRENCH);
String val = nf.format( 10000.22 );
Long lObj = nf.parse( "1,000" );  long l = lObj.longValue();

You can define custom number formats with class DecimalFormat.  To format numbers as percentages (i.e., “0.53” as “53%”) use getPercentInstance, to format as a currency use getCurrencyInstance.

There are two other formatters (Choice and Message) discussed later.

Finally, this package java.text contains class Collator and BreakIterator.  A Collator can be used to compare and sort strings in a locale-sensitive way, and deal with capitalization and accents.  (e.g., Spanish: a, b, c, ch, d, e, ...)  A BreakIterator can be used to find character, word, line, and sentence boundaries (not so easy in some languages!).

ResourceBundles and property files

ResourceBundles are classes that convert a key to an internationalized value.  The values are put in .class files that are named for their locales, for instance Msgs_en_US.class.  To use a resource bundle, you use code like this:

ResourceBundle rb = ResourceBundle.getBundle("Msgs", locale);

The system locates the correct resource bundle by constructing the class name from the supplied basename (“Msgs”) and the requested locale (or default locale).  Then the individual localized values are fetched this way:

String s = rb.getString( "key" );
int population =
   ( (Integer) rb.getObject( "key" ) ).intValue();

The set of keys should be the same for all supported ResourceBundles.  If a key isn’t found a MissingResourceException is thrown.  The key names must be Strings, the values can be anything but usually are translated strings.

ResourceBundle classes can be created in a number of ways, but the to easiest and most common are to extend ListResourceBundle and override the getContents method, or to create a text properties file and let the system create a ResourceBundle class automatically.  (Both these classes extend the abstract base class ResourceBundle.)

The properties files are easier to work with but have a limitation: The values must be ISO Latin I text (not even Unicode, so use the naitive2ascii tool).  The name of the file is the same as for the class, except the extension is “.properties”:

bundleName_localeLanguageName_localeLocation_locale_variant.  (All but the bundlename are optional; the system searches for the most specific one it can find.  The contents of a property file is a set of lines in the form key = value.  Blank lines and lines starting with “#” (comments) are ignored.  Spaces are allowed around the “=”.  For resources that aren’t strings (such as icons), it is easier to extend ListResourceBundle.

The system locates all relevant resource bundle classes and builds a hierarchy to use in the lookup algorithm.  So you don’t need to put every key in every resource bundle; you can instead create defaults and over-ride them in more specific resource bundles.  When looking for a specific bundle, the system first looks for a class, then a properties file if no class is found.  Suppose the default locale is en_US and the requested locale is fr_FR.  Then the lookup will search for bundles in this order (the variant is ignored to simply):

          Msgs_fr_FR.class
    Msgs_fr_FR.properties

    Msgs_fr.class
    Msgs_fr.properties

    Msgs_en_US.class
    Msgs_en_US.properties

    Msgs_en.class
    Msgs_en.properties

    Msgs.class
    Msgs.properties

It is important to have a default (Msgs) bundle, and also any intervening bundles even if empty.  (For example if use supply Msgs_fr_FR you must supply Msgs_fr as well, and should supply Msgs too, or the lookup algorithm may not work correctly.)

Java 6 will allow the use of XML property files and other new features using the ResourceBundle.Control class.  You  can use XML this way:

ResourceBundle bundle = ResourceBundle.getBundle(
  "Test2", new XMLResourceBundleControl() );
String string = bundle.getString( "HelpKey" );
System.out.println( "HelpKey: " + string );

and the XML file Test2.xml looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM
     "http://java.sun.com/dtd/properties.dtd">
<properties>
  <entry key="OkKey">OK</entry>
    <entry key="CancelKey">Cancel</entry>
    <entry key="HelpKey">Help</entry>
    <entry key="YesKey">Yes</entry>
    <entry key="NoKey">No</entry>
 </properties>

Messages and Choices

When translating text some problems can arise.  The position of numbers and other values within text changes from language to language.  The other problem is in dealing with plurals.

A MessageFormat works a bit like C’s printf statement: a string with placeholders for the values is given (called the pattern), along with a array of values to substitute.  The MessageFormat class then formats the values for the proper locale and substitutes them in the string in the right places.  The string can be supplied from a resource bundle.  For example, the en_US string may look like “You have {0} files older than {1} days”.

The placeholders may just indicate which value to use “{0}” in which case the value is a String, or it may indicate the type (and optionally the format) of the value, as in “{1,number,$'#',##}” will produce a number format with the pound-sign quoted, with a result such as: “$#31,45”.  Some other possibilities include “{1,date|time|number|choice,format}” where format for dates and times can be short|medium|full|long|customFormat, for numbers can be currency|percent|integer|customFormat, and for choices must be a choice format.

A ChoiceFormat is used to convert values to strings.  This is done to convert day of week numbers into day name strings, or to handle plurals:  “There are no seats available”, “There is 1 seat available”, “There are {0} seats available”.  The strings used can be kept in a resource bundle.  See Javadocs API for examples.

See IGreet.java (ListResourceBundle demo) and Stocks.java (Property file demo).

Note:  For text output use:

String sep = System.getProperty("line.separator");

As of 2005, everyone on Earth uses ISO-216 paper sizes except the US and Canada.  Every country has adopted the metric system except for the US, Liberia, and Myanmar (formally Burma).   See web resource.