A locale is a definition of language (and encoding, e.g. UTF-8), time, currency, and other number formats, that vary by language and geographical region. Related formats are grouped into categories. *nix systems include a number of environment variables (one per category) you can use to pick these data formats, by specifying a locale for each.
The settings in a locale reflect a language's and geographic region's
(i.e., country's or territory's) cultural rules for formatting data.
A locale name looks like
.
For example lang[_region][.encoding][@variant]
.
Only the en_US.utf8
lang
part is required.
(or POSIX
)
locales are always defined, but others may or may not be defined (installed)
on any given system.
A locale can also be an absolute pathname to a file produced by the
C
localedef
utility.
The POSIX categories and the environment variables for each are:
NLSPATH
.)
(Additional categories such as LC_ADDRESS
or LC_PAPER
may be available on some
systems.)
If some LC_*
variable is not set, the value of
LANG
is used to define its locale.
If LC_ALL
is set, that value
over-rides any other LC_*
and
LANG
settings.
You can see the current values used for each category, and details
on each installed locale, by using the
locale
command.
To portably set your locale, it is best to set the
LC_ALL
environment variable to C
(or POSIX
).
Setting only (for example) LC_COLLATE
has
two problems:
it is ineffective if LC_ALL
is also set,
and it has undefined behavior if LC_CTYPE
(or LANG
if LC_CTYPE
is unset)
is set to an incompatible value.
For example, you get undefined behavior if LC_CTYPE
is
and
ja_JP.PCK
LC_COLLATE
is
.
en_US.UTF-8
Most shell scripts probably should set LC_ALL
to POSIX
at the top of the script.
(You may want to set TZ
to
UTC0
as well, especially for utilities
that record a date in the current timezone such as diff
and tar
.)
The standard utility iconv
can be used
to convert between (compatible) text encodings.
Use iconv -l
to list all available encodings
on your system.