Home | History | Annotate | Download | only in library
      1 
      2 :mod:`locale` --- Internationalization services
      3 ===============================================
      4 
      5 .. module:: locale
      6    :synopsis: Internationalization services.
      7 .. moduleauthor:: Martin von Lwis <martin (a] v.loewis.de>
      8 .. sectionauthor:: Martin von Lwis <martin (a] v.loewis.de>
      9 
     10 
     11 The :mod:`locale` module opens access to the POSIX locale database and
     12 functionality. The POSIX locale mechanism allows programmers to deal with
     13 certain cultural issues in an application, without requiring the programmer to
     14 know all the specifics of each country where the software is executed.
     15 
     16 .. index:: module: _locale
     17 
     18 The :mod:`locale` module is implemented on top of the :mod:`_locale` module,
     19 which in turn uses an ANSI C locale implementation if available.
     20 
     21 The :mod:`locale` module defines the following exception and functions:
     22 
     23 
     24 .. exception:: Error
     25 
     26    Exception raised when the locale passed to :func:`setlocale` is not
     27    recognized.
     28 
     29 
     30 .. function:: setlocale(category[, locale])
     31 
     32    If *locale* is given and not ``None``, :func:`setlocale` modifies the locale
     33    setting for the *category*. The available categories are listed in the data
     34    description below. *locale* may be a string, or an iterable of two strings
     35    (language code and encoding). If it's an iterable, it's converted to a locale
     36    name using the locale aliasing engine. An empty string specifies the user's
     37    default settings. If the modification of the locale fails, the exception
     38    :exc:`Error` is raised. If successful, the new locale setting is returned.
     39 
     40    If *locale* is omitted or ``None``, the current setting for *category* is
     41    returned.
     42 
     43    :func:`setlocale` is not thread-safe on most systems. Applications typically
     44    start with a call of ::
     45 
     46       import locale
     47       locale.setlocale(locale.LC_ALL, '')
     48 
     49    This sets the locale for all categories to the user's default setting (typically
     50    specified in the :envvar:`LANG` environment variable).  If the locale is not
     51    changed thereafter, using multithreading should not cause problems.
     52 
     53    .. versionchanged:: 2.0
     54       Added support for iterable values of the *locale* parameter.
     55 
     56 
     57 .. function:: localeconv()
     58 
     59    Returns the database of the local conventions as a dictionary. This dictionary
     60    has the following strings as keys:
     61 
     62    .. tabularcolumns:: |l|l|L|
     63 
     64    +----------------------+-------------------------------------+--------------------------------+
     65    | Category             | Key                                 | Meaning                        |
     66    +======================+=====================================+================================+
     67    | :const:`LC_NUMERIC`  | ``'decimal_point'``                 | Decimal point character.       |
     68    +----------------------+-------------------------------------+--------------------------------+
     69    |                      | ``'grouping'``                      | Sequence of numbers specifying |
     70    |                      |                                     | which relative positions the   |
     71    |                      |                                     | ``'thousands_sep'`` is         |
     72    |                      |                                     | expected.  If the sequence is  |
     73    |                      |                                     | terminated with                |
     74    |                      |                                     | :const:`CHAR_MAX`, no further  |
     75    |                      |                                     | grouping is performed. If the  |
     76    |                      |                                     | sequence terminates with a     |
     77    |                      |                                     | ``0``,  the last group size is |
     78    |                      |                                     | repeatedly used.               |
     79    +----------------------+-------------------------------------+--------------------------------+
     80    |                      | ``'thousands_sep'``                 | Character used between groups. |
     81    +----------------------+-------------------------------------+--------------------------------+
     82    | :const:`LC_MONETARY` | ``'int_curr_symbol'``               | International currency symbol. |
     83    +----------------------+-------------------------------------+--------------------------------+
     84    |                      | ``'currency_symbol'``               | Local currency symbol.         |
     85    +----------------------+-------------------------------------+--------------------------------+
     86    |                      | ``'p_cs_precedes/n_cs_precedes'``   | Whether the currency symbol    |
     87    |                      |                                     | precedes the value (for        |
     88    |                      |                                     | positive resp. negative        |
     89    |                      |                                     | values).                       |
     90    +----------------------+-------------------------------------+--------------------------------+
     91    |                      | ``'p_sep_by_space/n_sep_by_space'`` | Whether the currency symbol is |
     92    |                      |                                     | separated from the value  by a |
     93    |                      |                                     | space (for positive resp.      |
     94    |                      |                                     | negative values).              |
     95    +----------------------+-------------------------------------+--------------------------------+
     96    |                      | ``'mon_decimal_point'``             | Decimal point used for         |
     97    |                      |                                     | monetary values.               |
     98    +----------------------+-------------------------------------+--------------------------------+
     99    |                      | ``'frac_digits'``                   | Number of fractional digits    |
    100    |                      |                                     | used in local formatting of    |
    101    |                      |                                     | monetary values.               |
    102    +----------------------+-------------------------------------+--------------------------------+
    103    |                      | ``'int_frac_digits'``               | Number of fractional digits    |
    104    |                      |                                     | used in international          |
    105    |                      |                                     | formatting of monetary values. |
    106    +----------------------+-------------------------------------+--------------------------------+
    107    |                      | ``'mon_thousands_sep'``             | Group separator used for       |
    108    |                      |                                     | monetary values.               |
    109    +----------------------+-------------------------------------+--------------------------------+
    110    |                      | ``'mon_grouping'``                  | Equivalent to ``'grouping'``,  |
    111    |                      |                                     | used for monetary values.      |
    112    +----------------------+-------------------------------------+--------------------------------+
    113    |                      | ``'positive_sign'``                 | Symbol used to annotate a      |
    114    |                      |                                     | positive monetary value.       |
    115    +----------------------+-------------------------------------+--------------------------------+
    116    |                      | ``'negative_sign'``                 | Symbol used to annotate a      |
    117    |                      |                                     | negative monetary value.       |
    118    +----------------------+-------------------------------------+--------------------------------+
    119    |                      | ``'p_sign_posn/n_sign_posn'``       | The position of the sign (for  |
    120    |                      |                                     | positive resp. negative        |
    121    |                      |                                     | values), see below.            |
    122    +----------------------+-------------------------------------+--------------------------------+
    123 
    124    All numeric values can be set to :const:`CHAR_MAX` to indicate that there is no
    125    value specified in this locale.
    126 
    127    The possible values for ``'p_sign_posn'`` and ``'n_sign_posn'`` are given below.
    128 
    129    +--------------+-----------------------------------------+
    130    | Value        | Explanation                             |
    131    +==============+=========================================+
    132    | ``0``        | Currency and value are surrounded by    |
    133    |              | parentheses.                            |
    134    +--------------+-----------------------------------------+
    135    | ``1``        | The sign should precede the value and   |
    136    |              | currency symbol.                        |
    137    +--------------+-----------------------------------------+
    138    | ``2``        | The sign should follow the value and    |
    139    |              | currency symbol.                        |
    140    +--------------+-----------------------------------------+
    141    | ``3``        | The sign should immediately precede the |
    142    |              | value.                                  |
    143    +--------------+-----------------------------------------+
    144    | ``4``        | The sign should immediately follow the  |
    145    |              | value.                                  |
    146    +--------------+-----------------------------------------+
    147    | ``CHAR_MAX`` | Nothing is specified in this locale.    |
    148    +--------------+-----------------------------------------+
    149 
    150 
    151 .. function:: nl_langinfo(option)
    152 
    153    Return some locale-specific information as a string.  This function is not
    154    available on all systems, and the set of possible options might also vary
    155    across platforms.  The possible argument values are numbers, for which
    156    symbolic constants are available in the locale module.
    157 
    158    The :func:`nl_langinfo` function accepts one of the following keys.  Most
    159    descriptions are taken from the corresponding description in the GNU C
    160    library.
    161 
    162    .. data:: CODESET
    163 
    164       Get a string with the name of the character encoding used in the
    165       selected locale.
    166 
    167    .. data:: D_T_FMT
    168 
    169       Get a string that can be used as a format string for :func:`time.strftime` to
    170       represent date and time in a locale-specific way.
    171 
    172    .. data:: D_FMT
    173 
    174       Get a string that can be used as a format string for :func:`time.strftime` to
    175       represent a date in a locale-specific way.
    176 
    177    .. data:: T_FMT
    178 
    179       Get a string that can be used as a format string for :func:`time.strftime` to
    180       represent a time in a locale-specific way.
    181 
    182    .. data:: T_FMT_AMPM
    183 
    184       Get a format string for :func:`time.strftime` to represent time in the am/pm
    185       format.
    186 
    187    .. data:: DAY_1 ... DAY_7
    188 
    189       Get the name of the n-th day of the week.
    190 
    191       .. note::
    192 
    193          This follows the US convention of :const:`DAY_1` being Sunday, not the
    194          international convention (ISO 8601) that Monday is the first day of the
    195          week.
    196 
    197    .. data:: ABDAY_1 ... ABDAY_7
    198 
    199       Get the abbreviated name of the n-th day of the week.
    200 
    201    .. data:: MON_1 ... MON_12
    202 
    203       Get the name of the n-th month.
    204 
    205    .. data:: ABMON_1 ... ABMON_12
    206 
    207       Get the abbreviated name of the n-th month.
    208 
    209    .. data:: RADIXCHAR
    210 
    211       Get the radix character (decimal dot, decimal comma, etc.).
    212 
    213    .. data:: THOUSEP
    214 
    215       Get the separator character for thousands (groups of three digits).
    216 
    217    .. data:: YESEXPR
    218 
    219       Get a regular expression that can be used with the regex function to
    220       recognize a positive response to a yes/no question.
    221 
    222       .. note::
    223 
    224          The expression is in the syntax suitable for the :c:func:`regex` function
    225          from the C library, which might differ from the syntax used in :mod:`re`.
    226 
    227    .. data:: NOEXPR
    228 
    229       Get a regular expression that can be used with the regex(3) function to
    230       recognize a negative response to a yes/no question.
    231 
    232    .. data:: CRNCYSTR
    233 
    234       Get the currency symbol, preceded by "-" if the symbol should appear before
    235       the value, "+" if the symbol should appear after the value, or "." if the
    236       symbol should replace the radix character.
    237 
    238    .. data:: ERA
    239 
    240       Get a string that represents the era used in the current locale.
    241 
    242       Most locales do not define this value.  An example of a locale which does
    243       define this value is the Japanese one.  In Japan, the traditional
    244       representation of dates includes the name of the era corresponding to the
    245       then-emperor's reign.
    246 
    247       Normally it should not be necessary to use this value directly. Specifying
    248       the ``E`` modifier in their format strings causes the :func:`time.strftime`
    249       function to use this information.  The format of the returned string is not
    250       specified, and therefore you should not assume knowledge of it on different
    251       systems.
    252 
    253    .. data:: ERA_D_T_FMT
    254 
    255       Get a format string for :func:`time.strftime` to represent date and time in a
    256       locale-specific era-based way.
    257 
    258    .. data:: ERA_D_FMT
    259 
    260       Get a format string for :func:`time.strftime` to represent a date in a
    261       locale-specific era-based way.
    262 
    263    .. data:: ERA_T_FMT
    264 
    265       Get a format string for :func:`time.strftime` to represent a time in a
    266       locale-specific era-based way.
    267 
    268    .. data:: ALT_DIGITS
    269 
    270       Get a representation of up to 100 values used to represent the values
    271       0 to 99.
    272 
    273 
    274 .. function:: getdefaultlocale([envvars])
    275 
    276    Tries to determine the default locale settings and returns them as a tuple of
    277    the form ``(language code, encoding)``.
    278 
    279    According to POSIX, a program which has not called ``setlocale(LC_ALL, '')``
    280    runs using the portable ``'C'`` locale.  Calling ``setlocale(LC_ALL, '')`` lets
    281    it use the default locale as defined by the :envvar:`LANG` variable.  Since we
    282    do not want to interfere with the current locale setting we thus emulate the
    283    behavior in the way described above.
    284 
    285    To maintain compatibility with other platforms, not only the :envvar:`LANG`
    286    variable is tested, but a list of variables given as envvars parameter.  The
    287    first found to be defined will be used.  *envvars* defaults to the search path
    288    used in GNU gettext; it must always contain the variable name ``LANG``.  The GNU
    289    gettext search path contains ``'LANGUAGE'``, ``'LC_ALL'``, ``'LC_CTYPE'``, and
    290    ``'LANG'``, in that order.
    291 
    292    Except for the code ``'C'``, the language code corresponds to :rfc:`1766`.
    293    *language code* and *encoding* may be ``None`` if their values cannot be
    294    determined.
    295 
    296    .. versionadded:: 2.0
    297 
    298 
    299 .. function:: getlocale([category])
    300 
    301    Returns the current setting for the given locale category as sequence containing
    302    *language code*, *encoding*. *category* may be one of the :const:`LC_\*` values
    303    except :const:`LC_ALL`.  It defaults to :const:`LC_CTYPE`.
    304 
    305    Except for the code ``'C'``, the language code corresponds to :rfc:`1766`.
    306    *language code* and *encoding* may be ``None`` if their values cannot be
    307    determined.
    308 
    309    .. versionadded:: 2.0
    310 
    311 
    312 .. function:: getpreferredencoding([do_setlocale])
    313 
    314    Return the encoding used for text data, according to user preferences.  User
    315    preferences are expressed differently on different systems, and might not be
    316    available programmatically on some systems, so this function only returns a
    317    guess.
    318 
    319    On some systems, it is necessary to invoke :func:`setlocale` to obtain the user
    320    preferences, so this function is not thread-safe. If invoking setlocale is not
    321    necessary or desired, *do_setlocale* should be set to ``False``.
    322 
    323    .. versionadded:: 2.3
    324 
    325 
    326 .. function:: normalize(localename)
    327 
    328    Returns a normalized locale code for the given locale name.  The returned locale
    329    code is formatted for use with :func:`setlocale`.  If normalization fails, the
    330    original name is returned unchanged.
    331 
    332    If the given encoding is not known, the function defaults to the default
    333    encoding for the locale code just like :func:`setlocale`.
    334 
    335    .. versionadded:: 2.0
    336 
    337 
    338 .. function:: resetlocale([category])
    339 
    340    Sets the locale for *category* to the default setting.
    341 
    342    The default setting is determined by calling :func:`getdefaultlocale`.
    343    *category* defaults to :const:`LC_ALL`.
    344 
    345    .. versionadded:: 2.0
    346 
    347 
    348 .. function:: strcoll(string1, string2)
    349 
    350    Compares two strings according to the current :const:`LC_COLLATE` setting. As
    351    any other compare function, returns a negative, or a positive value, or ``0``,
    352    depending on whether *string1* collates before or after *string2* or is equal to
    353    it.
    354 
    355 
    356 .. function:: strxfrm(string)
    357 
    358    .. index:: builtin: cmp
    359 
    360    Transforms a string to one that can be used for the built-in function
    361    :func:`cmp`, and still returns locale-aware results.  This function can be used
    362    when the same string is compared repeatedly, e.g. when collating a sequence of
    363    strings.
    364 
    365 
    366 .. function:: format(format, val[, grouping[, monetary]])
    367 
    368    Formats a number *val* according to the current :const:`LC_NUMERIC` setting.
    369    The format follows the conventions of the ``%`` operator.  For floating point
    370    values, the decimal point is modified if appropriate.  If *grouping* is true,
    371    also takes the grouping into account.
    372 
    373    If *monetary* is true, the conversion uses monetary thousands separator and
    374    grouping strings.
    375 
    376    Please note that this function will only work for exactly one %char specifier.
    377    For whole format strings, use :func:`format_string`.
    378 
    379    .. versionchanged:: 2.5
    380       Added the *monetary* parameter.
    381 
    382 
    383 .. function:: format_string(format, val[, grouping])
    384 
    385    Processes formatting specifiers as in ``format % val``, but takes the current
    386    locale settings into account.
    387 
    388    .. versionadded:: 2.5
    389 
    390 
    391 .. function:: currency(val[, symbol[, grouping[, international]]])
    392 
    393    Formats a number *val* according to the current :const:`LC_MONETARY` settings.
    394 
    395    The returned string includes the currency symbol if *symbol* is true, which is
    396    the default. If *grouping* is true (which is not the default), grouping is done
    397    with the value. If *international* is true (which is not the default), the
    398    international currency symbol is used.
    399 
    400    Note that this function will not work with the 'C' locale, so you have to set a
    401    locale via :func:`setlocale` first.
    402 
    403    .. versionadded:: 2.5
    404 
    405 
    406 .. function:: str(float)
    407 
    408    Formats a floating point number using the same format as the built-in function
    409    ``str(float)``, but takes the decimal point into account.
    410 
    411 
    412 .. function:: atof(string)
    413 
    414    Converts a string to a floating point number, following the :const:`LC_NUMERIC`
    415    settings.
    416 
    417 
    418 .. function:: atoi(string)
    419 
    420    Converts a string to an integer, following the :const:`LC_NUMERIC` conventions.
    421 
    422 
    423 .. data:: LC_CTYPE
    424 
    425    .. index:: module: string
    426 
    427    Locale category for the character type functions.  Depending on the settings of
    428    this category, the functions of module :mod:`string` dealing with case change
    429    their behaviour.
    430 
    431 
    432 .. data:: LC_COLLATE
    433 
    434    Locale category for sorting strings.  The functions :func:`strcoll` and
    435    :func:`strxfrm` of the :mod:`locale` module are affected.
    436 
    437 
    438 .. data:: LC_TIME
    439 
    440    Locale category for the formatting of time.  The function :func:`time.strftime`
    441    follows these conventions.
    442 
    443 
    444 .. data:: LC_MONETARY
    445 
    446    Locale category for formatting of monetary values.  The available options are
    447    available from the :func:`localeconv` function.
    448 
    449 
    450 .. data:: LC_MESSAGES
    451 
    452    Locale category for message display. Python currently does not support
    453    application specific locale-aware messages.  Messages displayed by the operating
    454    system, like those returned by :func:`os.strerror` might be affected by this
    455    category.
    456 
    457 
    458 .. data:: LC_NUMERIC
    459 
    460    Locale category for formatting numbers.  The functions :func:`.format`,
    461    :func:`atoi`, :func:`atof` and :func:`.str` of the :mod:`locale` module are
    462    affected by that category.  All other numeric formatting operations are not
    463    affected.
    464 
    465 
    466 .. data:: LC_ALL
    467 
    468    Combination of all locale settings.  If this flag is used when the locale is
    469    changed, setting the locale for all categories is attempted. If that fails for
    470    any category, no category is changed at all.  When the locale is retrieved using
    471    this flag, a string indicating the setting for all categories is returned. This
    472    string can be later used to restore the settings.
    473 
    474 
    475 .. data:: CHAR_MAX
    476 
    477    This is a symbolic constant used for different values returned by
    478    :func:`localeconv`.
    479 
    480 
    481 Example::
    482 
    483    >>> import locale
    484    >>> loc = locale.getlocale()  # get current locale
    485    # use German locale; name might vary with platform
    486    >>> locale.setlocale(locale.LC_ALL, 'de_DE')
    487    >>> locale.strcoll('f\xe4n', 'foo')  # compare a string containing an umlaut
    488    >>> locale.setlocale(locale.LC_ALL, '')   # use user's preferred locale
    489    >>> locale.setlocale(locale.LC_ALL, 'C')  # use default (C) locale
    490    >>> locale.setlocale(locale.LC_ALL, loc)  # restore saved locale
    491 
    492 
    493 Background, details, hints, tips and caveats
    494 --------------------------------------------
    495 
    496 The C standard defines the locale as a program-wide property that may be
    497 relatively expensive to change.  On top of that, some implementation are broken
    498 in such a way that frequent locale changes may cause core dumps.  This makes the
    499 locale somewhat painful to use correctly.
    500 
    501 Initially, when a program is started, the locale is the ``C`` locale, no matter
    502 what the user's preferred locale is.  The program must explicitly say that it
    503 wants the user's preferred locale settings by calling ``setlocale(LC_ALL, '')``.
    504 
    505 It is generally a bad idea to call :func:`setlocale` in some library routine,
    506 since as a side effect it affects the entire program.  Saving and restoring it
    507 is almost as bad: it is expensive and affects other threads that happen to run
    508 before the settings have been restored.
    509 
    510 If, when coding a module for general use, you need a locale independent version
    511 of an operation that is affected by the locale (such as :func:`string.lower`, or
    512 certain formats used with :func:`time.strftime`), you will have to find a way to
    513 do it without using the standard library routine.  Even better is convincing
    514 yourself that using locale settings is okay.  Only as a last resort should you
    515 document that your module is not compatible with non-\ ``C`` locale settings.
    516 
    517 .. index:: module: string
    518 
    519 The case conversion functions in the :mod:`string` module are affected by the
    520 locale settings.  When a call to the :func:`setlocale` function changes the
    521 :const:`LC_CTYPE` settings, the variables ``string.lowercase``,
    522 ``string.uppercase`` and ``string.letters`` are recalculated.  Note that code
    523 that uses these variable through ':keyword:`from` ... :keyword:`import` ...',
    524 e.g. ``from string import letters``, is not affected by subsequent
    525 :func:`setlocale` calls.
    526 
    527 The only way to perform numeric operations according to the locale is to use the
    528 special functions defined by this module: :func:`atof`, :func:`atoi`,
    529 :func:`.format`, :func:`.str`.
    530 
    531 
    532 .. _embedding-locale:
    533 
    534 For extension writers and programs that embed Python
    535 ----------------------------------------------------
    536 
    537 Extension modules should never call :func:`setlocale`, except to find out what
    538 the current locale is.  But since the return value can only be used portably to
    539 restore it, that is not very useful (except perhaps to find out whether or not
    540 the locale is ``C``).
    541 
    542 When Python code uses the :mod:`locale` module to change the locale, this also
    543 affects the embedding application.  If the embedding application doesn't want
    544 this to happen, it should remove the :mod:`_locale` extension module (which does
    545 all the work) from the table of built-in modules in the :file:`config.c` file,
    546 and make sure that the :mod:`_locale` module is not accessible as a shared
    547 library.
    548 
    549 
    550 .. _locale-gettext:
    551 
    552 Access to message catalogs
    553 --------------------------
    554 
    555 The locale module exposes the C library's gettext interface on systems that
    556 provide this interface.  It consists of the functions :func:`gettext`,
    557 :func:`dgettext`, :func:`dcgettext`, :func:`textdomain`, :func:`bindtextdomain`,
    558 and :func:`bind_textdomain_codeset`.  These are similar to the same functions in
    559 the :mod:`gettext` module, but use the C library's binary format for message
    560 catalogs, and the C library's search algorithms for locating message catalogs.
    561 
    562 Python applications should normally find no need to invoke these functions, and
    563 should use :mod:`gettext` instead.  A known exception to this rule are
    564 applications that link with additional C libraries which internally invoke
    565 :c:func:`gettext` or :func:`dcgettext`.  For these applications, it may be
    566 necessary to bind the text domain, so that the libraries can properly locate
    567 their message catalogs.
    568 
    569