Home | History | Annotate | Download | only in c-api
      1 .. highlightlang:: c
      2 
      3 .. _unicodeobjects:
      4 
      5 Unicode Objects and Codecs
      6 --------------------------
      7 
      8 .. sectionauthor:: Marc-Andre Lemburg <mal (a] lemburg.com>
      9 
     10 Unicode Objects
     11 ^^^^^^^^^^^^^^^
     12 
     13 
     14 Unicode Type
     15 """"""""""""
     16 
     17 These are the basic Unicode object types used for the Unicode implementation in
     18 Python:
     19 
     20 
     21 .. c:type:: Py_UNICODE
     22 
     23    This type represents the storage type which is used by Python internally as
     24    basis for holding Unicode ordinals.  Python's default builds use a 16-bit type
     25    for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
     26    possible to build a UCS4 version of Python (most recent Linux distributions come
     27    with UCS4 builds of Python). These builds then use a 32-bit type for
     28    :c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
     29    where :c:type:`wchar_t` is available and compatible with the chosen Python
     30    Unicode build variant, :c:type:`Py_UNICODE` is a typedef alias for
     31    :c:type:`wchar_t` to enhance native platform compatibility. On all other
     32    platforms, :c:type:`Py_UNICODE` is a typedef alias for either :c:type:`unsigned
     33    short` (UCS2) or :c:type:`unsigned long` (UCS4).
     34 
     35 Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
     36 this in mind when writing extensions or interfaces.
     37 
     38 
     39 .. c:type:: PyUnicodeObject
     40 
     41    This subtype of :c:type:`PyObject` represents a Python Unicode object.
     42 
     43 
     44 .. c:var:: PyTypeObject PyUnicode_Type
     45 
     46    This instance of :c:type:`PyTypeObject` represents the Python Unicode type.  It
     47    is exposed to Python code as ``unicode`` and ``types.UnicodeType``.
     48 
     49 The following APIs are really C macros and can be used to do fast checks and to
     50 access internal read-only data of Unicode objects:
     51 
     52 
     53 .. c:function:: int PyUnicode_Check(PyObject *o)
     54 
     55    Return true if the object *o* is a Unicode object or an instance of a Unicode
     56    subtype.
     57 
     58    .. versionchanged:: 2.2
     59       Allowed subtypes to be accepted.
     60 
     61 
     62 .. c:function:: int PyUnicode_CheckExact(PyObject *o)
     63 
     64    Return true if the object *o* is a Unicode object, but not an instance of a
     65    subtype.
     66 
     67    .. versionadded:: 2.2
     68 
     69 
     70 .. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
     71 
     72    Return the size of the object.  *o* has to be a :c:type:`PyUnicodeObject` (not
     73    checked).
     74 
     75    .. versionchanged:: 2.5
     76       This function returned an :c:type:`int` type. This might require changes
     77       in your code for properly supporting 64-bit systems.
     78 
     79 
     80 .. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
     81 
     82    Return the size of the object's internal buffer in bytes.  *o* has to be a
     83    :c:type:`PyUnicodeObject` (not checked).
     84 
     85    .. versionchanged:: 2.5
     86       This function returned an :c:type:`int` type. This might require changes
     87       in your code for properly supporting 64-bit systems.
     88 
     89 
     90 .. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
     91 
     92    Return a pointer to the internal :c:type:`Py_UNICODE` buffer of the object.  *o*
     93    has to be a :c:type:`PyUnicodeObject` (not checked).
     94 
     95 
     96 .. c:function:: const char* PyUnicode_AS_DATA(PyObject *o)
     97 
     98    Return a pointer to the internal buffer of the object. *o* has to be a
     99    :c:type:`PyUnicodeObject` (not checked).
    100 
    101 
    102 .. c:function:: int PyUnicode_ClearFreeList()
    103 
    104    Clear the free list. Return the total number of freed items.
    105 
    106    .. versionadded:: 2.6
    107 
    108 
    109 Unicode Character Properties
    110 """"""""""""""""""""""""""""
    111 
    112 Unicode provides many different character properties. The most often needed ones
    113 are available through these macros which are mapped to C functions depending on
    114 the Python configuration.
    115 
    116 
    117 .. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
    118 
    119    Return ``1`` or ``0`` depending on whether *ch* is a whitespace character.
    120 
    121 
    122 .. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
    123 
    124    Return ``1`` or ``0`` depending on whether *ch* is a lowercase character.
    125 
    126 
    127 .. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
    128 
    129    Return ``1`` or ``0`` depending on whether *ch* is an uppercase character.
    130 
    131 
    132 .. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
    133 
    134    Return ``1`` or ``0`` depending on whether *ch* is a titlecase character.
    135 
    136 
    137 .. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
    138 
    139    Return ``1`` or ``0`` depending on whether *ch* is a linebreak character.
    140 
    141 
    142 .. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
    143 
    144    Return ``1`` or ``0`` depending on whether *ch* is a decimal character.
    145 
    146 
    147 .. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
    148 
    149    Return ``1`` or ``0`` depending on whether *ch* is a digit character.
    150 
    151 
    152 .. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
    153 
    154    Return ``1`` or ``0`` depending on whether *ch* is a numeric character.
    155 
    156 
    157 .. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
    158 
    159    Return ``1`` or ``0`` depending on whether *ch* is an alphabetic character.
    160 
    161 
    162 .. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
    163 
    164    Return ``1`` or ``0`` depending on whether *ch* is an alphanumeric character.
    165 
    166 These APIs can be used for fast direct character conversions:
    167 
    168 
    169 .. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
    170 
    171    Return the character *ch* converted to lower case.
    172 
    173 
    174 .. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
    175 
    176    Return the character *ch* converted to upper case.
    177 
    178 
    179 .. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
    180 
    181    Return the character *ch* converted to title case.
    182 
    183 
    184 .. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
    185 
    186    Return the character *ch* converted to a decimal positive integer.  Return
    187    ``-1`` if this is not possible.  This macro does not raise exceptions.
    188 
    189 
    190 .. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
    191 
    192    Return the character *ch* converted to a single digit integer. Return ``-1`` if
    193    this is not possible.  This macro does not raise exceptions.
    194 
    195 
    196 .. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
    197 
    198    Return the character *ch* converted to a double. Return ``-1.0`` if this is not
    199    possible.  This macro does not raise exceptions.
    200 
    201 
    202 Plain Py_UNICODE
    203 """"""""""""""""
    204 
    205 To create Unicode objects and access their basic sequence properties, use these
    206 APIs:
    207 
    208 
    209 .. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
    210 
    211    Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
    212    may be *NULL* which causes the contents to be undefined. It is the user's
    213    responsibility to fill in the needed data.  The buffer is copied into the new
    214    object. If the buffer is not *NULL*, the return value might be a shared object.
    215    Therefore, modification of the resulting Unicode object is only allowed when *u*
    216    is *NULL*.
    217 
    218    .. versionchanged:: 2.5
    219       This function used an :c:type:`int` type for *size*. This might require
    220       changes in your code for properly supporting 64-bit systems.
    221 
    222 
    223 .. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
    224 
    225    Create a Unicode object from the char buffer *u*.  The bytes will be interpreted
    226    as being UTF-8 encoded.  *u* may also be *NULL* which
    227    causes the contents to be undefined. It is the user's responsibility to fill in
    228    the needed data.  The buffer is copied into the new object. If the buffer is not
    229    *NULL*, the return value might be a shared object. Therefore, modification of
    230    the resulting Unicode object is only allowed when *u* is *NULL*.
    231 
    232    .. versionadded:: 2.6
    233 
    234 
    235 .. c:function:: PyObject *PyUnicode_FromString(const char *u)
    236 
    237    Create a Unicode object from a UTF-8 encoded null-terminated char buffer
    238    *u*.
    239 
    240    .. versionadded:: 2.6
    241 
    242 
    243 .. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
    244 
    245    Take a C :c:func:`printf`\ -style *format* string and a variable number of
    246    arguments, calculate the size of the resulting Python unicode string and return
    247    a string with the values formatted into it.  The variable arguments must be C
    248    types and must correspond exactly to the format characters in the *format*
    249    string.  The following format characters are allowed:
    250 
    251    .. % The descriptions for %zd and %zu are wrong, but the truth is complicated
    252    .. % because not all compilers support the %z width modifier -- we fake it
    253    .. % when necessary via interpolating PY_FORMAT_SIZE_T.
    254 
    255    .. tabularcolumns:: |l|l|L|
    256 
    257    +-------------------+---------------------+--------------------------------+
    258    | Format Characters | Type                | Comment                        |
    259    +===================+=====================+================================+
    260    | :attr:`%%`        | *n/a*               | The literal % character.       |
    261    +-------------------+---------------------+--------------------------------+
    262    | :attr:`%c`        | int                 | A single character,            |
    263    |                   |                     | represented as a C int.        |
    264    +-------------------+---------------------+--------------------------------+
    265    | :attr:`%d`        | int                 | Exactly equivalent to          |
    266    |                   |                     | ``printf("%d")``.              |
    267    +-------------------+---------------------+--------------------------------+
    268    | :attr:`%u`        | unsigned int        | Exactly equivalent to          |
    269    |                   |                     | ``printf("%u")``.              |
    270    +-------------------+---------------------+--------------------------------+
    271    | :attr:`%ld`       | long                | Exactly equivalent to          |
    272    |                   |                     | ``printf("%ld")``.             |
    273    +-------------------+---------------------+--------------------------------+
    274    | :attr:`%lu`       | unsigned long       | Exactly equivalent to          |
    275    |                   |                     | ``printf("%lu")``.             |
    276    +-------------------+---------------------+--------------------------------+
    277    | :attr:`%zd`       | Py_ssize_t          | Exactly equivalent to          |
    278    |                   |                     | ``printf("%zd")``.             |
    279    +-------------------+---------------------+--------------------------------+
    280    | :attr:`%zu`       | size_t              | Exactly equivalent to          |
    281    |                   |                     | ``printf("%zu")``.             |
    282    +-------------------+---------------------+--------------------------------+
    283    | :attr:`%i`        | int                 | Exactly equivalent to          |
    284    |                   |                     | ``printf("%i")``.              |
    285    +-------------------+---------------------+--------------------------------+
    286    | :attr:`%x`        | int                 | Exactly equivalent to          |
    287    |                   |                     | ``printf("%x")``.              |
    288    +-------------------+---------------------+--------------------------------+
    289    | :attr:`%s`        | char\*              | A null-terminated C character  |
    290    |                   |                     | array.                         |
    291    +-------------------+---------------------+--------------------------------+
    292    | :attr:`%p`        | void\*              | The hex representation of a C  |
    293    |                   |                     | pointer. Mostly equivalent to  |
    294    |                   |                     | ``printf("%p")`` except that   |
    295    |                   |                     | it is guaranteed to start with |
    296    |                   |                     | the literal ``0x`` regardless  |
    297    |                   |                     | of what the platform's         |
    298    |                   |                     | ``printf`` yields.             |
    299    +-------------------+---------------------+--------------------------------+
    300    | :attr:`%U`        | PyObject\*          | A unicode object.              |
    301    +-------------------+---------------------+--------------------------------+
    302    | :attr:`%V`        | PyObject\*, char \* | A unicode object (which may be |
    303    |                   |                     | *NULL*) and a null-terminated  |
    304    |                   |                     | C character array as a second  |
    305    |                   |                     | parameter (which will be used, |
    306    |                   |                     | if the first parameter is      |
    307    |                   |                     | *NULL*).                       |
    308    +-------------------+---------------------+--------------------------------+
    309    | :attr:`%S`        | PyObject\*          | The result of calling          |
    310    |                   |                     | :func:`PyObject_Unicode`.      |
    311    +-------------------+---------------------+--------------------------------+
    312    | :attr:`%R`        | PyObject\*          | The result of calling          |
    313    |                   |                     | :func:`PyObject_Repr`.         |
    314    +-------------------+---------------------+--------------------------------+
    315 
    316    An unrecognized format character causes all the rest of the format string to be
    317    copied as-is to the result string, and any extra arguments discarded.
    318 
    319    .. versionadded:: 2.6
    320 
    321 
    322 .. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
    323 
    324    Identical to :func:`PyUnicode_FromFormat` except that it takes exactly two
    325    arguments.
    326 
    327    .. versionadded:: 2.6
    328 
    329 
    330 .. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
    331 
    332    Return a read-only pointer to the Unicode object's internal
    333    :c:type:`Py_UNICODE` buffer, *NULL* if *unicode* is not a Unicode object.
    334    Note that the resulting :c:type:`Py_UNICODE*` string may contain embedded
    335    null characters, which would cause the string to be truncated when used in
    336    most C functions.
    337 
    338 
    339 .. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
    340 
    341    Return the length of the Unicode object.
    342 
    343    .. versionchanged:: 2.5
    344       This function returned an :c:type:`int` type. This might require changes
    345       in your code for properly supporting 64-bit systems.
    346 
    347 
    348 .. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors)
    349 
    350    Coerce an encoded object *obj* to a Unicode object and return a reference with
    351    incremented refcount.
    352 
    353    String and other char buffer compatible objects are decoded according to the
    354    given encoding and using the error handling defined by errors.  Both can be
    355    *NULL* to have the interface use the default values (see the next section for
    356    details).
    357 
    358    All other objects, including Unicode objects, cause a :exc:`TypeError` to be
    359    set.
    360 
    361    The API returns *NULL* if there was an error.  The caller is responsible for
    362    decref'ing the returned objects.
    363 
    364 
    365 .. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
    366 
    367    Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
    368    throughout the interpreter whenever coercion to Unicode is needed.
    369 
    370 If the platform supports :c:type:`wchar_t` and provides a header file wchar.h,
    371 Python can interface directly to this type using the following functions.
    372 Support is optimized if Python's own :c:type:`Py_UNICODE` type is identical to
    373 the system's :c:type:`wchar_t`.
    374 
    375 
    376 wchar_t Support
    377 """""""""""""""
    378 
    379 :c:type:`wchar_t` support for platforms which support it:
    380 
    381 .. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
    382 
    383    Create a Unicode object from the :c:type:`wchar_t` buffer *w* of the given *size*.
    384    Return *NULL* on failure.
    385 
    386    .. versionchanged:: 2.5
    387       This function used an :c:type:`int` type for *size*. This might require
    388       changes in your code for properly supporting 64-bit systems.
    389 
    390 
    391 .. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject *unicode, wchar_t *w, Py_ssize_t size)
    392 
    393    Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*.  At most
    394    *size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
    395    0-termination character).  Return the number of :c:type:`wchar_t` characters
    396    copied or ``-1`` in case of an error.  Note that the resulting :c:type:`wchar_t`
    397    string may or may not be 0-terminated.  It is the responsibility of the caller
    398    to make sure that the :c:type:`wchar_t` string is 0-terminated in case this is
    399    required by the application. Also, note that the :c:type:`wchar_t*` string
    400    might contain null characters, which would cause the string to be truncated
    401    when used with most C functions.
    402 
    403    .. versionchanged:: 2.5
    404       This function returned an :c:type:`int` type and used an :c:type:`int`
    405       type for *size*. This might require changes in your code for properly
    406       supporting 64-bit systems.
    407 
    408 
    409 .. _builtincodecs:
    410 
    411 Built-in Codecs
    412 ^^^^^^^^^^^^^^^
    413 
    414 Python provides a set of built-in codecs which are written in C for speed. All of
    415 these codecs are directly usable via the following functions.
    416 
    417 Many of the following APIs take two arguments encoding and errors, and they
    418 have the same semantics as the ones of the built-in :func:`unicode` Unicode
    419 object constructor.
    420 
    421 Setting encoding to *NULL* causes the default encoding to be used which is
    422 ASCII.  The file system calls should use :c:data:`Py_FileSystemDefaultEncoding`
    423 as the encoding for file names. This variable should be treated as read-only: on
    424 some systems, it will be a pointer to a static string, on others, it will change
    425 at run-time (such as when the application invokes setlocale).
    426 
    427 Error handling is set by errors which may also be set to *NULL* meaning to use
    428 the default handling defined for the codec.  Default error handling for all
    429 built-in codecs is "strict" (:exc:`ValueError` is raised).
    430 
    431 The codecs all use a similar interface.  Only deviation from the following
    432 generic ones are documented for simplicity.
    433 
    434 
    435 Generic Codecs
    436 """"""""""""""
    437 
    438 These are the generic codec APIs:
    439 
    440 
    441 .. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
    442 
    443    Create a Unicode object by decoding *size* bytes of the encoded string *s*.
    444    *encoding* and *errors* have the same meaning as the parameters of the same name
    445    in the :func:`unicode` built-in function.  The codec to be used is looked up
    446    using the Python codec registry.  Return *NULL* if an exception was raised by
    447    the codec.
    448 
    449    .. versionchanged:: 2.5
    450       This function used an :c:type:`int` type for *size*. This might require
    451       changes in your code for properly supporting 64-bit systems.
    452 
    453 
    454 .. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors)
    455 
    456    Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* and return a Python
    457    string object.  *encoding* and *errors* have the same meaning as the parameters
    458    of the same name in the Unicode :meth:`~unicode.encode` method.  The codec
    459    to be used is looked up using the Python codec registry.  Return *NULL* if
    460    an exception was raised by the codec.
    461 
    462    .. versionchanged:: 2.5
    463       This function used an :c:type:`int` type for *size*. This might require
    464       changes in your code for properly supporting 64-bit systems.
    465 
    466 
    467 .. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors)
    468 
    469    Encode a Unicode object and return the result as Python string object.
    470    *encoding* and *errors* have the same meaning as the parameters of the same name
    471    in the Unicode :meth:`encode` method. The codec to be used is looked up using
    472    the Python codec registry. Return *NULL* if an exception was raised by the
    473    codec.
    474 
    475 
    476 UTF-8 Codecs
    477 """"""""""""
    478 
    479 These are the UTF-8 codec APIs:
    480 
    481 
    482 .. c:function:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
    483 
    484    Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string
    485    *s*. Return *NULL* if an exception was raised by the codec.
    486 
    487    .. versionchanged:: 2.5
    488       This function used an :c:type:`int` type for *size*. This might require
    489       changes in your code for properly supporting 64-bit systems.
    490 
    491 
    492 .. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
    493 
    494    If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF8`. If
    495    *consumed* is not *NULL*, trailing incomplete UTF-8 byte sequences will not be
    496    treated as an error. Those bytes will not be decoded and the number of bytes
    497    that have been decoded will be stored in *consumed*.
    498 
    499    .. versionadded:: 2.4
    500 
    501    .. versionchanged:: 2.5
    502       This function used an :c:type:`int` type for *size*. This might require
    503       changes in your code for properly supporting 64-bit systems.
    504 
    505 
    506 .. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
    507 
    508    Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* using UTF-8 and return a
    509    Python string object.  Return *NULL* if an exception was raised by the codec.
    510 
    511    .. versionchanged:: 2.5
    512       This function used an :c:type:`int` type for *size*. This might require
    513       changes in your code for properly supporting 64-bit systems.
    514 
    515 
    516 .. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
    517 
    518    Encode a Unicode object using UTF-8 and return the result as Python string
    519    object.  Error handling is "strict".  Return *NULL* if an exception was raised
    520    by the codec.
    521 
    522 
    523 UTF-32 Codecs
    524 """""""""""""
    525 
    526 These are the UTF-32 codec APIs:
    527 
    528 
    529 .. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
    530 
    531    Decode *size* bytes from a UTF-32 encoded buffer string and return the
    532    corresponding Unicode object.  *errors* (if non-*NULL*) defines the error
    533    handling. It defaults to "strict".
    534 
    535    If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
    536    order::
    537 
    538       *byteorder == -1: little endian
    539       *byteorder == 0:  native order
    540       *byteorder == 1:  big endian
    541 
    542    If ``*byteorder`` is zero, and the first four bytes of the input data are a
    543    byte order mark (BOM), the decoder switches to this byte order and the BOM is
    544    not copied into the resulting Unicode string.  If ``*byteorder`` is ``-1`` or
    545    ``1``, any byte order mark is copied to the output.
    546 
    547    After completion, *\*byteorder* is set to the current byte order at the end
    548    of input data.
    549 
    550    In a narrow build code points outside the BMP will be decoded as surrogate pairs.
    551 
    552    If *byteorder* is *NULL*, the codec starts in native order mode.
    553 
    554    Return *NULL* if an exception was raised by the codec.
    555 
    556    .. versionadded:: 2.6
    557 
    558 
    559 .. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
    560 
    561    If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF32`. If
    562    *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
    563    trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
    564    by four) as an error. Those bytes will not be decoded and the number of bytes
    565    that have been decoded will be stored in *consumed*.
    566 
    567    .. versionadded:: 2.6
    568 
    569 
    570 .. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
    571 
    572    Return a Python bytes object holding the UTF-32 encoded value of the Unicode
    573    data in *s*.  Output is written according to the following byte order::
    574 
    575       byteorder == -1: little endian
    576       byteorder == 0:  native byte order (writes a BOM mark)
    577       byteorder == 1:  big endian
    578 
    579    If byteorder is ``0``, the output string will always start with the Unicode BOM
    580    mark (U+FEFF). In the other two modes, no BOM mark is prepended.
    581 
    582    If *Py_UNICODE_WIDE* is not defined, surrogate pairs will be output
    583    as a single code point.
    584 
    585    Return *NULL* if an exception was raised by the codec.
    586 
    587    .. versionadded:: 2.6
    588 
    589 
    590 .. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
    591 
    592    Return a Python string using the UTF-32 encoding in native byte order. The
    593    string always starts with a BOM mark.  Error handling is "strict".  Return
    594    *NULL* if an exception was raised by the codec.
    595 
    596    .. versionadded:: 2.6
    597 
    598 
    599 UTF-16 Codecs
    600 """""""""""""
    601 
    602 These are the UTF-16 codec APIs:
    603 
    604 
    605 .. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
    606 
    607    Decode *size* bytes from a UTF-16 encoded buffer string and return the
    608    corresponding Unicode object.  *errors* (if non-*NULL*) defines the error
    609    handling. It defaults to "strict".
    610 
    611    If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
    612    order::
    613 
    614       *byteorder == -1: little endian
    615       *byteorder == 0:  native order
    616       *byteorder == 1:  big endian
    617 
    618    If ``*byteorder`` is zero, and the first two bytes of the input data are a
    619    byte order mark (BOM), the decoder switches to this byte order and the BOM is
    620    not copied into the resulting Unicode string.  If ``*byteorder`` is ``-1`` or
    621    ``1``, any byte order mark is copied to the output (where it will result in
    622    either a ``\ufeff`` or a ``\ufffe`` character).
    623 
    624    After completion, *\*byteorder* is set to the current byte order at the end
    625    of input data.
    626 
    627    If *byteorder* is *NULL*, the codec starts in native order mode.
    628 
    629    Return *NULL* if an exception was raised by the codec.
    630 
    631    .. versionchanged:: 2.5
    632       This function used an :c:type:`int` type for *size*. This might require
    633       changes in your code for properly supporting 64-bit systems.
    634 
    635 
    636 .. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
    637 
    638    If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF16`. If
    639    *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
    640    trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
    641    split surrogate pair) as an error. Those bytes will not be decoded and the
    642    number of bytes that have been decoded will be stored in *consumed*.
    643 
    644    .. versionadded:: 2.4
    645 
    646    .. versionchanged:: 2.5
    647       This function used an :c:type:`int` type for *size* and an :c:type:`int *`
    648       type for *consumed*. This might require changes in your code for
    649       properly supporting 64-bit systems.
    650 
    651 
    652 .. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
    653 
    654    Return a Python string object holding the UTF-16 encoded value of the Unicode
    655    data in *s*.  Output is written according to the following byte order::
    656 
    657       byteorder == -1: little endian
    658       byteorder == 0:  native byte order (writes a BOM mark)
    659       byteorder == 1:  big endian
    660 
    661    If byteorder is ``0``, the output string will always start with the Unicode BOM
    662    mark (U+FEFF). In the other two modes, no BOM mark is prepended.
    663 
    664    If *Py_UNICODE_WIDE* is defined, a single :c:type:`Py_UNICODE` value may get
    665    represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
    666    values is interpreted as a UCS-2 character.
    667 
    668    Return *NULL* if an exception was raised by the codec.
    669 
    670    .. versionchanged:: 2.5
    671       This function used an :c:type:`int` type for *size*. This might require
    672       changes in your code for properly supporting 64-bit systems.
    673 
    674 
    675 .. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
    676 
    677    Return a Python string using the UTF-16 encoding in native byte order. The
    678    string always starts with a BOM mark.  Error handling is "strict".  Return
    679    *NULL* if an exception was raised by the codec.
    680 
    681 
    682 UTF-7 Codecs
    683 """"""""""""
    684 
    685 These are the UTF-7 codec APIs:
    686 
    687 
    688 .. c:function:: PyObject* PyUnicode_DecodeUTF7(const char *s, Py_ssize_t size, const char *errors)
    689 
    690    Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string
    691    *s*.  Return *NULL* if an exception was raised by the codec.
    692 
    693 
    694 .. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
    695 
    696    If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF7`.  If
    697    *consumed* is not *NULL*, trailing incomplete UTF-7 base-64 sections will not
    698    be treated as an error.  Those bytes will not be decoded and the number of
    699    bytes that have been decoded will be stored in *consumed*.
    700 
    701 
    702 .. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors)
    703 
    704    Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
    705    return a Python bytes object.  Return *NULL* if an exception was raised by
    706    the codec.
    707 
    708    If *base64SetO* is nonzero, "Set O" (punctuation that has no otherwise
    709    special meaning) will be encoded in base-64.  If *base64WhiteSpace* is
    710    nonzero, whitespace will be encoded in base-64.  Both are set to zero for the
    711    Python "utf-7" codec.
    712 
    713 
    714 Unicode-Escape Codecs
    715 """""""""""""""""""""
    716 
    717 These are the "Unicode Escape" codec APIs:
    718 
    719 
    720 .. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
    721 
    722    Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
    723    string *s*.  Return *NULL* if an exception was raised by the codec.
    724 
    725    .. versionchanged:: 2.5
    726       This function used an :c:type:`int` type for *size*. This might require
    727       changes in your code for properly supporting 64-bit systems.
    728 
    729 
    730 .. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
    731 
    732    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Unicode-Escape and
    733    return a Python string object.  Return *NULL* if an exception was raised by the
    734    codec.
    735 
    736    .. versionchanged:: 2.5
    737       This function used an :c:type:`int` type for *size*. This might require
    738       changes in your code for properly supporting 64-bit systems.
    739 
    740 
    741 .. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
    742 
    743    Encode a Unicode object using Unicode-Escape and return the result as Python
    744    string object.  Error handling is "strict". Return *NULL* if an exception was
    745    raised by the codec.
    746 
    747 
    748 Raw-Unicode-Escape Codecs
    749 """""""""""""""""""""""""
    750 
    751 These are the "Raw Unicode Escape" codec APIs:
    752 
    753 
    754 .. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
    755 
    756    Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
    757    encoded string *s*.  Return *NULL* if an exception was raised by the codec.
    758 
    759    .. versionchanged:: 2.5
    760       This function used an :c:type:`int` type for *size*. This might require
    761       changes in your code for properly supporting 64-bit systems.
    762 
    763 
    764 .. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
    765 
    766    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
    767    and return a Python string object.  Return *NULL* if an exception was raised by
    768    the codec.
    769 
    770    .. versionchanged:: 2.5
    771       This function used an :c:type:`int` type for *size*. This might require
    772       changes in your code for properly supporting 64-bit systems.
    773 
    774 
    775 .. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
    776 
    777    Encode a Unicode object using Raw-Unicode-Escape and return the result as
    778    Python string object. Error handling is "strict". Return *NULL* if an exception
    779    was raised by the codec.
    780 
    781 
    782 Latin-1 Codecs
    783 """"""""""""""
    784 
    785 These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
    786 ordinals and only these are accepted by the codecs during encoding.
    787 
    788 
    789 .. c:function:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
    790 
    791    Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string
    792    *s*.  Return *NULL* if an exception was raised by the codec.
    793 
    794    .. versionchanged:: 2.5
    795       This function used an :c:type:`int` type for *size*. This might require
    796       changes in your code for properly supporting 64-bit systems.
    797 
    798 
    799 .. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
    800 
    801    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Latin-1 and return
    802    a Python string object.  Return *NULL* if an exception was raised by the codec.
    803 
    804    .. versionchanged:: 2.5
    805       This function used an :c:type:`int` type for *size*. This might require
    806       changes in your code for properly supporting 64-bit systems.
    807 
    808 
    809 .. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
    810 
    811    Encode a Unicode object using Latin-1 and return the result as Python string
    812    object.  Error handling is "strict".  Return *NULL* if an exception was raised
    813    by the codec.
    814 
    815 
    816 ASCII Codecs
    817 """"""""""""
    818 
    819 These are the ASCII codec APIs.  Only 7-bit ASCII data is accepted. All other
    820 codes generate errors.
    821 
    822 
    823 .. c:function:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
    824 
    825    Create a Unicode object by decoding *size* bytes of the ASCII encoded string
    826    *s*.  Return *NULL* if an exception was raised by the codec.
    827 
    828    .. versionchanged:: 2.5
    829       This function used an :c:type:`int` type for *size*. This might require
    830       changes in your code for properly supporting 64-bit systems.
    831 
    832 
    833 .. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
    834 
    835    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using ASCII and return a
    836    Python string object.  Return *NULL* if an exception was raised by the codec.
    837 
    838    .. versionchanged:: 2.5
    839       This function used an :c:type:`int` type for *size*. This might require
    840       changes in your code for properly supporting 64-bit systems.
    841 
    842 
    843 .. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
    844 
    845    Encode a Unicode object using ASCII and return the result as Python string
    846    object.  Error handling is "strict".  Return *NULL* if an exception was raised
    847    by the codec.
    848 
    849 
    850 Character Map Codecs
    851 """"""""""""""""""""
    852 
    853 This codec is special in that it can be used to implement many different codecs
    854 (and this is in fact what was done to obtain most of the standard codecs
    855 included in the :mod:`encodings` package). The codec uses mapping to encode and
    856 decode characters.
    857 
    858 Decoding mappings must map single string characters to single Unicode
    859 characters, integers (which are then interpreted as Unicode ordinals) or ``None``
    860 (meaning "undefined mapping" and causing an error).
    861 
    862 Encoding mappings must map single Unicode characters to single string
    863 characters, integers (which are then interpreted as Latin-1 ordinals) or ``None``
    864 (meaning "undefined mapping" and causing an error).
    865 
    866 The mapping objects provided must only support the __getitem__ mapping
    867 interface.
    868 
    869 If a character lookup fails with a LookupError, the character is copied as-is
    870 meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
    871 resp. Because of this, mappings only need to contain those mappings which map
    872 characters to different code points.
    873 
    874 These are the mapping codec APIs:
    875 
    876 .. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, PyObject *mapping, const char *errors)
    877 
    878    Create a Unicode object by decoding *size* bytes of the encoded string *s* using
    879    the given *mapping* object.  Return *NULL* if an exception was raised by the
    880    codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
    881    dictionary mapping byte or a unicode string, which is treated as a lookup table.
    882    Byte values greater that the length of the string and U+FFFE "characters" are
    883    treated as "undefined mapping".
    884 
    885    .. versionchanged:: 2.4
    886       Allowed unicode string as mapping argument.
    887 
    888    .. versionchanged:: 2.5
    889       This function used an :c:type:`int` type for *size*. This might require
    890       changes in your code for properly supporting 64-bit systems.
    891 
    892 
    893 .. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
    894 
    895    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
    896    *mapping* object and return a Python string object. Return *NULL* if an
    897    exception was raised by the codec.
    898 
    899    .. versionchanged:: 2.5
    900       This function used an :c:type:`int` type for *size*. This might require
    901       changes in your code for properly supporting 64-bit systems.
    902 
    903 
    904 .. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
    905 
    906    Encode a Unicode object using the given *mapping* object and return the result
    907    as Python string object.  Error handling is "strict".  Return *NULL* if an
    908    exception was raised by the codec.
    909 
    910 The following codec API is special in that maps Unicode to Unicode.
    911 
    912 
    913 .. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *table, const char *errors)
    914 
    915    Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
    916    character mapping *table* to it and return the resulting Unicode object.  Return
    917    *NULL* when an exception was raised by the codec.
    918 
    919    The *mapping* table must map Unicode ordinal integers to Unicode ordinal
    920    integers or ``None`` (causing deletion of the character).
    921 
    922    Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
    923    and sequences work well.  Unmapped character ordinals (ones which cause a
    924    :exc:`LookupError`) are left untouched and are copied as-is.
    925 
    926    .. versionchanged:: 2.5
    927       This function used an :c:type:`int` type for *size*. This might require
    928       changes in your code for properly supporting 64-bit systems.
    929 
    930 
    931 MBCS codecs for Windows
    932 """""""""""""""""""""""
    933 
    934 These are the MBCS codec APIs. They are currently only available on Windows and
    935 use the Win32 MBCS converters to implement the conversions.  Note that MBCS (or
    936 DBCS) is a class of encodings, not just one.  The target encoding is defined by
    937 the user settings on the machine running the codec.
    938 
    939 
    940 .. c:function:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
    941 
    942    Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*.
    943    Return *NULL* if an exception was raised by the codec.
    944 
    945    .. versionchanged:: 2.5
    946       This function used an :c:type:`int` type for *size*. This might require
    947       changes in your code for properly supporting 64-bit systems.
    948 
    949 
    950 .. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, const char *errors, int *consumed)
    951 
    952    If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeMBCS`. If
    953    *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
    954    trailing lead byte and the number of bytes that have been decoded will be stored
    955    in *consumed*.
    956 
    957    .. versionadded:: 2.5
    958 
    959 
    960 .. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
    961 
    962    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using MBCS and return a
    963    Python string object.  Return *NULL* if an exception was raised by the codec.
    964 
    965    .. versionchanged:: 2.5
    966       This function used an :c:type:`int` type for *size*. This might require
    967       changes in your code for properly supporting 64-bit systems.
    968 
    969 
    970 .. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
    971 
    972    Encode a Unicode object using MBCS and return the result as Python string
    973    object.  Error handling is "strict".  Return *NULL* if an exception was raised
    974    by the codec.
    975 
    976 
    977 Methods & Slots
    978 """""""""""""""
    979 
    980 .. _unicodemethodsandslots:
    981 
    982 Methods and Slot Functions
    983 ^^^^^^^^^^^^^^^^^^^^^^^^^^
    984 
    985 The following APIs are capable of handling Unicode objects and strings on input
    986 (we refer to them as strings in the descriptions) and return Unicode objects or
    987 integers as appropriate.
    988 
    989 They all return *NULL* or ``-1`` if an exception occurs.
    990 
    991 
    992 .. c:function:: PyObject* PyUnicode_Concat(PyObject *left, PyObject *right)
    993 
    994    Concat two strings giving a new Unicode string.
    995 
    996 
    997 .. c:function:: PyObject* PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit)
    998 
    999    Split a string giving a list of Unicode strings.  If *sep* is *NULL*, splitting
   1000    will be done at all whitespace substrings.  Otherwise, splits occur at the given
   1001    separator.  At most *maxsplit* splits will be done.  If negative, no limit is
   1002    set.  Separators are not included in the resulting list.
   1003 
   1004    .. versionchanged:: 2.5
   1005       This function used an :c:type:`int` type for *maxsplit*. This might require
   1006       changes in your code for properly supporting 64-bit systems.
   1007 
   1008 
   1009 .. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
   1010 
   1011    Split a Unicode string at line breaks, returning a list of Unicode strings.
   1012    CRLF is considered to be one line break.  If *keepend* is ``0``, the Line break
   1013    characters are not included in the resulting strings.
   1014 
   1015 
   1016 .. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)
   1017 
   1018    Translate a string by applying a character mapping table to it and return the
   1019    resulting Unicode object.
   1020 
   1021    The mapping table must map Unicode ordinal integers to Unicode ordinal integers
   1022    or ``None`` (causing deletion of the character).
   1023 
   1024    Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
   1025    and sequences work well.  Unmapped character ordinals (ones which cause a
   1026    :exc:`LookupError`) are left untouched and are copied as-is.
   1027 
   1028    *errors* has the usual meaning for codecs. It may be *NULL* which indicates to
   1029    use the default error handling.
   1030 
   1031 
   1032 .. c:function:: PyObject* PyUnicode_Join(PyObject *separator, PyObject *seq)
   1033 
   1034    Join a sequence of strings using the given *separator* and return the resulting
   1035    Unicode string.
   1036 
   1037 
   1038 .. c:function:: Py_ssize_t PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
   1039 
   1040    Return ``1`` if *substr* matches ``str[start:end]`` at the given tail end
   1041    (*direction* == ``-1`` means to do a prefix match, *direction* == ``1`` a suffix match),
   1042    ``0`` otherwise. Return ``-1`` if an error occurred.
   1043 
   1044    .. versionchanged:: 2.5
   1045       This function used an :c:type:`int` type for *start* and *end*. This
   1046       might require changes in your code for properly supporting 64-bit
   1047       systems.
   1048 
   1049 
   1050 .. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
   1051 
   1052    Return the first position of *substr* in ``str[start:end]`` using the given
   1053    *direction* (*direction* == ``1`` means to do a forward search, *direction* == ``-1`` a
   1054    backward search).  The return value is the index of the first match; a value of
   1055    ``-1`` indicates that no match was found, and ``-2`` indicates that an error
   1056    occurred and an exception has been set.
   1057 
   1058    .. versionchanged:: 2.5
   1059       This function used an :c:type:`int` type for *start* and *end*. This
   1060       might require changes in your code for properly supporting 64-bit
   1061       systems.
   1062 
   1063 
   1064 .. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end)
   1065 
   1066    Return the number of non-overlapping occurrences of *substr* in
   1067    ``str[start:end]``.  Return ``-1`` if an error occurred.
   1068 
   1069    .. versionchanged:: 2.5
   1070       This function returned an :c:type:`int` type and used an :c:type:`int`
   1071       type for *start* and *end*. This might require changes in your code for
   1072       properly supporting 64-bit systems.
   1073 
   1074 
   1075 .. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount)
   1076 
   1077    Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
   1078    return the resulting Unicode object. *maxcount* == ``-1`` means replace all
   1079    occurrences.
   1080 
   1081    .. versionchanged:: 2.5
   1082       This function used an :c:type:`int` type for *maxcount*. This might
   1083       require changes in your code for properly supporting 64-bit systems.
   1084 
   1085 
   1086 .. c:function:: int PyUnicode_Compare(PyObject *left, PyObject *right)
   1087 
   1088    Compare two strings and return ``-1``, ``0``, ``1`` for less than, equal, and greater than,
   1089    respectively.
   1090 
   1091 
   1092 .. c:function:: int PyUnicode_RichCompare(PyObject *left,  PyObject *right,  int op)
   1093 
   1094    Rich compare two unicode strings and return one of the following:
   1095 
   1096    * ``NULL`` in case an exception was raised
   1097    * :const:`Py_True` or :const:`Py_False` for successful comparisons
   1098    * :const:`Py_NotImplemented` in case the type combination is unknown
   1099 
   1100    Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
   1101    :exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
   1102    with a :exc:`UnicodeDecodeError`.
   1103 
   1104    Possible values for *op* are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
   1105    :const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
   1106 
   1107 
   1108 .. c:function:: PyObject* PyUnicode_Format(PyObject *format, PyObject *args)
   1109 
   1110    Return a new string object from *format* and *args*; this is analogous to
   1111    ``format % args``.
   1112 
   1113 
   1114 .. c:function:: int PyUnicode_Contains(PyObject *container, PyObject *element)
   1115 
   1116    Check whether *element* is contained in *container* and return true or false
   1117    accordingly.
   1118 
   1119    *element* has to coerce to a one element Unicode string. ``-1`` is returned if
   1120    there was an error.
   1121