Home | History | Annotate | Download | only in library
      1 
      2 :mod:`stringprep` --- Internet String Preparation
      3 =================================================
      4 
      5 .. module:: stringprep
      6    :synopsis: String preparation, as per RFC 3453
      7 .. moduleauthor:: Martin v. Lwis <martin (a] v.loewis.de>
      8 .. sectionauthor:: Martin v. Lwis <martin (a] v.loewis.de>
      9 
     10 
     11 .. versionadded:: 2.3
     12 
     13 When identifying things (such as host names) in the internet, it is often
     14 necessary to compare such identifications for "equality". Exactly how this
     15 comparison is executed may depend on the application domain, e.g. whether it
     16 should be case-insensitive or not. It may be also necessary to restrict the
     17 possible identifications, to allow only identifications consisting of
     18 "printable" characters.
     19 
     20 :rfc:`3454` defines a procedure for "preparing" Unicode strings in internet
     21 protocols. Before passing strings onto the wire, they are processed with the
     22 preparation procedure, after which they have a certain normalized form. The RFC
     23 defines a set of tables, which can be combined into profiles. Each profile must
     24 define which tables it uses, and what other optional parts of the ``stringprep``
     25 procedure are part of the profile. One example of a ``stringprep`` profile is
     26 ``nameprep``, which is used for internationalized domain names.
     27 
     28 The module :mod:`stringprep` only exposes the tables from RFC 3454. As these
     29 tables would be very large to represent them as dictionaries or lists, the
     30 module uses the Unicode character database internally. The module source code
     31 itself was generated using the ``mkstringprep.py`` utility.
     32 
     33 As a result, these tables are exposed as functions, not as data structures.
     34 There are two kinds of tables in the RFC: sets and mappings. For a set,
     35 :mod:`stringprep` provides the "characteristic function", i.e. a function that
     36 returns true if the parameter is part of the set. For mappings, it provides the
     37 mapping function: given the key, it returns the associated value. Below is a
     38 list of all functions available in the module.
     39 
     40 
     41 .. function:: in_table_a1(code)
     42 
     43    Determine whether *code* is in tableA.1 (Unassigned code points in Unicode 3.2).
     44 
     45 
     46 .. function:: in_table_b1(code)
     47 
     48    Determine whether *code* is in tableB.1 (Commonly mapped to nothing).
     49 
     50 
     51 .. function:: map_table_b2(code)
     52 
     53    Return the mapped value for *code* according to tableB.2 (Mapping for
     54    case-folding used with NFKC).
     55 
     56 
     57 .. function:: map_table_b3(code)
     58 
     59    Return the mapped value for *code* according to tableB.3 (Mapping for
     60    case-folding used with no normalization).
     61 
     62 
     63 .. function:: in_table_c11(code)
     64 
     65    Determine whether *code* is in tableC.1.1  (ASCII space characters).
     66 
     67 
     68 .. function:: in_table_c12(code)
     69 
     70    Determine whether *code* is in tableC.1.2  (Non-ASCII space characters).
     71 
     72 
     73 .. function:: in_table_c11_c12(code)
     74 
     75    Determine whether *code* is in tableC.1  (Space characters, union of C.1.1 and
     76    C.1.2).
     77 
     78 
     79 .. function:: in_table_c21(code)
     80 
     81    Determine whether *code* is in tableC.2.1  (ASCII control characters).
     82 
     83 
     84 .. function:: in_table_c22(code)
     85 
     86    Determine whether *code* is in tableC.2.2  (Non-ASCII control characters).
     87 
     88 
     89 .. function:: in_table_c21_c22(code)
     90 
     91    Determine whether *code* is in tableC.2  (Control characters, union of C.2.1 and
     92    C.2.2).
     93 
     94 
     95 .. function:: in_table_c3(code)
     96 
     97    Determine whether *code* is in tableC.3  (Private use).
     98 
     99 
    100 .. function:: in_table_c4(code)
    101 
    102    Determine whether *code* is in tableC.4  (Non-character code points).
    103 
    104 
    105 .. function:: in_table_c5(code)
    106 
    107    Determine whether *code* is in tableC.5  (Surrogate codes).
    108 
    109 
    110 .. function:: in_table_c6(code)
    111 
    112    Determine whether *code* is in tableC.6  (Inappropriate for plain text).
    113 
    114 
    115 .. function:: in_table_c7(code)
    116 
    117    Determine whether *code* is in tableC.7  (Inappropriate for canonical
    118    representation).
    119 
    120 
    121 .. function:: in_table_c8(code)
    122 
    123    Determine whether *code* is in tableC.8  (Change display properties or are
    124    deprecated).
    125 
    126 
    127 .. function:: in_table_c9(code)
    128 
    129    Determine whether *code* is in tableC.9  (Tagging characters).
    130 
    131 
    132 .. function:: in_table_d1(code)
    133 
    134    Determine whether *code* is in tableD.1  (Characters with bidirectional property
    135    "R" or "AL").
    136 
    137 
    138 .. function:: in_table_d2(code)
    139 
    140    Determine whether *code* is in tableD.2  (Characters with bidirectional property
    141    "L").
    142 
    143