Home | History | Annotate | Download | only in html
      1 <html>
      2 <head>
      3 <title>pcre2serialize specification</title>
      4 </head>
      5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
      6 <h1>pcre2serialize man page</h1>
      7 <p>
      8 Return to the <a href="index.html">PCRE2 index page</a>.
      9 </p>
     10 <p>
     11 This page is part of the PCRE2 HTML documentation. It was generated
     12 automatically from the original man page. If there is any nonsense in it,
     13 please consult the man page, in case the conversion went wrong.
     14 <br>
     15 <ul>
     16 <li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a>
     17 <li><a name="TOC2" href="#SEC2">SECURITY CONCERNS</a>
     18 <li><a name="TOC3" href="#SEC3">SAVING COMPILED PATTERNS</a>
     19 <li><a name="TOC4" href="#SEC4">RE-USING PRECOMPILED PATTERNS</a>
     20 <li><a name="TOC5" href="#SEC5">AUTHOR</a>
     21 <li><a name="TOC6" href="#SEC6">REVISION</a>
     22 </ul>
     23 <br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br>
     24 <P>
     25 <b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
     26 <b>  int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
     27 <b>  pcre2_general_context *<i>gcontext</i>);</b>
     28 <br>
     29 <br>
     30 <b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
     31 <b>  int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
     32 <b>  PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
     33 <br>
     34 <br>
     35 <b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
     36 <br>
     37 <br>
     38 <b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
     39 <br>
     40 <br>
     41 If you are running an application that uses a large number of regular
     42 expression patterns, it may be useful to store them in a precompiled form
     43 instead of having to compile them every time the application is run. However,
     44 if you are using the just-in-time optimization feature, it is not possible to
     45 save and reload the JIT data, because it is position-dependent. The host on
     46 which the patterns are reloaded must be running the same version of PCRE2, with
     47 the same code unit width, and must also have the same endianness, pointer width
     48 and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using
     49 PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be
     50 reloaded using the 8-bit library.
     51 </P>
     52 <P>
     53 Note that "serialization" in PCRE2 does not convert compiled patterns to an
     54 abstract format like Java or .NET serialization. The serialized output is
     55 really just a bytecode dump, which is why it can only be reloaded in the same
     56 environment as the one that created it. Hence the restrictions mentioned above.
     57 Applications that are not statically linked with a fixed version of PCRE2 must
     58 be prepared to recompile patterns from their sources, in order to be immune to
     59 PCRE2 upgrades.
     60 </P>
     61 <br><a name="SEC2" href="#TOC1">SECURITY CONCERNS</a><br>
     62 <P>
     63 The facility for saving and restoring compiled patterns is intended for use
     64 within individual applications. As such, the data supplied to
     65 <b>pcre2_serialize_decode()</b> is expected to be trusted data, not data from
     66 arbitrary external sources. There is only some simple consistency checking, not
     67 complete validation of what is being re-loaded. Corrupted data may cause
     68 undefined results. For example, if the length field of a pattern in the
     69 serialized data is corrupted, the deserializing code may read beyond the end of
     70 the byte stream that is passed to it.
     71 </P>
     72 <br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
     73 <P>
     74 Before compiled patterns can be saved they must be serialized, which in PCRE2
     75 means converting the pattern to a stream of bytes. A single byte stream may
     76 contain any number of compiled patterns, but they must all use the same
     77 character tables. A single copy of the tables is included in the byte stream
     78 (its size is 1088 bytes). For more details of character tables, see the
     79 <a href="pcre2api.html#localesupport">section on locale support</a>
     80 in the
     81 <a href="pcre2api.html"><b>pcre2api</b></a>
     82 documentation.
     83 </P>
     84 <P>
     85 The function <b>pcre2_serialize_encode()</b> creates a serialized byte stream
     86 from a list of compiled patterns. Its first two arguments specify the list,
     87 being a pointer to a vector of pointers to compiled patterns, and the length of
     88 the vector. The third and fourth arguments point to variables which are set to
     89 point to the created byte stream and its length, respectively. The final
     90 argument is a pointer to a general context, which can be used to specify custom
     91 memory mangagement functions. If this argument is NULL, <b>malloc()</b> is used
     92 to obtain memory for the byte stream. The yield of the function is the number
     93 of serialized patterns, or one of the following negative error codes:
     94 <pre>
     95   PCRE2_ERROR_BADDATA      the number of patterns is zero or less
     96   PCRE2_ERROR_BADMAGIC     mismatch of id bytes in one of the patterns
     97   PCRE2_ERROR_MEMORY       memory allocation failed
     98   PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same tables
     99   PCRE2_ERROR_NULL         the 1st, 3rd, or 4th argument is NULL
    100 </pre>
    101 PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
    102 that a slot in the vector does not point to a compiled pattern.
    103 </P>
    104 <P>
    105 Once a set of patterns has been serialized you can save the data in any
    106 appropriate manner. Here is sample code that compiles two patterns and writes
    107 them to a file. It assumes that the variable <i>fd</i> refers to a file that is
    108 open for output. The error checking that should be present in a real
    109 application has been omitted for simplicity.
    110 <pre>
    111   int errorcode;
    112   uint8_t *bytes;
    113   PCRE2_SIZE erroroffset;
    114   PCRE2_SIZE bytescount;
    115   pcre2_code *list_of_codes[2];
    116   list_of_codes[0] = pcre2_compile("first pattern",
    117     PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
    118   list_of_codes[1] = pcre2_compile("second pattern",
    119     PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
    120   errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
    121     &bytescount, NULL);
    122   errorcode = fwrite(bytes, 1, bytescount, fd);
    123 </pre>
    124 Note that the serialized data is binary data that may contain any of the 256
    125 possible byte values. On systems that make a distinction between binary and
    126 non-binary data, be sure that the file is opened for binary output.
    127 </P>
    128 <P>
    129 Serializing a set of patterns leaves the original data untouched, so they can
    130 still be used for matching. Their memory must eventually be freed in the usual
    131 way by calling <b>pcre2_code_free()</b>. When you have finished with the byte
    132 stream, it too must be freed by calling <b>pcre2_serialize_free()</b>. If this
    133 function is called with a NULL argument, it returns immediately without doing
    134 anything.
    135 </P>
    136 <br><a name="SEC4" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
    137 <P>
    138 In order to re-use a set of saved patterns you must first make the serialized
    139 byte stream available in main memory (for example, by reading from a file). The
    140 management of this memory block is up to the application. You can use the
    141 <b>pcre2_serialize_get_number_of_codes()</b> function to find out how many
    142 compiled patterns are in the serialized data without actually decoding the
    143 patterns:
    144 <pre>
    145   uint8_t *bytes = &#60;serialized data&#62;;
    146   int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
    147 </pre>
    148 The <b>pcre2_serialize_decode()</b> function reads a byte stream and recreates
    149 the compiled patterns in new memory blocks, setting pointers to them in a
    150 vector. The first two arguments are a pointer to a suitable vector and its
    151 length, and the third argument points to a byte stream. The final argument is a
    152 pointer to a general context, which can be used to specify custom memory
    153 mangagement functions for the decoded patterns. If this argument is NULL,
    154 <b>malloc()</b> and <b>free()</b> are used. After deserialization, the byte
    155 stream is no longer needed and can be discarded.
    156 <pre>
    157   int32_t number_of_codes;
    158   pcre2_code *list_of_codes[2];
    159   uint8_t *bytes = &#60;serialized data&#62;;
    160   int32_t number_of_codes =
    161     pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
    162 </pre>
    163 If the vector is not large enough for all the patterns in the byte stream, it
    164 is filled with those that fit, and the remainder are ignored. The yield of the
    165 function is the number of decoded patterns, or one of the following negative
    166 error codes:
    167 <pre>
    168   PCRE2_ERROR_BADDATA    second argument is zero or less
    169   PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
    170   PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2 version
    171   PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
    172   PCRE2_ERROR_MEMORY     memory allocation failed
    173   PCRE2_ERROR_NULL       first or third argument is NULL
    174 </pre>
    175 PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
    176 on a system with different endianness.
    177 </P>
    178 <P>
    179 Decoded patterns can be used for matching in the usual way, and must be freed
    180 by calling <b>pcre2_code_free()</b>. However, be aware that there is a potential
    181 race issue if you are using multiple patterns that were decoded from a single
    182 byte stream in a multithreaded application. A single copy of the character
    183 tables is used by all the decoded patterns and a reference count is used to
    184 arrange for its memory to be automatically freed when the last pattern is
    185 freed, but there is no locking on this reference count. Therefore, if you want
    186 to call <b>pcre2_code_free()</b> for these patterns in different threads, you
    187 must arrange your own locking, and ensure that <b>pcre2_code_free()</b> cannot
    188 be called by two threads at the same time.
    189 </P>
    190 <P>
    191 If a pattern was processed by <b>pcre2_jit_compile()</b> before being
    192 serialized, the JIT data is discarded and so is no longer available after a
    193 save/restore cycle. You can, however, process a restored pattern with
    194 <b>pcre2_jit_compile()</b> if you wish.
    195 </P>
    196 <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
    197 <P>
    198 Philip Hazel
    199 <br>
    200 University Computing Service
    201 <br>
    202 Cambridge, England.
    203 <br>
    204 </P>
    205 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
    206 <P>
    207 Last updated: 27 June 2018
    208 <br>
    209 Copyright &copy; 1997-2018 University of Cambridge.
    210 <br>
    211 <p>
    212 Return to the <a href="index.html">PCRE2 index page</a>.
    213 </p>
    214