Home | History | Annotate | Download | only in html
      1 <html>
      2 <head>
      3 <title>pcre2serialize specification</title>
      4 </head>
      5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
      6 <h1>pcre2serialize man page</h1>
      7 <p>
      8 Return to the <a href="index.html">PCRE2 index page</a>.
      9 </p>
     10 <p>
     11 This page is part of the PCRE2 HTML documentation. It was generated
     12 automatically from the original man page. If there is any nonsense in it,
     13 please consult the man page, in case the conversion went wrong.
     14 <br>
     15 <ul>
     16 <li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a>
     17 <li><a name="TOC2" href="#SEC2">SECURITY CONCERNS</a>
     18 <li><a name="TOC3" href="#SEC3">SAVING COMPILED PATTERNS</a>
     19 <li><a name="TOC4" href="#SEC4">RE-USING PRECOMPILED PATTERNS</a>
     20 <li><a name="TOC5" href="#SEC5">AUTHOR</a>
     21 <li><a name="TOC6" href="#SEC6">REVISION</a>
     22 </ul>
     23 <br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br>
     24 <P>
     25 <b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
     26 <b>  int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
     27 <b>  pcre2_general_context *<i>gcontext</i>);</b>
     28 <br>
     29 <br>
     30 <b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
     31 <b>  int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
     32 <b>  PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
     33 <br>
     34 <br>
     35 <b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
     36 <br>
     37 <br>
     38 <b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
     39 <br>
     40 <br>
     41 If you are running an application that uses a large number of regular
     42 expression patterns, it may be useful to store them in a precompiled form
     43 instead of having to compile them every time the application is run. However,
     44 if you are using the just-in-time optimization feature, it is not possible to
     45 save and reload the JIT data, because it is position-dependent. The host on
     46 which the patterns are reloaded must be running the same version of PCRE2, with
     47 the same code unit width, and must also have the same endianness, pointer width
     48 and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using
     49 PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be
     50 reloaded using the 8-bit library.
     51 </P>
     52 <br><a name="SEC2" href="#TOC1">SECURITY CONCERNS</a><br>
     53 <P>
     54 The facility for saving and restoring compiled patterns is intended for use
     55 within individual applications. As such, the data supplied to
     56 <b>pcre2_serialize_decode()</b> is expected to be trusted data, not data from
     57 arbitrary external sources. There is only some simple consistency checking, not
     58 complete validation of what is being re-loaded.
     59 </P>
     60 <br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
     61 <P>
     62 Before compiled patterns can be saved they must be serialized, that is,
     63 converted to a stream of bytes. A single byte stream may contain any number of
     64 compiled patterns, but they must all use the same character tables. A single
     65 copy of the tables is included in the byte stream (its size is 1088 bytes). For
     66 more details of character tables, see the
     67 <a href="pcre2api.html#localesupport">section on locale support</a>
     68 in the
     69 <a href="pcre2api.html"><b>pcre2api</b></a>
     70 documentation.
     71 </P>
     72 <P>
     73 The function <b>pcre2_serialize_encode()</b> creates a serialized byte stream
     74 from a list of compiled patterns. Its first two arguments specify the list,
     75 being a pointer to a vector of pointers to compiled patterns, and the length of
     76 the vector. The third and fourth arguments point to variables which are set to
     77 point to the created byte stream and its length, respectively. The final
     78 argument is a pointer to a general context, which can be used to specify custom
     79 memory mangagement functions. If this argument is NULL, <b>malloc()</b> is used
     80 to obtain memory for the byte stream. The yield of the function is the number
     81 of serialized patterns, or one of the following negative error codes:
     82 <pre>
     83   PCRE2_ERROR_BADDATA      the number of patterns is zero or less
     84   PCRE2_ERROR_BADMAGIC     mismatch of id bytes in one of the patterns
     85   PCRE2_ERROR_MEMORY       memory allocation failed
     86   PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same tables
     87   PCRE2_ERROR_NULL         the 1st, 3rd, or 4th argument is NULL
     88 </pre>
     89 PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
     90 that a slot in the vector does not point to a compiled pattern.
     91 </P>
     92 <P>
     93 Once a set of patterns has been serialized you can save the data in any
     94 appropriate manner. Here is sample code that compiles two patterns and writes
     95 them to a file. It assumes that the variable <i>fd</i> refers to a file that is
     96 open for output. The error checking that should be present in a real
     97 application has been omitted for simplicity.
     98 <pre>
     99   int errorcode;
    100   uint8_t *bytes;
    101   PCRE2_SIZE erroroffset;
    102   PCRE2_SIZE bytescount;
    103   pcre2_code *list_of_codes[2];
    104   list_of_codes[0] = pcre2_compile("first pattern",
    105     PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
    106   list_of_codes[1] = pcre2_compile("second pattern",
    107     PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
    108   errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
    109     &bytescount, NULL);
    110   errorcode = fwrite(bytes, 1, bytescount, fd);
    111 </pre>
    112 Note that the serialized data is binary data that may contain any of the 256
    113 possible byte values. On systems that make a distinction between binary and
    114 non-binary data, be sure that the file is opened for binary output.
    115 </P>
    116 <P>
    117 Serializing a set of patterns leaves the original data untouched, so they can
    118 still be used for matching. Their memory must eventually be freed in the usual
    119 way by calling <b>pcre2_code_free()</b>. When you have finished with the byte
    120 stream, it too must be freed by calling <b>pcre2_serialize_free()</b>.
    121 </P>
    122 <br><a name="SEC4" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
    123 <P>
    124 In order to re-use a set of saved patterns you must first make the serialized
    125 byte stream available in main memory (for example, by reading from a file). The
    126 management of this memory block is up to the application. You can use the
    127 <b>pcre2_serialize_get_number_of_codes()</b> function to find out how many
    128 compiled patterns are in the serialized data without actually decoding the
    129 patterns:
    130 <pre>
    131   uint8_t *bytes = &#60;serialized data&#62;;
    132   int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
    133 </pre>
    134 The <b>pcre2_serialize_decode()</b> function reads a byte stream and recreates
    135 the compiled patterns in new memory blocks, setting pointers to them in a
    136 vector. The first two arguments are a pointer to a suitable vector and its
    137 length, and the third argument points to a byte stream. The final argument is a
    138 pointer to a general context, which can be used to specify custom memory
    139 mangagement functions for the decoded patterns. If this argument is NULL,
    140 <b>malloc()</b> and <b>free()</b> are used. After deserialization, the byte
    141 stream is no longer needed and can be discarded.
    142 <pre>
    143   int32_t number_of_codes;
    144   pcre2_code *list_of_codes[2];
    145   uint8_t *bytes = &#60;serialized data&#62;;
    146   int32_t number_of_codes =
    147     pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
    148 </pre>
    149 If the vector is not large enough for all the patterns in the byte stream, it
    150 is filled with those that fit, and the remainder are ignored. The yield of the
    151 function is the number of decoded patterns, or one of the following negative
    152 error codes:
    153 <pre>
    154   PCRE2_ERROR_BADDATA    second argument is zero or less
    155   PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
    156   PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2 version
    157   PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
    158   PCRE2_ERROR_MEMORY     memory allocation failed
    159   PCRE2_ERROR_NULL       first or third argument is NULL
    160 </pre>
    161 PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
    162 on a system with different endianness.
    163 </P>
    164 <P>
    165 Decoded patterns can be used for matching in the usual way, and must be freed
    166 by calling <b>pcre2_code_free()</b>. However, be aware that there is a potential
    167 race issue if you are using multiple patterns that were decoded from a single
    168 byte stream in a multithreaded application. A single copy of the character
    169 tables is used by all the decoded patterns and a reference count is used to
    170 arrange for its memory to be automatically freed when the last pattern is
    171 freed, but there is no locking on this reference count. Therefore, if you want
    172 to call <b>pcre2_code_free()</b> for these patterns in different threads, you
    173 must arrange your own locking, and ensure that <b>pcre2_code_free()</b> cannot
    174 be called by two threads at the same time.
    175 </P>
    176 <P>
    177 If a pattern was processed by <b>pcre2_jit_compile()</b> before being
    178 serialized, the JIT data is discarded and so is no longer available after a
    179 save/restore cycle. You can, however, process a restored pattern with
    180 <b>pcre2_jit_compile()</b> if you wish.
    181 </P>
    182 <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
    183 <P>
    184 Philip Hazel
    185 <br>
    186 University Computing Service
    187 <br>
    188 Cambridge, England.
    189 <br>
    190 </P>
    191 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
    192 <P>
    193 Last updated: 24 May 2016
    194 <br>
    195 Copyright &copy; 1997-2016 University of Cambridge.
    196 <br>
    197 <p>
    198 Return to the <a href="index.html">PCRE2 index page</a>.
    199 </p>
    200