1 <html> 2 <head> 3 <title>pcre2serialize specification</title> 4 </head> 5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> 6 <h1>pcre2serialize man page</h1> 7 <p> 8 Return to the <a href="index.html">PCRE2 index page</a>. 9 </p> 10 <p> 11 This page is part of the PCRE2 HTML documentation. It was generated 12 automatically from the original man page. If there is any nonsense in it, 13 please consult the man page, in case the conversion went wrong. 14 <br> 15 <ul> 16 <li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a> 17 <li><a name="TOC2" href="#SEC2">SECURITY CONCERNS</a> 18 <li><a name="TOC3" href="#SEC3">SAVING COMPILED PATTERNS</a> 19 <li><a name="TOC4" href="#SEC4">RE-USING PRECOMPILED PATTERNS</a> 20 <li><a name="TOC5" href="#SEC5">AUTHOR</a> 21 <li><a name="TOC6" href="#SEC6">REVISION</a> 22 </ul> 23 <br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br> 24 <P> 25 <b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b> 26 <b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b> 27 <b> pcre2_general_context *<i>gcontext</i>);</b> 28 <br> 29 <br> 30 <b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b> 31 <b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b> 32 <b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b> 33 <br> 34 <br> 35 <b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b> 36 <br> 37 <br> 38 <b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b> 39 <br> 40 <br> 41 If you are running an application that uses a large number of regular 42 expression patterns, it may be useful to store them in a precompiled form 43 instead of having to compile them every time the application is run. However, 44 if you are using the just-in-time optimization feature, it is not possible to 45 save and reload the JIT data, because it is position-dependent. The host on 46 which the patterns are reloaded must be running the same version of PCRE2, with 47 the same code unit width, and must also have the same endianness, pointer width 48 and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using 49 PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be 50 reloaded using the 8-bit library. 51 </P> 52 <br><a name="SEC2" href="#TOC1">SECURITY CONCERNS</a><br> 53 <P> 54 The facility for saving and restoring compiled patterns is intended for use 55 within individual applications. As such, the data supplied to 56 <b>pcre2_serialize_decode()</b> is expected to be trusted data, not data from 57 arbitrary external sources. There is only some simple consistency checking, not 58 complete validation of what is being re-loaded. 59 </P> 60 <br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br> 61 <P> 62 Before compiled patterns can be saved they must be serialized, that is, 63 converted to a stream of bytes. A single byte stream may contain any number of 64 compiled patterns, but they must all use the same character tables. A single 65 copy of the tables is included in the byte stream (its size is 1088 bytes). For 66 more details of character tables, see the 67 <a href="pcre2api.html#localesupport">section on locale support</a> 68 in the 69 <a href="pcre2api.html"><b>pcre2api</b></a> 70 documentation. 71 </P> 72 <P> 73 The function <b>pcre2_serialize_encode()</b> creates a serialized byte stream 74 from a list of compiled patterns. Its first two arguments specify the list, 75 being a pointer to a vector of pointers to compiled patterns, and the length of 76 the vector. The third and fourth arguments point to variables which are set to 77 point to the created byte stream and its length, respectively. The final 78 argument is a pointer to a general context, which can be used to specify custom 79 memory mangagement functions. If this argument is NULL, <b>malloc()</b> is used 80 to obtain memory for the byte stream. The yield of the function is the number 81 of serialized patterns, or one of the following negative error codes: 82 <pre> 83 PCRE2_ERROR_BADDATA the number of patterns is zero or less 84 PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns 85 PCRE2_ERROR_MEMORY memory allocation failed 86 PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables 87 PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL 88 </pre> 89 PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or 90 that a slot in the vector does not point to a compiled pattern. 91 </P> 92 <P> 93 Once a set of patterns has been serialized you can save the data in any 94 appropriate manner. Here is sample code that compiles two patterns and writes 95 them to a file. It assumes that the variable <i>fd</i> refers to a file that is 96 open for output. The error checking that should be present in a real 97 application has been omitted for simplicity. 98 <pre> 99 int errorcode; 100 uint8_t *bytes; 101 PCRE2_SIZE erroroffset; 102 PCRE2_SIZE bytescount; 103 pcre2_code *list_of_codes[2]; 104 list_of_codes[0] = pcre2_compile("first pattern", 105 PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL); 106 list_of_codes[1] = pcre2_compile("second pattern", 107 PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL); 108 errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes, 109 &bytescount, NULL); 110 errorcode = fwrite(bytes, 1, bytescount, fd); 111 </pre> 112 Note that the serialized data is binary data that may contain any of the 256 113 possible byte values. On systems that make a distinction between binary and 114 non-binary data, be sure that the file is opened for binary output. 115 </P> 116 <P> 117 Serializing a set of patterns leaves the original data untouched, so they can 118 still be used for matching. Their memory must eventually be freed in the usual 119 way by calling <b>pcre2_code_free()</b>. When you have finished with the byte 120 stream, it too must be freed by calling <b>pcre2_serialize_free()</b>. 121 </P> 122 <br><a name="SEC4" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br> 123 <P> 124 In order to re-use a set of saved patterns you must first make the serialized 125 byte stream available in main memory (for example, by reading from a file). The 126 management of this memory block is up to the application. You can use the 127 <b>pcre2_serialize_get_number_of_codes()</b> function to find out how many 128 compiled patterns are in the serialized data without actually decoding the 129 patterns: 130 <pre> 131 uint8_t *bytes = <serialized data>; 132 int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes); 133 </pre> 134 The <b>pcre2_serialize_decode()</b> function reads a byte stream and recreates 135 the compiled patterns in new memory blocks, setting pointers to them in a 136 vector. The first two arguments are a pointer to a suitable vector and its 137 length, and the third argument points to a byte stream. The final argument is a 138 pointer to a general context, which can be used to specify custom memory 139 mangagement functions for the decoded patterns. If this argument is NULL, 140 <b>malloc()</b> and <b>free()</b> are used. After deserialization, the byte 141 stream is no longer needed and can be discarded. 142 <pre> 143 int32_t number_of_codes; 144 pcre2_code *list_of_codes[2]; 145 uint8_t *bytes = <serialized data>; 146 int32_t number_of_codes = 147 pcre2_serialize_decode(list_of_codes, 2, bytes, NULL); 148 </pre> 149 If the vector is not large enough for all the patterns in the byte stream, it 150 is filled with those that fit, and the remainder are ignored. The yield of the 151 function is the number of decoded patterns, or one of the following negative 152 error codes: 153 <pre> 154 PCRE2_ERROR_BADDATA second argument is zero or less 155 PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data 156 PCRE2_ERROR_BADMODE mismatch of code unit size or PCRE2 version 157 PCRE2_ERROR_BADSERIALIZEDDATA other sanity check failure 158 PCRE2_ERROR_MEMORY memory allocation failed 159 PCRE2_ERROR_NULL first or third argument is NULL 160 </pre> 161 PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled 162 on a system with different endianness. 163 </P> 164 <P> 165 Decoded patterns can be used for matching in the usual way, and must be freed 166 by calling <b>pcre2_code_free()</b>. However, be aware that there is a potential 167 race issue if you are using multiple patterns that were decoded from a single 168 byte stream in a multithreaded application. A single copy of the character 169 tables is used by all the decoded patterns and a reference count is used to 170 arrange for its memory to be automatically freed when the last pattern is 171 freed, but there is no locking on this reference count. Therefore, if you want 172 to call <b>pcre2_code_free()</b> for these patterns in different threads, you 173 must arrange your own locking, and ensure that <b>pcre2_code_free()</b> cannot 174 be called by two threads at the same time. 175 </P> 176 <P> 177 If a pattern was processed by <b>pcre2_jit_compile()</b> before being 178 serialized, the JIT data is discarded and so is no longer available after a 179 save/restore cycle. You can, however, process a restored pattern with 180 <b>pcre2_jit_compile()</b> if you wish. 181 </P> 182 <br><a name="SEC5" href="#TOC1">AUTHOR</a><br> 183 <P> 184 Philip Hazel 185 <br> 186 University Computing Service 187 <br> 188 Cambridge, England. 189 <br> 190 </P> 191 <br><a name="SEC6" href="#TOC1">REVISION</a><br> 192 <P> 193 Last updated: 24 May 2016 194 <br> 195 Copyright © 1997-2016 University of Cambridge. 196 <br> 197 <p> 198 Return to the <a href="index.html">PCRE2 index page</a>. 199 </p> 200