Home | History | Annotate | Download | only in html
      1 <html>
      2 <head>
      3 <title>pcre2api specification</title>
      4 </head>
      5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
      6 <h1>pcre2api man page</h1>
      7 <p>
      8 Return to the <a href="index.html">PCRE2 index page</a>.
      9 </p>
     10 <p>
     11 This page is part of the PCRE2 HTML documentation. It was generated
     12 automatically from the original man page. If there is any nonsense in it,
     13 please consult the man page, in case the conversion went wrong.
     14 <br>
     15 <ul>
     16 <li><a name="TOC1" href="#SEC1">PCRE2 NATIVE API BASIC FUNCTIONS</a>
     17 <li><a name="TOC2" href="#SEC2">PCRE2 NATIVE API AUXILIARY MATCH FUNCTIONS</a>
     18 <li><a name="TOC3" href="#SEC3">PCRE2 NATIVE API GENERAL CONTEXT FUNCTIONS</a>
     19 <li><a name="TOC4" href="#SEC4">PCRE2 NATIVE API COMPILE CONTEXT FUNCTIONS</a>
     20 <li><a name="TOC5" href="#SEC5">PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS</a>
     21 <li><a name="TOC6" href="#SEC6">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a>
     22 <li><a name="TOC7" href="#SEC7">PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION</a>
     23 <li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
     24 <li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a>
     25 <li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
     26 <li><a name="TOC11" href="#SEC11">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
     27 <li><a name="TOC12" href="#SEC12">PCRE2 API OVERVIEW</a>
     28 <li><a name="TOC13" href="#SEC13">STRING LENGTHS AND OFFSETS</a>
     29 <li><a name="TOC14" href="#SEC14">NEWLINES</a>
     30 <li><a name="TOC15" href="#SEC15">MULTITHREADING</a>
     31 <li><a name="TOC16" href="#SEC16">PCRE2 CONTEXTS</a>
     32 <li><a name="TOC17" href="#SEC17">CHECKING BUILD-TIME OPTIONS</a>
     33 <li><a name="TOC18" href="#SEC18">COMPILING A PATTERN</a>
     34 <li><a name="TOC19" href="#SEC19">COMPILATION ERROR CODES</a>
     35 <li><a name="TOC20" href="#SEC20">JUST-IN-TIME (JIT) COMPILATION</a>
     36 <li><a name="TOC21" href="#SEC21">LOCALE SUPPORT</a>
     37 <li><a name="TOC22" href="#SEC22">INFORMATION ABOUT A COMPILED PATTERN</a>
     38 <li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A PATTERN'S CALLOUTS</a>
     39 <li><a name="TOC24" href="#SEC24">SERIALIZATION AND PRECOMPILING</a>
     40 <li><a name="TOC25" href="#SEC25">THE MATCH DATA BLOCK</a>
     41 <li><a name="TOC26" href="#SEC26">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
     42 <li><a name="TOC27" href="#SEC27">NEWLINE HANDLING WHEN MATCHING</a>
     43 <li><a name="TOC28" href="#SEC28">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
     44 <li><a name="TOC29" href="#SEC29">OTHER INFORMATION ABOUT A MATCH</a>
     45 <li><a name="TOC30" href="#SEC30">ERROR RETURNS FROM <b>pcre2_match()</b></a>
     46 <li><a name="TOC31" href="#SEC31">OBTAINING A TEXTUAL ERROR MESSAGE</a>
     47 <li><a name="TOC32" href="#SEC32">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
     48 <li><a name="TOC33" href="#SEC33">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
     49 <li><a name="TOC34" href="#SEC34">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
     50 <li><a name="TOC35" href="#SEC35">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
     51 <li><a name="TOC36" href="#SEC36">DUPLICATE SUBPATTERN NAMES</a>
     52 <li><a name="TOC37" href="#SEC37">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
     53 <li><a name="TOC38" href="#SEC38">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
     54 <li><a name="TOC39" href="#SEC39">SEE ALSO</a>
     55 <li><a name="TOC40" href="#SEC40">AUTHOR</a>
     56 <li><a name="TOC41" href="#SEC41">REVISION</a>
     57 </ul>
     58 <P>
     59 <b>#include &#60;pcre2.h&#62;</b>
     60 <br>
     61 <br>
     62 PCRE2 is a new API for PCRE. This document contains a description of all its
     63 functions. See the
     64 <a href="pcre2.html"><b>pcre2</b></a>
     65 document for an overview of all the PCRE2 documentation.
     66 </P>
     67 <br><a name="SEC1" href="#TOC1">PCRE2 NATIVE API BASIC FUNCTIONS</a><br>
     68 <P>
     69 <b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
     70 <b>  uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
     71 <b>  pcre2_compile_context *<i>ccontext</i>);</b>
     72 <br>
     73 <br>
     74 <b>void pcre2_code_free(pcre2_code *<i>code</i>);</b>
     75 <br>
     76 <br>
     77 <b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
     78 <b>  pcre2_general_context *<i>gcontext</i>);</b>
     79 <br>
     80 <br>
     81 <b>pcre2_match_data *pcre2_match_data_create_from_pattern(</b>
     82 <b>  const pcre2_code *<i>code</i>, pcre2_general_context *<i>gcontext</i>);</b>
     83 <br>
     84 <br>
     85 <b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
     86 <b>  PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
     87 <b>  uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b>
     88 <b>  pcre2_match_context *<i>mcontext</i>);</b>
     89 <br>
     90 <br>
     91 <b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
     92 <b>  PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
     93 <b>  uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b>
     94 <b>  pcre2_match_context *<i>mcontext</i>,</b>
     95 <b>  int *<i>workspace</i>, PCRE2_SIZE <i>wscount</i>);</b>
     96 <br>
     97 <br>
     98 <b>void pcre2_match_data_free(pcre2_match_data *<i>match_data</i>);</b>
     99 </P>
    100 <br><a name="SEC2" href="#TOC1">PCRE2 NATIVE API AUXILIARY MATCH FUNCTIONS</a><br>
    101 <P>
    102 <b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
    103 <br>
    104 <br>
    105 <b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
    106 <br>
    107 <br>
    108 <b>PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *<i>match_data</i>);</b>
    109 <br>
    110 <br>
    111 <b>PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *<i>match_data</i>);</b>
    112 </P>
    113 <br><a name="SEC3" href="#TOC1">PCRE2 NATIVE API GENERAL CONTEXT FUNCTIONS</a><br>
    114 <P>
    115 <b>pcre2_general_context *pcre2_general_context_create(</b>
    116 <b>  void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
    117 <b>  void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
    118 <br>
    119 <br>
    120 <b>pcre2_general_context *pcre2_general_context_copy(</b>
    121 <b>  pcre2_general_context *<i>gcontext</i>);</b>
    122 <br>
    123 <br>
    124 <b>void pcre2_general_context_free(pcre2_general_context *<i>gcontext</i>);</b>
    125 </P>
    126 <br><a name="SEC4" href="#TOC1">PCRE2 NATIVE API COMPILE CONTEXT FUNCTIONS</a><br>
    127 <P>
    128 <b>pcre2_compile_context *pcre2_compile_context_create(</b>
    129 <b>  pcre2_general_context *<i>gcontext</i>);</b>
    130 <br>
    131 <br>
    132 <b>pcre2_compile_context *pcre2_compile_context_copy(</b>
    133 <b>  pcre2_compile_context *<i>ccontext</i>);</b>
    134 <br>
    135 <br>
    136 <b>void pcre2_compile_context_free(pcre2_compile_context *<i>ccontext</i>);</b>
    137 <br>
    138 <br>
    139 <b>int pcre2_set_bsr(pcre2_compile_context *<i>ccontext</i>,</b>
    140 <b>  uint32_t <i>value</i>);</b>
    141 <br>
    142 <br>
    143 <b>int pcre2_set_character_tables(pcre2_compile_context *<i>ccontext</i>,</b>
    144 <b>  const unsigned char *<i>tables</i>);</b>
    145 <br>
    146 <br>
    147 <b>int pcre2_set_max_pattern_length(pcre2_compile_context *<i>ccontext</i>,</b>
    148 <b>  PCRE2_SIZE <i>value</i>);</b>
    149 <br>
    150 <br>
    151 <b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b>
    152 <b>  uint32_t <i>value</i>);</b>
    153 <br>
    154 <br>
    155 <b>int pcre2_set_parens_nest_limit(pcre2_compile_context *<i>ccontext</i>,</b>
    156 <b>  uint32_t <i>value</i>);</b>
    157 <br>
    158 <br>
    159 <b>int pcre2_set_compile_recursion_guard(pcre2_compile_context *<i>ccontext</i>,</b>
    160 <b>  int (*<i>guard_function</i>)(uint32_t, void *), void *<i>user_data</i>);</b>
    161 </P>
    162 <br><a name="SEC5" href="#TOC1">PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS</a><br>
    163 <P>
    164 <b>pcre2_match_context *pcre2_match_context_create(</b>
    165 <b>  pcre2_general_context *<i>gcontext</i>);</b>
    166 <br>
    167 <br>
    168 <b>pcre2_match_context *pcre2_match_context_copy(</b>
    169 <b>  pcre2_match_context *<i>mcontext</i>);</b>
    170 <br>
    171 <br>
    172 <b>void pcre2_match_context_free(pcre2_match_context *<i>mcontext</i>);</b>
    173 <br>
    174 <br>
    175 <b>int pcre2_set_callout(pcre2_match_context *<i>mcontext</i>,</b>
    176 <b>  int (*<i>callout_function</i>)(pcre2_callout_block *, void *),</b>
    177 <b>  void *<i>callout_data</i>);</b>
    178 <br>
    179 <br>
    180 <b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
    181 <b>  uint32_t <i>value</i>);</b>
    182 <br>
    183 <br>
    184 <b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
    185 <b>  PCRE2_SIZE <i>value</i>);</b>
    186 <br>
    187 <br>
    188 <b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
    189 <b>  uint32_t <i>value</i>);</b>
    190 <br>
    191 <br>
    192 <b>int pcre2_set_recursion_memory_management(</b>
    193 <b>  pcre2_match_context *<i>mcontext</i>,</b>
    194 <b>  void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
    195 <b>  void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
    196 </P>
    197 <br><a name="SEC6" href="#TOC1">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a><br>
    198 <P>
    199 <b>int pcre2_substring_copy_byname(pcre2_match_data *<i>match_data</i>,</b>
    200 <b>  PCRE2_SPTR <i>name</i>, PCRE2_UCHAR *<i>buffer</i>, PCRE2_SIZE *<i>bufflen</i>);</b>
    201 <br>
    202 <br>
    203 <b>int pcre2_substring_copy_bynumber(pcre2_match_data *<i>match_data</i>,</b>
    204 <b>  uint32_t <i>number</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
    205 <b>  PCRE2_SIZE *<i>bufflen</i>);</b>
    206 <br>
    207 <br>
    208 <b>void pcre2_substring_free(PCRE2_UCHAR *<i>buffer</i>);</b>
    209 <br>
    210 <br>
    211 <b>int pcre2_substring_get_byname(pcre2_match_data *<i>match_data</i>,</b>
    212 <b>  PCRE2_SPTR <i>name</i>, PCRE2_UCHAR **<i>bufferptr</i>, PCRE2_SIZE *<i>bufflen</i>);</b>
    213 <br>
    214 <br>
    215 <b>int pcre2_substring_get_bynumber(pcre2_match_data *<i>match_data</i>,</b>
    216 <b>  uint32_t <i>number</i>, PCRE2_UCHAR **<i>bufferptr</i>,</b>
    217 <b>  PCRE2_SIZE *<i>bufflen</i>);</b>
    218 <br>
    219 <br>
    220 <b>int pcre2_substring_length_byname(pcre2_match_data *<i>match_data</i>,</b>
    221 <b>  PCRE2_SPTR <i>name</i>, PCRE2_SIZE *<i>length</i>);</b>
    222 <br>
    223 <br>
    224 <b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
    225 <b>  uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
    226 <br>
    227 <br>
    228 <b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
    229 <b>  PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
    230 <br>
    231 <br>
    232 <b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
    233 <b>  PCRE2_SPTR <i>name</i>);</b>
    234 <br>
    235 <br>
    236 <b>void pcre2_substring_list_free(PCRE2_SPTR *<i>list</i>);</b>
    237 <br>
    238 <br>
    239 <b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
    240 <b>"  PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
    241 </P>
    242 <br><a name="SEC7" href="#TOC1">PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION</a><br>
    243 <P>
    244 <b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
    245 <b>  PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
    246 <b>  uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b>
    247 <b>  pcre2_match_context *<i>mcontext</i>, PCRE2_SPTR \fIreplacementzfP,</b>
    248 <b>  PCRE2_SIZE <i>rlength</i>, PCRE2_UCHAR *<i>outputbuffer</i>,</b>
    249 <b>  PCRE2_SIZE *<i>outlengthptr</i>);</b>
    250 </P>
    251 <br><a name="SEC8" href="#TOC1">PCRE2 NATIVE API JIT FUNCTIONS</a><br>
    252 <P>
    253 <b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
    254 <br>
    255 <br>
    256 <b>int pcre2_jit_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
    257 <b>  PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
    258 <b>  uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b>
    259 <b>  pcre2_match_context *<i>mcontext</i>);</b>
    260 <br>
    261 <br>
    262 <b>void pcre2_jit_free_unused_memory(pcre2_general_context *<i>gcontext</i>);</b>
    263 <br>
    264 <br>
    265 <b>pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE <i>startsize</i>,</b>
    266 <b>  PCRE2_SIZE <i>maxsize</i>, pcre2_general_context *<i>gcontext</i>);</b>
    267 <br>
    268 <br>
    269 <b>void pcre2_jit_stack_assign(pcre2_match_context *<i>mcontext</i>,</b>
    270 <b>  pcre2_jit_callback <i>callback_function</i>, void *<i>callback_data</i>);</b>
    271 <br>
    272 <br>
    273 <b>void pcre2_jit_stack_free(pcre2_jit_stack *<i>jit_stack</i>);</b>
    274 </P>
    275 <br><a name="SEC9" href="#TOC1">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a><br>
    276 <P>
    277 <b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
    278 <b>  int32_t <i>number_of_codes</i>, const uint8_t *<i>bytes</i>,</b>
    279 <b>  pcre2_general_context *<i>gcontext</i>);</b>
    280 <br>
    281 <br>
    282 <b>int32_t pcre2_serialize_encode(const pcre2_code **<i>codes</i>,</b>
    283 <b>  int32_t <i>number_of_codes</i>, uint8_t **<i>serialized_bytes</i>,</b>
    284 <b>  PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
    285 <br>
    286 <br>
    287 <b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
    288 <br>
    289 <br>
    290 <b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
    291 </P>
    292 <br><a name="SEC10" href="#TOC1">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a><br>
    293 <P>
    294 <b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b>
    295 <br>
    296 <br>
    297 <b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
    298 <b>  PCRE2_SIZE <i>bufflen</i>);</b>
    299 <br>
    300 <br>
    301 <b>const unsigned char *pcre2_maketables(pcre2_general_context *<i>gcontext</i>);</b>
    302 <br>
    303 <br>
    304 <b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
    305 <br>
    306 <br>
    307 <b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b>
    308 <b>  int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b>
    309 <b>  void *<i>user_data</i>);</b>
    310 <br>
    311 <br>
    312 <b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
    313 </P>
    314 <br><a name="SEC11" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
    315 <P>
    316 There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
    317 units, respectively. However, there is just one header file, <b>pcre2.h</b>.
    318 This contains the function prototypes and other definitions for all three
    319 libraries. One, two, or all three can be installed simultaneously. On Unix-like
    320 systems the libraries are called <b>libpcre2-8</b>, <b>libpcre2-16</b>, and
    321 <b>libpcre2-32</b>, and they can also co-exist with the original PCRE libraries.
    322 </P>
    323 <P>
    324 Character strings are passed to and from a PCRE2 library as a sequence of
    325 unsigned integers in code units of the appropriate width. Every PCRE2 function
    326 comes in three different forms, one for each library, for example:
    327 <pre>
    328   <b>pcre2_compile_8()</b>
    329   <b>pcre2_compile_16()</b>
    330   <b>pcre2_compile_32()</b>
    331 </pre>
    332 There are also three different sets of data types:
    333 <pre>
    334   <b>PCRE2_UCHAR8, PCRE2_UCHAR16, PCRE2_UCHAR32</b>
    335   <b>PCRE2_SPTR8,  PCRE2_SPTR16,  PCRE2_SPTR32</b>
    336 </pre>
    337 The UCHAR types define unsigned code units of the appropriate widths. For
    338 example, PCRE2_UCHAR16 is usually defined as `uint16_t'. The SPTR types are
    339 constant pointers to the equivalent UCHAR types, that is, they are pointers to
    340 vectors of unsigned code units.
    341 </P>
    342 <P>
    343 Many applications use only one code unit width. For their convenience, macros
    344 are defined whose names are the generic forms such as <b>pcre2_compile()</b> and
    345 PCRE2_SPTR. These macros use the value of the macro PCRE2_CODE_UNIT_WIDTH to
    346 generate the appropriate width-specific function and macro names.
    347 PCRE2_CODE_UNIT_WIDTH is not defined by default. An application must define it
    348 to be 8, 16, or 32 before including <b>pcre2.h</b> in order to make use of the
    349 generic names.
    350 </P>
    351 <P>
    352 Applications that use more than one code unit width can be linked with more
    353 than one PCRE2 library, but must define PCRE2_CODE_UNIT_WIDTH to be 0 before
    354 including <b>pcre2.h</b>, and then use the real function names. Any code that is
    355 to be included in an environment where the value of PCRE2_CODE_UNIT_WIDTH is
    356 unknown should also use the real function names. (Unfortunately, it is not
    357 possible in C code to save and restore the value of a macro.)
    358 </P>
    359 <P>
    360 If PCRE2_CODE_UNIT_WIDTH is not defined before including <b>pcre2.h</b>, a
    361 compiler error occurs.
    362 </P>
    363 <P>
    364 When using multiple libraries in an application, you must take care when
    365 processing any particular pattern to use only functions from a single library.
    366 For example, if you want to run a match using a pattern that was compiled with
    367 <b>pcre2_compile_16()</b>, you must do so with <b>pcre2_match_16()</b>, not
    368 <b>pcre2_match_8()</b>.
    369 </P>
    370 <P>
    371 In the function summaries above, and in the rest of this document and other
    372 PCRE2 documents, functions and data types are described using their generic
    373 names, without the 8, 16, or 32 suffix.
    374 </P>
    375 <br><a name="SEC12" href="#TOC1">PCRE2 API OVERVIEW</a><br>
    376 <P>
    377 PCRE2 has its own native API, which is described in this document. There are
    378 also some wrapper functions for the 8-bit library that correspond to the
    379 POSIX regular expression API, but they do not give access to all the
    380 functionality. They are described in the
    381 <a href="pcre2posix.html"><b>pcre2posix</b></a>
    382 documentation. Both these APIs define a set of C function calls.
    383 </P>
    384 <P>
    385 The native API C data types, function prototypes, option values, and error
    386 codes are defined in the header file <b>pcre2.h</b>, which contains definitions
    387 of PCRE2_MAJOR and PCRE2_MINOR, the major and minor release numbers for the
    388 library. Applications can use these to include support for different releases
    389 of PCRE2.
    390 </P>
    391 <P>
    392 In a Windows environment, if you want to statically link an application program
    393 against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
    394 <b>pcre2.h</b>.
    395 </P>
    396 <P>
    397 The functions <b>pcre2_compile()</b>, and <b>pcre2_match()</b> are used for
    398 compiling and matching regular expressions in a Perl-compatible manner. A
    399 sample program that demonstrates the simplest way of using them is provided in
    400 the file called <i>pcre2demo.c</i> in the PCRE2 source distribution. A listing
    401 of this program is given in the
    402 <a href="pcre2demo.html"><b>pcre2demo</b></a>
    403 documentation, and the
    404 <a href="pcre2sample.html"><b>pcre2sample</b></a>
    405 documentation describes how to compile and run it.
    406 </P>
    407 <P>
    408 Just-in-time compiler support is an optional feature of PCRE2 that can be built
    409 in appropriate hardware environments. It greatly speeds up the matching
    410 performance of many patterns. Programs can request that it be used if
    411 available, by calling <b>pcre2_jit_compile()</b> after a pattern has been
    412 successfully compiled by <b>pcre2_compile()</b>. This does nothing if JIT
    413 support is not available.
    414 </P>
    415 <P>
    416 More complicated programs might need to make use of the specialist functions
    417 <b>pcre2_jit_stack_create()</b>, <b>pcre2_jit_stack_free()</b>, and
    418 <b>pcre2_jit_stack_assign()</b> in order to control the JIT code's memory usage.
    419 </P>
    420 <P>
    421 JIT matching is automatically used by <b>pcre2_match()</b> if it is available,
    422 unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
    423 matching, which gives improved performance. The JIT-specific functions are
    424 discussed in the
    425 <a href="pcre2jit.html"><b>pcre2jit</b></a>
    426 documentation.
    427 </P>
    428 <P>
    429 A second matching function, <b>pcre2_dfa_match()</b>, which is not
    430 Perl-compatible, is also provided. This uses a different algorithm for the
    431 matching. The alternative algorithm finds all possible matches (at a given
    432 point in the subject), and scans the subject just once (unless there are
    433 lookbehind assertions). However, this algorithm does not return captured
    434 substrings. A description of the two matching algorithms and their advantages
    435 and disadvantages is given in the
    436 <a href="pcre2matching.html"><b>pcre2matching</b></a>
    437 documentation. There is no JIT support for <b>pcre2_dfa_match()</b>.
    438 </P>
    439 <P>
    440 In addition to the main compiling and matching functions, there are convenience
    441 functions for extracting captured substrings from a subject string that has
    442 been matched by <b>pcre2_match()</b>. They are:
    443 <pre>
    444   <b>pcre2_substring_copy_byname()</b>
    445   <b>pcre2_substring_copy_bynumber()</b>
    446   <b>pcre2_substring_get_byname()</b>
    447   <b>pcre2_substring_get_bynumber()</b>
    448   <b>pcre2_substring_list_get()</b>
    449   <b>pcre2_substring_length_byname()</b>
    450   <b>pcre2_substring_length_bynumber()</b>
    451   <b>pcre2_substring_nametable_scan()</b>
    452   <b>pcre2_substring_number_from_name()</b>
    453 </pre>
    454 <b>pcre2_substring_free()</b> and <b>pcre2_substring_list_free()</b> are also
    455 provided, to free the memory used for extracted strings.
    456 </P>
    457 <P>
    458 The function <b>pcre2_substitute()</b> can be called to match a pattern and
    459 return a copy of the subject string with substitutions for parts that were
    460 matched.
    461 </P>
    462 <P>
    463 Functions whose names begin with <b>pcre2_serialize_</b> are used for saving
    464 compiled patterns on disc or elsewhere, and reloading them later.
    465 </P>
    466 <P>
    467 Finally, there are functions for finding out information about a compiled
    468 pattern (<b>pcre2_pattern_info()</b>) and about the configuration with which
    469 PCRE2 was built (<b>pcre2_config()</b>).
    470 </P>
    471 <P>
    472 Functions with names ending with <b>_free()</b> are used for freeing memory
    473 blocks of various sorts. In all cases, if one of these functions is called with
    474 a NULL argument, it does nothing.
    475 </P>
    476 <br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
    477 <P>
    478 The PCRE2 API uses string lengths and offsets into strings of code units in
    479 several places. These values are always of type PCRE2_SIZE, which is an
    480 unsigned integer type, currently always defined as <i>size_t</i>. The largest
    481 value that can be stored in such a type (that is ~(PCRE2_SIZE)0) is reserved
    482 as a special indicator for zero-terminated strings and unset offsets.
    483 Therefore, the longest string that can be handled is one less than this
    484 maximum.
    485 <a name="newlines"></a></P>
    486 <br><a name="SEC14" href="#TOC1">NEWLINES</a><br>
    487 <P>
    488 PCRE2 supports five different conventions for indicating line breaks in
    489 strings: a single CR (carriage return) character, a single LF (linefeed)
    490 character, the two-character sequence CRLF, any of the three preceding, or any
    491 Unicode newline sequence. The Unicode newline sequences are the three just
    492 mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed,
    493 U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
    494 (paragraph separator, U+2029).
    495 </P>
    496 <P>
    497 Each of the first three conventions is used by at least one operating system as
    498 its standard newline sequence. When PCRE2 is built, a default can be specified.
    499 The default default is LF, which is the Unix standard. However, the newline
    500 convention can be changed by an application when calling <b>pcre2_compile()</b>,
    501 or it can be specified by special text at the start of the pattern itself; this
    502 overrides any other settings. See the
    503 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
    504 page for details of the special character sequences.
    505 </P>
    506 <P>
    507 In the PCRE2 documentation the word "newline" is used to mean "the character or
    508 pair of characters that indicate a line break". The choice of newline
    509 convention affects the handling of the dot, circumflex, and dollar
    510 metacharacters, the handling of #-comments in /x mode, and, when CRLF is a
    511 recognized line ending sequence, the match position advancement for a
    512 non-anchored pattern. There is more detail about this in the
    513 <a href="#matchoptions">section on <b>pcre2_match()</b> options</a>
    514 below.
    515 </P>
    516 <P>
    517 The choice of newline convention does not affect the interpretation of
    518 the \n or \r escape sequences, nor does it affect what \R matches; this has
    519 its own separate convention.
    520 </P>
    521 <br><a name="SEC15" href="#TOC1">MULTITHREADING</a><br>
    522 <P>
    523 In a multithreaded application it is important to keep thread-specific data
    524 separate from data that can be shared between threads. The PCRE2 library code
    525 itself is thread-safe: it contains no static or global variables. The API is
    526 designed to be fairly simple for non-threaded applications while at the same
    527 time ensuring that multithreaded applications can use it.
    528 </P>
    529 <P>
    530 There are several different blocks of data that are used to pass information
    531 between the application and the PCRE2 libraries.
    532 </P>
    533 <br><b>
    534 The compiled pattern
    535 </b><br>
    536 <P>
    537 A pointer to the compiled form of a pattern is returned to the user when
    538 <b>pcre2_compile()</b> is successful. The data in the compiled pattern is fixed,
    539 and does not change when the pattern is matched. Therefore, it is thread-safe,
    540 that is, the same compiled pattern can be used by more than one thread
    541 simultaneously. For example, an application can compile all its patterns at the
    542 start, before forking off multiple threads that use them. However, if the
    543 just-in-time optimization feature is being used, it needs separate memory stack
    544 areas for each thread. See the
    545 <a href="pcre2jit.html"><b>pcre2jit</b></a>
    546 documentation for more details.
    547 </P>
    548 <P>
    549 In a more complicated situation, where patterns are compiled only when they are
    550 first needed, but are still shared between threads, pointers to compiled
    551 patterns must be protected from simultaneous writing by multiple threads, at
    552 least until a pattern has been compiled. The logic can be something like this:
    553 <pre>
    554   Get a read-only (shared) lock (mutex) for pointer
    555   if (pointer == NULL)
    556     {
    557     Get a write (unique) lock for pointer
    558     pointer = pcre2_compile(...
    559     }
    560   Release the lock
    561   Use pointer in pcre2_match()
    562 </pre>
    563 Of course, testing for compilation errors should also be included in the code.
    564 </P>
    565 <P>
    566 If JIT is being used, but the JIT compilation is not being done immediately,
    567 (perhaps waiting to see if the pattern is used often enough) similar logic is
    568 required. JIT compilation updates a pointer within the compiled code block, so
    569 a thread must gain unique write access to the pointer before calling
    570 <b>pcre2_jit_compile()</b>. Alternatively, <b>pcre2_code_copy()</b> can be used
    571 to obtain a private copy of the compiled code.
    572 </P>
    573 <br><b>
    574 Context blocks
    575 </b><br>
    576 <P>
    577 The next main section below introduces the idea of "contexts" in which PCRE2
    578 functions are called. A context is nothing more than a collection of parameters
    579 that control the way PCRE2 operates. Grouping a number of parameters together
    580 in a context is a convenient way of passing them to a PCRE2 function without
    581 using lots of arguments. The parameters that are stored in contexts are in some
    582 sense "advanced features" of the API. Many straightforward applications will
    583 not need to use contexts.
    584 </P>
    585 <P>
    586 In a multithreaded application, if the parameters in a context are values that
    587 are never changed, the same context can be used by all the threads. However, if
    588 any thread needs to change any value in a context, it must make its own
    589 thread-specific copy.
    590 </P>
    591 <br><b>
    592 Match blocks
    593 </b><br>
    594 <P>
    595 The matching functions need a block of memory for working space and for storing
    596 the results of a match. This includes details of what was matched, as well as
    597 additional information such as the name of a (*MARK) setting. Each thread must
    598 provide its own copy of this memory.
    599 </P>
    600 <br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br>
    601 <P>
    602 Some PCRE2 functions have a lot of parameters, many of which are used only by
    603 specialist applications, for example, those that use custom memory management
    604 or non-standard character tables. To keep function argument lists at a
    605 reasonable size, and at the same time to keep the API extensible, "uncommon"
    606 parameters are passed to certain functions in a <b>context</b> instead of
    607 directly. A context is just a block of memory that holds the parameter values.
    608 Applications that do not need to adjust any of the context parameters can pass
    609 NULL when a context pointer is required.
    610 </P>
    611 <P>
    612 There are three different types of context: a general context that is relevant
    613 for several PCRE2 operations, a compile-time context, and a match-time context.
    614 </P>
    615 <br><b>
    616 The general context
    617 </b><br>
    618 <P>
    619 At present, this context just contains pointers to (and data for) external
    620 memory management functions that are called from several places in the PCRE2
    621 library. The context is named `general' rather than specifically `memory'
    622 because in future other fields may be added. If you do not want to supply your
    623 own custom memory management functions, you do not need to bother with a
    624 general context. A general context is created by:
    625 <b>pcre2_general_context *pcre2_general_context_create(</b>
    626 <b>  void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
    627 <b>  void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
    628 <br>
    629 <br>
    630 The two function pointers specify custom memory management functions, whose
    631 prototypes are:
    632 <pre>
    633   <b>void *private_malloc(PCRE2_SIZE, void *);</b>
    634   <b>void  private_free(void *, void *);</b>
    635 </pre>
    636 Whenever code in PCRE2 calls these functions, the final argument is the value
    637 of <i>memory_data</i>. Either of the first two arguments of the creation
    638 function may be NULL, in which case the system memory management functions
    639 <i>malloc()</i> and <i>free()</i> are used. (This is not currently useful, as
    640 there are no other fields in a general context, but in future there might be.)
    641 The <i>private_malloc()</i> function is used (if supplied) to obtain memory for
    642 storing the context, and all three values are saved as part of the context.
    643 </P>
    644 <P>
    645 Whenever PCRE2 creates a data block of any kind, the block contains a pointer
    646 to the <i>free()</i> function that matches the <i>malloc()</i> function that was
    647 used. When the time comes to free the block, this function is called.
    648 </P>
    649 <P>
    650 A general context can be copied by calling:
    651 <b>pcre2_general_context *pcre2_general_context_copy(</b>
    652 <b>  pcre2_general_context *<i>gcontext</i>);</b>
    653 <br>
    654 <br>
    655 The memory used for a general context should be freed by calling:
    656 <b>void pcre2_general_context_free(pcre2_general_context *<i>gcontext</i>);</b>
    657 <a name="compilecontext"></a></P>
    658 <br><b>
    659 The compile context
    660 </b><br>
    661 <P>
    662 A compile context is required if you want to change the default values of any
    663 of the following compile-time parameters:
    664 <pre>
    665   What \R matches (Unicode newlines or CR, LF, CRLF only)
    666   PCRE2's character tables
    667   The newline character sequence
    668   The compile time nested parentheses limit
    669   The maximum length of the pattern string
    670   An external function for stack checking
    671 </pre>
    672 A compile context is also required if you are using custom memory management.
    673 If none of these apply, just pass NULL as the context argument of
    674 <i>pcre2_compile()</i>.
    675 </P>
    676 <P>
    677 A compile context is created, copied, and freed by the following functions:
    678 <b>pcre2_compile_context *pcre2_compile_context_create(</b>
    679 <b>  pcre2_general_context *<i>gcontext</i>);</b>
    680 <br>
    681 <br>
    682 <b>pcre2_compile_context *pcre2_compile_context_copy(</b>
    683 <b>  pcre2_compile_context *<i>ccontext</i>);</b>
    684 <br>
    685 <br>
    686 <b>void pcre2_compile_context_free(pcre2_compile_context *<i>ccontext</i>);</b>
    687 <br>
    688 <br>
    689 A compile context is created with default values for its parameters. These can
    690 be changed by calling the following functions, which return 0 on success, or
    691 PCRE2_ERROR_BADDATA if invalid data is detected.
    692 <b>int pcre2_set_bsr(pcre2_compile_context *<i>ccontext</i>,</b>
    693 <b>  uint32_t <i>value</i>);</b>
    694 <br>
    695 <br>
    696 The value must be PCRE2_BSR_ANYCRLF, to specify that \R matches only CR, LF,
    697 or CRLF, or PCRE2_BSR_UNICODE, to specify that \R matches any Unicode line
    698 ending sequence. The value is used by the JIT compiler and by the two
    699 interpreted matching functions, <i>pcre2_match()</i> and
    700 <i>pcre2_dfa_match()</i>.
    701 <b>int pcre2_set_character_tables(pcre2_compile_context *<i>ccontext</i>,</b>
    702 <b>  const unsigned char *<i>tables</i>);</b>
    703 <br>
    704 <br>
    705 The value must be the result of a call to <i>pcre2_maketables()</i>, whose only
    706 argument is a general context. This function builds a set of character tables
    707 in the current locale.
    708 <b>int pcre2_set_max_pattern_length(pcre2_compile_context *<i>ccontext</i>,</b>
    709 <b>  PCRE2_SIZE <i>value</i>);</b>
    710 <br>
    711 <br>
    712 This sets a maximum length, in code units, for the pattern string that is to be
    713 compiled. If the pattern is longer, an error is generated. This facility is
    714 provided so that applications that accept patterns from external sources can
    715 limit their size. The default is the largest number that a PCRE2_SIZE variable
    716 can hold, which is effectively unlimited.
    717 <b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b>
    718 <b>  uint32_t <i>value</i>);</b>
    719 <br>
    720 <br>
    721 This specifies which characters or character sequences are to be recognized as
    722 newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only),
    723 PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
    724 sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
    725 PCRE2_NEWLINE_ANY (any Unicode newline sequence).
    726 </P>
    727 <P>
    728 When a pattern is compiled with the PCRE2_EXTENDED option, the value of this
    729 parameter affects the recognition of white space and the end of internal
    730 comments starting with #. The value is saved with the compiled pattern for
    731 subsequent use by the JIT compiler and by the two interpreted matching
    732 functions, <i>pcre2_match()</i> and <i>pcre2_dfa_match()</i>.
    733 <b>int pcre2_set_parens_nest_limit(pcre2_compile_context *<i>ccontext</i>,</b>
    734 <b>  uint32_t <i>value</i>);</b>
    735 <br>
    736 <br>
    737 This parameter ajusts the limit, set when PCRE2 is built (default 250), on the
    738 depth of parenthesis nesting in a pattern. This limit stops rogue patterns
    739 using up too much system stack when being compiled.
    740 <b>int pcre2_set_compile_recursion_guard(pcre2_compile_context *<i>ccontext</i>,</b>
    741 <b>  int (*<i>guard_function</i>)(uint32_t, void *), void *<i>user_data</i>);</b>
    742 <br>
    743 <br>
    744 There is at least one application that runs PCRE2 in threads with very limited
    745 system stack, where running out of stack is to be avoided at all costs. The
    746 parenthesis limit above cannot take account of how much stack is actually
    747 available. For a finer control, you can supply a function that is called
    748 whenever <b>pcre2_compile()</b> starts to compile a parenthesized part of a
    749 pattern. This function can check the actual stack size (or anything else that
    750 it wants to, of course).
    751 </P>
    752 <P>
    753 The first argument to the callout function gives the current depth of
    754 nesting, and the second is user data that is set up by the last argument of
    755 <b>pcre2_set_compile_recursion_guard()</b>. The callout function should return
    756 zero if all is well, or non-zero to force an error.
    757 <a name="matchcontext"></a></P>
    758 <br><b>
    759 The match context
    760 </b><br>
    761 <P>
    762 A match context is required if you want to change the default values of any
    763 of the following match-time parameters:
    764 <pre>
    765   A callout function
    766   The offset limit for matching an unanchored pattern
    767   The limit for calling <b>match()</b> (see below)
    768   The limit for calling <b>match()</b> recursively
    769 </pre>
    770 A match context is also required if you are using custom memory management.
    771 If none of these apply, just pass NULL as the context argument of
    772 <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>.
    773 </P>
    774 <P>
    775 A match context is created, copied, and freed by the following functions:
    776 <b>pcre2_match_context *pcre2_match_context_create(</b>
    777 <b>  pcre2_general_context *<i>gcontext</i>);</b>
    778 <br>
    779 <br>
    780 <b>pcre2_match_context *pcre2_match_context_copy(</b>
    781 <b>  pcre2_match_context *<i>mcontext</i>);</b>
    782 <br>
    783 <br>
    784 <b>void pcre2_match_context_free(pcre2_match_context *<i>mcontext</i>);</b>
    785 <br>
    786 <br>
    787 A match context is created with default values for its parameters. These can
    788 be changed by calling the following functions, which return 0 on success, or
    789 PCRE2_ERROR_BADDATA if invalid data is detected.
    790 <b>int pcre2_set_callout(pcre2_match_context *<i>mcontext</i>,</b>
    791 <b>  int (*<i>callout_function</i>)(pcre2_callout_block *, void *),</b>
    792 <b>  void *<i>callout_data</i>);</b>
    793 <br>
    794 <br>
    795 This sets up a "callout" function, which PCRE2 will call at specified points
    796 during a matching operation. Details are given in the
    797 <a href="pcre2callout.html"><b>pcre2callout</b></a>
    798 documentation.
    799 <b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
    800 <b>  PCRE2_SIZE <i>value</i>);</b>
    801 <br>
    802 <br>
    803 The <i>offset_limit</i> parameter limits how far an unanchored search can
    804 advance in the subject string. The default value is PCRE2_UNSET. The
    805 <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> functions return
    806 PCRE2_ERROR_NOMATCH if a match with a starting point before or at the given
    807 offset is not found. For example, if the pattern /abc/ is matched against
    808 "123abc" with an offset limit less than 3, the result is PCRE2_ERROR_NO_MATCH.
    809 A match can never be found if the <i>startoffset</i> argument of
    810 <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> is greater than the offset
    811 limit.
    812 </P>
    813 <P>
    814 When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling
    815 <b>pcre2_compile()</b> so that when JIT is in use, different code can be
    816 compiled. If a match is started with a non-default match limit when
    817 PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
    818 </P>
    819 <P>
    820 The offset limit facility can be used to track progress when searching large
    821 subject strings. See also the PCRE2_FIRSTLINE option, which requires a match to
    822 start within the first line of the subject. If this is set with an offset
    823 limit, a match must occur in the first line and also within the offset limit.
    824 In other words, whichever limit comes first is used.
    825 <b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
    826 <b>  uint32_t <i>value</i>);</b>
    827 <br>
    828 <br>
    829 The <i>match_limit</i> parameter provides a means of preventing PCRE2 from using
    830 up too many resources when processing patterns that are not going to match, but
    831 which have a very large number of possibilities in their search trees. The
    832 classic example is a pattern that uses nested unlimited repeats.
    833 </P>
    834 <P>
    835 Internally, <b>pcre2_match()</b> uses a function called <b>match()</b>, which it
    836 calls repeatedly (sometimes recursively). The limit set by <i>match_limit</i> is
    837 imposed on the number of times this function is called during a match, which
    838 has the effect of limiting the amount of backtracking that can take place. For
    839 patterns that are not anchored, the count restarts from zero for each position
    840 in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>,
    841 which ignores it.
    842 </P>
    843 <P>
    844 When <b>pcre2_match()</b> is called with a pattern that was successfully
    845 processed by <b>pcre2_jit_compile()</b>, the way in which matching is executed
    846 is entirely different. However, there is still the possibility of runaway
    847 matching that goes on for a very long time, and so the <i>match_limit</i> value
    848 is also used in this case (but in a different way) to limit how long the
    849 matching can continue.
    850 </P>
    851 <P>
    852 The default value for the limit can be set when PCRE2 is built; the default
    853 default is 10 million, which handles all but the most extreme cases. If the
    854 limit is exceeded, <b>pcre2_match()</b> returns PCRE2_ERROR_MATCHLIMIT. A value
    855 for the match limit may also be supplied by an item at the start of a pattern
    856 of the form
    857 <pre>
    858   (*LIMIT_MATCH=ddd)
    859 </pre>
    860 where ddd is a decimal number. However, such a setting is ignored unless ddd is
    861 less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
    862 limit is set, less than the default.
    863 <b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
    864 <b>  uint32_t <i>value</i>);</b>
    865 <br>
    866 <br>
    867 The <i>recursion_limit</i> parameter is similar to <i>match_limit</i>, but
    868 instead of limiting the total number of times that <b>match()</b> is called, it
    869 limits the depth of recursion. The recursion depth is a smaller number than the
    870 total number of calls, because not all calls to <b>match()</b> are recursive.
    871 This limit is of use only if it is set smaller than <i>match_limit</i>.
    872 </P>
    873 <P>
    874 Limiting the recursion depth limits the amount of system stack that can be
    875 used, or, when PCRE2 has been compiled to use memory on the heap instead of the
    876 stack, the amount of heap memory that can be used. This limit is not relevant,
    877 and is ignored, when matching is done using JIT compiled code or by the
    878 <b>pcre2_dfa_match()</b> function.
    879 </P>
    880 <P>
    881 The default value for <i>recursion_limit</i> can be set when PCRE2 is built; the
    882 default default is the same value as the default for <i>match_limit</i>. If the
    883 limit is exceeded, <b>pcre2_match()</b> returns PCRE2_ERROR_RECURSIONLIMIT. A
    884 value for the recursion limit may also be supplied by an item at the start of a
    885 pattern of the form
    886 <pre>
    887   (*LIMIT_RECURSION=ddd)
    888 </pre>
    889 where ddd is a decimal number. However, such a setting is ignored unless ddd is
    890 less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
    891 limit is set, less than the default.
    892 <b>int pcre2_set_recursion_memory_management(</b>
    893 <b>  pcre2_match_context *<i>mcontext</i>,</b>
    894 <b>  void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
    895 <b>  void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
    896 <br>
    897 <br>
    898 This function sets up two additional custom memory management functions for use
    899 by <b>pcre2_match()</b> when PCRE2 is compiled to use the heap for remembering
    900 backtracking data, instead of recursive function calls that use the system
    901 stack. There is a discussion about PCRE2's stack usage in the
    902 <a href="pcre2stack.html"><b>pcre2stack</b></a>
    903 documentation. See the
    904 <a href="pcre2build.html"><b>pcre2build</b></a>
    905 documentation for details of how to build PCRE2.
    906 </P>
    907 <P>
    908 Using the heap for recursion is a non-standard way of building PCRE2, for use
    909 in environments that have limited stacks. Because of the greater use of memory
    910 management, <b>pcre2_match()</b> runs more slowly. Functions that are different
    911 to the general custom memory functions are provided so that special-purpose
    912 external code can be used for this case, because the memory blocks are all the
    913 same size. The blocks are retained by <b>pcre2_match()</b> until it is about to
    914 exit so that they can be re-used when possible during the match. In the absence
    915 of these functions, the normal custom memory management functions are used, if
    916 supplied, otherwise the system functions.
    917 </P>
    918 <br><a name="SEC17" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
    919 <P>
    920 <b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
    921 </P>
    922 <P>
    923 The function <b>pcre2_config()</b> makes it possible for a PCRE2 client to
    924 discover which optional features have been compiled into the PCRE2 library. The
    925 <a href="pcre2build.html"><b>pcre2build</b></a>
    926 documentation has more details about these optional features.
    927 </P>
    928 <P>
    929 The first argument for <b>pcre2_config()</b> specifies which information is
    930 required. The second argument is a pointer to memory into which the information
    931 is placed. If NULL is passed, the function returns the amount of memory that is
    932 needed for the requested information. For calls that return numerical values,
    933 the value is in bytes; when requesting these values, <i>where</i> should point
    934 to appropriately aligned memory. For calls that return strings, the required
    935 length is given in code units, not counting the terminating zero.
    936 </P>
    937 <P>
    938 When requesting information, the returned value from <b>pcre2_config()</b> is
    939 non-negative on success, or the negative error code PCRE2_ERROR_BADOPTION if
    940 the value in the first argument is not recognized. The following information is
    941 available:
    942 <pre>
    943   PCRE2_CONFIG_BSR
    944 </pre>
    945 The output is a uint32_t integer whose value indicates what character
    946 sequences the \R escape sequence matches by default. A value of
    947 PCRE2_BSR_UNICODE means that \R matches any Unicode line ending sequence; a
    948 value of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. The
    949 default can be overridden when a pattern is compiled.
    950 <pre>
    951   PCRE2_CONFIG_JIT
    952 </pre>
    953 The output is a uint32_t integer that is set to one if support for just-in-time
    954 compiling is available; otherwise it is set to zero.
    955 <pre>
    956   PCRE2_CONFIG_JITTARGET
    957 </pre>
    958 The <i>where</i> argument should point to a buffer that is at least 48 code
    959 units long. (The exact length required can be found by calling
    960 <b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with a
    961 string that contains the name of the architecture for which the JIT compiler is
    962 configured, for example "x86 32bit (little endian + unaligned)". If JIT support
    963 is not available, PCRE2_ERROR_BADOPTION is returned, otherwise the number of
    964 code units used is returned. This is the length of the string, plus one unit
    965 for the terminating zero.
    966 <pre>
    967   PCRE2_CONFIG_LINKSIZE
    968 </pre>
    969 The output is a uint32_t integer that contains the number of bytes used for
    970 internal linkage in compiled regular expressions. When PCRE2 is configured, the
    971 value can be set to 2, 3, or 4, with the default being 2. This is the value
    972 that is returned by <b>pcre2_config()</b>. However, when the 16-bit library is
    973 compiled, a value of 3 is rounded up to 4, and when the 32-bit library is
    974 compiled, internal linkages always use 4 bytes, so the configured value is not
    975 relevant.
    976 </P>
    977 <P>
    978 The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
    979 but the most massive patterns, since it allows the size of the compiled pattern
    980 to be up to 64K code units. Larger values allow larger regular expressions to
    981 be compiled by those two libraries, but at the expense of slower matching.
    982 <pre>
    983   PCRE2_CONFIG_MATCHLIMIT
    984 </pre>
    985 The output is a uint32_t integer that gives the default limit for the number of
    986 internal matching function calls in a <b>pcre2_match()</b> execution. Further
    987 details are given with <b>pcre2_match()</b> below.
    988 <pre>
    989   PCRE2_CONFIG_NEWLINE
    990 </pre>
    991 The output is a uint32_t integer whose value specifies the default character
    992 sequence that is recognized as meaning "newline". The values are:
    993 <pre>
    994   PCRE2_NEWLINE_CR       Carriage return (CR)
    995   PCRE2_NEWLINE_LF       Linefeed (LF)
    996   PCRE2_NEWLINE_CRLF     Carriage return, linefeed (CRLF)
    997   PCRE2_NEWLINE_ANY      Any Unicode line ending
    998   PCRE2_NEWLINE_ANYCRLF  Any of CR, LF, or CRLF
    999 </pre>
   1000 The default should normally correspond to the standard sequence for your
   1001 operating system.
   1002 <pre>
   1003   PCRE2_CONFIG_PARENSLIMIT
   1004 </pre>
   1005 The output is a uint32_t integer that gives the maximum depth of nesting
   1006 of parentheses (of any kind) in a pattern. This limit is imposed to cap the
   1007 amount of system stack used when a pattern is compiled. It is specified when
   1008 PCRE2 is built; the default is 250. This limit does not take into account the
   1009 stack that may already be used by the calling application. For finer control
   1010 over compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>.
   1011 <pre>
   1012   PCRE2_CONFIG_RECURSIONLIMIT
   1013 </pre>
   1014 The output is a uint32_t integer that gives the default limit for the depth of
   1015 recursion when calling the internal matching function in a <b>pcre2_match()</b>
   1016 execution. Further details are given with <b>pcre2_match()</b> below.
   1017 <pre>
   1018   PCRE2_CONFIG_STACKRECURSE
   1019 </pre>
   1020 The output is a uint32_t integer that is set to one if internal recursion when
   1021 running <b>pcre2_match()</b> is implemented by recursive function calls that use
   1022 the system stack to remember their state. This is the usual way that PCRE2 is
   1023 compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
   1024 heap instead of recursive function calls.
   1025 <pre>
   1026   PCRE2_CONFIG_UNICODE_VERSION
   1027 </pre>
   1028 The <i>where</i> argument should point to a buffer that is at least 24 code
   1029 units long. (The exact length required can be found by calling
   1030 <b>pcre2_config()</b> with <b>where</b> set to NULL.) If PCRE2 has been compiled
   1031 without Unicode support, the buffer is filled with the text "Unicode not
   1032 supported". Otherwise, the Unicode version string (for example, "8.0.0") is
   1033 inserted. The number of code units used is returned. This is the length of the
   1034 string plus one unit for the terminating zero.
   1035 <pre>
   1036   PCRE2_CONFIG_UNICODE
   1037 </pre>
   1038 The output is a uint32_t integer that is set to one if Unicode support is
   1039 available; otherwise it is set to zero. Unicode support implies UTF support.
   1040 <pre>
   1041   PCRE2_CONFIG_VERSION
   1042 </pre>
   1043 The <i>where</i> argument should point to a buffer that is at least 12 code
   1044 units long. (The exact length required can be found by calling
   1045 <b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with
   1046 the PCRE2 version string, zero-terminated. The number of code units used is
   1047 returned. This is the length of the string plus one unit for the terminating
   1048 zero.
   1049 <a name="compiling"></a></P>
   1050 <br><a name="SEC18" href="#TOC1">COMPILING A PATTERN</a><br>
   1051 <P>
   1052 <b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
   1053 <b>  uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
   1054 <b>  pcre2_compile_context *<i>ccontext</i>);</b>
   1055 <br>
   1056 <br>
   1057 <b>void pcre2_code_free(pcre2_code *<i>code</i>);</b>
   1058 <br>
   1059 <br>
   1060 <b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b>
   1061 </P>
   1062 <P>
   1063 The <b>pcre2_compile()</b> function compiles a pattern into an internal form.
   1064 The pattern is defined by a pointer to a string of code units and a length. If
   1065 the pattern is zero-terminated, the length can be specified as
   1066 PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that
   1067 contains the compiled pattern and related data, or NULL if an error occurred.
   1068 </P>
   1069 <P>
   1070 If the compile context argument <i>ccontext</i> is NULL, memory for the compiled
   1071 pattern is obtained by calling <b>malloc()</b>. Otherwise, it is obtained from
   1072 the same memory function that was used for the compile context. The caller must
   1073 free the memory by calling <b>pcre2_code_free()</b> when it is no longer needed.
   1074 </P>
   1075 <P>
   1076 The function <b>pcre2_code_copy()</b> makes a copy of the compiled code in new
   1077 memory, using the same memory allocator as was used for the original. However,
   1078 if the code has been processed by the JIT compiler (see
   1079 <a href="#jitcompiling">below),</a>
   1080 the JIT information cannot be copied (because it is position-dependent).
   1081 The new copy can initially be used only for non-JIT matching, though it can be
   1082 passed to <b>pcre2_jit_compile()</b> if required. The <b>pcre2_code_copy()</b>
   1083 function provides a way for individual threads in a multithreaded application
   1084 to acquire a private copy of shared compiled code.
   1085 </P>
   1086 <P>
   1087 NOTE: When one of the matching functions is called, pointers to the compiled
   1088 pattern and the subject string are set in the match data block so that they can
   1089 be referenced by the substring extraction functions. After running a match, you
   1090 must not free a compiled pattern (or a subject string) until after all
   1091 operations on the
   1092 <a href="#matchdatablock">match data block</a>
   1093 have taken place.
   1094 </P>
   1095 <P>
   1096 The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
   1097 settings that affect the compilation. It should be zero if no options are
   1098 required. The available options are described below. Some of them (in
   1099 particular, those that are compatible with Perl, but some others as well) can
   1100 also be set and unset from within the pattern (see the detailed description in
   1101 the
   1102 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
   1103 documentation).
   1104 </P>
   1105 <P>
   1106 For those options that can be different in different parts of the pattern, the
   1107 contents of the <i>options</i> argument specifies their settings at the start of
   1108 compilation. The PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK options can be set at
   1109 the time of matching as well as at compile time.
   1110 </P>
   1111 <P>
   1112 Other, less frequently required compile-time parameters (for example, the
   1113 newline setting) can be provided in a compile context (as described
   1114 <a href="#compilecontext">above).</a>
   1115 </P>
   1116 <P>
   1117 If <i>errorcode</i> or <i>erroroffset</i> is NULL, <b>pcre2_compile()</b> returns
   1118 NULL immediately. Otherwise, the variables to which these point are set to an
   1119 error code and an offset (number of code units) within the pattern,
   1120 respectively, when <b>pcre2_compile()</b> returns NULL because a compilation
   1121 error has occurred. The values are not defined when compilation is successful
   1122 and <b>pcre2_compile()</b> returns a non-NULL value.
   1123 </P>
   1124 <P>
   1125 The <b>pcre2_get_error_message()</b> function (see "Obtaining a textual error
   1126 message"
   1127 <a href="#geterrormessage">below)</a>
   1128 provides a textual message for each error code. Compilation errors have
   1129 positive error codes; UTF formatting error codes are negative. For an invalid
   1130 UTF-8 or UTF-16 string, the offset is that of the first code unit of the
   1131 failing character.
   1132 </P>
   1133 <P>
   1134 Some errors are not detected until the whole pattern has been scanned; in these
   1135 cases, the offset passed back is the length of the pattern. Note that the
   1136 offset is in code units, not characters, even in a UTF mode. It may sometimes
   1137 point into the middle of a UTF-8 or UTF-16 character.
   1138 </P>
   1139 <P>
   1140 This code fragment shows a typical straightforward call to
   1141 <b>pcre2_compile()</b>:
   1142 <pre>
   1143   pcre2_code *re;
   1144   PCRE2_SIZE erroffset;
   1145   int errorcode;
   1146   re = pcre2_compile(
   1147     "^A.*Z",                /* the pattern */
   1148     PCRE2_ZERO_TERMINATED,  /* the pattern is zero-terminated */
   1149     0,                      /* default options */
   1150     &errorcode,             /* for error code */
   1151     &erroffset,             /* for error offset */
   1152     NULL);                  /* no compile context */
   1153 </pre>
   1154 The following names for option bits are defined in the <b>pcre2.h</b> header
   1155 file:
   1156 <pre>
   1157   PCRE2_ANCHORED
   1158 </pre>
   1159 If this bit is set, the pattern is forced to be "anchored", that is, it is
   1160 constrained to match only at the first matching point in the string that is
   1161 being searched (the "subject string"). This effect can also be achieved by
   1162 appropriate constructs in the pattern itself, which is the only way to do it in
   1163 Perl.
   1164 <pre>
   1165   PCRE2_ALLOW_EMPTY_CLASS
   1166 </pre>
   1167 By default, for compatibility with Perl, a closing square bracket that
   1168 immediately follows an opening one is treated as a data character for the
   1169 class. When PCRE2_ALLOW_EMPTY_CLASS is set, it terminates the class, which
   1170 therefore contains no characters and so can never match.
   1171 <pre>
   1172   PCRE2_ALT_BSUX
   1173 </pre>
   1174 This option request alternative handling of three escape sequences, which
   1175 makes PCRE2's behaviour more like ECMAscript (aka JavaScript). When it is set:
   1176 </P>
   1177 <P>
   1178 (1) \U matches an upper case "U" character; by default \U causes a compile
   1179 time error (Perl uses \U to upper case subsequent characters).
   1180 </P>
   1181 <P>
   1182 (2) \u matches a lower case "u" character unless it is followed by four
   1183 hexadecimal digits, in which case the hexadecimal number defines the code point
   1184 to match. By default, \u causes a compile time error (Perl uses it to upper
   1185 case the following character).
   1186 </P>
   1187 <P>
   1188 (3) \x matches a lower case "x" character unless it is followed by two
   1189 hexadecimal digits, in which case the hexadecimal number defines the code point
   1190 to match. By default, as in Perl, a hexadecimal number is always expected after
   1191 \x, but it may have zero, one, or two digits (so, for example, \xz matches a
   1192 binary zero character followed by z).
   1193 <pre>
   1194   PCRE2_ALT_CIRCUMFLEX
   1195 </pre>
   1196 In multiline mode (when PCRE2_MULTILINE is set), the circumflex metacharacter
   1197 matches at the start of the subject (unless PCRE2_NOTBOL is set), and also
   1198 after any internal newline. However, it does not match after a newline at the
   1199 end of the subject, for compatibility with Perl. If you want a multiline
   1200 circumflex also to match after a terminating newline, you must set
   1201 PCRE2_ALT_CIRCUMFLEX.
   1202 <pre>
   1203   PCRE2_ALT_VERBNAMES
   1204 </pre>
   1205 By default, for compatibility with Perl, the name in any verb sequence such as
   1206 (*MARK:NAME) is any sequence of characters that does not include a closing
   1207 parenthesis. The name is not processed in any way, and it is not possible to
   1208 include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES
   1209 option is set, normal backslash processing is applied to verb names and only an
   1210 unescaped closing parenthesis terminates the name. A closing parenthesis can be
   1211 included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
   1212 option is set, unescaped whitespace in verb names is skipped and #-comments are
   1213 recognized, exactly as in the rest of the pattern.
   1214 <pre>
   1215   PCRE2_AUTO_CALLOUT
   1216 </pre>
   1217 If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items,
   1218 all with number 255, before each pattern item. For discussion of the callout
   1219 facility, see the
   1220 <a href="pcre2callout.html"><b>pcre2callout</b></a>
   1221 documentation.
   1222 <pre>
   1223   PCRE2_CASELESS
   1224 </pre>
   1225 If this bit is set, letters in the pattern match both upper and lower case
   1226 letters in the subject. It is equivalent to Perl's /i option, and it can be
   1227 changed within a pattern by a (?i) option setting.
   1228 <pre>
   1229   PCRE2_DOLLAR_ENDONLY
   1230 </pre>
   1231 If this bit is set, a dollar metacharacter in the pattern matches only at the
   1232 end of the subject string. Without this option, a dollar also matches
   1233 immediately before a newline at the end of the string (but not before any other
   1234 newlines). The PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is
   1235 set. There is no equivalent to this option in Perl, and no way to set it within
   1236 a pattern.
   1237 <pre>
   1238   PCRE2_DOTALL
   1239 </pre>
   1240 If this bit is set, a dot metacharacter in the pattern matches any character,
   1241 including one that indicates a newline. However, it only ever matches one
   1242 character, even if newlines are coded as CRLF. Without this option, a dot does
   1243 not match when the current position in the subject is at a newline. This option
   1244 is equivalent to Perl's /s option, and it can be changed within a pattern by a
   1245 (?s) option setting. A negative class such as [^a] always matches newline
   1246 characters, independent of the setting of this option.
   1247 <pre>
   1248   PCRE2_DUPNAMES
   1249 </pre>
   1250 If this bit is set, names used to identify capturing subpatterns need not be
   1251 unique. This can be helpful for certain types of pattern when it is known that
   1252 only one instance of the named subpattern can ever be matched. There are more
   1253 details of named subpatterns below; see also the
   1254 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
   1255 documentation.
   1256 <pre>
   1257   PCRE2_EXTENDED
   1258 </pre>
   1259 If this bit is set, most white space characters in the pattern are totally
   1260 ignored except when escaped or inside a character class. However, white space
   1261 is not allowed within sequences such as (?&#62; that introduce various
   1262 parenthesized subpatterns, nor within numerical quantifiers such as {1,3}.
   1263 Ignorable white space is permitted between an item and a following quantifier
   1264 and between a quantifier and a following + that indicates possessiveness.
   1265 </P>
   1266 <P>
   1267 PCRE2_EXTENDED also causes characters between an unescaped # outside a
   1268 character class and the next newline, inclusive, to be ignored, which makes it
   1269 possible to include comments inside complicated patterns. Note that the end of
   1270 this type of comment is a literal newline sequence in the pattern; escape
   1271 sequences that happen to represent a newline do not count. PCRE2_EXTENDED is
   1272 equivalent to Perl's /x option, and it can be changed within a pattern by a
   1273 (?x) option setting.
   1274 </P>
   1275 <P>
   1276 Which characters are interpreted as newlines can be specified by a setting in
   1277 the compile context that is passed to <b>pcre2_compile()</b> or by a special
   1278 sequence at the start of the pattern, as described in the section entitled
   1279 <a href="pcre2pattern.html#newlines">"Newline conventions"</a>
   1280 in the <b>pcre2pattern</b> documentation. A default is defined when PCRE2 is
   1281 built.
   1282 <pre>
   1283   PCRE2_FIRSTLINE
   1284 </pre>
   1285 If this option is set, an unanchored pattern is required to match before or at
   1286 the first newline in the subject string, though the matched text may continue
   1287 over the newline. See also PCRE2_USE_OFFSET_LIMIT, which provides a more
   1288 general limiting facility. If PCRE2_FIRSTLINE is set with an offset limit, a
   1289 match must occur in the first line and also within the offset limit. In other
   1290 words, whichever limit comes first is used.
   1291 <pre>
   1292   PCRE2_MATCH_UNSET_BACKREF
   1293 </pre>
   1294 If this option is set, a back reference to an unset subpattern group matches an
   1295 empty string (by default this causes the current matching alternative to fail).
   1296 A pattern such as (\1)(a) succeeds when this option is set (assuming it can
   1297 find an "a" in the subject), whereas it fails by default, for Perl
   1298 compatibility. Setting this option makes PCRE2 behave more like ECMAscript (aka
   1299 JavaScript).
   1300 <pre>
   1301   PCRE2_MULTILINE
   1302 </pre>
   1303 By default, for the purposes of matching "start of line" and "end of line",
   1304 PCRE2 treats the subject string as consisting of a single line of characters,
   1305 even if it actually contains newlines. The "start of line" metacharacter (^)
   1306 matches only at the start of the string, and the "end of line" metacharacter
   1307 ($) matches only at the end of the string, or before a terminating newline
   1308 (except when PCRE2_DOLLAR_ENDONLY is set). Note, however, that unless
   1309 PCRE2_DOTALL is set, the "any character" metacharacter (.) does not match at a
   1310 newline. This behaviour (for ^, $, and dot) is the same as Perl.
   1311 </P>
   1312 <P>
   1313 When PCRE2_MULTILINE it is set, the "start of line" and "end of line"
   1314 constructs match immediately following or immediately before internal newlines
   1315 in the subject string, respectively, as well as at the very start and end. This
   1316 is equivalent to Perl's /m option, and it can be changed within a pattern by a
   1317 (?m) option setting. Note that the "start of line" metacharacter does not match
   1318 after a newline at the end of the subject, for compatibility with Perl.
   1319 However, you can change this by setting the PCRE2_ALT_CIRCUMFLEX option. If
   1320 there are no newlines in a subject string, or no occurrences of ^ or $ in a
   1321 pattern, setting PCRE2_MULTILINE has no effect.
   1322 <pre>
   1323   PCRE2_NEVER_BACKSLASH_C
   1324 </pre>
   1325 This option locks out the use of \C in the pattern that is being compiled.
   1326 This escape can cause unpredictable behaviour in UTF-8 or UTF-16 modes, because
   1327 it may leave the current matching point in the middle of a multi-code-unit
   1328 character. This option may be useful in applications that process patterns from
   1329 external sources. Note that there is also a build-time option that permanently
   1330 locks out the use of \C.
   1331 <pre>
   1332   PCRE2_NEVER_UCP
   1333 </pre>
   1334 This option locks out the use of Unicode properties for handling \B, \b, \D,
   1335 \d, \S, \s, \W, \w, and some of the POSIX character classes, as described
   1336 for the PCRE2_UCP option below. In particular, it prevents the creator of the
   1337 pattern from enabling this facility by starting the pattern with (*UCP). This
   1338 option may be useful in applications that process patterns from external
   1339 sources. The option combination PCRE_UCP and PCRE_NEVER_UCP causes an error.
   1340 <pre>
   1341   PCRE2_NEVER_UTF
   1342 </pre>
   1343 This option locks out interpretation of the pattern as UTF-8, UTF-16, or
   1344 UTF-32, depending on which library is in use. In particular, it prevents the
   1345 creator of the pattern from switching to UTF interpretation by starting the
   1346 pattern with (*UTF). This option may be useful in applications that process
   1347 patterns from external sources. The combination of PCRE2_UTF and
   1348 PCRE2_NEVER_UTF causes an error.
   1349 <pre>
   1350   PCRE2_NO_AUTO_CAPTURE
   1351 </pre>
   1352 If this option is set, it disables the use of numbered capturing parentheses in
   1353 the pattern. Any opening parenthesis that is not followed by ? behaves as if it
   1354 were followed by ?: but named parentheses can still be used for capturing (and
   1355 they acquire numbers in the usual way). There is no equivalent of this option
   1356 in Perl. Note that, if this option is set, references to capturing groups (back
   1357 references or recursion/subroutine calls) may only refer to named groups,
   1358 though the reference can be by name or by number.
   1359 <pre>
   1360   PCRE2_NO_AUTO_POSSESS
   1361 </pre>
   1362 If this option is set, it disables "auto-possessification", which is an
   1363 optimization that, for example, turns a+b into a++b in order to avoid
   1364 backtracks into a+ that can never be successful. However, if callouts are in
   1365 use, auto-possessification means that some callouts are never taken. You can
   1366 set this option if you want the matching functions to do a full unoptimized
   1367 search and run all the callouts, but it is mainly provided for testing
   1368 purposes.
   1369 <pre>
   1370   PCRE2_NO_DOTSTAR_ANCHOR
   1371 </pre>
   1372 If this option is set, it disables an optimization that is applied when .* is
   1373 the first significant item in a top-level branch of a pattern, and all the
   1374 other branches also start with .* or with \A or \G or ^. The optimization is
   1375 automatically disabled for .* if it is inside an atomic group or a capturing
   1376 group that is the subject of a back reference, or if the pattern contains
   1377 (*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
   1378 automatically anchored if PCRE2_DOTALL is set for all the .* items and
   1379 PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match
   1380 must start either at the start of the subject or following a newline is
   1381 remembered. Like other optimizations, this can cause callouts to be skipped.
   1382 <pre>
   1383   PCRE2_NO_START_OPTIMIZE
   1384 </pre>
   1385 This is an option whose main effect is at matching time. It does not change
   1386 what <b>pcre2_compile()</b> generates, but it does affect the output of the JIT
   1387 compiler.
   1388 </P>
   1389 <P>
   1390 There are a number of optimizations that may occur at the start of a match, in
   1391 order to speed up the process. For example, if it is known that an unanchored
   1392 match must start with a specific character, the matching code searches the
   1393 subject for that character, and fails immediately if it cannot find it, without
   1394 actually running the main matching function. This means that a special item
   1395 such as (*COMMIT) at the start of a pattern is not considered until after a
   1396 suitable starting point for the match has been found. Also, when callouts or
   1397 (*MARK) items are in use, these "start-up" optimizations can cause them to be
   1398 skipped if the pattern is never actually used. The start-up optimizations are
   1399 in effect a pre-scan of the subject that takes place before the pattern is run.
   1400 </P>
   1401 <P>
   1402 The PCRE2_NO_START_OPTIMIZE option disables the start-up optimizations,
   1403 possibly causing performance to suffer, but ensuring that in cases where the
   1404 result is "no match", the callouts do occur, and that items such as (*COMMIT)
   1405 and (*MARK) are considered at every possible starting position in the subject
   1406 string.
   1407 </P>
   1408 <P>
   1409 Setting PCRE2_NO_START_OPTIMIZE may change the outcome of a matching operation.
   1410 Consider the pattern
   1411 <pre>
   1412   (*COMMIT)ABC
   1413 </pre>
   1414 When this is compiled, PCRE2 records the fact that a match must start with the
   1415 character "A". Suppose the subject string is "DEFABC". The start-up
   1416 optimization scans along the subject, finds "A" and runs the first match
   1417 attempt from there. The (*COMMIT) item means that the pattern must match the
   1418 current starting position, which in this case, it does. However, if the same
   1419 match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
   1420 subject string does not happen. The first match attempt is run starting from
   1421 "D" and when this fails, (*COMMIT) prevents any further matches being tried, so
   1422 the overall result is "no match". There are also other start-up optimizations.
   1423 For example, a minimum length for the subject may be recorded. Consider the
   1424 pattern
   1425 <pre>
   1426   (*MARK:A)(X|Y)
   1427 </pre>
   1428 The minimum length for a match is one character. If the subject is "ABC", there
   1429 will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
   1430 string at the end of the subject does not take place, because PCRE2 knows that
   1431 the subject is now too short, and so the (*MARK) is never encountered. In this
   1432 case, the optimization does not affect the overall match result, which is still
   1433 "no match", but it does affect the auxiliary information that is returned.
   1434 <pre>
   1435   PCRE2_NO_UTF_CHECK
   1436 </pre>
   1437 When PCRE2_UTF is set, the validity of the pattern as a UTF string is
   1438 automatically checked. There are discussions about the validity of
   1439 <a href="pcre2unicode.html#utf8strings">UTF-8 strings,</a>
   1440 <a href="pcre2unicode.html#utf16strings">UTF-16 strings,</a>
   1441 and
   1442 <a href="pcre2unicode.html#utf32strings">UTF-32 strings</a>
   1443 in the
   1444 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
   1445 document.
   1446 If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a negative
   1447 error code.
   1448 </P>
   1449 <P>
   1450 If you know that your pattern is valid, and you want to skip this check for
   1451 performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When it is set,
   1452 the effect of passing an invalid UTF string as a pattern is undefined. It may
   1453 cause your program to crash or loop. Note that this option can also be passed
   1454 to <b>pcre2_match()</b> and <b>pcre_dfa_match()</b>, to suppress validity
   1455 checking of the subject string.
   1456 <pre>
   1457   PCRE2_UCP
   1458 </pre>
   1459 This option changes the way PCRE2 processes \B, \b, \D, \d, \S, \s, \W,
   1460 \w, and some of the POSIX character classes. By default, only ASCII characters
   1461 are recognized, but if PCRE2_UCP is set, Unicode properties are used instead to
   1462 classify characters. More details are given in the section on
   1463 <a href="pcre2pattern.html#genericchartypes">generic character types</a>
   1464 in the
   1465 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
   1466 page. If you set PCRE2_UCP, matching one of the items it affects takes much
   1467 longer. The option is available only if PCRE2 has been compiled with Unicode
   1468 support.
   1469 <pre>
   1470   PCRE2_UNGREEDY
   1471 </pre>
   1472 This option inverts the "greediness" of the quantifiers so that they are not
   1473 greedy by default, but become greedy if followed by "?". It is not compatible
   1474 with Perl. It can also be set by a (?U) option setting within the pattern.
   1475 <pre>
   1476   PCRE2_USE_OFFSET_LIMIT
   1477 </pre>
   1478 This option must be set for <b>pcre2_compile()</b> if
   1479 <b>pcre2_set_offset_limit()</b> is going to be used to set a non-default offset
   1480 limit in a match context for matches that use this pattern. An error is
   1481 generated if an offset limit is set without this option. For more details, see
   1482 the description of <b>pcre2_set_offset_limit()</b> in the
   1483 <a href="#matchcontext">section</a>
   1484 that describes match contexts. See also the PCRE2_FIRSTLINE
   1485 option above.
   1486 <pre>
   1487   PCRE2_UTF
   1488 </pre>
   1489 This option causes PCRE2 to regard both the pattern and the subject strings
   1490 that are subsequently processed as strings of UTF characters instead of
   1491 single-code-unit strings. It is available when PCRE2 is built to include
   1492 Unicode support (which is the default). If Unicode support is not available,
   1493 the use of this option provokes an error. Details of how this option changes
   1494 the behaviour of PCRE2 are given in the
   1495 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
   1496 page.
   1497 </P>
   1498 <br><a name="SEC19" href="#TOC1">COMPILATION ERROR CODES</a><br>
   1499 <P>
   1500 There are over 80 positive error codes that <b>pcre2_compile()</b> may return
   1501 (via <i>errorcode</i>) if it finds an error in the pattern. There are also some
   1502 negative error codes that are used for invalid UTF strings. These are the same
   1503 as given by <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and are described
   1504 in the
   1505 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
   1506 page. The <b>pcre2_get_error_message()</b> function (see "Obtaining a textual
   1507 error message"
   1508 <a href="#geterrormessage">below)</a>
   1509 can be called to obtain a textual error message from any error code.
   1510 <a name="jitcompiling"></a></P>
   1511 <br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
   1512 <P>
   1513 <b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
   1514 <br>
   1515 <br>
   1516 <b>int pcre2_jit_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
   1517 <b>  PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
   1518 <b>  uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b>
   1519 <b>  pcre2_match_context *<i>mcontext</i>);</b>
   1520 <br>
   1521 <br>
   1522 <b>void pcre2_jit_free_unused_memory(pcre2_general_context *<i>gcontext</i>);</b>
   1523 <br>
   1524 <br>
   1525 <b>pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE <i>startsize</i>,</b>
   1526 <b>  PCRE2_SIZE <i>maxsize</i>, pcre2_general_context *<i>gcontext</i>);</b>
   1527 <br>
   1528 <br>
   1529 <b>void pcre2_jit_stack_assign(pcre2_match_context *<i>mcontext</i>,</b>
   1530 <b>  pcre2_jit_callback <i>callback_function</i>, void *<i>callback_data</i>);</b>
   1531 <br>
   1532 <br>
   1533 <b>void pcre2_jit_stack_free(pcre2_jit_stack *<i>jit_stack</i>);</b>
   1534 </P>
   1535 <P>
   1536 These functions provide support for JIT compilation, which, if the just-in-time
   1537 compiler is available, further processes a compiled pattern into machine code
   1538 that executes much faster than the <b>pcre2_match()</b> interpretive matching
   1539 function. Full details are given in the
   1540 <a href="pcre2jit.html"><b>pcre2jit</b></a>
   1541 documentation.
   1542 </P>
   1543 <P>
   1544 JIT compilation is a heavyweight optimization. It can take some time for
   1545 patterns to be analyzed, and for one-off matches and simple patterns the
   1546 benefit of faster execution might be offset by a much slower compilation time.
   1547 Most, but not all patterns can be optimized by the JIT compiler.
   1548 <a name="localesupport"></a></P>
   1549 <br><a name="SEC21" href="#TOC1">LOCALE SUPPORT</a><br>
   1550 <P>
   1551 PCRE2 handles caseless matching, and determines whether characters are letters,
   1552 digits, or whatever, by reference to a set of tables, indexed by character code
   1553 point. This applies only to characters whose code points are less than 256. By
   1554 default, higher-valued code points never match escapes such as \w or \d.
   1555 However, if PCRE2 is built with UTF support, all characters can be tested with
   1556 \p and \P, or, alternatively, the PCRE2_UCP option can be set when a pattern
   1557 is compiled; this causes \w and friends to use Unicode property support
   1558 instead of the built-in tables.
   1559 </P>
   1560 <P>
   1561 The use of locales with Unicode is discouraged. If you are handling characters
   1562 with code points greater than 128, you should either use Unicode support, or
   1563 use locales, but not try to mix the two.
   1564 </P>
   1565 <P>
   1566 PCRE2 contains an internal set of character tables that are used by default.
   1567 These are sufficient for many applications. Normally, the internal tables
   1568 recognize only ASCII characters. However, when PCRE2 is built, it is possible
   1569 to cause the internal tables to be rebuilt in the default "C" locale of the
   1570 local system, which may cause them to be different.
   1571 </P>
   1572 <P>
   1573 The internal tables can be overridden by tables supplied by the application
   1574 that calls PCRE2. These may be created in a different locale from the default.
   1575 As more and more applications change to using Unicode, the need for this locale
   1576 support is expected to die away.
   1577 </P>
   1578 <P>
   1579 External tables are built by calling the <b>pcre2_maketables()</b> function, in
   1580 the relevant locale. The result can be passed to <b>pcre2_compile()</b> as often
   1581 as necessary, by creating a compile context and calling
   1582 <b>pcre2_set_character_tables()</b> to set the tables pointer therein. For
   1583 example, to build and use tables that are appropriate for the French locale
   1584 (where accented characters with values greater than 128 are treated as
   1585 letters), the following code could be used:
   1586 <pre>
   1587   setlocale(LC_CTYPE, "fr_FR");
   1588   tables = pcre2_maketables(NULL);
   1589   ccontext = pcre2_compile_context_create(NULL);
   1590   pcre2_set_character_tables(ccontext, tables);
   1591   re = pcre2_compile(..., ccontext);
   1592 </pre>
   1593 The locale name "fr_FR" is used on Linux and other Unix-like systems; if you
   1594 are using Windows, the name for the French locale is "french". It is the
   1595 caller's responsibility to ensure that the memory containing the tables remains
   1596 available for as long as it is needed.
   1597 </P>
   1598 <P>
   1599 The pointer that is passed (via the compile context) to <b>pcre2_compile()</b>
   1600 is saved with the compiled pattern, and the same tables are used by
   1601 <b>pcre2_match()</b> and <b>pcre_dfa_match()</b>. Thus, for any single pattern,
   1602 compilation, and matching all happen in the same locale, but different patterns
   1603 can be processed in different locales.
   1604 <a name="infoaboutpattern"></a></P>
   1605 <br><a name="SEC22" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
   1606 <P>
   1607 <b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
   1608 </P>
   1609 <P>
   1610 The <b>pcre2_pattern_info()</b> function returns general information about a
   1611 compiled pattern. For information about callouts, see the
   1612 <a href="pcre2pattern.html#infoaboutcallouts">next section.</a>
   1613 The first argument for <b>pcre2_pattern_info()</b> is a pointer to the compiled
   1614 pattern. The second argument specifies which piece of information is required,
   1615 and the third argument is a pointer to a variable to receive the data. If the
   1616 third argument is NULL, the first argument is ignored, and the function returns
   1617 the size in bytes of the variable that is required for the information
   1618 requested. Otherwise, The yield of the function is zero for success, or one of
   1619 the following negative numbers:
   1620 <pre>
   1621   PCRE2_ERROR_NULL           the argument <i>code</i> was NULL
   1622   PCRE2_ERROR_BADMAGIC       the "magic number" was not found
   1623   PCRE2_ERROR_BADOPTION      the value of <i>what</i> was invalid
   1624   PCRE2_ERROR_UNSET          the requested field is not set
   1625 </pre>
   1626 The "magic number" is placed at the start of each compiled pattern as an simple
   1627 check against passing an arbitrary memory pointer. Here is a typical call of
   1628 <b>pcre2_pattern_info()</b>, to obtain the length of the compiled pattern:
   1629 <pre>
   1630   int rc;
   1631   size_t length;
   1632   rc = pcre2_pattern_info(
   1633     re,               /* result of pcre2_compile() */
   1634     PCRE2_INFO_SIZE,  /* what is required */
   1635     &length);         /* where to put the data */
   1636 </pre>
   1637 The possible values for the second argument are defined in <b>pcre2.h</b>, and
   1638 are as follows:
   1639 <pre>
   1640   PCRE2_INFO_ALLOPTIONS
   1641   PCRE2_INFO_ARGOPTIONS
   1642 </pre>
   1643 Return a copy of the pattern's options. The third argument should point to a
   1644 <b>uint32_t</b> variable. PCRE2_INFO_ARGOPTIONS returns exactly the options that
   1645 were passed to <b>pcre2_compile()</b>, whereas PCRE2_INFO_ALLOPTIONS returns
   1646 the compile options as modified by any top-level (*XXX) option settings such as
   1647 (*UTF) at the start of the pattern itself.
   1648 </P>
   1649 <P>
   1650 For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED
   1651 option, the result for PCRE2_INFO_ALLOPTIONS is PCRE2_EXTENDED and PCRE2_UTF.
   1652 Option settings such as (?i) that can change within a pattern do not affect the
   1653 result of PCRE2_INFO_ALLOPTIONS, even if they appear right at the start of the
   1654 pattern. (This was different in some earlier releases.)
   1655 </P>
   1656 <P>
   1657 A pattern compiled without PCRE2_ANCHORED is automatically anchored by PCRE2 if
   1658 the first significant item in every top-level branch is one of the following:
   1659 <pre>
   1660   ^     unless PCRE2_MULTILINE is set
   1661   \A    always
   1662   \G    always
   1663   .*    sometimes - see below
   1664 </pre>
   1665 When .* is the first significant item, anchoring is possible only when all the
   1666 following are true:
   1667 <pre>
   1668   .* is not in an atomic group
   1669   .* is not in a capturing group that is the subject of a back reference
   1670   PCRE2_DOTALL is in force for .*
   1671   Neither (*PRUNE) nor (*SKIP) appears in the pattern.
   1672   PCRE2_NO_DOTSTAR_ANCHOR is not set.
   1673 </pre>
   1674 For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
   1675 options returned for PCRE2_INFO_ALLOPTIONS.
   1676 <pre>
   1677   PCRE2_INFO_BACKREFMAX
   1678 </pre>
   1679 Return the number of the highest back reference in the pattern. The third
   1680 argument should point to an <b>uint32_t</b> variable. Named subpatterns acquire
   1681 numbers as well as names, and these count towards the highest back reference.
   1682 Back references such as \4 or \g{12} match the captured characters of the
   1683 given group, but in addition, the check that a capturing group is set in a
   1684 conditional subpattern such as (?(3)a|b) is also a back reference. Zero is
   1685 returned if there are no back references.
   1686 <pre>
   1687   PCRE2_INFO_BSR
   1688 </pre>
   1689 The output is a uint32_t whose value indicates what character sequences the \R
   1690 escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R matches
   1691 any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means that \R
   1692 matches only CR, LF, or CRLF.
   1693 <pre>
   1694   PCRE2_INFO_CAPTURECOUNT
   1695 </pre>
   1696 Return the highest capturing subpattern number in the pattern. In patterns
   1697 where (?| is not used, this is also the total number of capturing subpatterns.
   1698 The third argument should point to an <b>uint32_t</b> variable.
   1699 <pre>
   1700   PCRE2_INFO_FIRSTBITMAP
   1701 </pre>
   1702 In the absence of a single first code unit for a non-anchored pattern,
   1703 <b>pcre2_compile()</b> may construct a 256-bit table that defines a fixed set of
   1704 values for the first code unit in any match. For example, a pattern that starts
   1705 with [abc] results in a table with three bits set. When code unit values
   1706 greater than 255 are supported, the flag bit for 255 means "any code unit of
   1707 value 255 or above". If such a table was constructed, a pointer to it is
   1708 returned. Otherwise NULL is returned. The third argument should point to an
   1709 <b>const uint8_t *</b> variable.
   1710 <pre>
   1711   PCRE2_INFO_FIRSTCODETYPE
   1712 </pre>
   1713 Return information about the first code unit of any matched string, for a
   1714 non-anchored pattern. The third argument should point to an <b>uint32_t</b>
   1715 variable. If there is a fixed first value, for example, the letter "c" from a
   1716 pattern such as (cat|cow|coyote), 1 is returned, and the character value can be
   1717 retrieved using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, but
   1718 it is known that a match can occur only at the start of the subject or
   1719 following a newline in the subject, 2 is returned. Otherwise, and for anchored
   1720 patterns, 0 is returned.
   1721 <pre>
   1722   PCRE2_INFO_FIRSTCODEUNIT
   1723 </pre>
   1724 Return the value of the first code unit of any matched string in the situation
   1725 where PCRE2_INFO_FIRSTCODETYPE returns 1; otherwise return 0. The third
   1726 argument should point to an <b>uint32_t</b> variable. In the 8-bit library, the
   1727 value is always less than 256. In the 16-bit library the value can be up to
   1728 0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
   1729 and up to 0xffffffff when not using UTF-32 mode.
   1730 <pre>
   1731   PCRE2_INFO_HASBACKSLASHC
   1732 </pre>
   1733 Return 1 if the pattern contains any instances of \C, otherwise 0. The third
   1734 argument should point to an <b>uint32_t</b> variable.
   1735 <pre>
   1736   PCRE2_INFO_HASCRORLF
   1737 </pre>
   1738 Return 1 if the pattern contains any explicit matches for CR or LF characters,
   1739 otherwise 0. The third argument should point to an <b>uint32_t</b> variable. An
   1740 explicit match is either a literal CR or LF character, or \r or \n.
   1741 <pre>
   1742   PCRE2_INFO_JCHANGED
   1743 </pre>
   1744 Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise
   1745 0. The third argument should point to an <b>uint32_t</b> variable. (?J) and
   1746 (?-J) set and unset the local PCRE2_DUPNAMES option, respectively.
   1747 <pre>
   1748   PCRE2_INFO_JITSIZE
   1749 </pre>
   1750 If the compiled pattern was successfully processed by
   1751 <b>pcre2_jit_compile()</b>, return the size of the JIT compiled code, otherwise
   1752 return zero. The third argument should point to a <b>size_t</b> variable.
   1753 <pre>
   1754   PCRE2_INFO_LASTCODETYPE
   1755 </pre>
   1756 Returns 1 if there is a rightmost literal code unit that must exist in any
   1757 matched string, other than at its start. The third argument should  point to an
   1758 <b>uint32_t</b> variable. If there is no such value, 0 is returned. When 1 is
   1759 returned, the code unit value itself can be retrieved using
   1760 PCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last literal value is
   1761 recorded only if it follows something of variable length. For example, for the
   1762 pattern /^a\d+z\d+/ the returned value is 1 (with "z" returned from
   1763 PCRE2_INFO_LASTCODEUNIT), but for /^a\dz\d/ the returned value is 0.
   1764 <pre>
   1765   PCRE2_INFO_LASTCODEUNIT
   1766 </pre>
   1767 Return the value of the rightmost literal data unit that must exist in any
   1768 matched string, other than at its start, if such a value has been recorded. The
   1769 third argument should point to an <b>uint32_t</b> variable. If there is no such
   1770 value, 0 is returned.
   1771 <pre>
   1772   PCRE2_INFO_MATCHEMPTY
   1773 </pre>
   1774 Return 1 if the pattern might match an empty string, otherwise 0. The third
   1775 argument should point to an <b>uint32_t</b> variable. When a pattern contains
   1776 recursive subroutine calls it is not always possible to determine whether or
   1777 not it can match an empty string. PCRE2 takes a cautious approach and returns 1
   1778 in such cases.
   1779 <pre>
   1780   PCRE2_INFO_MATCHLIMIT
   1781 </pre>
   1782 If the pattern set a match limit by including an item of the form
   1783 (*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
   1784 should point to an unsigned 32-bit integer. If no such value has been set, the
   1785 call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
   1786 <pre>
   1787   PCRE2_INFO_MAXLOOKBEHIND
   1788 </pre>
   1789 Return the number of characters (not code units) in the longest lookbehind
   1790 assertion in the pattern. The third argument should point to an unsigned 32-bit
   1791 integer. This information is useful when doing multi-segment matching using the
   1792 partial matching facilities. Note that the simple assertions \b and \B
   1793 require a one-character lookbehind. \A also registers a one-character
   1794 lookbehind, though it does not actually inspect the previous character. This is
   1795 to ensure that at least one character from the old segment is retained when a
   1796 new segment is processed. Otherwise, if there are no lookbehinds in the
   1797 pattern, \A might match incorrectly at the start of a new segment.
   1798 <pre>
   1799   PCRE2_INFO_MINLENGTH
   1800 </pre>
   1801 If a minimum length for matching subject strings was computed, its value is
   1802 returned. Otherwise the returned value is 0. The value is a number of
   1803 characters, which in UTF mode may be different from the number of code units.
   1804 The third argument should point to an <b>uint32_t</b> variable. The value is a
   1805 lower bound to the length of any matching string. There may not be any strings
   1806 of that length that do actually match, but every string that does match is at
   1807 least that long.
   1808 <pre>
   1809   PCRE2_INFO_NAMECOUNT
   1810   PCRE2_INFO_NAMEENTRYSIZE
   1811   PCRE2_INFO_NAMETABLE
   1812 </pre>
   1813 PCRE2 supports the use of named as well as numbered capturing parentheses. The
   1814 names are just an additional way of identifying the parentheses, which still
   1815 acquire numbers. Several convenience functions such as
   1816 <b>pcre2_substring_get_byname()</b> are provided for extracting captured
   1817 substrings by name. It is also possible to extract the data directly, by first
   1818 converting the name to a number in order to access the correct pointers in the
   1819 output vector (described with <b>pcre2_match()</b> below). To do the conversion,
   1820 you need to use the name-to-number map, which is described by these three
   1821 values.
   1822 </P>
   1823 <P>
   1824 The map consists of a number of fixed-size entries. PCRE2_INFO_NAMECOUNT gives
   1825 the number of entries, and PCRE2_INFO_NAMEENTRYSIZE gives the size of each
   1826 entry in code units; both of these return a <b>uint32_t</b> value. The entry
   1827 size depends on the length of the longest name.
   1828 </P>
   1829 <P>
   1830 PCRE2_INFO_NAMETABLE returns a pointer to the first entry of the table. This is
   1831 a PCRE2_SPTR pointer to a block of code units. In the 8-bit library, the first
   1832 two bytes of each entry are the number of the capturing parenthesis, most
   1833 significant byte first. In the 16-bit library, the pointer points to 16-bit
   1834 code units, the first of which contains the parenthesis number. In the 32-bit
   1835 library, the pointer points to 32-bit code units, the first of which contains
   1836 the parenthesis number. The rest of the entry is the corresponding name, zero
   1837 terminated.
   1838 </P>
   1839 <P>
   1840 The names are in alphabetical order. If (?| is used to create multiple groups
   1841 with the same number, as described in the
   1842 <a href="pcre2pattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
   1843 in the
   1844 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
   1845 page, the groups may be given the same name, but there is only one entry in the
   1846 table. Different names for groups of the same number are not permitted.
   1847 </P>
   1848 <P>
   1849 Duplicate names for subpatterns with different numbers are permitted, but only
   1850 if PCRE2_DUPNAMES is set. They appear in the table in the order in which they
   1851 were found in the pattern. In the absence of (?| this is the order of
   1852 increasing number; when (?| is used this is not necessarily the case because
   1853 later subpatterns may have lower numbers.
   1854 </P>
   1855 <P>
   1856 As a simple example of the name/number table, consider the following pattern
   1857 after compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white
   1858 space - including newlines - is ignored):
   1859 <pre>
   1860   (?&#60;date&#62; (?&#60;year&#62;(\d\d)?\d\d) - (?&#60;month&#62;\d\d) - (?&#60;day&#62;\d\d) )
   1861 </pre>
   1862 There are four named subpatterns, so the table has four entries, and each entry
   1863 in the table is eight bytes long. The table is as follows, with non-printing
   1864 bytes shows in hexadecimal, and undefined bytes shown as ??:
   1865 <pre>
   1866   00 01 d  a  t  e  00 ??
   1867   00 05 d  a  y  00 ?? ??
   1868   00 04 m  o  n  t  h  00
   1869   00 02 y  e  a  r  00 ??
   1870 </pre>
   1871 When writing code to extract data from named subpatterns using the
   1872 name-to-number map, remember that the length of the entries is likely to be
   1873 different for each compiled pattern.
   1874 <pre>
   1875   PCRE2_INFO_NEWLINE
   1876 </pre>
   1877 The output is a <b>uint32_t</b> with one of the following values:
   1878 <pre>
   1879   PCRE2_NEWLINE_CR       Carriage return (CR)
   1880   PCRE2_NEWLINE_LF       Linefeed (LF)
   1881   PCRE2_NEWLINE_CRLF     Carriage return, linefeed (CRLF)
   1882   PCRE2_NEWLINE_ANY      Any Unicode line ending
   1883   PCRE2_NEWLINE_ANYCRLF  Any of CR, LF, or CRLF
   1884 </pre>
   1885 This specifies the default character sequence that will be recognized as
   1886 meaning "newline" while matching.
   1887 <pre>
   1888   PCRE2_INFO_RECURSIONLIMIT
   1889 </pre>
   1890 If the pattern set a recursion limit by including an item of the form
   1891 (*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
   1892 argument should point to an unsigned 32-bit integer. If no such value has been
   1893 set, the call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
   1894 <pre>
   1895   PCRE2_INFO_SIZE
   1896 </pre>
   1897 Return the size of the compiled pattern in bytes (for all three libraries). The
   1898 third argument should point to a <b>size_t</b> variable. This value includes the
   1899 size of the general data block that precedes the code units of the compiled
   1900 pattern itself. The value that is used when <b>pcre2_compile()</b> is getting
   1901 memory in which to place the compiled pattern may be slightly larger than the
   1902 value returned by this option, because there are cases where the code that
   1903 calculates the size has to over-estimate. Processing a pattern with the JIT
   1904 compiler does not alter the value returned by this option.
   1905 <a name="infoaboutcallouts"></a></P>
   1906 <br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br>
   1907 <P>
   1908 <b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b>
   1909 <b>  int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b>
   1910 <b>  void *<i>user_data</i>);</b>
   1911 <br>
   1912 <br>
   1913 A script language that supports the use of string arguments in callouts might
   1914 like to scan all the callouts in a pattern before running the match. This can
   1915 be done by calling <b>pcre2_callout_enumerate()</b>. The first argument is a
   1916 pointer to a compiled pattern, the second points to a callback function, and
   1917 the third is arbitrary user data. The callback function is called for every
   1918 callout in the pattern in the order in which they appear. Its first argument is
   1919 a pointer to a callout enumeration block, and its second argument is the
   1920 <i>user_data</i> value that was passed to <b>pcre2_callout_enumerate()</b>. The
   1921 contents of the callout enumeration block are described in the
   1922 <a href="pcre2callout.html"><b>pcre2callout</b></a>
   1923 documentation, which also gives further details about callouts.
   1924 </P>
   1925 <br><a name="SEC24" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
   1926 <P>
   1927 It is possible to save compiled patterns on disc or elsewhere, and reload them
   1928 later, subject to a number of restrictions. The functions whose names begin
   1929 with <b>pcre2_serialize_</b> are used for this purpose. They are described in
   1930 the
   1931 <a href="pcre2serialize.html"><b>pcre2serialize</b></a>
   1932 documentation.
   1933 <a name="matchdatablock"></a></P>
   1934 <br><a name="SEC25" href="#TOC1">THE MATCH DATA BLOCK</a><br>
   1935 <P>
   1936 <b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
   1937 <b>  pcre2_general_context *<i>gcontext</i>);</b>
   1938 <br>
   1939 <br>
   1940 <b>pcre2_match_data *pcre2_match_data_create_from_pattern(</b>
   1941 <b>  const pcre2_code *<i>code</i>, pcre2_general_context *<i>gcontext</i>);</b>
   1942 <br>
   1943 <br>
   1944 <b>void pcre2_match_data_free(pcre2_match_data *<i>match_data</i>);</b>
   1945 </P>
   1946 <P>
   1947 Information about a successful or unsuccessful match is placed in a match
   1948 data block, which is an opaque structure that is accessed by function calls. In
   1949 particular, the match data block contains a vector of offsets into the subject
   1950 string that define the matched part of the subject and any substrings that were
   1951 captured. This is know as the <i>ovector</i>.
   1952 </P>
   1953 <P>
   1954 Before calling <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or
   1955 <b>pcre2_jit_match()</b> you must create a match data block by calling one of
   1956 the creation functions above. For <b>pcre2_match_data_create()</b>, the first
   1957 argument is the number of pairs of offsets in the <i>ovector</i>. One pair of
   1958 offsets is required to identify the string that matched the whole pattern, with
   1959 another pair for each captured substring. For example, a value of 4 creates
   1960 enough space to record the matched portion of the subject plus three captured
   1961 substrings. A minimum of at least 1 pair is imposed by
   1962 <b>pcre2_match_data_create()</b>, so it is always possible to return the overall
   1963 matched string.
   1964 </P>
   1965 <P>
   1966 The second argument of <b>pcre2_match_data_create()</b> is a pointer to a
   1967 general context, which can specify custom memory management for obtaining the
   1968 memory for the match data block. If you are not using custom memory management,
   1969 pass NULL, which causes <b>malloc()</b> to be used.
   1970 </P>
   1971 <P>
   1972 For <b>pcre2_match_data_create_from_pattern()</b>, the first argument is a
   1973 pointer to a compiled pattern. The ovector is created to be exactly the right
   1974 size to hold all the substrings a pattern might capture. The second argument is
   1975 again a pointer to a general context, but in this case if NULL is passed, the
   1976 memory is obtained using the same allocator that was used for the compiled
   1977 pattern (custom or default).
   1978 </P>
   1979 <P>
   1980 A match data block can be used many times, with the same or different compiled
   1981 patterns. You can extract information from a match data block after a match
   1982 operation has finished, using functions that are described in the sections on
   1983 <a href="#matchedstrings">matched strings</a>
   1984 and
   1985 <a href="#matchotherdata">other match data</a>
   1986 below.
   1987 </P>
   1988 <P>
   1989 When a call of <b>pcre2_match()</b> fails, valid data is available in the match
   1990 block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one
   1991 of the error codes for an invalid UTF string. Exactly what is available depends
   1992 on the error, and is detailed below.
   1993 </P>
   1994 <P>
   1995 When one of the matching functions is called, pointers to the compiled pattern
   1996 and the subject string are set in the match data block so that they can be
   1997 referenced by the extraction functions. After running a match, you must not
   1998 free a compiled pattern or a subject string until after all operations on the
   1999 match data block (for that match) have taken place.
   2000 </P>
   2001 <P>
   2002 When a match data block itself is no longer needed, it should be freed by
   2003 calling <b>pcre2_match_data_free()</b>.
   2004 </P>
   2005 <br><a name="SEC26" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
   2006 <P>
   2007 <b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
   2008 <b>  PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
   2009 <b>  uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b>
   2010 <b>  pcre2_match_context *<i>mcontext</i>);</b>
   2011 </P>
   2012 <P>
   2013 The function <b>pcre2_match()</b> is called to match a subject string against a
   2014 compiled pattern, which is passed in the <i>code</i> argument. You can call
   2015 <b>pcre2_match()</b> with the same <i>code</i> argument as many times as you
   2016 like, in order to find multiple matches in the subject string or to match
   2017 different subject strings with the same pattern.
   2018 </P>
   2019 <P>
   2020 This function is the main matching facility of the library, and it operates in
   2021 a Perl-like manner. For specialist use there is also an alternative matching
   2022 function, which is described
   2023 <a href="#dfamatch">below</a>
   2024 in the section about the <b>pcre2_dfa_match()</b> function.
   2025 </P>
   2026 <P>
   2027 Here is an example of a simple call to <b>pcre2_match()</b>:
   2028 <pre>
   2029   pcre2_match_data *md = pcre2_match_data_create(4, NULL);
   2030   int rc = pcre2_match(
   2031     re,             /* result of pcre2_compile() */
   2032     "some string",  /* the subject string */
   2033     11,             /* the length of the subject string */
   2034     0,              /* start at offset 0 in the subject */
   2035     0,              /* default options */
   2036     match_data,     /* the match data block */
   2037     NULL);          /* a match context; NULL means use defaults */
   2038 </pre>
   2039 If the subject string is zero-terminated, the length can be given as
   2040 PCRE2_ZERO_TERMINATED. A match context must be provided if certain less common
   2041 matching parameters are to be changed. For details, see the section on
   2042 <a href="#matchcontext">the match context</a>
   2043 above.
   2044 </P>
   2045 <br><b>
   2046 The string to be matched by <b>pcre2_match()</b>
   2047 </b><br>
   2048 <P>
   2049 The subject string is passed to <b>pcre2_match()</b> as a pointer in
   2050 <i>subject</i>, a length in <i>length</i>, and a starting offset in
   2051 <i>startoffset</i>. The length and offset are in code units, not characters.
   2052 That is, they are in bytes for the 8-bit library, 16-bit code units for the
   2053 16-bit library, and 32-bit code units for the 32-bit library, whether or not
   2054 UTF processing is enabled.
   2055 </P>
   2056 <P>
   2057 If <i>startoffset</i> is greater than the length of the subject,
   2058 <b>pcre2_match()</b> returns PCRE2_ERROR_BADOFFSET. When the starting offset is
   2059 zero, the search for a match starts at the beginning of the subject, and this
   2060 is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
   2061 must point to the start of a character, or to the end of the subject (in UTF-32
   2062 mode, one code unit equals one character, so all offsets are valid). Like the
   2063 pattern string, the subject may contain binary zeroes.
   2064 </P>
   2065 <P>
   2066 A non-zero starting offset is useful when searching for another match in the
   2067 same subject by calling <b>pcre2_match()</b> again after a previous success.
   2068 Setting <i>startoffset</i> differs from passing over a shortened string and
   2069 setting PCRE2_NOTBOL in the case of a pattern that begins with any kind of
   2070 lookbehind. For example, consider the pattern
   2071 <pre>
   2072   \Biss\B
   2073 </pre>
   2074 which finds occurrences of "iss" in the middle of words. (\B matches only if
   2075 the current position in the subject is not a word boundary.) When applied to
   2076 the string "Mississipi" the first call to <b>pcre2_match()</b> finds the first
   2077 occurrence. If <b>pcre2_match()</b> is called again with just the remainder of
   2078 the subject, namely "issipi", it does not match, because \B is always false at
   2079 the start of the subject, which is deemed to be a word boundary. However, if
   2080 <b>pcre2_match()</b> is passed the entire string again, but with
   2081 <i>startoffset</i> set to 4, it finds the second occurrence of "iss" because it
   2082 is able to look behind the starting point to discover that it is preceded by a
   2083 letter.
   2084 </P>
   2085 <P>
   2086 Finding all the matches in a subject is tricky when the pattern can match an
   2087 empty string. It is possible to emulate Perl's /g behaviour by first trying the
   2088 match again at the same offset, with the PCRE2_NOTEMPTY_ATSTART and
   2089 PCRE2_ANCHORED options, and then if that fails, advancing the starting offset
   2090 and trying an ordinary match again. There is some code that demonstrates how to
   2091 do this in the
   2092 <a href="pcre2demo.html"><b>pcre2demo</b></a>
   2093 sample program. In the most general case, you have to check to see if the
   2094 newline convention recognizes CRLF as a newline, and if so, and the current
   2095 character is CR followed by LF, advance the starting offset by two characters
   2096 instead of one.
   2097 </P>
   2098 <P>
   2099 If a non-zero starting offset is passed when the pattern is anchored, one
   2100 attempt to match at the given offset is made. This can only succeed if the
   2101 pattern does not require the match to be at the start of the subject.
   2102 <a name="matchoptions"></a></P>
   2103 <br><b>
   2104 Option bits for <b>pcre2_match()</b>
   2105 </b><br>
   2106 <P>
   2107 The unused bits of the <i>options</i> argument for <b>pcre2_match()</b> must be
   2108 zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_NOTBOL,
   2109 PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT,
   2110 PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is
   2111 described below.
   2112 </P>
   2113 <P>
   2114 Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
   2115 compiler. If it is set, JIT matching is disabled and the normal interpretive
   2116 code in <b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the
   2117 remaining options are supported for JIT matching.
   2118 <pre>
   2119   PCRE2_ANCHORED
   2120 </pre>
   2121 The PCRE2_ANCHORED option limits <b>pcre2_match()</b> to matching at the first
   2122 matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out
   2123 to be anchored by virtue of its contents, it cannot be made unachored at
   2124 matching time. Note that setting the option at match time disables JIT
   2125 matching.
   2126 <pre>
   2127   PCRE2_NOTBOL
   2128 </pre>
   2129 This option specifies that first character of the subject string is not the
   2130 beginning of a line, so the circumflex metacharacter should not match before
   2131 it. Setting this without having set PCRE2_MULTILINE at compile time causes
   2132 circumflex never to match. This option affects only the behaviour of the
   2133 circumflex metacharacter. It does not affect \A.
   2134 <pre>
   2135   PCRE2_NOTEOL
   2136 </pre>
   2137 This option specifies that the end of the subject string is not the end of a
   2138 line, so the dollar metacharacter should not match it nor (except in multiline
   2139 mode) a newline immediately before it. Setting this without having set
   2140 PCRE2_MULTILINE at compile time causes dollar never to match. This option
   2141 affects only the behaviour of the dollar metacharacter. It does not affect \Z
   2142 or \z.
   2143 <pre>
   2144   PCRE2_NOTEMPTY
   2145 </pre>
   2146 An empty string is not considered to be a valid match if this option is set. If
   2147 there are alternatives in the pattern, they are tried. If all the alternatives
   2148 match the empty string, the entire match fails. For example, if the pattern
   2149 <pre>
   2150   a?b?
   2151 </pre>
   2152 is applied to a string not beginning with "a" or "b", it matches an empty
   2153 string at the start of the subject. With PCRE2_NOTEMPTY set, this match is not
   2154 valid, so <b>pcre2_match()</b> searches further into the string for occurrences
   2155 of "a" or "b".
   2156 <pre>
   2157   PCRE2_NOTEMPTY_ATSTART
   2158 </pre>
   2159 This is like PCRE2_NOTEMPTY, except that it locks out an empty string match
   2160 only at the first matching position, that is, at the start of the subject plus
   2161 the starting offset. An empty string match later in the subject is permitted.
   2162 If the pattern is anchored, such a match can occur only if the pattern contains
   2163 \K.
   2164 <pre>
   2165   PCRE2_NO_JIT
   2166 </pre>
   2167 By default, if a pattern has been successfully processed by
   2168 <b>pcre2_jit_compile()</b>, JIT is automatically used when <b>pcre2_match()</b>
   2169 is called with options that JIT supports. Setting PCRE2_NO_JIT disables the use
   2170 of JIT; it forces matching to be done by the interpreter.
   2171 <pre>
   2172   PCRE2_NO_UTF_CHECK
   2173 </pre>
   2174 When PCRE2_UTF is set at compile time, the validity of the subject as a UTF
   2175 string is checked by default when <b>pcre2_match()</b> is subsequently called.
   2176 If a non-zero starting offset is given, the check is applied only to that part
   2177 of the subject that could be inspected during matching, and there is a check
   2178 that the starting offset points to the first code unit of a character or to the
   2179 end of the subject. If there are no lookbehind assertions in the pattern, the
   2180 check starts at the starting offset. Otherwise, it starts at the length of the
   2181 longest lookbehind before the starting offset, or at the start of the subject
   2182 if there are not that many characters before the starting offset. Note that the
   2183 sequences \b and \B are one-character lookbehinds.
   2184 </P>
   2185 <P>
   2186 The check is carried out before any other processing takes place, and a
   2187 negative error code is returned if the check fails. There are several UTF error
   2188 codes for each code unit width, corresponding to different problems with the
   2189 code unit sequence. There are discussions about the validity of
   2190 <a href="pcre2unicode.html#utf8strings">UTF-8 strings,</a>
   2191 <a href="pcre2unicode.html#utf16strings">UTF-16 strings,</a>
   2192 and
   2193 <a href="pcre2unicode.html#utf32strings">UTF-32 strings</a>
   2194 in the
   2195 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
   2196 page.
   2197 </P>
   2198 <P>
   2199 If you know that your subject is valid, and you want to skip these checks for
   2200 performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
   2201 <b>pcre2_match()</b>. You might want to do this for the second and subsequent
   2202 calls to <b>pcre2_match()</b> if you are making repeated calls to find all the
   2203 matches in a single subject string.
   2204 </P>
   2205 <P>
   2206 NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string
   2207 as a subject, or an invalid value of <i>startoffset</i>, is undefined. Your
   2208 program may crash or loop indefinitely.
   2209 <pre>
   2210   PCRE2_PARTIAL_HARD
   2211   PCRE2_PARTIAL_SOFT
   2212 </pre>
   2213 These options turn on the partial matching feature. A partial match occurs if
   2214 the end of the subject string is reached successfully, but there are not enough
   2215 subject characters to complete the match. If this happens when
   2216 PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD) is set, matching continues by
   2217 testing any remaining alternatives. Only if no complete match can be found is
   2218 PCRE2_ERROR_PARTIAL returned instead of PCRE2_ERROR_NOMATCH. In other words,
   2219 PCRE2_PARTIAL_SOFT specifies that the caller is prepared to handle a partial
   2220 match, but only if no complete match can be found.
   2221 </P>
   2222 <P>
   2223 If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if
   2224 a partial match is found, <b>pcre2_match()</b> immediately returns
   2225 PCRE2_ERROR_PARTIAL, without considering any other alternatives. In other
   2226 words, when PCRE2_PARTIAL_HARD is set, a partial match is considered to be more
   2227 important that an alternative complete match.
   2228 </P>
   2229 <P>
   2230 There is a more detailed discussion of partial and multi-segment matching, with
   2231 examples, in the
   2232 <a href="pcre2partial.html"><b>pcre2partial</b></a>
   2233 documentation.
   2234 </P>
   2235 <br><a name="SEC27" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
   2236 <P>
   2237 When PCRE2 is built, a default newline convention is set; this is usually the
   2238 standard convention for the operating system. The default can be overridden in
   2239 a
   2240 <a href="#compilecontext">compile context</a>
   2241 by calling <b>pcre2_set_newline()</b>. It can also be overridden by starting a
   2242 pattern string with, for example, (*CRLF), as described in the
   2243 <a href="pcre2pattern.html#newlines">section on newline conventions</a>
   2244 in the
   2245 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
   2246 page. During matching, the newline choice affects the behaviour of the dot,
   2247 circumflex, and dollar metacharacters. It may also alter the way the match
   2248 starting position is advanced after a match failure for an unanchored pattern.
   2249 </P>
   2250 <P>
   2251 When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is set as
   2252 the newline convention, and a match attempt for an unanchored pattern fails
   2253 when the current starting position is at a CRLF sequence, and the pattern
   2254 contains no explicit matches for CR or LF characters, the match position is
   2255 advanced by two characters instead of one, in other words, to after the CRLF.
   2256 </P>
   2257 <P>
   2258 The above rule is a compromise that makes the most common cases work as
   2259 expected. For example, if the pattern is .+A (and the PCRE2_DOTALL option is
   2260 not set), it does not match the string "\r\nA" because, after failing at the
   2261 start, it skips both the CR and the LF before retrying. However, the pattern
   2262 [\r\n]A does match that string, because it contains an explicit CR or LF
   2263 reference, and so advances only by one character after the first failure.
   2264 </P>
   2265 <P>
   2266 An explicit match for CR of LF is either a literal appearance of one of those
   2267 characters in the pattern, or one of the \r or \n escape sequences. Implicit
   2268 matches such as [^X] do not count, nor does \s, even though it includes CR and
   2269 LF in the characters that it matches.
   2270 </P>
   2271 <P>
   2272 Notwithstanding the above, anomalous effects may still occur when CRLF is a
   2273 valid newline sequence and explicit \r or \n escapes appear in the pattern.
   2274 <a name="matchedstrings"></a></P>
   2275 <br><a name="SEC28" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
   2276 <P>
   2277 <b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
   2278 <br>
   2279 <br>
   2280 <b>PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *<i>match_data</i>);</b>
   2281 </P>
   2282 <P>
   2283 In general, a pattern matches a certain portion of the subject, and in
   2284 addition, further substrings from the subject may be picked out by
   2285 parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's
   2286 book, this is called "capturing" in what follows, and the phrase "capturing
   2287 subpattern" or "capturing group" is used for a fragment of a pattern that picks
   2288 out a substring. PCRE2 supports several other kinds of parenthesized subpattern
   2289 that do not cause substrings to be captured. The <b>pcre2_pattern_info()</b>
   2290 function can be used to find out how many capturing subpatterns there are in a
   2291 compiled pattern.
   2292 </P>
   2293 <P>
   2294 You can use auxiliary functions for accessing captured substrings
   2295 <a href="#extractbynumber">by number</a>
   2296 or
   2297 <a href="#extractbyname">by name,</a>
   2298 as described in sections below.
   2299 </P>
   2300 <P>
   2301 Alternatively, you can make direct use of the vector of PCRE2_SIZE values,
   2302 called the <b>ovector</b>, which contains the offsets of captured strings. It is
   2303 part of the
   2304 <a href="#matchdatablock">match data block.</a>
   2305 The function <b>pcre2_get_ovector_pointer()</b> returns the address of the
   2306 ovector, and <b>pcre2_get_ovector_count()</b> returns the number of pairs of
   2307 values it contains.
   2308 </P>
   2309 <P>
   2310 Within the ovector, the first in each pair of values is set to the offset of
   2311 the first code unit of a substring, and the second is set to the offset of the
   2312 first code unit after the end of a substring. These values are always code unit
   2313 offsets, not character offsets. That is, they are byte offsets in the 8-bit
   2314 library, 16-bit offsets in the 16-bit library, and 32-bit offsets in the 32-bit
   2315 library.
   2316 </P>
   2317 <P>
   2318 After a partial match (error return PCRE2_ERROR_PARTIAL), only the first pair
   2319 of offsets (that is, <i>ovector[0]</i> and <i>ovector[1]</i>) are set. They
   2320 identify the part of the subject that was partially matched. See the
   2321 <a href="pcre2partial.html"><b>pcre2partial</b></a>
   2322 documentation for details of partial matching.
   2323 </P>
   2324 <P>
   2325 After a successful match, the first pair of offsets identifies the portion of
   2326 the subject string that was matched by the entire pattern. The next pair is
   2327 used for the first capturing subpattern, and so on. The value returned by
   2328 <b>pcre2_match()</b> is one more than the highest numbered pair that has been
   2329 set. For example, if two substrings have been captured, the returned value is
   2330 3. If there are no capturing subpatterns, the return value from a successful
   2331 match is 1, indicating that just the first pair of offsets has been set.
   2332 </P>
   2333 <P>
   2334 If a pattern uses the \K escape sequence within a positive assertion, the
   2335 reported start of a successful match can be greater than the end of the match.
   2336 For example, if the pattern (?=ab\K) is matched against "ab", the start and
   2337 end offset values for the match are 2 and 0.
   2338 </P>
   2339 <P>
   2340 If a capturing subpattern group is matched repeatedly within a single match
   2341 operation, it is the last portion of the subject that it matched that is
   2342 returned.
   2343 </P>
   2344 <P>
   2345 If the ovector is too small to hold all the captured substring offsets, as much
   2346 as possible is filled in, and the function returns a value of zero. If captured
   2347 substrings are not of interest, <b>pcre2_match()</b> may be called with a match
   2348 data block whose ovector is of minimum length (that is, one pair). However, if
   2349 the pattern contains back references and the <i>ovector</i> is not big enough to
   2350 remember the related substrings, PCRE2 has to get additional memory for use
   2351 during matching. Thus it is usually advisable to set up a match data block
   2352 containing an ovector of reasonable size.
   2353 </P>
   2354 <P>
   2355 It is possible for capturing subpattern number <i>n+1</i> to match some part of
   2356 the subject when subpattern <i>n</i> has not been used at all. For example, if
   2357 the string "abc" is matched against the pattern (a|(z))(bc) the return from the
   2358 function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this
   2359 happens, both values in the offset pairs corresponding to unused subpatterns
   2360 are set to PCRE2_UNSET.
   2361 </P>
   2362 <P>
   2363 Offset values that correspond to unused subpatterns at the end of the
   2364 expression are also set to PCRE2_UNSET. For example, if the string "abc" is
   2365 matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched.
   2366 The return from the function is 2, because the highest used capturing
   2367 subpattern number is 1. The offsets for for the second and third capturing
   2368 subpatterns (assuming the vector is large enough, of course) are set to
   2369 PCRE2_UNSET.
   2370 </P>
   2371 <P>
   2372 Elements in the ovector that do not correspond to capturing parentheses in the
   2373 pattern are never changed. That is, if a pattern contains <i>n</i> capturing
   2374 parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
   2375 <b>pcre2_match()</b>. The other elements retain whatever values they previously
   2376 had.
   2377 <a name="matchotherdata"></a></P>
   2378 <br><a name="SEC29" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
   2379 <P>
   2380 <b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
   2381 <br>
   2382 <br>
   2383 <b>PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *<i>match_data</i>);</b>
   2384 </P>
   2385 <P>
   2386 As well as the offsets in the ovector, other information about a match is
   2387 retained in the match data block and can be retrieved by the above functions in
   2388 appropriate circumstances. If they are called at other times, the result is
   2389 undefined.
   2390 </P>
   2391 <P>
   2392 After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
   2393 to match (PCRE2_ERROR_NOMATCH), a (*MARK) name may be available, and
   2394 <b>pcre2_get_mark()</b> can be called. It returns a pointer to the
   2395 zero-terminated name, which is within the compiled pattern. Otherwise NULL is
   2396 returned. The length of the (*MARK) name (excluding the terminating zero) is
   2397 stored in the code unit that preceeds the name. You should use this instead of
   2398 relying on the terminating zero if the (*MARK) name might contain a binary
   2399 zero.
   2400 </P>
   2401 <P>
   2402 After a successful match, the (*MARK) name that is returned is the
   2403 last one encountered on the matching path through the pattern. After a "no
   2404 match" or a partial match, the last encountered (*MARK) name is returned. For
   2405 example, consider this pattern:
   2406 <pre>
   2407   ^(*MARK:A)((*MARK:B)a|b)c
   2408 </pre>
   2409 When it matches "bc", the returned mark is A. The B mark is "seen" in the first
   2410 branch of the group, but it is not on the matching path. On the other hand,
   2411 when this pattern fails to match "bx", the returned mark is B.
   2412 </P>
   2413 <P>
   2414 After a successful match, a partial match, or one of the invalid UTF errors
   2415 (for example, PCRE2_ERROR_UTF8_ERR5), <b>pcre2_get_startchar()</b> can be
   2416 called. After a successful or partial match it returns the code unit offset of
   2417 the character at which the match started. For a non-partial match, this can be
   2418 different to the value of <i>ovector[0]</i> if the pattern contains the \K
   2419 escape sequence. After a partial match, however, this value is always the same
   2420 as <i>ovector[0]</i> because \K does not affect the result of a partial match.
   2421 </P>
   2422 <P>
   2423 After a UTF check failure, <b>pcre2_get_startchar()</b> can be used to obtain
   2424 the code unit offset of the invalid UTF character. Details are given in the
   2425 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
   2426 page.
   2427 <a name="errorlist"></a></P>
   2428 <br><a name="SEC30" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
   2429 <P>
   2430 If <b>pcre2_match()</b> fails, it returns a negative number. This can be
   2431 converted to a text string by calling the <b>pcre2_get_error_message()</b>
   2432 function (see "Obtaining a textual error message"
   2433 <a href="#geterrormessage">below).</a>
   2434 Negative error codes are also returned by other functions, and are documented
   2435 with them. The codes are given names in the header file. If UTF checking is in
   2436 force and an invalid UTF subject string is detected, one of a number of
   2437 UTF-specific negative error codes is returned. Details are given in the
   2438 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
   2439 page. The following are the other errors that may be returned by
   2440 <b>pcre2_match()</b>:
   2441 <pre>
   2442   PCRE2_ERROR_NOMATCH
   2443 </pre>
   2444 The subject string did not match the pattern.
   2445 <pre>
   2446   PCRE2_ERROR_PARTIAL
   2447 </pre>
   2448 The subject string did not match, but it did match partially. See the
   2449 <a href="pcre2partial.html"><b>pcre2partial</b></a>
   2450 documentation for details of partial matching.
   2451 <pre>
   2452   PCRE2_ERROR_BADMAGIC
   2453 </pre>
   2454 PCRE2 stores a 4-byte "magic number" at the start of the compiled code, to
   2455 catch the case when it is passed a junk pointer. This is the error that is
   2456 returned when the magic number is not present.
   2457 <pre>
   2458   PCRE2_ERROR_BADMODE
   2459 </pre>
   2460 This error is given when a pattern that was compiled by the 8-bit library is
   2461 passed to a 16-bit or 32-bit library function, or vice versa.
   2462 <pre>
   2463   PCRE2_ERROR_BADOFFSET
   2464 </pre>
   2465 The value of <i>startoffset</i> was greater than the length of the subject.
   2466 <pre>
   2467   PCRE2_ERROR_BADOPTION
   2468 </pre>
   2469 An unrecognized bit was set in the <i>options</i> argument.
   2470 <pre>
   2471   PCRE2_ERROR_BADUTFOFFSET
   2472 </pre>
   2473 The UTF code unit sequence that was passed as a subject was checked and found
   2474 to be valid (the PCRE2_NO_UTF_CHECK option was not set), but the value of
   2475 <i>startoffset</i> did not point to the beginning of a UTF character or the end
   2476 of the subject.
   2477 <pre>
   2478   PCRE2_ERROR_CALLOUT
   2479 </pre>
   2480 This error is never generated by <b>pcre2_match()</b> itself. It is provided for
   2481 use by callout functions that want to cause <b>pcre2_match()</b> or
   2482 <b>pcre2_callout_enumerate()</b> to return a distinctive error code. See the
   2483 <a href="pcre2callout.html"><b>pcre2callout</b></a>
   2484 documentation for details.
   2485 <pre>
   2486   PCRE2_ERROR_INTERNAL
   2487 </pre>
   2488 An unexpected internal error has occurred. This error could be caused by a bug
   2489 in PCRE2 or by overwriting of the compiled pattern.
   2490 <pre>
   2491   PCRE2_ERROR_JIT_BADOPTION
   2492 </pre>
   2493 This error is returned when a pattern that was successfully studied using JIT
   2494 is being matched, but the matching mode (partial or complete match) does not
   2495 correspond to any JIT compilation mode. When the JIT fast path function is
   2496 used, this error may be also given for invalid options. See the
   2497 <a href="pcre2jit.html"><b>pcre2jit</b></a>
   2498 documentation for more details.
   2499 <pre>
   2500   PCRE2_ERROR_JIT_STACKLIMIT
   2501 </pre>
   2502 This error is returned when a pattern that was successfully studied using JIT
   2503 is being matched, but the memory available for the just-in-time processing
   2504 stack is not large enough. See the
   2505 <a href="pcre2jit.html"><b>pcre2jit</b></a>
   2506 documentation for more details.
   2507 <pre>
   2508   PCRE2_ERROR_MATCHLIMIT
   2509 </pre>
   2510 The backtracking limit was reached.
   2511 <pre>
   2512   PCRE2_ERROR_NOMEMORY
   2513 </pre>
   2514 If a pattern contains back references, but the ovector is not big enough to
   2515 remember the referenced substrings, PCRE2 gets a block of memory at the start
   2516 of matching to use for this purpose. There are some other special cases where
   2517 extra memory is needed during matching. This error is given when memory cannot
   2518 be obtained.
   2519 <pre>
   2520   PCRE2_ERROR_NULL
   2521 </pre>
   2522 Either the <i>code</i>, <i>subject</i>, or <i>match_data</i> argument was passed
   2523 as NULL.
   2524 <pre>
   2525   PCRE2_ERROR_RECURSELOOP
   2526 </pre>
   2527 This error is returned when <b>pcre2_match()</b> detects a recursion loop within
   2528 the pattern. Specifically, it means that either the whole pattern or a
   2529 subpattern has been called recursively for the second time at the same position
   2530 in the subject string. Some simple patterns that might do this are detected and
   2531 faulted at compile time, but more complicated cases, in particular mutual
   2532 recursions between two different subpatterns, cannot be detected until matching
   2533 is attempted.
   2534 <pre>
   2535   PCRE2_ERROR_RECURSIONLIMIT
   2536 </pre>
   2537 The internal recursion limit was reached.
   2538 <a name="geterrormessage"></a></P>
   2539 <br><a name="SEC31" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
   2540 <P>
   2541 <b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
   2542 <b>  PCRE2_SIZE <i>bufflen</i>);</b>
   2543 </P>
   2544 <P>
   2545 A text message for an error code from any PCRE2 function (compile, match, or
   2546 auxiliary) can be obtained by calling <b>pcre2_get_error_message()</b>. The code
   2547 is passed as the first argument, with the remaining two arguments specifying a
   2548 code unit buffer and its length, into which the text message is placed. Note
   2549 that the message is returned in code units of the appropriate width for the
   2550 library that is being used.
   2551 </P>
   2552 <P>
   2553 The returned message is terminated with a trailing zero, and the function
   2554 returns the number of code units used, excluding the trailing zero. If the
   2555 error number is unknown, the negative error code PCRE2_ERROR_BADDATA is
   2556 returned. If the buffer is too small, the message is truncated (but still with
   2557 a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
   2558 None of the messages are very long; a buffer size of 120 code units is ample.
   2559 <a name="extractbynumber"></a></P>
   2560 <br><a name="SEC32" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
   2561 <P>
   2562 <b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
   2563 <b>  uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
   2564 <br>
   2565 <br>
   2566 <b>int pcre2_substring_copy_bynumber(pcre2_match_data *<i>match_data</i>,</b>
   2567 <b>  uint32_t <i>number</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
   2568 <b>  PCRE2_SIZE *<i>bufflen</i>);</b>
   2569 <br>
   2570 <br>
   2571 <b>int pcre2_substring_get_bynumber(pcre2_match_data *<i>match_data</i>,</b>
   2572 <b>  uint32_t <i>number</i>, PCRE2_UCHAR **<i>bufferptr</i>,</b>
   2573 <b>  PCRE2_SIZE *<i>bufflen</i>);</b>
   2574 <br>
   2575 <br>
   2576 <b>void pcre2_substring_free(PCRE2_UCHAR *<i>buffer</i>);</b>
   2577 </P>
   2578 <P>
   2579 Captured substrings can be accessed directly by using the ovector as described
   2580 <a href="#matchedstrings">above.</a>
   2581 For convenience, auxiliary functions are provided for extracting captured
   2582 substrings as new, separate, zero-terminated strings. A substring that contains
   2583 a binary zero is correctly extracted and has a further zero added on the end,
   2584 but the result is not, of course, a C string.
   2585 </P>
   2586 <P>
   2587 The functions in this section identify substrings by number. The number zero
   2588 refers to the entire matched substring, with higher numbers referring to
   2589 substrings captured by parenthesized groups. After a partial match, only
   2590 substring zero is available. An attempt to extract any other substring gives
   2591 the error PCRE2_ERROR_PARTIAL. The next section describes similar functions for
   2592 extracting captured substrings by name.
   2593 </P>
   2594 <P>
   2595 If a pattern uses the \K escape sequence within a positive assertion, the
   2596 reported start of a successful match can be greater than the end of the match.
   2597 For example, if the pattern (?=ab\K) is matched against "ab", the start and
   2598 end offset values for the match are 2 and 0. In this situation, calling these
   2599 functions with a zero substring number extracts a zero-length empty string.
   2600 </P>
   2601 <P>
   2602 You can find the length in code units of a captured substring without
   2603 extracting it by calling <b>pcre2_substring_length_bynumber()</b>. The first
   2604 argument is a pointer to the match data block, the second is the group number,
   2605 and the third is a pointer to a variable into which the length is placed. If
   2606 you just want to know whether or not the substring has been captured, you can
   2607 pass the third argument as NULL.
   2608 </P>
   2609 <P>
   2610 The <b>pcre2_substring_copy_bynumber()</b> function copies a captured substring
   2611 into a supplied buffer, whereas <b>pcre2_substring_get_bynumber()</b> copies it
   2612 into new memory, obtained using the same memory allocation function that was
   2613 used for the match data block. The first two arguments of these functions are a
   2614 pointer to the match data block and a capturing group number.
   2615 </P>
   2616 <P>
   2617 The final arguments of <b>pcre2_substring_copy_bynumber()</b> are a pointer to
   2618 the buffer and a pointer to a variable that contains its length in code units.
   2619 This is updated to contain the actual number of code units used for the
   2620 extracted substring, excluding the terminating zero.
   2621 </P>
   2622 <P>
   2623 For <b>pcre2_substring_get_bynumber()</b> the third and fourth arguments point
   2624 to variables that are updated with a pointer to the new memory and the number
   2625 of code units that comprise the substring, again excluding the terminating
   2626 zero. When the substring is no longer needed, the memory should be freed by
   2627 calling <b>pcre2_substring_free()</b>.
   2628 </P>
   2629 <P>
   2630 The return value from all these functions is zero for success, or a negative
   2631 error code. If the pattern match failed, the match failure code is returned.
   2632 If a substring number greater than zero is used after a partial match,
   2633 PCRE2_ERROR_PARTIAL is returned. Other possible error codes are:
   2634 <pre>
   2635   PCRE2_ERROR_NOMEMORY
   2636 </pre>
   2637 The buffer was too small for <b>pcre2_substring_copy_bynumber()</b>, or the
   2638 attempt to get memory failed for <b>pcre2_substring_get_bynumber()</b>.
   2639 <pre>
   2640   PCRE2_ERROR_NOSUBSTRING
   2641 </pre>
   2642 There is no substring with that number in the pattern, that is, the number is
   2643 greater than the number of capturing parentheses.
   2644 <pre>
   2645   PCRE2_ERROR_UNAVAILABLE
   2646 </pre>
   2647 The substring number, though not greater than the number of captures in the
   2648 pattern, is greater than the number of slots in the ovector, so the substring
   2649 could not be captured.
   2650 <pre>
   2651   PCRE2_ERROR_UNSET
   2652 </pre>
   2653 The substring did not participate in the match. For example, if the pattern is
   2654 (abc)|(def) and the subject is "def", and the ovector contains at least two
   2655 capturing slots, substring number 1 is unset.
   2656 </P>
   2657 <br><a name="SEC33" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
   2658 <P>
   2659 <b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
   2660 <b>"  PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
   2661 <br>
   2662 <br>
   2663 <b>void pcre2_substring_list_free(PCRE2_SPTR *<i>list</i>);</b>
   2664 </P>
   2665 <P>
   2666 The <b>pcre2_substring_list_get()</b> function extracts all available substrings
   2667 and builds a list of pointers to them. It also (optionally) builds a second
   2668 list that contains their lengths (in code units), excluding a terminating zero
   2669 that is added to each of them. All this is done in a single block of memory
   2670 that is obtained using the same memory allocation function that was used to get
   2671 the match data block.
   2672 </P>
   2673 <P>
   2674 This function must be called only after a successful match. If called after a
   2675 partial match, the error code PCRE2_ERROR_PARTIAL is returned.
   2676 </P>
   2677 <P>
   2678 The address of the memory block is returned via <i>listptr</i>, which is also
   2679 the start of the list of string pointers. The end of the list is marked by a
   2680 NULL pointer. The address of the list of lengths is returned via
   2681 <i>lengthsptr</i>. If your strings do not contain binary zeros and you do not
   2682 therefore need the lengths, you may supply NULL as the <b>lengthsptr</b>
   2683 argument to disable the creation of a list of lengths. The yield of the
   2684 function is zero if all went well, or PCRE2_ERROR_NOMEMORY if the memory block
   2685 could not be obtained. When the list is no longer needed, it should be freed by
   2686 calling <b>pcre2_substring_list_free()</b>.
   2687 </P>
   2688 <P>
   2689 If this function encounters a substring that is unset, which can happen when
   2690 capturing subpattern number <i>n+1</i> matches some part of the subject, but
   2691 subpattern <i>n</i> has not been used at all, it returns an empty string. This
   2692 can be distinguished from a genuine zero-length substring by inspecting the
   2693 appropriate offset in the ovector, which contain PCRE2_UNSET for unset
   2694 substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
   2695 <a name="extractbyname"></a></P>
   2696 <br><a name="SEC34" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
   2697 <P>
   2698 <b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
   2699 <b>  PCRE2_SPTR <i>name</i>);</b>
   2700 <br>
   2701 <br>
   2702 <b>int pcre2_substring_length_byname(pcre2_match_data *<i>match_data</i>,</b>
   2703 <b>  PCRE2_SPTR <i>name</i>, PCRE2_SIZE *<i>length</i>);</b>
   2704 <br>
   2705 <br>
   2706 <b>int pcre2_substring_copy_byname(pcre2_match_data *<i>match_data</i>,</b>
   2707 <b>  PCRE2_SPTR <i>name</i>, PCRE2_UCHAR *<i>buffer</i>, PCRE2_SIZE *<i>bufflen</i>);</b>
   2708 <br>
   2709 <br>
   2710 <b>int pcre2_substring_get_byname(pcre2_match_data *<i>match_data</i>,</b>
   2711 <b>  PCRE2_SPTR <i>name</i>, PCRE2_UCHAR **<i>bufferptr</i>, PCRE2_SIZE *<i>bufflen</i>);</b>
   2712 <br>
   2713 <br>
   2714 <b>void pcre2_substring_free(PCRE2_UCHAR *<i>buffer</i>);</b>
   2715 </P>
   2716 <P>
   2717 To extract a substring by name, you first have to find associated number.
   2718 For example, for this pattern:
   2719 <pre>
   2720   (a+)b(?&#60;xxx&#62;\d+)...
   2721 </pre>
   2722 the number of the subpattern called "xxx" is 2. If the name is known to be
   2723 unique (PCRE2_DUPNAMES was not set), you can find the number from the name by
   2724 calling <b>pcre2_substring_number_from_name()</b>. The first argument is the
   2725 compiled pattern, and the second is the name. The yield of the function is the
   2726 subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
   2727 name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
   2728 that name. Given the number, you can extract the substring directly, or use one
   2729 of the functions described above.
   2730 </P>
   2731 <P>
   2732 For convenience, there are also "byname" functions that correspond to the
   2733 "bynumber" functions, the only difference being that the second argument is a
   2734 name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
   2735 names, these functions scan all the groups with the given name, and return the
   2736 first named string that is set.
   2737 </P>
   2738 <P>
   2739 If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
   2740 returned. If all groups with the name have numbers that are greater than the
   2741 number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there
   2742 is at least one group with a slot in the ovector, but no group is found to be
   2743 set, PCRE2_ERROR_UNSET is returned.
   2744 </P>
   2745 <P>
   2746 <b>Warning:</b> If the pattern uses the (?| feature to set up multiple
   2747 subpatterns with the same number, as described in the
   2748 <a href="pcre2pattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
   2749 in the
   2750 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
   2751 page, you cannot use names to distinguish the different subpatterns, because
   2752 names are not included in the compiled code. The matching process uses only
   2753 numbers. For this reason, the use of different names for subpatterns of the
   2754 same number causes an error at compile time.
   2755 </P>
   2756 <br><a name="SEC35" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
   2757 <P>
   2758 <b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
   2759 <b>  PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
   2760 <b>  uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b>
   2761 <b>  pcre2_match_context *<i>mcontext</i>, PCRE2_SPTR <i>replacement</i>,</b>
   2762 <b>  PCRE2_SIZE <i>rlength</i>, PCRE2_UCHAR *\fIoutputbuffer\zfP,</b>
   2763 <b>  PCRE2_SIZE *<i>outlengthptr</i>);</b>
   2764 </P>
   2765 <P>
   2766 This function calls <b>pcre2_match()</b> and then makes a copy of the subject
   2767 string in <i>outputbuffer</i>, replacing the part that was matched with the
   2768 <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This can
   2769 be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
   2770 which a \K item in a lookahead in the pattern causes the match to end before
   2771 it starts are not supported, and give rise to an error return.
   2772 </P>
   2773 <P>
   2774 The first seven arguments of <b>pcre2_substitute()</b> are the same as for
   2775 <b>pcre2_match()</b>, except that the partial matching options are not
   2776 permitted, and <i>match_data</i> may be passed as NULL, in which case a match
   2777 data block is obtained and freed within this function, using memory management
   2778 functions from the match context, if provided, or else those that were used to
   2779 allocate memory for the compiled code.
   2780 </P>
   2781 <P>
   2782 The <i>outlengthptr</i> argument must point to a variable that contains the
   2783 length, in code units, of the output buffer. If the function is successful, the
   2784 value is updated to contain the length of the new string, excluding the
   2785 trailing zero that is automatically added.
   2786 </P>
   2787 <P>
   2788 If the function is not successful, the value set via <i>outlengthptr</i> depends
   2789 on the type of error. For syntax errors in the replacement string, the value is
   2790 the offset in the replacement string where the error was detected. For other
   2791 errors, the value is PCRE2_UNSET by default. This includes the case of the
   2792 output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set
   2793 (see below), in which case the value is the minimum length needed, including
   2794 space for the trailing zero. Note that in order to compute the required length,
   2795 <b>pcre2_substitute()</b> has to simulate all the matching and copying, instead
   2796 of giving an error return as soon as the buffer overflows. Note also that the
   2797 length is in code units, not bytes.
   2798 </P>
   2799 <P>
   2800 In the replacement string, which is interpreted as a UTF string in UTF mode,
   2801 and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
   2802 dollar character is an escape character that can specify the insertion of
   2803 characters from capturing groups or (*MARK) items in the pattern. The following
   2804 forms are always recognized:
   2805 <pre>
   2806   $$                  insert a dollar character
   2807   $&#60;n&#62; or ${&#60;n&#62;}      insert the contents of group &#60;n&#62;
   2808   $*MARK or ${*MARK}  insert the name of the last (*MARK) encountered
   2809 </pre>
   2810 Either a group number or a group name can be given for &#60;n&#62;. Curly brackets are
   2811 required only if the following character would be interpreted as part of the
   2812 number or name. The number may be zero to include the entire matched string.
   2813 For example, if the pattern a(b)c is matched with "=abc=" and the replacement
   2814 string "+$1$0$1+", the result is "=+babcb+=".
   2815 </P>
   2816 <P>
   2817 The facility for inserting a (*MARK) name can be used to perform simple
   2818 simultaneous substitutions, as this <b>pcre2test</b> example shows:
   2819 <pre>
   2820   /(*:pear)apple|(*:orange)lemon/g,replace=${*MARK}
   2821       apple lemon
   2822    2: pear orange
   2823 </pre>
   2824 As well as the usual options for <b>pcre2_match()</b>, a number of additional
   2825 options can be set in the <i>options</i> argument.
   2826 </P>
   2827 <P>
   2828 PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
   2829 replacing every matching substring. If this is not set, only the first matching
   2830 substring is replaced. If any matched substring has zero length, after the
   2831 substitution has happened, an attempt to find a non-empty match at the same
   2832 position is performed. If this is not successful, the current position is
   2833 advanced by one character except when CRLF is a valid newline sequence and the
   2834 next two characters are CR, LF. In this case, the current position is advanced
   2835 by two characters.
   2836 </P>
   2837 <P>
   2838 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
   2839 too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
   2840 this option is set, however, <b>pcre2_substitute()</b> continues to go through
   2841 the motions of matching and substituting (without, of course, writing anything)
   2842 in order to compute the size of buffer that is needed. This value is passed
   2843 back via the <i>outlengthptr</i> variable, with the result of the function still
   2844 being PCRE2_ERROR_NOMEMORY.
   2845 </P>
   2846 <P>
   2847 Passing a buffer size of zero is a permitted way of finding out how much memory
   2848 is needed for given substitution. However, this does mean that the entire
   2849 operation is carried out twice. Depending on the application, it may be more
   2850 efficient to allocate a large buffer and free the excess afterwards, instead of
   2851 using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
   2852 </P>
   2853 <P>
   2854 PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capturing groups that do
   2855 not appear in the pattern to be treated as unset groups. This option should be
   2856 used with care, because it means that a typo in a group name or number no
   2857 longer causes the PCRE2_ERROR_NOSUBSTRING error.
   2858 </P>
   2859 <P>
   2860 PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups (including unknown
   2861 groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
   2862 strings when inserted as described above. If this option is not set, an attempt
   2863 to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does
   2864 not influence the extended substitution syntax described below.
   2865 </P>
   2866 <P>
   2867 PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
   2868 replacement string. Without this option, only the dollar character is special,
   2869 and only the group insertion forms listed above are valid. When
   2870 PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
   2871 </P>
   2872 <P>
   2873 Firstly, backslash in a replacement string is interpreted as an escape
   2874 character. The usual forms such as \n or \x{ddd} can be used to specify
   2875 particular character codes, and backslash followed by any non-alphanumeric
   2876 character quotes that character. Extended quoting can be coded using \Q...\E,
   2877 exactly as in pattern strings.
   2878 </P>
   2879 <P>
   2880 There are also four escape sequences for forcing the case of inserted letters.
   2881 The insertion mechanism has three states: no case forcing, force upper case,
   2882 and force lower case. The escape sequences change the current state: \U and
   2883 \L change to upper or lower case forcing, respectively, and \E (when not
   2884 terminating a \Q quoted sequence) reverts to no case forcing. The sequences
   2885 \u and \l force the next character (if it is a letter) to upper or lower
   2886 case, respectively, and then the state automatically reverts to no case
   2887 forcing. Case forcing applies to all inserted  characters, including those from
   2888 captured groups and letters within \Q...\E quoted sequences.
   2889 </P>
   2890 <P>
   2891 Note that case forcing sequences such as \U...\E do not nest. For example,
   2892 the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final \E has no
   2893 effect.
   2894 </P>
   2895 <P>
   2896 The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
   2897 flexibility to group substitution. The syntax is similar to that used by Bash:
   2898 <pre>
   2899   ${&#60;n&#62;:-&#60;string&#62;}
   2900   ${&#60;n&#62;:+&#60;string1&#62;:&#60;string2&#62;}
   2901 </pre>
   2902 As before, &#60;n&#62; may be a group number or a name. The first form specifies a
   2903 default value. If group &#60;n&#62; is set, its value is inserted; if not, &#60;string&#62; is
   2904 expanded and the result inserted. The second form specifies strings that are
   2905 expanded and inserted when group &#60;n&#62; is set or unset, respectively. The first
   2906 form is just a convenient shorthand for
   2907 <pre>
   2908   ${&#60;n&#62;:+${&#60;n&#62;}:&#60;string&#62;}
   2909 </pre>
   2910 Backslash can be used to escape colons and closing curly brackets in the
   2911 replacement strings. A change of the case forcing state within a replacement
   2912 string remains in force afterwards, as shown in this <b>pcre2test</b> example:
   2913 <pre>
   2914   /(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
   2915       body
   2916    1: hello
   2917       somebody
   2918    1: HELLO
   2919 </pre>
   2920 The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
   2921 substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
   2922 groups in the extended syntax forms to be treated as unset.
   2923 </P>
   2924 <P>
   2925 If successful, <b>pcre2_substitute()</b> returns the number of replacements that
   2926 were made. This may be zero if no matches were found, and is never greater than
   2927 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
   2928 </P>
   2929 <P>
   2930 In the event of an error, a negative error code is returned. Except for
   2931 PCRE2_ERROR_NOMATCH (which is never returned), errors from <b>pcre2_match()</b>
   2932 are passed straight back.
   2933 </P>
   2934 <P>
   2935 PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
   2936 unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
   2937 </P>
   2938 <P>
   2939 PCRE2_ERROR_UNSET is returned for an unset substring insertion (including an
   2940 unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) when the simple
   2941 (non-extended) syntax is used and PCRE2_SUBSTITUTE_UNSET_EMPTY is not set.
   2942 </P>
   2943 <P>
   2944 PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough. If the
   2945 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is
   2946 needed is returned via <i>outlengthptr</i>. Note that this does not happen by
   2947 default.
   2948 </P>
   2949 <P>
   2950 PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
   2951 replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
   2952 (invalid escape sequence), PCRE2_ERROR_REPMISSING_BRACE (closing curly bracket
   2953 not found), PCRE2_BADSUBSTITUTION (syntax error in extended group
   2954 substitution), and PCRE2_BADSUBPATTERN (the pattern match ended before it
   2955 started, which can happen if \K is used in an assertion).
   2956 </P>
   2957 <P>
   2958 As for all PCRE2 errors, a text message that describes the error can be
   2959 obtained by calling the <b>pcre2_get_error_message()</b> function (see
   2960 "Obtaining a textual error message"
   2961 <a href="#geterrormessage">above).</a>
   2962 </P>
   2963 <br><a name="SEC36" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
   2964 <P>
   2965 <b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
   2966 <b>  PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
   2967 </P>
   2968 <P>
   2969 When a pattern is compiled with the PCRE2_DUPNAMES option, names for
   2970 subpatterns are not required to be unique. Duplicate names are always allowed
   2971 for subpatterns with the same number, created by using the (?| feature. Indeed,
   2972 if such subpatterns are named, they are required to use the same names.
   2973 </P>
   2974 <P>
   2975 Normally, patterns with duplicate names are such that in any one match, only
   2976 one of the named subpatterns participates. An example is shown in the
   2977 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
   2978 documentation.
   2979 </P>
   2980 <P>
   2981 When duplicates are present, <b>pcre2_substring_copy_byname()</b> and
   2982 <b>pcre2_substring_get_byname()</b> return the first substring corresponding to
   2983 the given name that is set. Only if none are set is PCRE2_ERROR_UNSET is
   2984 returned. The <b>pcre2_substring_number_from_name()</b> function returns the
   2985 error PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate names.
   2986 </P>
   2987 <P>
   2988 If you want to get full details of all captured substrings for a given name,
   2989 you must use the <b>pcre2_substring_nametable_scan()</b> function. The first
   2990 argument is the compiled pattern, and the second is the name. If the third and
   2991 fourth arguments are NULL, the function returns a group number for a unique
   2992 name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise.
   2993 </P>
   2994 <P>
   2995 When the third and fourth arguments are not NULL, they must be pointers to
   2996 variables that are updated by the function. After it has run, they point to the
   2997 first and last entries in the name-to-number table for the given name, and the
   2998 function returns the length of each entry in code units. In both cases,
   2999 PCRE2_ERROR_NOSUBSTRING is returned if there are no entries for the given name.
   3000 </P>
   3001 <P>
   3002 The format of the name table is described
   3003 <a href="#infoaboutpattern">above</a>
   3004 in the section entitled <i>Information about a pattern</i>. Given all the
   3005 relevant entries for the name, you can extract each of their numbers, and hence
   3006 the captured data.
   3007 </P>
   3008 <br><a name="SEC37" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
   3009 <P>
   3010 The traditional matching function uses a similar algorithm to Perl, which stops
   3011 when it finds the first match at a given point in the subject. If you want to
   3012 find all possible matches, or the longest possible match at a given position,
   3013 consider using the alternative matching function (see below) instead. If you
   3014 cannot use the alternative function, you can kludge it up by making use of the
   3015 callout facility, which is described in the
   3016 <a href="pcre2callout.html"><b>pcre2callout</b></a>
   3017 documentation.
   3018 </P>
   3019 <P>
   3020 What you have to do is to insert a callout right at the end of the pattern.
   3021 When your callout function is called, extract and save the current matched
   3022 substring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try
   3023 other alternatives. Ultimately, when it runs out of matches,
   3024 <b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
   3025 <a name="dfamatch"></a></P>
   3026 <br><a name="SEC38" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
   3027 <P>
   3028 <b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
   3029 <b>  PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
   3030 <b>  uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b>
   3031 <b>  pcre2_match_context *<i>mcontext</i>,</b>
   3032 <b>  int *<i>workspace</i>, PCRE2_SIZE <i>wscount</i>);</b>
   3033 </P>
   3034 <P>
   3035 The function <b>pcre2_dfa_match()</b> is called to match a subject string
   3036 against a compiled pattern, using a matching algorithm that scans the subject
   3037 string just once, and does not backtrack. This has different characteristics to
   3038 the normal algorithm, and is not compatible with Perl. Some of the features of
   3039 PCRE2 patterns are not supported. Nevertheless, there are times when this kind
   3040 of matching can be useful. For a discussion of the two matching algorithms, and
   3041 a list of features that <b>pcre2_dfa_match()</b> does not support, see the
   3042 <a href="pcre2matching.html"><b>pcre2matching</b></a>
   3043 documentation.
   3044 </P>
   3045 <P>
   3046 The arguments for the <b>pcre2_dfa_match()</b> function are the same as for
   3047 <b>pcre2_match()</b>, plus two extras. The ovector within the match data block
   3048 is used in a different way, and this is described below. The other common
   3049 arguments are used in the same way as for <b>pcre2_match()</b>, so their
   3050 description is not repeated here.
   3051 </P>
   3052 <P>
   3053 The two additional arguments provide workspace for the function. The workspace
   3054 vector should contain at least 20 elements. It is used for keeping track of
   3055 multiple paths through the pattern tree. More workspace is needed for patterns
   3056 and subjects where there are a lot of potential matches.
   3057 </P>
   3058 <P>
   3059 Here is an example of a simple call to <b>pcre2_dfa_match()</b>:
   3060 <pre>
   3061   int wspace[20];
   3062   pcre2_match_data *md = pcre2_match_data_create(4, NULL);
   3063   int rc = pcre2_dfa_match(
   3064     re,             /* result of pcre2_compile() */
   3065     "some string",  /* the subject string */
   3066     11,             /* the length of the subject string */
   3067     0,              /* start at offset 0 in the subject */
   3068     0,              /* default options */
   3069     match_data,     /* the match data block */
   3070     NULL,           /* a match context; NULL means use defaults */
   3071     wspace,         /* working space vector */
   3072     20);            /* number of elements (NOT size in bytes) */
   3073 </PRE>
   3074 </P>
   3075 <br><b>
   3076 Option bits for <b>pcre_dfa_match()</b>
   3077 </b><br>
   3078 <P>
   3079 The unused bits of the <i>options</i> argument for <b>pcre2_dfa_match()</b> must
   3080 be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_NOTBOL,
   3081 PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK,
   3082 PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and
   3083 PCRE2_DFA_RESTART. All but the last four of these are exactly the same as for
   3084 <b>pcre2_match()</b>, so their description is not repeated here.
   3085 <pre>
   3086   PCRE2_PARTIAL_HARD
   3087   PCRE2_PARTIAL_SOFT
   3088 </pre>
   3089 These have the same general effect as they do for <b>pcre2_match()</b>, but the
   3090 details are slightly different. When PCRE2_PARTIAL_HARD is set for
   3091 <b>pcre2_dfa_match()</b>, it returns PCRE2_ERROR_PARTIAL if the end of the
   3092 subject is reached and there is still at least one matching possibility that
   3093 requires additional characters. This happens even if some complete matches have
   3094 already been found. When PCRE2_PARTIAL_SOFT is set, the return code
   3095 PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL if the end of the
   3096 subject is reached, there have been no complete matches, but there is still at
   3097 least one matching possibility. The portion of the string that was inspected
   3098 when the longest partial match was found is set as the first matching string in
   3099 both cases. There is a more detailed discussion of partial and multi-segment
   3100 matching, with examples, in the
   3101 <a href="pcre2partial.html"><b>pcre2partial</b></a>
   3102 documentation.
   3103 <pre>
   3104   PCRE2_DFA_SHORTEST
   3105 </pre>
   3106 Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to stop as
   3107 soon as it has found one match. Because of the way the alternative algorithm
   3108 works, this is necessarily the shortest possible match at the first possible
   3109 matching point in the subject string.
   3110 <pre>
   3111   PCRE2_DFA_RESTART
   3112 </pre>
   3113 When <b>pcre2_dfa_match()</b> returns a partial match, it is possible to call it
   3114 again, with additional subject characters, and have it continue with the same
   3115 match. The PCRE2_DFA_RESTART option requests this action; when it is set, the
   3116 <i>workspace</i> and <i>wscount</i> options must reference the same vector as
   3117 before because data about the match so far is left in them after a partial
   3118 match. There is more discussion of this facility in the
   3119 <a href="pcre2partial.html"><b>pcre2partial</b></a>
   3120 documentation.
   3121 </P>
   3122 <br><b>
   3123 Successful returns from <b>pcre2_dfa_match()</b>
   3124 </b><br>
   3125 <P>
   3126 When <b>pcre2_dfa_match()</b> succeeds, it may have matched more than one
   3127 substring in the subject. Note, however, that all the matches from one run of
   3128 the function start at the same point in the subject. The shorter matches are
   3129 all initial substrings of the longer matches. For example, if the pattern
   3130 <pre>
   3131   &#60;.*&#62;
   3132 </pre>
   3133 is matched against the string
   3134 <pre>
   3135   This is &#60;something&#62; &#60;something else&#62; &#60;something further&#62; no more
   3136 </pre>
   3137 the three matched strings are
   3138 <pre>
   3139   &#60;something&#62; &#60;something else&#62; &#60;something further&#62;
   3140   &#60;something&#62; &#60;something else&#62;
   3141   &#60;something&#62;
   3142 </pre>
   3143 On success, the yield of the function is a number greater than zero, which is
   3144 the number of matched substrings. The offsets of the substrings are returned in
   3145 the ovector, and can be extracted by number in the same way as for
   3146 <b>pcre2_match()</b>, but the numbers bear no relation to any capturing groups
   3147 that may exist in the pattern, because DFA matching does not support group
   3148 capture.
   3149 </P>
   3150 <P>
   3151 Calls to the convenience functions that extract substrings by name
   3152 return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
   3153 DFA match. The convenience functions that extract substrings by number never
   3154 return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
   3155 slightly different:
   3156 <pre>
   3157   PCRE2_ERROR_UNAVAILABLE
   3158 </pre>
   3159 The ovector is not big enough to include a slot for the given substring number.
   3160 <pre>
   3161   PCRE2_ERROR_UNSET
   3162 </pre>
   3163 There is a slot in the ovector for this substring, but there were insufficient
   3164 matches to fill it.
   3165 </P>
   3166 <P>
   3167 The matched strings are stored in the ovector in reverse order of length; that
   3168 is, the longest matching string is first. If there were too many matches to fit
   3169 into the ovector, the yield of the function is zero, and the vector is filled
   3170 with the longest matches.
   3171 </P>
   3172 <P>
   3173 NOTE: PCRE2's "auto-possessification" optimization usually applies to character
   3174 repeats at the end of a pattern (as well as internally). For example, the
   3175 pattern "a\d+" is compiled as if it were "a\d++". For DFA matching, this
   3176 means that only one possible match is found. If you really do want multiple
   3177 matches in such cases, either use an ungreedy repeat auch as "a\d+?" or set
   3178 the PCRE2_NO_AUTO_POSSESS option when compiling.
   3179 </P>
   3180 <br><b>
   3181 Error returns from <b>pcre2_dfa_match()</b>
   3182 </b><br>
   3183 <P>
   3184 The <b>pcre2_dfa_match()</b> function returns a negative number when it fails.
   3185 Many of the errors are the same as for <b>pcre2_match()</b>, as described
   3186 <a href="#errorlist">above.</a>
   3187 There are in addition the following errors that are specific to
   3188 <b>pcre2_dfa_match()</b>:
   3189 <pre>
   3190   PCRE2_ERROR_DFA_UITEM
   3191 </pre>
   3192 This return is given if <b>pcre2_dfa_match()</b> encounters an item in the
   3193 pattern that it does not support, for instance, the use of \C in a UTF mode or
   3194 a back reference.
   3195 <pre>
   3196   PCRE2_ERROR_DFA_UCOND
   3197 </pre>
   3198 This return is given if <b>pcre2_dfa_match()</b> encounters a condition item
   3199 that uses a back reference for the condition, or a test for recursion in a
   3200 specific group. These are not supported.
   3201 <pre>
   3202   PCRE2_ERROR_DFA_WSSIZE
   3203 </pre>
   3204 This return is given if <b>pcre2_dfa_match()</b> runs out of space in the
   3205 <i>workspace</i> vector.
   3206 <pre>
   3207   PCRE2_ERROR_DFA_RECURSE
   3208 </pre>
   3209 When a recursive subpattern is processed, the matching function calls itself
   3210 recursively, using private memory for the ovector and <i>workspace</i>. This
   3211 error is given if the internal ovector is not large enough. This should be
   3212 extremely rare, as a vector of size 1000 is used.
   3213 <pre>
   3214   PCRE2_ERROR_DFA_BADRESTART
   3215 </pre>
   3216 When <b>pcre2_dfa_match()</b> is called with the <b>PCRE2_DFA_RESTART</b> option,
   3217 some plausibility checks are made on the contents of the workspace, which
   3218 should contain data about the previous partial match. If any of these checks
   3219 fail, this error is given.
   3220 </P>
   3221 <br><a name="SEC39" href="#TOC1">SEE ALSO</a><br>
   3222 <P>
   3223 <b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
   3224 <b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
   3225 <b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
   3226 </P>
   3227 <br><a name="SEC40" href="#TOC1">AUTHOR</a><br>
   3228 <P>
   3229 Philip Hazel
   3230 <br>
   3231 University Computing Service
   3232 <br>
   3233 Cambridge, England.
   3234 <br>
   3235 </P>
   3236 <br><a name="SEC41" href="#TOC1">REVISION</a><br>
   3237 <P>
   3238 Last updated: 17 June 2016
   3239 <br>
   3240 Copyright &copy; 1997-2016 University of Cambridge.
   3241 <br>
   3242 <p>
   3243 Return to the <a href="index.html">PCRE2 index page</a>.
   3244 </p>
   3245