Home | History | Annotate | Download | only in html
      1 <html>
      2 <head>
      3 <title>pcre2convert specification</title>
      4 </head>
      5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
      6 <h1>pcre2convert man page</h1>
      7 <p>
      8 Return to the <a href="index.html">PCRE2 index page</a>.
      9 </p>
     10 <p>
     11 This page is part of the PCRE2 HTML documentation. It was generated
     12 automatically from the original man page. If there is any nonsense in it,
     13 please consult the man page, in case the conversion went wrong.
     14 <br>
     15 <ul>
     16 <li><a name="TOC1" href="#SEC1">EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a>
     17 <li><a name="TOC2" href="#SEC2">THE CONVERT CONTEXT</a>
     18 <li><a name="TOC3" href="#SEC3">THE CONVERSION FUNCTION</a>
     19 <li><a name="TOC4" href="#SEC4">CONVERTING GLOBS</a>
     20 <li><a name="TOC5" href="#SEC5">CONVERTING POSIX PATTERNS</a>
     21 <li><a name="TOC6" href="#SEC6">AUTHOR</a>
     22 <li><a name="TOC7" href="#SEC7">REVISION</a>
     23 </ul>
     24 <br><a name="SEC1" href="#TOC1">EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a><br>
     25 <P>
     26 This document describes a set of functions that can be used to convert
     27 "foreign" patterns into PCRE2 regular expressions. This facility is currently
     28 experimental, and may be changed in future releases. Two kinds of pattern,
     29 globs and POSIX patterns, are supported.
     30 </P>
     31 <br><a name="SEC2" href="#TOC1">THE CONVERT CONTEXT</a><br>
     32 <P>
     33 <b>pcre2_convert_context *pcre2_convert_context_create(</b>
     34 <b>  pcre2_general_context *<i>gcontext</i>);</b>
     35 <br>
     36 <br>
     37 <b>pcre2_convert_context *pcre2_convert_context_copy(</b>
     38 <b>  pcre2_convert_context *<i>cvcontext</i>);</b>
     39 <br>
     40 <br>
     41 <b>void pcre2_convert_context_free(pcre2_convert_context *<i>cvcontext</i>);</b>
     42 <br>
     43 <br>
     44 <b>int pcre2_set_glob_escape(pcre2_convert_context *<i>cvcontext</i>,</b>
     45 <b>  uint32_t <i>escape_char</i>);</b>
     46 <br>
     47 <br>
     48 <b>int pcre2_set_glob_separator(pcre2_convert_context *<i>cvcontext</i>,</b>
     49 <b>  uint32_t <i>separator_char</i>);</b>
     50 <br>
     51 <br>
     52 A convert context is used to hold parameters that affect the way that pattern
     53 conversion works. Like all PCRE2 contexts, you need to use a context only if
     54 you want to override the defaults. There are the usual create, copy, and free
     55 functions. If custom memory management functions are set in a general context
     56 that is passed to <b>pcre2_convert_context_create()</b>, they are used for all
     57 memory management within the conversion functions.
     58 </P>
     59 <P>
     60 There are only two parameters in the convert context at present. Both apply
     61 only to glob conversions. The escape character defaults to grave accent under
     62 Windows, otherwise backslash. It can be set to zero, meaning no escape
     63 character, or to any punctuation character with a code point less than 256.
     64 The separator character defaults to backslash under Windows, otherwise forward
     65 slash. It can be set to forward slash, backslash, or dot.
     66 </P>
     67 <P>
     68 The two setting functions return zero on success, or PCRE2_ERROR_BADDATA if
     69 their second argument is invalid.
     70 </P>
     71 <br><a name="SEC3" href="#TOC1">THE CONVERSION FUNCTION</a><br>
     72 <P>
     73 <b>int pcre2_pattern_convert(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
     74 <b>  uint32_t <i>options</i>, PCRE2_UCHAR **<i>buffer</i>,</b>
     75 <b>  PCRE2_SIZE *<i>blength</i>, pcre2_convert_context *<i>cvcontext</i>);</b>
     76 <br>
     77 <br>
     78 <b>void pcre2_converted_pattern_free(PCRE2_UCHAR *<i>converted_pattern</i>);</b>
     79 <br>
     80 <br>
     81 The first two arguments of <b>pcre2_pattern_convert()</b> define the foreign
     82 pattern that is to be converted. The length may be given as
     83 PCRE2_ZERO_TERMINATED. The <b>options</b> argument defines how the pattern is to
     84 be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set.
     85 PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid.
     86 One or more of the glob options, or one of the following POSIX options must be
     87 set to define the type of conversion that is required:
     88 <pre>
     89   PCRE2_CONVERT_GLOB
     90   PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
     91   PCRE2_CONVERT_GLOB_NO_STARSTAR
     92   PCRE2_CONVERT_POSIX_BASIC
     93   PCRE2_CONVERT_POSIX_EXTENDED
     94 </pre>
     95 Details of the conversions are given below. The <b>buffer</b> and <b>blength</b>
     96 arguments define how the output is handled:
     97 </P>
     98 <P>
     99 If <b>buffer</b> is NULL, the function just returns the length of the converted
    100 pattern via <b>blength</b>. This is one less than the length of buffer needed,
    101 because a terminating zero is always added to the output.
    102 </P>
    103 <P>
    104 If <b>buffer</b> points to a NULL pointer, an output buffer is obtained using
    105 the allocator in the context or <b>malloc()</b> if no context is supplied. A
    106 pointer to this buffer is placed in the variable to which <b>buffer</b> points.
    107 When no longer needed the output buffer must be freed by calling
    108 <b>pcre2_converted_pattern_free()</b>. If this function is called with a NULL
    109 argument, it returns immediately without doing anything.
    110 </P>
    111 <P>
    112 If <b>buffer</b> points to a non-NULL pointer, <b>blength</b> must be set to the
    113 actual length of the buffer provided (in code units).
    114 </P>
    115 <P>
    116 In all cases, after successful conversion, the variable pointed to by
    117 <b>blength</b> is updated to the length actually used (in code units), excluding
    118 the terminating zero that is always added.
    119 </P>
    120 <P>
    121 If an error occurs, the length (via <b>blength</b>) is set to the offset
    122 within the input pattern where the error was detected. Only gross syntax errors
    123 are caught; there are plenty of errors that will get passed on for
    124 <b>pcre2_compile()</b> to discover.
    125 </P>
    126 <P>
    127 The return from <b>pcre2_pattern_convert()</b> is zero on success or a non-zero
    128 PCRE2 error code. Note that PCRE2 error codes may be positive or negative:
    129 <b>pcre2_compile()</b> uses mostly positive codes and <b>pcre2_match()</b>
    130 negative ones; <b>pcre2_convert()</b> uses existing codes of both kinds. A
    131 textual error message can be obtained by calling
    132 <b>pcre2_get_error_message()</b>.
    133 </P>
    134 <br><a name="SEC4" href="#TOC1">CONVERTING GLOBS</a><br>
    135 <P>
    136 Globs are used to match file names, and consequently have the concept of a
    137 "path separator", which defaults to backslash under Windows and forward slash
    138 otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not
    139 permitted to match separator characters, but the double-star (**) feature
    140 (which does match separators) is supported.
    141 </P>
    142 <P>
    143 PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to
    144 match separator characters. PCRE2_GLOB_NO_STARSTAR matches globs with the
    145 double-star feature disabled. These options may be given together.
    146 </P>
    147 <br><a name="SEC5" href="#TOC1">CONVERTING POSIX PATTERNS</a><br>
    148 <P>
    149 POSIX defines two kinds of regular expression pattern: basic and extended.
    150 These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or
    151 PCRE2_CONVERT_POSIX_EXTENDED, respectively.
    152 </P>
    153 <P>
    154 In POSIX patterns, backslash is not special in a character class. Unmatched
    155 closing parentheses are treated as literals.
    156 </P>
    157 <P>
    158 In basic patterns, ? + | {} and () must be escaped to be recognized
    159 as metacharacters outside a character class. If the first character in the
    160 pattern is * it is treated as a literal. ^ is a metacharacter only at the start
    161 of a branch.
    162 </P>
    163 <P>
    164 In extended patterns, a backslash not in a character class always
    165 makes the next character literal, whatever it is. There are no backreferences.
    166 </P>
    167 <P>
    168 Note: POSIX mandates that the longest possible match at the first matching
    169 position must be found. This is not what <b>pcre2_match()</b> does; it yields
    170 the first match that is found. An application can use <b>pcre2_dfa_match()</b>
    171 to find the longest match, but that does not support backreferences (but then
    172 neither do POSIX extended patterns).
    173 </P>
    174 <br><a name="SEC6" href="#TOC1">AUTHOR</a><br>
    175 <P>
    176 Philip Hazel
    177 <br>
    178 University Computing Service
    179 <br>
    180 Cambridge, England.
    181 <br>
    182 </P>
    183 <br><a name="SEC7" href="#TOC1">REVISION</a><br>
    184 <P>
    185 Last updated: 28 June 2018
    186 <br>
    187 Copyright &copy; 1997-2018 University of Cambridge.
    188 <br>
    189 <p>
    190 Return to the <a href="index.html">PCRE2 index page</a>.
    191 </p>
    192