Home | History | Annotate | Download | only in doc
      1 PCRE2TEST(1)                General Commands Manual               PCRE2TEST(1)
      2 
      3 
      4 
      5 NAME
      6        pcre2test - a program for testing Perl-compatible regular expressions.
      7 
      8 SYNOPSIS
      9 
     10        pcre2test [options] [input file [output file]]
     11 
     12        pcre2test is a test program for the PCRE2 regular expression libraries,
     13        but it can also be used for  experimenting  with  regular  expressions.
     14        This  document  describes the features of the test program; for details
     15        of the regular expressions themselves, see the pcre2pattern  documenta-
     16        tion.  For  details  of  the  PCRE2  library  function  calls and their
     17        options, see the pcre2api documentation.
     18 
     19        The input for pcre2test is a sequence of  regular  expression  patterns
     20        and  subject  strings  to  be matched. There are also command lines for
     21        setting defaults and controlling some special actions. The output shows
     22        the  result  of  each  match attempt. Modifiers on external or internal
     23        command lines, the patterns, and the subject lines specify PCRE2  func-
     24        tion  options, control how the subject is processed, and what output is
     25        produced.
     26 
     27        As the original fairly simple PCRE library evolved,  it  acquired  many
     28        different  features,  and  as  a  result, the original pcretest program
     29        ended up with a lot of options in a messy, arcane  syntax  for  testing
     30        all the features. The move to the new PCRE2 API provided an opportunity
     31        to re-implement the test program as pcre2test, with a cleaner  modifier
     32        syntax.  Nevertheless,  there are still many obscure modifiers, some of
     33        which are specifically designed for use in conjunction  with  the  test
     34        script  and  data  files that are distributed as part of PCRE2. All the
     35        modifiers are documented here, some  without  much  justification,  but
     36        many  of  them  are  unlikely  to  be  of  use  except when testing the
     37        libraries.
     38 
     39 
     40 PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
     41 
     42        Different versions of the PCRE2 library can be built to support charac-
     43        ter  strings  that  are encoded in 8-bit, 16-bit, or 32-bit code units.
     44        One, two, or  all  three  of  these  libraries  may  be  simultaneously
     45        installed. The pcre2test program can be used to test all the libraries.
     46        However, its own input and output are  always  in  8-bit  format.  When
     47        testing  the  16-bit  or 32-bit libraries, patterns and subject strings
     48        are converted to 16-bit or 32-bit format before  being  passed  to  the
     49        library  functions.  Results are converted back to 8-bit code units for
     50        output.
     51 
     52        In the rest of this document, the names of library functions and struc-
     53        tures  are  given  in  generic  form,  for example, pcre_compile(). The
     54        actual names used in the libraries have a suffix _8, _16,  or  _32,  as
     55        appropriate.
     56 
     57 
     58 INPUT ENCODING
     59 
     60        Input  to  pcre2test is processed line by line, either by calling the C
     61        library's fgets() function, or via the  libreadline  library.  In  some
     62        Windows  environments  character 26 (hex 1A) causes an immediate end of
     63        file, and no further data is read, so this character should be  avoided
     64        unless you really want that action.
     65 
     66        The  input  is  processed using using C's string functions, so must not
     67        contain binary zeros, even though in  Unix-like  environments,  fgets()
     68        treats  any  bytes  other  than newline as data characters. An error is
     69        generated if a binary zero is encountered. By default subject lines are
     70        processed for backslash escapes, which makes it possible to include any
     71        data value in strings that are passed to the library for matching.  For
     72        patterns,  there  is a facility for specifying some or all of the 8-bit
     73        input characters as hexadecimal  pairs,  which  makes  it  possible  to
     74        include binary zeros.
     75 
     76    Input for the 16-bit and 32-bit libraries
     77 
     78        When testing the 16-bit or 32-bit libraries, there is a need to be able
     79        to generate character code points greater than 255 in the strings  that
     80        are  passed to the library. For subject lines, backslash escapes can be
     81        used. In addition, when the  utf  modifier  (see  "Setting  compilation
     82        options" below) is set, the pattern and any following subject lines are
     83        interpreted as UTF-8 strings and translated  to  UTF-16  or  UTF-32  as
     84        appropriate.
     85 
     86        For  non-UTF testing of wide characters, the utf8_input modifier can be
     87        used. This is mutually exclusive with  utf,  and  is  allowed  only  in
     88        16-bit  or  32-bit  mode.  It  causes the pattern and following subject
     89        lines to be treated as UTF-8 according to the original definition  (RFC
     90        2279), which allows for character values up to 0x7fffffff. Each charac-
     91        ter is placed in one 16-bit or 32-bit code unit (in  the  16-bit  case,
     92        values greater than 0xffff cause an error to occur).
     93 
     94        UTF-8  (in  its  original definition) is not capable of encoding values
     95        greater than 0x7fffffff, but such values can be handled by  the  32-bit
     96        library. When testing this library in non-UTF mode with utf8_input set,
     97        if any character is preceded by the byte 0xff (which is an invalid byte
     98        in  UTF-8)  0x80000000  is  added to the character's value. This is the
     99        only way of passing such code points in a pattern string.  For  subject
    100        strings, using an escape sequence is preferable.
    101 
    102 
    103 COMMAND LINE OPTIONS
    104 
    105        -8        If the 8-bit library has been built, this option causes it to
    106                  be used (this is the default). If the 8-bit library  has  not
    107                  been built, this option causes an error.
    108 
    109        -16       If  the  16-bit library has been built, this option causes it
    110                  to be used. If only the 16-bit library has been  built,  this
    111                  is  the  default.  If  the 16-bit library has not been built,
    112                  this option causes an error.
    113 
    114        -32       If the 32-bit library has been built, this option  causes  it
    115                  to  be  used. If only the 32-bit library has been built, this
    116                  is the default. If the 32-bit library  has  not  been  built,
    117                  this option causes an error.
    118 
    119        -ac       Behave as if each pattern has the auto_callout modifier, that
    120                  is, insert automatic callouts into every pattern that is com-
    121                  piled.
    122 
    123        -AC       As  for  -ac,  but in addition behave as if each subject line
    124                  has the callout_extra  modifier,  that  is,  show  additional
    125                  information from callouts.
    126 
    127        -b        Behave  as  if each pattern has the fullbincode modifier; the
    128                  full internal binary form of the pattern is output after com-
    129                  pilation.
    130 
    131        -C        Output  the  version  number  of  the  PCRE2 library, and all
    132                  available information about the optional  features  that  are
    133                  included,  and  then  exit  with  zero  exit  code. All other
    134                  options are ignored. If both -C and -LM are  present,  which-
    135                  ever is first is recognized.
    136 
    137        -C option Output  information  about a specific build-time option, then
    138                  exit. This functionality is intended for use in scripts  such
    139                  as  RunTest.  The  following options output the value and set
    140                  the exit code as indicated:
    141 
    142                    ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
    143                                 0x15 or 0x25
    144                                 0 if used in an ASCII environment
    145                                 exit code is always 0
    146                    linksize   the configured internal link size (2, 3, or 4)
    147                                 exit code is set to the link size
    148                    newline    the default newline setting:
    149                                 CR, LF, CRLF, ANYCRLF, ANY, or NUL
    150                                 exit code is always 0
    151                    bsr        the default setting for what \R matches:
    152                                 ANYCRLF or ANY
    153                                 exit code is always 0
    154 
    155                  The following options output 1 for true or 0 for  false,  and
    156                  set the exit code to the same value:
    157 
    158                    backslash-C  \C is supported (not locked out)
    159                    ebcdic       compiled for an EBCDIC environment
    160                    jit          just-in-time support is available
    161                    pcre2-16     the 16-bit library was built
    162                    pcre2-32     the 32-bit library was built
    163                    pcre2-8      the 8-bit library was built
    164                    unicode      Unicode support is available
    165 
    166                  If  an  unknown  option is given, an error message is output;
    167                  the exit code is 0.
    168 
    169        -d        Behave as if each pattern has the debug modifier; the  inter-
    170                  nal form and information about the compiled pattern is output
    171                  after compilation; -d is equivalent to -b -i.
    172 
    173        -dfa      Behave as if each subject line has the dfa modifier; matching
    174                  is  done  using the pcre2_dfa_match() function instead of the
    175                  default pcre2_match().
    176 
    177        -error number[,number,...]
    178                  Call pcre2_get_error_message() for each of the error  numbers
    179                  in  the  comma-separated list, display the resulting messages
    180                  on the standard output, then exit with zero  exit  code.  The
    181                  numbers  may  be  positive or negative. This is a convenience
    182                  facility for PCRE2 maintainers.
    183 
    184        -help     Output a brief summary these options and then exit.
    185 
    186        -i        Behave as if each pattern has the info modifier;  information
    187                  about the compiled pattern is given after compilation.
    188 
    189        -jit      Behave  as  if  each pattern line has the jit modifier; after
    190                  successful compilation, each pattern is passed to  the  just-
    191                  in-time compiler, if available.
    192 
    193        -jitverify
    194                  Behave  as  if  each pattern line has the jitverify modifier;
    195                  after successful compilation, each pattern is passed  to  the
    196                  just-in-time  compiler,  if  available, and the use of JIT is
    197                  verified.
    198 
    199        -LM       List modifiers: write a list of available pattern and subject
    200                  modifiers  to  the  standard output, then exit with zero exit
    201                  code. All other options are ignored.  If both -C and -LM  are
    202                  present, whichever is first is recognized.
    203 
    204        -pattern modifier-list
    205                  Behave as if each pattern line contains the given modifiers.
    206 
    207        -q        Do not output the version number of pcre2test at the start of
    208                  execution.
    209 
    210        -S size   On Unix-like systems, set the size of the run-time  stack  to
    211                  size mebibytes (units of 1024*1024 bytes).
    212 
    213        -subject modifier-list
    214                  Behave as if each subject line contains the given modifiers.
    215 
    216        -t        Run  each compile and match many times with a timer, and out-
    217                  put the resulting times per compile or  match.  When  JIT  is
    218                  used,  separate  times  are given for the initial compile and
    219                  the JIT compile. You can control  the  number  of  iterations
    220                  that  are used for timing by following -t with a number (as a
    221                  separate item on the command line). For  example,  "-t  1000"
    222                  iterates 1000 times. The default is to iterate 500,000 times.
    223 
    224        -tm       This is like -t except that it times only the matching phase,
    225                  not the compile phase.
    226 
    227        -T -TM    These behave like -t and -tm, but in addition, at the end  of
    228                  a  run, the total times for all compiles and matches are out-
    229                  put.
    230 
    231        -version  Output the PCRE2 version number and then exit.
    232 
    233 
    234 DESCRIPTION
    235 
    236        If pcre2test is given two filename arguments, it reads from  the  first
    237        and writes to the second. If the first name is "-", input is taken from
    238        the standard input. If pcre2test is given only one argument,  it  reads
    239        from that file and writes to stdout. Otherwise, it reads from stdin and
    240        writes to stdout.
    241 
    242        When pcre2test is built, a configuration option  can  specify  that  it
    243        should  be linked with the libreadline or libedit library. When this is
    244        done, if the input is from a terminal, it is read using the  readline()
    245        function. This provides line-editing and history facilities. The output
    246        from the -help option states whether or not readline() will be used.
    247 
    248        The program handles any number of tests, each of which  consists  of  a
    249        set  of input lines. Each set starts with a regular expression pattern,
    250        followed by any number of subject lines to be matched against that pat-
    251        tern. In between sets of test data, command lines that begin with # may
    252        appear. This file format, with some restrictions, can also be processed
    253        by  the perltest.sh script that is distributed with PCRE2 as a means of
    254        checking that the behaviour of PCRE2 and Perl is the same. For a speci-
    255        fication of perltest.sh, see the comments near its beginning.
    256 
    257        When the input is a terminal, pcre2test prompts for each line of input,
    258        using "re>" to prompt for regular expression patterns, and  "data>"  to
    259        prompt  for subject lines. Command lines starting with # can be entered
    260        only in response to the "re>" prompt.
    261 
    262        Each subject line is matched separately and independently. If you  want
    263        to do multi-line matches, you have to use the \n escape sequence (or \r
    264        or \r\n, etc., depending on the newline setting) in a  single  line  of
    265        input  to encode the newline sequences. There is no limit on the length
    266        of subject lines; the input buffer is automatically extended if  it  is
    267        too  small.  There  are  replication features that makes it possible to
    268        generate long repetitive pattern or subject  lines  without  having  to
    269        supply them explicitly.
    270 
    271        An  empty  line  or  the end of the file signals the end of the subject
    272        lines for a test, at which point a  new  pattern  or  command  line  is
    273        expected if there is still input to be read.
    274 
    275 
    276 COMMAND LINES
    277 
    278        In  between sets of test data, a line that begins with # is interpreted
    279        as a command line. If the first character is followed by white space or
    280        an  exclamation  mark,  the  line is treated as a comment, and ignored.
    281        Otherwise, the following commands are recognized:
    282 
    283          #forbid_utf
    284 
    285        Subsequent  patterns  automatically  have   the   PCRE2_NEVER_UTF   and
    286        PCRE2_NEVER_UCP  options  set, which locks out the use of the PCRE2_UTF
    287        and PCRE2_UCP options and the use of (*UTF) and (*UCP) at the start  of
    288        patterns.  This  command  also  forces an error if a subsequent pattern
    289        contains any occurrences of \P, \p, or \X, which  are  still  supported
    290        when  PCRE2_UTF  is not set, but which require Unicode property support
    291        to be included in the library.
    292 
    293        This is a trigger guard that is used in test files to ensure  that  UTF
    294        or  Unicode property tests are not accidentally added to files that are
    295        used when Unicode support is  not  included  in  the  library.  Setting
    296        PCRE2_NEVER_UTF  and  PCRE2_NEVER_UCP as a default can also be obtained
    297        by the use of #pattern; the difference is that  #forbid_utf  cannot  be
    298        unset,  and the automatic options are not displayed in pattern informa-
    299        tion, to avoid cluttering up test output.
    300 
    301          #load <filename>
    302 
    303        This command is used to load a set of precompiled patterns from a file,
    304        as  described  in  the  section entitled "Saving and restoring compiled
    305        patterns" below.
    306 
    307          #newline_default [<newline-list>]
    308 
    309        When PCRE2 is built, a default newline  convention  can  be  specified.
    310        This  determines which characters and/or character pairs are recognized
    311        as indicating a newline in a pattern or subject string. The default can
    312        be  overridden when a pattern is compiled. The standard test files con-
    313        tain tests of various newline conventions,  but  the  majority  of  the
    314        tests  expect  a  single  linefeed  to  be  recognized  as a newline by
    315        default. Without special action the tests would fail when PCRE2 is com-
    316        piled with either CR or CRLF as the default newline.
    317 
    318        The #newline_default command specifies a list of newline types that are
    319        acceptable as the default. The types must be one of CR, LF, CRLF,  ANY-
    320        CRLF, ANY, or NUL (in upper or lower case), for example:
    321 
    322          #newline_default LF Any anyCRLF
    323 
    324        If the default newline is in the list, this command has no effect. Oth-
    325        erwise, except when testing the POSIX  API,  a  newline  modifier  that
    326        specifies  the  first  newline  convention in the list (LF in the above
    327        example) is added to any pattern that does not already have  a  newline
    328        modifier. If the newline list is empty, the feature is turned off. This
    329        command is present in a number of the standard test input files.
    330 
    331        When the POSIX API is being tested there is  no  way  to  override  the
    332        default  newline  convention,  though it is possible to set the newline
    333        convention from within the pattern. A warning is given if the posix  or
    334        posix_nosub  modifier is used when #newline_default would set a default
    335        for the non-POSIX API.
    336 
    337          #pattern <modifier-list>
    338 
    339        This command sets a default modifier list that applies  to  all  subse-
    340        quent patterns. Modifiers on a pattern can change these settings.
    341 
    342          #perltest
    343 
    344        The  appearance of this line causes all subsequent modifier settings to
    345        be checked for compatibility with the perltest.sh script, which is used
    346        to  confirm that Perl gives the same results as PCRE2. Also, apart from
    347        comment lines, #pattern commands, and #subject  commands  that  set  or
    348        unset  "mark", no command lines are permitted, because they and many of
    349        the modifiers are specific to pcre2test, and should not be used in test
    350        files  that  are  also  processed by perltest.sh. The #perltest command
    351        helps detect tests that are accidentally put in the wrong file.
    352 
    353          #pop [<modifiers>]
    354          #popcopy [<modifiers>]
    355 
    356        These commands are used to manipulate the stack of  compiled  patterns,
    357        as  described  in  the  section entitled "Saving and restoring compiled
    358        patterns" below.
    359 
    360          #save <filename>
    361 
    362        This command is used to save a set of compiled patterns to a  file,  as
    363        described  in  the section entitled "Saving and restoring compiled pat-
    364        terns" below.
    365 
    366          #subject <modifier-list>
    367 
    368        This command sets a default modifier list that applies  to  all  subse-
    369        quent  subject lines. Modifiers on a subject line can change these set-
    370        tings.
    371 
    372 
    373 MODIFIER SYNTAX
    374 
    375        Modifier lists are used with both pattern and subject lines. Items in a
    376        list are separated by commas followed by optional white space. Trailing
    377        whitespace in a modifier list is ignored. Some modifiers may  be  given
    378        for  both patterns and subject lines, whereas others are valid only for
    379        one  or  the  other.  Each  modifier  has  a  long  name,  for  example
    380        "anchored",  and  some of them must be followed by an equals sign and a
    381        value, for example, "offset=12". Values cannot  contain  comma  charac-
    382        ters,  but may contain spaces. Modifiers that do not take values may be
    383        preceded by a minus sign to turn off a previous setting.
    384 
    385        A few of the more common modifiers can also be specified as single let-
    386        ters,  for  example "i" for "caseless". In documentation, following the
    387        Perl convention, these are written with a slash ("the /i modifier") for
    388        clarity.  Abbreviated  modifiers  must all be concatenated in the first
    389        item of a modifier list. If the first item is not recognized as a  long
    390        modifier  name, it is interpreted as a sequence of these abbreviations.
    391        For example:
    392 
    393          /abc/ig,newline=cr,jit=3
    394 
    395        This is a pattern line whose modifier list starts with  two  one-letter
    396        modifiers  (/i  and  /g).  The lower-case abbreviated modifiers are the
    397        same as used in Perl.
    398 
    399 
    400 PATTERN SYNTAX
    401 
    402        A pattern line must start with one of the following characters  (common
    403        symbols, excluding pattern meta-characters):
    404 
    405          / ! " ' ` - = _ : ; , % & @ ~
    406 
    407        This  is  interpreted  as the pattern's delimiter. A regular expression
    408        may be continued over several input lines, in which  case  the  newline
    409        characters are included within it. It is possible to include the delim-
    410        iter within the pattern by escaping it with a backslash, for example
    411 
    412          /abc\/def/
    413 
    414        If you do this, the escape and the delimiter form part of the  pattern,
    415        but since the delimiters are all non-alphanumeric, this does not affect
    416        its interpretation. If the terminating delimiter  is  immediately  fol-
    417        lowed by a backslash, for example,
    418 
    419          /abc/\
    420 
    421        then  a  backslash  is added to the end of the pattern. This is done to
    422        provide a way of testing the error condition that arises if  a  pattern
    423        finishes with a backslash, because
    424 
    425          /abc\/
    426 
    427        is  interpreted as the first line of a pattern that starts with "abc/",
    428        causing pcre2test to read the next line as a continuation of the  regu-
    429        lar expression.
    430 
    431        A pattern can be followed by a modifier list (details below).
    432 
    433 
    434 SUBJECT LINE SYNTAX
    435 
    436        Before    each   subject   line   is   passed   to   pcre2_match()   or
    437        pcre2_dfa_match(), leading and trailing white space is removed, and the
    438        line is scanned for backslash escapes, unless the subject_literal modi-
    439        fier was set for the pattern. The following provide a means of encoding
    440        non-printing characters in a visible way:
    441 
    442          \a         alarm (BEL, \x07)
    443          \b         backspace (\x08)
    444          \e         escape (\x27)
    445          \f         form feed (\x0c)
    446          \n         newline (\x0a)
    447          \r         carriage return (\x0d)
    448          \t         tab (\x09)
    449          \v         vertical tab (\x0b)
    450          \nnn       octal character (up to 3 octal digits); always
    451                       a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
    452          \o{dd...}  octal character (any number of octal digits}
    453          \xhh       hexadecimal byte (up to 2 hex digits)
    454          \x{hh...}  hexadecimal character (any number of hex digits)
    455 
    456        The use of \x{hh...} is not dependent on the use of the utf modifier on
    457        the pattern. It is recognized always. There may be any number of  hexa-
    458        decimal  digits  inside  the  braces; invalid values provoke error mes-
    459        sages.
    460 
    461        Note that \xhh specifies one byte rather than one  character  in  UTF-8
    462        mode;  this  makes it possible to construct invalid UTF-8 sequences for
    463        testing purposes. On the other hand, \x{hh} is interpreted as  a  UTF-8
    464        character  in UTF-8 mode, generating more than one byte if the value is
    465        greater than 127.  When testing the 8-bit library not  in  UTF-8  mode,
    466        \x{hh} generates one byte for values less than 256, and causes an error
    467        for greater values.
    468 
    469        In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
    470        possible to construct invalid UTF-16 sequences for testing purposes.
    471 
    472        In  UTF-32  mode,  all  4- to 8-digit \x{...} values are accepted. This
    473        makes it possible to construct invalid  UTF-32  sequences  for  testing
    474        purposes.
    475 
    476        There is a special backslash sequence that specifies replication of one
    477        or more characters:
    478 
    479          \[<characters>]{<count>}
    480 
    481        This makes it possible to test long strings without having  to  provide
    482        them as part of the file. For example:
    483 
    484          \[abc]{4}
    485 
    486        is  converted to "abcabcabcabc". This feature does not support nesting.
    487        To include a closing square bracket in the characters, code it as \x5D.
    488 
    489        A backslash followed by an equals sign marks the  end  of  the  subject
    490        string and the start of a modifier list. For example:
    491 
    492          abc\=notbol,notempty
    493 
    494        If  the  subject  string is empty and \= is followed by whitespace, the
    495        line is treated as a comment line, and is not used  for  matching.  For
    496        example:
    497 
    498          \= This is a comment.
    499          abc\= This is an invalid modifier list.
    500 
    501        A  backslash  followed  by  any  other  non-alphanumeric character just
    502        escapes that character. A backslash followed by anything else causes an
    503        error.  However,  if the very last character in the line is a backslash
    504        (and there is no modifier list), it is ignored. This  gives  a  way  of
    505        passing  an  empty line as data, since a real empty line terminates the
    506        data input.
    507 
    508        If the subject_literal modifier is set for a pattern, all subject lines
    509        that follow are treated as literals, with no special treatment of back-
    510        slashes.  No replication is possible, and any subject modifiers must be
    511        set as defaults by a #subject command.
    512 
    513 
    514 PATTERN MODIFIERS
    515 
    516        There  are  several types of modifier that can appear in pattern lines.
    517        Except where noted below, they may also be used in #pattern commands. A
    518        pattern's  modifier  list can add to or override default modifiers that
    519        were set by a previous #pattern command.
    520 
    521    Setting compilation options
    522 
    523        The following modifiers set options for pcre2_compile(). Most  of  them
    524        set  bits  in  the  options  argument of that function, but those whose
    525        names start with PCRE2_EXTRA are additional options that are set in the
    526        compile  context.  For  the  main options, there are some single-letter
    527        abbreviations that are the same as Perl options. There is special  han-
    528        dling  for  /x:  if  a second x is present, PCRE2_EXTENDED is converted
    529        into  PCRE2_EXTENDED_MORE  as  in  Perl.  A   third   appearance   adds
    530        PCRE2_EXTENDED  as  well,  though  this  makes no difference to the way
    531        pcre2_compile() behaves. See pcre2api for a description of the  effects
    532        of these options.
    533 
    534              allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
    535              allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
    536              alt_bsux                  set PCRE2_ALT_BSUX
    537              alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
    538              alt_verbnames             set PCRE2_ALT_VERBNAMES
    539              anchored                  set PCRE2_ANCHORED
    540              auto_callout              set PCRE2_AUTO_CALLOUT
    541              bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
    542          /i  caseless                  set PCRE2_CASELESS
    543              dollar_endonly            set PCRE2_DOLLAR_ENDONLY
    544          /s  dotall                    set PCRE2_DOTALL
    545              dupnames                  set PCRE2_DUPNAMES
    546              endanchored               set PCRE2_ENDANCHORED
    547          /x  extended                  set PCRE2_EXTENDED
    548          /xx extended_more             set PCRE2_EXTENDED_MORE
    549              firstline                 set PCRE2_FIRSTLINE
    550              literal                   set PCRE2_LITERAL
    551              match_line                set PCRE2_EXTRA_MATCH_LINE
    552              match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
    553              match_word                set PCRE2_EXTRA_MATCH_WORD
    554          /m  multiline                 set PCRE2_MULTILINE
    555              never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
    556              never_ucp                 set PCRE2_NEVER_UCP
    557              never_utf                 set PCRE2_NEVER_UTF
    558          /n  no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
    559              no_auto_possess           set PCRE2_NO_AUTO_POSSESS
    560              no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
    561              no_start_optimize         set PCRE2_NO_START_OPTIMIZE
    562              no_utf_check              set PCRE2_NO_UTF_CHECK
    563              ucp                       set PCRE2_UCP
    564              ungreedy                  set PCRE2_UNGREEDY
    565              use_offset_limit          set PCRE2_USE_OFFSET_LIMIT
    566              utf                       set PCRE2_UTF
    567 
    568        As well as turning on the PCRE2_UTF option, the utf modifier causes all
    569        non-printing characters in output  strings  to  be  printed  using  the
    570        \x{hh...}  notation. Otherwise, those less than 0x100 are output in hex
    571        without the curly brackets. Setting utf in 16-bit or 32-bit  mode  also
    572        causes  pattern  and  subject  strings  to  be  translated to UTF-16 or
    573        UTF-32, respectively, before being passed to library functions.
    574 
    575    Setting compilation controls
    576 
    577        The following modifiers  affect  the  compilation  process  or  request
    578        information  about  the  pattern. There are single-letter abbreviations
    579        for some that are heavily used in the test files.
    580 
    581              bsr=[anycrlf|unicode]     specify \R handling
    582          /B  bincode                   show binary code without lengths
    583              callout_info              show callout information
    584              convert=<options>         request foreign pattern conversion
    585              convert_glob_escape=c     set glob escape character
    586              convert_glob_separator=c  set glob separator character
    587              convert_length            set convert buffer length
    588              debug                     same as info,fullbincode
    589              framesize                 show matching frame size
    590              fullbincode               show binary code with lengths
    591          /I  info                      show info about compiled pattern
    592              hex                       unquoted characters are hexadecimal
    593              jit[=<number>]            use JIT
    594              jitfast                   use JIT fast path
    595              jitverify                 verify JIT use
    596              locale=<name>             use this locale
    597              max_pattern_length=<n>    set the maximum pattern length
    598              memory                    show memory used
    599              newline=<type>            set newline type
    600              null_context              compile with a NULL context
    601              parens_nest_limit=<n>     set maximum parentheses depth
    602              posix                     use the POSIX API
    603              posix_nosub               use the POSIX API with REG_NOSUB
    604              push                      push compiled pattern onto the stack
    605              pushcopy                  push a copy onto the stack
    606              stackguard=<number>       test the stackguard feature
    607              subject_literal           treat all subject lines as literal
    608              tables=[0|1|2]            select internal tables
    609              use_length                do not zero-terminate the pattern
    610              utf8_input                treat input as UTF-8
    611 
    612        The effects of these modifiers are described in the following sections.
    613 
    614    Newline and \R handling
    615 
    616        The bsr modifier specifies what \R in a pattern should match. If it  is
    617        set  to  "anycrlf",  \R  matches  CR, LF, or CRLF only. If it is set to
    618        "unicode", \R matches any Unicode newline sequence. The default can  be
    619        specified when PCRE2 is built; if it is not, the default is set to Uni-
    620        code.
    621 
    622        The newline modifier specifies which characters are to  be  interpreted
    623        as newlines, both in the pattern and in subject lines. The type must be
    624        one of CR, LF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case).
    625 
    626    Information about a pattern
    627 
    628        The debug modifier is a shorthand for info,fullbincode, requesting  all
    629        available information.
    630 
    631        The bincode modifier causes a representation of the compiled code to be
    632        output after compilation. This information does not contain length  and
    633        offset values, which ensures that the same output is generated for dif-
    634        ferent internal link sizes and different code  unit  widths.  By  using
    635        bincode,  the  same  regression tests can be used in different environ-
    636        ments.
    637 
    638        The fullbincode modifier, by contrast, does include length  and  offset
    639        values.  This is used in a few special tests that run only for specific
    640        code unit widths and link sizes, and is also useful for one-off tests.
    641 
    642        The info modifier  requests  information  about  the  compiled  pattern
    643        (whether  it  is anchored, has a fixed first character, and so on). The
    644        information is obtained from the  pcre2_pattern_info()  function.  Here
    645        are some typical examples:
    646 
    647            re> /(?i)(^a|^b)/m,info
    648          Capturing subpattern count = 1
    649          Compile options: multiline
    650          Overall options: caseless multiline
    651          First code unit at start or follows newline
    652          Subject length lower bound = 1
    653 
    654            re> /(?i)abc/info
    655          Capturing subpattern count = 0
    656          Compile options: <none>
    657          Overall options: caseless
    658          First code unit = 'a' (caseless)
    659          Last code unit = 'c' (caseless)
    660          Subject length lower bound = 3
    661 
    662        "Compile  options"  are those specified by modifiers; "overall options"
    663        have added options that are taken or deduced from the pattern. If  both
    664        sets  of  options are the same, just a single "options" line is output;
    665        if there are no options, the line is  omitted.  "First  code  unit"  is
    666        where  any  match must start; if there is more than one they are listed
    667        as "starting code units". "Last code unit" is  the  last  literal  code
    668        unit  that  must  be  present in any match. This is not necessarily the
    669        last character. These lines are omitted if no starting or  ending  code
    670        units are recorded.
    671 
    672        The  framesize modifier shows the size, in bytes, of the storage frames
    673        used by pcre2_match() for handling backtracking. The  size  depends  on
    674        the number of capturing parentheses in the pattern.
    675 
    676        The  callout_info  modifier requests information about all the callouts
    677        in the pattern. A list of them is output at the end of any other infor-
    678        mation that is requested. For each callout, either its number or string
    679        is given, followed by the item that follows it in the pattern.
    680 
    681    Passing a NULL context
    682 
    683        Normally, pcre2test passes a context block to pcre2_compile().  If  the
    684        null_context  modifier  is  set,  however,  NULL is passed. This is for
    685        testing that pcre2_compile() behaves correctly in this  case  (it  uses
    686        default values).
    687 
    688    Specifying pattern characters in hexadecimal
    689 
    690        The  hex  modifier specifies that the characters of the pattern, except
    691        for substrings enclosed in single or double quotes, are  to  be  inter-
    692        preted  as  pairs  of hexadecimal digits. This feature is provided as a
    693        way of creating patterns that contain binary zeros and other non-print-
    694        ing  characters.  White space is permitted between pairs of digits. For
    695        example, this pattern contains three characters:
    696 
    697          /ab 32 59/hex
    698 
    699        Parts of such a pattern are taken literally  if  quoted.  This  pattern
    700        contains  nine characters, only two of which are specified in hexadeci-
    701        mal:
    702 
    703          /ab "literal" 32/hex
    704 
    705        Either single or double quotes may be used. There is no way of  includ-
    706        ing  the delimiter within a substring. The hex and expand modifiers are
    707        mutually exclusive.
    708 
    709    Specifying the pattern's length
    710 
    711        By default, patterns are passed to the compiling functions as zero-ter-
    712        minated  strings but can be passed by length instead of being zero-ter-
    713        minated. The use_length modifier causes this to happen. Using a  length
    714        happens  automatically  (whether  or not use_length is set) when hex is
    715        set, because patterns  specified  in  hexadecimal  may  contain  binary
    716        zeros.
    717 
    718        If hex or use_length is used with the POSIX wrapper API (see "Using the
    719        POSIX wrapper API" below), the REG_PEND extension is used to  pass  the
    720        pattern's length.
    721 
    722    Specifying wide characters in 16-bit and 32-bit modes
    723 
    724        In 16-bit and 32-bit modes, all input is automatically treated as UTF-8
    725        and translated to UTF-16 or UTF-32 when the utf modifier  is  set.  For
    726        testing the 16-bit and 32-bit libraries in non-UTF mode, the utf8_input
    727        modifier can be used. It is mutually exclusive with  utf.  Input  lines
    728        are interpreted as UTF-8 as a means of specifying wide characters. More
    729        details are given in "Input encoding" above.
    730 
    731    Generating long repetitive patterns
    732 
    733        Some tests use long patterns that are very repetitive. Instead of  cre-
    734        ating  a very long input line for such a pattern, you can use a special
    735        repetition feature, similar to the  one  described  for  subject  lines
    736        above.  If  the  expand  modifier is present on a pattern, parts of the
    737        pattern that have the form
    738 
    739          \[<characters>]{<count>}
    740 
    741        are expanded before the pattern is passed to pcre2_compile(). For exam-
    742        ple, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
    743        cannot be nested. An initial "\[" sequence is recognized only  if  "]{"
    744        followed  by  decimal  digits and "}" is found later in the pattern. If
    745        not, the characters remain in the pattern unaltered. The expand and hex
    746        modifiers are mutually exclusive.
    747 
    748        If  part  of an expanded pattern looks like an expansion, but is really
    749        part of the actual pattern, unwanted expansion can be avoided by giving
    750        two values in the quantifier. For example, \[AB]{6000,6000} is not rec-
    751        ognized as an expansion item.
    752 
    753        If the info modifier is set on an expanded pattern, the result  of  the
    754        expansion is included in the information that is output.
    755 
    756    JIT compilation
    757 
    758        Just-in-time  (JIT)  compiling  is  a heavyweight optimization that can
    759        greatly speed up pattern matching. See the pcre2jit  documentation  for
    760        details.  JIT  compiling  happens, optionally, after a pattern has been
    761        successfully compiled into an internal form. The JIT compiler  converts
    762        this to optimized machine code. It needs to know whether the match-time
    763        options PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used,
    764        because  different  code  is generated for the different cases. See the
    765        partial modifier in "Subject Modifiers" below for details of how  these
    766        options are specified for each match attempt.
    767 
    768        JIT  compilation  is  requested  by the jit pattern modifier, which may
    769        optionally be followed by an equals sign and a number in the range 0 to
    770        7.   The  three bits that make up the number specify which of the three
    771        JIT operating modes are to be compiled:
    772 
    773          1  compile JIT code for non-partial matching
    774          2  compile JIT code for soft partial matching
    775          4  compile JIT code for hard partial matching
    776 
    777        The possible values for the jit modifier are therefore:
    778 
    779          0  disable JIT
    780          1  normal matching only
    781          2  soft partial matching only
    782          3  normal and soft partial matching
    783          4  hard partial matching only
    784          6  soft and hard partial matching only
    785          7  all three modes
    786 
    787        If no number is given, 7 is  assumed.  The  phrase  "partial  matching"
    788        means a call to pcre2_match() with either the PCRE2_PARTIAL_SOFT or the
    789        PCRE2_PARTIAL_HARD option set. Note that such a call may return a  com-
    790        plete match; the options enable the possibility of a partial match, but
    791        do not require it. Note also that if you request JIT  compilation  only
    792        for  partial  matching  (for example, jit=2) but do not set the partial
    793        modifier on a subject line, that match will not use  JIT  code  because
    794        none was compiled for non-partial matching.
    795 
    796        If  JIT compilation is successful, the compiled JIT code will automati-
    797        cally be used when an appropriate type of match  is  run,  except  when
    798        incompatible  run-time options are specified. For more details, see the
    799        pcre2jit documentation. See also the jitstack modifier below for a  way
    800        of setting the size of the JIT stack.
    801 
    802        If  the  jitfast  modifier is specified, matching is done using the JIT
    803        "fast path" interface, pcre2_jit_match(), which skips some of the  san-
    804        ity  checks that are done by pcre2_match(), and of course does not work
    805        when JIT is not supported. If jitfast is specified without  jit,  jit=7
    806        is assumed.
    807 
    808        If  the jitverify modifier is specified, information about the compiled
    809        pattern shows whether JIT compilation was or  was  not  successful.  If
    810        jitverify  is  specified without jit, jit=7 is assumed. If JIT compila-
    811        tion is successful when jitverify is set, the text "(JIT)" is added  to
    812        the first output line after a match or non match when JIT-compiled code
    813        was actually used in the match.
    814 
    815    Setting a locale
    816 
    817        The locale modifier must specify the name of a locale, for example:
    818 
    819          /pattern/locale=fr_FR
    820 
    821        The given locale is set, pcre2_maketables() is called to build a set of
    822        character  tables for the locale, and this is then passed to pcre2_com-
    823        pile() when compiling the regular expression. The same tables are  used
    824        when  matching the following subject lines. The locale modifier applies
    825        only to the pattern on which it appears, but can be given in a #pattern
    826        command  if a default is needed. Setting a locale and alternate charac-
    827        ter tables are mutually exclusive.
    828 
    829    Showing pattern memory
    830 
    831        The memory modifier causes the size in bytes of the memory used to hold
    832        the  compiled  pattern  to be output. This does not include the size of
    833        the pcre2_code block; it is just the actual compiled data. If the  pat-
    834        tern  is  subsequently  passed to the JIT compiler, the size of the JIT
    835        compiled code is also output. Here is an example:
    836 
    837            re> /a(b)c/jit,memory
    838          Memory allocation (code space): 21
    839          Memory allocation (JIT code): 1910
    840 
    841 
    842    Limiting nested parentheses
    843 
    844        The parens_nest_limit modifier sets a limit  on  the  depth  of  nested
    845        parentheses  in  a  pattern.  Breaching  the limit causes a compilation
    846        error.  The default for the library is set when  PCRE2  is  built,  but
    847        pcre2test  sets  its  own default of 220, which is required for running
    848        the standard test suite.
    849 
    850    Limiting the pattern length
    851 
    852        The max_pattern_length modifier sets a limit, in  code  units,  to  the
    853        length of pattern that pcre2_compile() will accept. Breaching the limit
    854        causes a compilation  error.  The  default  is  the  largest  number  a
    855        PCRE2_SIZE variable can hold (essentially unlimited).
    856 
    857    Using the POSIX wrapper API
    858 
    859        The  posix  and posix_nosub modifiers cause pcre2test to call PCRE2 via
    860        the POSIX wrapper API rather than its native API. When  posix_nosub  is
    861        used,  the  POSIX  option  REG_NOSUB  is passed to regcomp(). The POSIX
    862        wrapper supports only the 8-bit library. Note that it  does  not  imply
    863        POSIX matching semantics; for more detail see the pcre2posix documenta-
    864        tion. The following pattern modifiers set  options  for  the  regcomp()
    865        function:
    866 
    867          caseless           REG_ICASE
    868          multiline          REG_NEWLINE
    869          dotall             REG_DOTALL     )
    870          ungreedy           REG_UNGREEDY   ) These options are not part of
    871          ucp                REG_UCP        )   the POSIX standard
    872          utf                REG_UTF8       )
    873 
    874        The  regerror_buffsize  modifier  specifies a size for the error buffer
    875        that is passed to regerror() in the event of a compilation  error.  For
    876        example:
    877 
    878          /abc/posix,regerror_buffsize=20
    879 
    880        This  provides  a means of testing the behaviour of regerror() when the
    881        buffer is too small for the error message. If  this  modifier  has  not
    882        been set, a large buffer is used.
    883 
    884        The  aftertext  and  allaftertext  subject  modifiers work as described
    885        below. All other modifiers are either ignored, with a warning  message,
    886        or cause an error.
    887 
    888        The  pattern  is  passed  to  regcomp()  as a zero-terminated string by
    889        default, but if the use_length or hex modifiers are set,  the  REG_PEND
    890        extension is used to pass it by length.
    891 
    892    Testing the stack guard feature
    893 
    894        The  stackguard  modifier  is  used  to  test the use of pcre2_set_com-
    895        pile_recursion_guard(), a function that is  provided  to  enable  stack
    896        availability  to  be checked during compilation (see the pcre2api docu-
    897        mentation for details). If the number  specified  by  the  modifier  is
    898        greater than zero, pcre2_set_compile_recursion_guard() is called to set
    899        up callback from pcre2_compile() to a local function. The  argument  it
    900        receives  is  the current nesting parenthesis depth; if this is greater
    901        than the value given by the modifier, non-zero is returned, causing the
    902        compilation to be aborted.
    903 
    904    Using alternative character tables
    905 
    906        The  value  specified for the tables modifier must be one of the digits
    907        0, 1, or 2. It causes a specific set of built-in character tables to be
    908        passed to pcre2_compile(). This is used in the PCRE2 tests to check be-
    909        haviour with different character tables. The digit specifies the tables
    910        as follows:
    911 
    912          0   do not pass any special character tables
    913          1   the default ASCII tables, as distributed in
    914                pcre2_chartables.c.dist
    915          2   a set of tables defining ISO 8859 characters
    916 
    917        In  table 2, some characters whose codes are greater than 128 are iden-
    918        tified as letters, digits, spaces,  etc.  Setting  alternate  character
    919        tables and a locale are mutually exclusive.
    920 
    921    Setting certain match controls
    922 
    923        The following modifiers are really subject modifiers, and are described
    924        under "Subject Modifiers" below. However, they may  be  included  in  a
    925        pattern's  modifier  list, in which case they are applied to every sub-
    926        ject line that is processed with that pattern. These modifiers  do  not
    927        affect the compilation process.
    928 
    929              aftertext                  show text after match
    930              allaftertext               show text after captures
    931              allcaptures                show all captures
    932              allusedtext                show all consulted text
    933              altglobal                  alternative global matching
    934          /g  global                     global matching
    935              jitstack=<n>               set size of JIT stack
    936              mark                       show mark values
    937              replace=<string>           specify a replacement string
    938              startchar                  show starting character when relevant
    939              substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
    940              substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
    941              substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
    942              substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
    943 
    944        These  modifiers may not appear in a #pattern command. If you want them
    945        as defaults, set them in a #subject command.
    946 
    947    Specifying literal subject lines
    948 
    949        If the subject_literal modifier is present on a pattern, all  the  sub-
    950        ject lines that it matches are taken as literal strings, with no inter-
    951        pretation of backslashes. It is not possible to set  subject  modifiers
    952        on  such  lines, but any that are set as defaults by a #subject command
    953        are recognized.
    954 
    955    Saving a compiled pattern
    956 
    957        When a pattern with the push modifier is successfully compiled,  it  is
    958        pushed  onto  a  stack  of compiled patterns, and pcre2test expects the
    959        next line to contain a new pattern (or a command) instead of a  subject
    960        line. This facility is used when saving compiled patterns to a file, as
    961        described in the section entitled "Saving and restoring  compiled  pat-
    962        terns"  below.  If pushcopy is used instead of push, a copy of the com-
    963        piled pattern is stacked, leaving the original  as  current,  ready  to
    964        match  the  following  input  lines. This provides a way of testing the
    965        pcre2_code_copy() function.   The  push  and  pushcopy   modifiers  are
    966        incompatible  with  compilation  modifiers  such  as global that act at
    967        match time. Any that are specified are ignored (for the stacked  copy),
    968        with a warning message, except for replace, which causes an error. Note
    969        that jitverify, which is allowed, does not carry through to any  subse-
    970        quent matching that uses a stacked pattern.
    971 
    972    Testing foreign pattern conversion
    973 
    974        The  experimental  foreign pattern conversion functions in PCRE2 can be
    975        tested by setting the convert modifier. Its argument is  a  colon-sepa-
    976        rated  list  of  options,  which  set  the  equivalent  option  for the
    977        pcre2_pattern_convert() function:
    978 
    979          glob                    PCRE2_CONVERT_GLOB
    980          glob_no_starstar        PCRE2_CONVERT_GLOB_NO_STARSTAR
    981          glob_no_wild_separator  PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
    982          posix_basic             PCRE2_CONVERT_POSIX_BASIC
    983          posix_extended          PCRE2_CONVERT_POSIX_EXTENDED
    984          unset                   Unset all options
    985 
    986        The "unset" value is useful for turning off a default that has been set
    987        by a #pattern command. When one of these options is set, the input pat-
    988        tern is passed to pcre2_pattern_convert(). If the  conversion  is  suc-
    989        cessful,  the  result  is  reflected  in  the output and then passed to
    990        pcre2_compile(). The normal utf and no_utf_check options, if set, cause
    991        the  PCRE2_CONVERT_UTF  and  PCRE2_CONVERT_NO_UTF_CHECK  options  to be
    992        passed to pcre2_pattern_convert().
    993 
    994        By default, the conversion function is allowed to allocate a buffer for
    995        its  output.  However, if the convert_length modifier is set to a value
    996        greater than zero, pcre2test passes a buffer of the given length.  This
    997        makes it possible to test the length check.
    998 
    999        The  convert_glob_escape  and  convert_glob_separator  modifiers can be
   1000        used to specify the escape and separator characters for  glob  process-
   1001        ing, overriding the defaults, which are operating-system dependent.
   1002 
   1003 
   1004 SUBJECT MODIFIERS
   1005 
   1006        The modifiers that can appear in subject lines and the #subject command
   1007        are of two types.
   1008 
   1009    Setting match options
   1010 
   1011        The   following   modifiers   set   options   for   pcre2_match()    or
   1012        pcre2_dfa_match(). See pcreapi for a description of their effects.
   1013 
   1014              anchored                  set PCRE2_ANCHORED
   1015              endanchored               set PCRE2_ENDANCHORED
   1016              dfa_restart               set PCRE2_DFA_RESTART
   1017              dfa_shortest              set PCRE2_DFA_SHORTEST
   1018              no_jit                    set PCRE2_NO_JIT
   1019              no_utf_check              set PCRE2_NO_UTF_CHECK
   1020              notbol                    set PCRE2_NOTBOL
   1021              notempty                  set PCRE2_NOTEMPTY
   1022              notempty_atstart          set PCRE2_NOTEMPTY_ATSTART
   1023              noteol                    set PCRE2_NOTEOL
   1024              partial_hard (or ph)      set PCRE2_PARTIAL_HARD
   1025              partial_soft (or ps)      set PCRE2_PARTIAL_SOFT
   1026 
   1027        The  partial matching modifiers are provided with abbreviations because
   1028        they appear frequently in tests.
   1029 
   1030        If the posix or posix_nosub modifier was present on the pattern,  caus-
   1031        ing the POSIX wrapper API to be used, the only option-setting modifiers
   1032        that have any effect are notbol, notempty, and noteol, causing REG_NOT-
   1033        BOL,  REG_NOTEMPTY,  and  REG_NOTEOL,  respectively,  to  be  passed to
   1034        regexec(). The other modifiers are ignored, with a warning message.
   1035 
   1036        There is one additional modifier that can be used with the POSIX  wrap-
   1037        per. It is ignored (with a warning) if used for non-POSIX matching.
   1038 
   1039              posix_startend=<n>[:<m>]
   1040 
   1041        This  causes  the  subject  string  to be passed to regexec() using the
   1042        REG_STARTEND option, which uses offsets to specify which  part  of  the
   1043        string  is  searched.  If  only  one number is given, the end offset is
   1044        passed as the end of the subject string. For more detail  of  REG_STAR-
   1045        TEND,  see the pcre2posix documentation. If the subject string contains
   1046        binary zeros (coded as escapes such as \x{00}  because  pcre2test  does
   1047        not support actual binary zeros in its input), you must use posix_star-
   1048        tend to specify its length.
   1049 
   1050    Setting match controls
   1051 
   1052        The following modifiers affect the matching process  or  request  addi-
   1053        tional  information.  Some  of  them may also be specified on a pattern
   1054        line (see above), in which case they apply to every subject  line  that
   1055        is matched against that pattern.
   1056 
   1057              aftertext                  show text after match
   1058              allaftertext               show text after captures
   1059              allcaptures                show all captures
   1060              allusedtext                show all consulted text (non-JIT only)
   1061              altglobal                  alternative global matching
   1062              callout_capture            show captures at callout time
   1063              callout_data=<n>           set a value to pass via callouts
   1064              callout_error=<n>[:<m>]    control callout error
   1065              callout_extra              show extra callout information
   1066              callout_fail=<n>[:<m>]     control callout failure
   1067              callout_no_where           do not show position of a callout
   1068              callout_none               do not supply a callout function
   1069              copy=<number or name>      copy captured substring
   1070              depth_limit=<n>            set a depth limit
   1071              dfa                        use pcre2_dfa_match()
   1072              find_limits                find match and depth limits
   1073              get=<number or name>       extract captured substring
   1074              getall                     extract all captured substrings
   1075          /g  global                     global matching
   1076              heap_limit=<n>             set a limit on heap memory (Kbytes)
   1077              jitstack=<n>               set size of JIT stack
   1078              mark                       show mark values
   1079              match_limit=<n>            set a match limit
   1080              memory                     show heap memory usage
   1081              null_context               match with a NULL context
   1082              offset=<n>                 set starting offset
   1083              offset_limit=<n>           set offset limit
   1084              ovector=<n>                set size of output vector
   1085              recursion_limit=<n>        obsolete synonym for depth_limit
   1086              replace=<string>           specify a replacement string
   1087              startchar                  show startchar when relevant
   1088              startoffset=<n>            same as offset=<n>
   1089              substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
   1090              substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
   1091              substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
   1092              substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
   1093              zero_terminate             pass the subject as zero-terminated
   1094 
   1095        The effects of these modifiers are described in the following sections.
   1096        When matching via the POSIX wrapper API, the  aftertext,  allaftertext,
   1097        and  ovector subject modifiers work as described below. All other modi-
   1098        fiers are either ignored, with a warning message, or cause an error.
   1099 
   1100    Showing more text
   1101 
   1102        The aftertext modifier requests that as well as outputting the part  of
   1103        the subject string that matched the entire pattern, pcre2test should in
   1104        addition output the remainder of the subject string. This is useful for
   1105        tests where the subject contains multiple copies of the same substring.
   1106        The allaftertext modifier requests the same action  for  captured  sub-
   1107        strings as well as the main matched substring. In each case the remain-
   1108        der is output on the following line with a plus character following the
   1109        capture number.
   1110 
   1111        The  allusedtext modifier requests that all the text that was consulted
   1112        during a successful pattern match by the interpreter should  be  shown.
   1113        This  feature  is not supported for JIT matching, and if requested with
   1114        JIT it is ignored (with  a  warning  message).  Setting  this  modifier
   1115        affects the output if there is a lookbehind at the start of a match, or
   1116        a lookahead at the end, or if \K is used  in  the  pattern.  Characters
   1117        that  precede or follow the start and end of the actual match are indi-
   1118        cated in the output by '<' or '>' characters underneath them.  Here  is
   1119        an example:
   1120 
   1121            re> /(?<=pqr)abc(?=xyz)/
   1122          data> 123pqrabcxyz456\=allusedtext
   1123           0: pqrabcxyz
   1124              <<<   >>>
   1125 
   1126        This  shows  that  the  matched string is "abc", with the preceding and
   1127        following strings "pqr" and "xyz"  having  been  consulted  during  the
   1128        match (when processing the assertions).
   1129 
   1130        The  startchar  modifier  requests  that the starting character for the
   1131        match be indicated, if it is different to  the  start  of  the  matched
   1132        string. The only time when this occurs is when \K has been processed as
   1133        part of the match. In this situation, the output for the matched string
   1134        is  displayed  from  the  starting  character instead of from the match
   1135        point, with circumflex characters under  the  earlier  characters.  For
   1136        example:
   1137 
   1138            re> /abc\Kxyz/
   1139          data> abcxyz\=startchar
   1140           0: abcxyz
   1141              ^^^
   1142 
   1143        Unlike  allusedtext, the startchar modifier can be used with JIT.  How-
   1144        ever, these two modifiers are mutually exclusive.
   1145 
   1146    Showing the value of all capture groups
   1147 
   1148        The allcaptures modifier requests that the values of all potential cap-
   1149        tured parentheses be output after a match. By default, only those up to
   1150        the highest one actually used in the match are output (corresponding to
   1151        the  return  code from pcre2_match()). Groups that did not take part in
   1152        the match are output as "<unset>". This modifier is  not  relevant  for
   1153        DFA  matching  (which does no capturing); it is ignored, with a warning
   1154        message, if present.
   1155 
   1156    Testing callouts
   1157 
   1158        A callout function is supplied when pcre2test calls the library  match-
   1159        ing  functions,  unless callout_none is specified. Its behaviour can be
   1160        controlled by various modifiers listed above  whose  names  begin  with
   1161        callout_. Details are given in the section entitled "Callouts" below.
   1162 
   1163    Finding all matches in a string
   1164 
   1165        Searching for all possible matches within a subject can be requested by
   1166        the global or altglobal modifier. After finding a match,  the  matching
   1167        function  is  called  again to search the remainder of the subject. The
   1168        difference between global and altglobal is that  the  former  uses  the
   1169        start_offset  argument  to  pcre2_match() or pcre2_dfa_match() to start
   1170        searching at a new point within the entire string (which is  what  Perl
   1171        does), whereas the latter passes over a shortened subject. This makes a
   1172        difference to the matching process if the pattern begins with a lookbe-
   1173        hind assertion (including \b or \B).
   1174 
   1175        If  an  empty  string  is  matched,  the  next  match  is done with the
   1176        PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
   1177        for another, non-empty, match at the same point in the subject. If this
   1178        match fails, the start offset is advanced,  and  the  normal  match  is
   1179        retried.  This  imitates the way Perl handles such cases when using the
   1180        /g modifier or the split() function.  Normally,  the  start  offset  is
   1181        advanced  by  one  character,  but if the newline convention recognizes
   1182        CRLF as a newline, and the current character is CR followed by  LF,  an
   1183        advance of two characters occurs.
   1184 
   1185    Testing substring extraction functions
   1186 
   1187        The  copy  and  get  modifiers  can  be  used  to  test  the pcre2_sub-
   1188        string_copy_xxx() and pcre2_substring_get_xxx() functions.  They can be
   1189        given  more than once, and each can specify a group name or number, for
   1190        example:
   1191 
   1192           abcd\=copy=1,copy=3,get=G1
   1193 
   1194        If the #subject command is used to set default copy and/or  get  lists,
   1195        these  can  be unset by specifying a negative number to cancel all num-
   1196        bered groups and an empty name to cancel all named groups.
   1197 
   1198        The getall modifier tests  pcre2_substring_list_get(),  which  extracts
   1199        all captured substrings.
   1200 
   1201        If  the  subject line is successfully matched, the substrings extracted
   1202        by the convenience functions are output with  C,  G,  or  L  after  the
   1203        string  number  instead  of  a colon. This is in addition to the normal
   1204        full list. The string length (that is, the return from  the  extraction
   1205        function) is given in parentheses after each substring, followed by the
   1206        name when the extraction was by name.
   1207 
   1208    Testing the substitution function
   1209 
   1210        If the replace modifier is  set,  the  pcre2_substitute()  function  is
   1211        called  instead of one of the matching functions. Note that replacement
   1212        strings cannot contain commas, because a comma signifies the end  of  a
   1213        modifier. This is not thought to be an issue in a test program.
   1214 
   1215        Unlike  subject strings, pcre2test does not process replacement strings
   1216        for escape sequences. In UTF mode, a replacement string is  checked  to
   1217        see  if it is a valid UTF-8 string. If so, it is correctly converted to
   1218        a UTF string of the appropriate code unit width. If it is not  a  valid
   1219        UTF-8  string, the individual code units are copied directly. This pro-
   1220        vides a means of passing an invalid UTF-8 string for testing purposes.
   1221 
   1222        The following modifiers set options (in additional to the normal  match
   1223        options) for pcre2_substitute():
   1224 
   1225          global                      PCRE2_SUBSTITUTE_GLOBAL
   1226          substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
   1227          substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
   1228          substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
   1229          substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
   1230 
   1231 
   1232        After  a  successful  substitution, the modified string is output, pre-
   1233        ceded by the number of replacements. This may be zero if there were  no
   1234        matches. Here is a simple example of a substitution test:
   1235 
   1236          /abc/replace=xxx
   1237              =abc=abc=
   1238           1: =xxx=abc=
   1239              =abc=abc=\=global
   1240           2: =xxx=xxx=
   1241 
   1242        Subject  and replacement strings should be kept relatively short (fewer
   1243        than 256 characters) for substitution tests, as fixed-size buffers  are
   1244        used.  To  make it easy to test for buffer overflow, if the replacement
   1245        string starts with a number in square brackets, that number  is  passed
   1246        to  pcre2_substitute()  as  the  size  of  the  output buffer, with the
   1247        replacement string starting at the next character. Here is  an  example
   1248        that tests the edge case:
   1249 
   1250          /abc/
   1251              123abc123\=replace=[10]XYZ
   1252           1: 123XYZ123
   1253              123abc123\=replace=[9]XYZ
   1254          Failed: error -47: no more memory
   1255 
   1256        The    default    action    of    pcre2_substitute()   is   to   return
   1257        PCRE2_ERROR_NOMEMORY when the output buffer is too small.  However,  if
   1258        the  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  option is set (by using the sub-
   1259        stitute_overflow_length modifier), pcre2_substitute() continues  to  go
   1260        through  the  motions of matching and substituting, in order to compute
   1261        the size of buffer that is required. When this happens, pcre2test shows
   1262        the required buffer length (which includes space for the trailing zero)
   1263        as part of the error message. For example:
   1264 
   1265          /abc/substitute_overflow_length
   1266              123abc123\=replace=[9]XYZ
   1267          Failed: error -47: no more memory: 10 code units are needed
   1268 
   1269        A replacement string is ignored with POSIX and DFA matching. Specifying
   1270        partial  matching  provokes  an  error return ("bad option value") from
   1271        pcre2_substitute().
   1272 
   1273    Setting the JIT stack size
   1274 
   1275        The jitstack modifier provides a way of setting the maximum stack  size
   1276        that  is  used  by the just-in-time optimization code. It is ignored if
   1277        JIT optimization is not being used. The value is a number of  kibibytes
   1278        (units  of  1024  bytes). Setting zero reverts to the default of 32KiB.
   1279        Providing a stack that is larger than the default is necessary only for
   1280        very  complicated  patterns.  If  jitstack is set non-zero on a subject
   1281        line it overrides any value that was set on the pattern.
   1282 
   1283    Setting heap, match, and depth limits
   1284 
   1285        The heap_limit, match_limit, and depth_limit modifiers set  the  appro-
   1286        priate  limits  in the match context. These values are ignored when the
   1287        find_limits modifier is specified.
   1288 
   1289    Finding minimum limits
   1290 
   1291        If the find_limits modifier is present on  a  subject  line,  pcre2test
   1292        calls  the  relevant matching function several times, setting different
   1293        values   in   the    match    context    via    pcre2_set_heap_limit(),
   1294        pcre2_set_match_limit(),  or pcre2_set_depth_limit() until it finds the
   1295        minimum values for each parameter that allows  the  match  to  complete
   1296        without error. If JIT is being used, only the match limit is relevant.
   1297 
   1298        When using this modifier, the pattern should not contain any limit set-
   1299        tings such as (*LIMIT_MATCH=...)  within  it.  If  such  a  setting  is
   1300        present and is lower than the minimum matching value, the minimum value
   1301        cannot be found because pcre2_set_match_limit() etc. are only  able  to
   1302        reduce the value of an in-pattern limit; they cannot increase it.
   1303 
   1304        For  non-DFA  matching,  the minimum depth_limit number is a measure of
   1305        how much nested backtracking happens (that is, how deeply the pattern's
   1306        tree  is  searched).  In the case of DFA matching, depth_limit controls
   1307        the depth of recursive calls of the internal function that is used  for
   1308        handling pattern recursion, lookaround assertions, and atomic groups.
   1309 
   1310        For non-DFA matching, the match_limit number is a measure of the amount
   1311        of backtracking that takes place, and learning the minimum value can be
   1312        instructive.  For  most  simple matches, the number is quite small, but
   1313        for patterns with very large numbers of matching possibilities, it  can
   1314        become  large very quickly with increasing length of subject string. In
   1315        the case of DFA matching, match_limit  controls  the  total  number  of
   1316        calls, both recursive and non-recursive, to the internal matching func-
   1317        tion, thus controlling the overall amount of computing resource that is
   1318        used.
   1319 
   1320        For  both  kinds  of  matching,  the  heap_limit  number,  which  is in
   1321        kibibytes (units of 1024 bytes), limits the amount of heap memory  used
   1322        for matching. A value of zero disables the use of any heap memory; many
   1323        simple pattern matches can be done without using the heap, so  zero  is
   1324        not an unreasonable setting.
   1325 
   1326    Showing MARK names
   1327 
   1328 
   1329        The mark modifier causes the names from backtracking control verbs that
   1330        are returned from calls to pcre2_match() to be displayed. If a mark  is
   1331        returned  for a match, non-match, or partial match, pcre2test shows it.
   1332        For a match, it is on a line by itself, tagged with  "MK:".  Otherwise,
   1333        it is added to the non-match message.
   1334 
   1335    Showing memory usage
   1336 
   1337        The  memory modifier causes pcre2test to log the sizes of all heap mem-
   1338        ory  allocation  and  freeing  calls  that  occur  during  a  call   to
   1339        pcre2_match()  or  pcre2_dfa_match().  These  occur  only  when a match
   1340        requires a bigger vector than the default for remembering  backtracking
   1341        points  (pcre2_match())  or for internal workspace (pcre2_dfa_match()).
   1342        In many cases there will be no heap memory used and therefore no  addi-
   1343        tional output. No heap memory is allocated during matching with JIT, so
   1344        in that case the memory modifier never has any effect. For  this  modi-
   1345        fier  to  work,  the  null_context modifier must not be set on both the
   1346        pattern and the subject, though it can be set on one or the other.
   1347 
   1348    Setting a starting offset
   1349 
   1350        The offset modifier sets an offset  in  the  subject  string  at  which
   1351        matching starts. Its value is a number of code units, not characters.
   1352 
   1353    Setting an offset limit
   1354 
   1355        The  offset_limit  modifier  sets  a limit for unanchored matches. If a
   1356        match cannot be found starting at or before this offset in the subject,
   1357        a "no match" return is given. The data value is a number of code units,
   1358        not characters. When this modifier is used, the use_offset_limit  modi-
   1359        fier must have been set for the pattern; if not, an error is generated.
   1360 
   1361    Setting the size of the output vector
   1362 
   1363        The  ovector  modifier  applies  only  to  the subject line in which it
   1364        appears, though of course it can also be used to set  a  default  in  a
   1365        #subject  command. It specifies the number of pairs of offsets that are
   1366        available for storing matching information. The default is 15.
   1367 
   1368        A value of zero is useful when testing the POSIX API because it  causes
   1369        regexec() to be called with a NULL capture vector. When not testing the
   1370        POSIX API, a value of  zero  is  used  to  cause  pcre2_match_data_cre-
   1371        ate_from_pattern()  to  be  called, in order to create a match block of
   1372        exactly the right size for the pattern. (It is not possible to create a
   1373        match  block  with  a zero-length ovector; there is always at least one
   1374        pair of offsets.)
   1375 
   1376    Passing the subject as zero-terminated
   1377 
   1378        By default, the subject string is passed to a native API matching func-
   1379        tion with its correct length. In order to test the facility for passing
   1380        a zero-terminated string, the zero_terminate modifier is  provided.  It
   1381        causes  the length to be passed as PCRE2_ZERO_TERMINATED. When matching
   1382        via the POSIX interface, this modifier is ignored, with a warning.
   1383 
   1384        When testing pcre2_substitute(), this modifier also has the  effect  of
   1385        passing the replacement string as zero-terminated.
   1386 
   1387    Passing a NULL context
   1388 
   1389        Normally,   pcre2test   passes   a   context  block  to  pcre2_match(),
   1390        pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
   1391        set,  however,  NULL  is  passed. This is for testing that the matching
   1392        functions behave correctly in this case (they use default values). This
   1393        modifier  cannot  be used with the find_limits modifier or when testing
   1394        the substitution function.
   1395 
   1396 
   1397 THE ALTERNATIVE MATCHING FUNCTION
   1398 
   1399        By default,  pcre2test  uses  the  standard  PCRE2  matching  function,
   1400        pcre2_match() to match each subject line. PCRE2 also supports an alter-
   1401        native matching function, pcre2_dfa_match(), which operates in  a  dif-
   1402        ferent  way, and has some restrictions. The differences between the two
   1403        functions are described in the pcre2matching documentation.
   1404 
   1405        If the dfa modifier is set, the alternative matching function is  used.
   1406        This  function  finds all possible matches at a given point in the sub-
   1407        ject. If, however, the dfa_shortest modifier is set,  processing  stops
   1408        after  the  first  match is found. This is always the shortest possible
   1409        match.
   1410 
   1411 
   1412 DEFAULT OUTPUT FROM pcre2test
   1413 
   1414        This section describes the output when the  normal  matching  function,
   1415        pcre2_match(), is being used.
   1416 
   1417        When  a  match  succeeds,  pcre2test  outputs the list of captured sub-
   1418        strings, starting with number 0 for the string that matched  the  whole
   1419        pattern.    Otherwise,  it  outputs  "No  match"  when  the  return  is
   1420        PCRE2_ERROR_NOMATCH, or "Partial  match:"  followed  by  the  partially
   1421        matching  substring  when the return is PCRE2_ERROR_PARTIAL. (Note that
   1422        this is the entire substring that  was  inspected  during  the  partial
   1423        match;  it  may  include  characters before the actual match start if a
   1424        lookbehind assertion, \K, \b, or \B was involved.)
   1425 
   1426        For any other return, pcre2test outputs the PCRE2 negative error number
   1427        and  a  short  descriptive  phrase. If the error is a failed UTF string
   1428        check, the code unit offset of the start of the  failing  character  is
   1429        also output. Here is an example of an interactive pcre2test run.
   1430 
   1431          $ pcre2test
   1432          PCRE2 version 10.22 2016-07-29
   1433 
   1434            re> /^abc(\d+)/
   1435          data> abc123
   1436           0: abc123
   1437           1: 123
   1438          data> xyz
   1439          No match
   1440 
   1441        Unset capturing substrings that are not followed by one that is set are
   1442        not shown by pcre2test unless the allcaptures modifier is specified. In
   1443        the following example, there are two capturing substrings, but when the
   1444        first data line is matched, the second, unset substring is  not  shown.
   1445        An  "internal" unset substring is shown as "<unset>", as for the second
   1446        data line.
   1447 
   1448            re> /(a)|(b)/
   1449          data> a
   1450           0: a
   1451           1: a
   1452          data> b
   1453           0: b
   1454           1: <unset>
   1455           2: b
   1456 
   1457        If the strings contain any non-printing characters, they are output  as
   1458        \xhh  escapes  if  the  value is less than 256 and UTF mode is not set.
   1459        Otherwise they are output as \x{hh...} escapes. See below for the defi-
   1460        nition  of  non-printing  characters. If the aftertext modifier is set,
   1461        the output for substring 0 is followed by the the rest of  the  subject
   1462        string, identified by "0+" like this:
   1463 
   1464            re> /cat/aftertext
   1465          data> cataract
   1466           0: cat
   1467           0+ aract
   1468 
   1469        If  global  matching  is  requested, the results of successive matching
   1470        attempts are output in sequence, like this:
   1471 
   1472            re> /\Bi(\w\w)/g
   1473          data> Mississippi
   1474           0: iss
   1475           1: ss
   1476           0: iss
   1477           1: ss
   1478           0: ipp
   1479           1: pp
   1480 
   1481        "No match" is output only if the first match attempt fails. Here is  an
   1482        example  of  a  failure  message (the offset 4 that is specified by the
   1483        offset modifier is past the end of the subject string):
   1484 
   1485            re> /xyz/
   1486          data> xyz\=offset=4
   1487          Error -24 (bad offset value)
   1488 
   1489        Note that whereas patterns can be continued over several lines (a plain
   1490        ">"  prompt  is used for continuations), subject lines may not. However
   1491        newlines can be included in a subject by means of the \n escape (or \r,
   1492        \r\n, etc., depending on the newline sequence setting).
   1493 
   1494 
   1495 OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
   1496 
   1497        When the alternative matching function, pcre2_dfa_match(), is used, the
   1498        output consists of a list of all the matches that start  at  the  first
   1499        point in the subject where there is at least one match. For example:
   1500 
   1501            re> /(tang|tangerine|tan)/
   1502          data> yellow tangerine\=dfa
   1503           0: tangerine
   1504           1: tang
   1505           2: tan
   1506 
   1507        Using  the normal matching function on this data finds only "tang". The
   1508        longest matching string is always  given  first  (and  numbered  zero).
   1509        After  a  PCRE2_ERROR_PARTIAL  return,  the output is "Partial match:",
   1510        followed by the partially matching substring. Note  that  this  is  the
   1511        entire  substring  that  was inspected during the partial match; it may
   1512        include characters before the actual match start if a lookbehind asser-
   1513        tion, \b, or \B was involved. (\K is not supported for DFA matching.)
   1514 
   1515        If global matching is requested, the search for further matches resumes
   1516        at the end of the longest match. For example:
   1517 
   1518            re> /(tang|tangerine|tan)/g
   1519          data> yellow tangerine and tangy sultana\=dfa
   1520           0: tangerine
   1521           1: tang
   1522           2: tan
   1523           0: tang
   1524           1: tan
   1525           0: tan
   1526 
   1527        The alternative matching function does not support  substring  capture,
   1528        so  the  modifiers  that are concerned with captured substrings are not
   1529        relevant.
   1530 
   1531 
   1532 RESTARTING AFTER A PARTIAL MATCH
   1533 
   1534        When the alternative matching function has given  the  PCRE2_ERROR_PAR-
   1535        TIAL return, indicating that the subject partially matched the pattern,
   1536        you can restart the match with additional subject data by means of  the
   1537        dfa_restart modifier. For example:
   1538 
   1539            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
   1540          data> 23ja\=P,dfa
   1541          Partial match: 23ja
   1542          data> n05\=dfa,dfa_restart
   1543           0: n05
   1544 
   1545        For  further  information  about partial matching, see the pcre2partial
   1546        documentation.
   1547 
   1548 
   1549 CALLOUTS
   1550 
   1551        If the pattern contains any callout requests, pcre2test's callout func-
   1552        tion  is  called during matching unless callout_none is specified. This
   1553        works with both matching functions, and with JIT, though there are some
   1554        differences  in behaviour. The output for callouts with numerical argu-
   1555        ments and those with string arguments is slightly different.
   1556 
   1557    Callouts with numerical arguments
   1558 
   1559        By default, the callout function displays the callout number, the start
   1560        and  current positions in the subject text at the callout time, and the
   1561        next pattern item to be tested. For example:
   1562 
   1563          --->pqrabcdef
   1564            0    ^  ^     \d
   1565 
   1566        This output indicates that  callout  number  0  occurred  for  a  match
   1567        attempt  starting  at  the fourth character of the subject string, when
   1568        the pointer was at the seventh character, and  when  the  next  pattern
   1569        item  was  \d.  Just  one circumflex is output if the start and current
   1570        positions are the same, or if the current position precedes  the  start
   1571        position, which can happen if the callout is in a lookbehind assertion.
   1572 
   1573        Callouts numbered 255 are assumed to be automatic callouts, inserted as
   1574        a result of the auto_callout pattern modifier. In this case, instead of
   1575        showing  the  callout  number, the offset in the pattern, preceded by a
   1576        plus, is output. For example:
   1577 
   1578            re> /\d?[A-E]\*/auto_callout
   1579          data> E*
   1580          --->E*
   1581           +0 ^      \d?
   1582           +3 ^      [A-E]
   1583           +8 ^^     \*
   1584          +10 ^ ^
   1585           0: E*
   1586 
   1587        If a pattern contains (*MARK) items, an additional line is output when-
   1588        ever  a  change  of  latest mark is passed to the callout function. For
   1589        example:
   1590 
   1591            re> /a(*MARK:X)bc/auto_callout
   1592          data> abc
   1593          --->abc
   1594           +0 ^       a
   1595           +1 ^^      (*MARK:X)
   1596          +10 ^^      b
   1597          Latest Mark: X
   1598          +11 ^ ^     c
   1599          +12 ^  ^
   1600           0: abc
   1601 
   1602        The mark changes between matching "a" and "b", but stays the  same  for
   1603        the  rest  of  the match, so nothing more is output. If, as a result of
   1604        backtracking, the mark reverts to being unset, the  text  "<unset>"  is
   1605        output.
   1606 
   1607    Callouts with string arguments
   1608 
   1609        The output for a callout with a string argument is similar, except that
   1610        instead of outputting a callout number before the position  indicators,
   1611        the  callout  string  and  its  offset in the pattern string are output
   1612        before the reflection of the subject string, and the subject string  is
   1613        reflected for each callout. For example:
   1614 
   1615            re> /^ab(?C'first')cd(?C"second")ef/
   1616          data> abcdefg
   1617          Callout (7): 'first'
   1618          --->abcdefg
   1619              ^ ^         c
   1620          Callout (20): "second"
   1621          --->abcdefg
   1622              ^   ^       e
   1623           0: abcdef
   1624 
   1625 
   1626    Callout modifiers
   1627 
   1628        The  callout  function in pcre2test returns zero (carry on matching) by
   1629        default, but you can use a callout_fail modifier in a subject  line  to
   1630        change this and other parameters of the callout (see below).
   1631 
   1632        If the callout_capture modifier is set, the current captured groups are
   1633        output when a callout occurs. This is useful only for non-DFA matching,
   1634        as  pcre2_dfa_match()  does  not  support capturing, so no captures are
   1635        ever shown.
   1636 
   1637        The normal callout output, showing the callout number or pattern offset
   1638        (as  described above) is suppressed if the callout_no_where modifier is
   1639        set.
   1640 
   1641        When using the interpretive  matching  function  pcre2_match()  without
   1642        JIT,  setting  the callout_extra modifier causes additional output from
   1643        pcre2test's callout function to be generated. For the first callout  in
   1644        a  match  attempt at a new starting position in the subject, "New match
   1645        attempt" is output. If there has been a backtrack since the last  call-
   1646        out (or start of matching if this is the first callout), "Backtrack" is
   1647        output, followed by "No other matching paths" if  the  backtrack  ended
   1648        the previous match attempt. For example:
   1649 
   1650           re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
   1651          data> aac\=callout_extra
   1652          New match attempt
   1653          --->aac
   1654           +0 ^       (
   1655           +1 ^       a+
   1656           +3 ^ ^     )
   1657           +4 ^ ^     b
   1658          Backtrack
   1659          --->aac
   1660           +3 ^^      )
   1661           +4 ^^      b
   1662          Backtrack
   1663          No other matching paths
   1664          New match attempt
   1665          --->aac
   1666           +0  ^      (
   1667           +1  ^      a+
   1668           +3  ^^     )
   1669           +4  ^^     b
   1670          Backtrack
   1671          No other matching paths
   1672          New match attempt
   1673          --->aac
   1674           +0   ^     (
   1675           +1   ^     a+
   1676          Backtrack
   1677          No other matching paths
   1678          New match attempt
   1679          --->aac
   1680           +0    ^    (
   1681           +1    ^    a+
   1682          No match
   1683 
   1684        Notice  that  various  optimizations must be turned off if you want all
   1685        possible matching paths to be  scanned.  If  no_start_optimize  is  not
   1686        used,  there  is an immediate "no match", without any callouts, because
   1687        the starting optimization fails to find "b" in the  subject,  which  it
   1688        knows  must  be  present for any match. If no_auto_possess is not used,
   1689        the "a+" item is turned into "a++", which reduces the number  of  back-
   1690        tracks.
   1691 
   1692        The  callout_extra modifier has no effect if used with the DFA matching
   1693        function, or with JIT.
   1694 
   1695    Return values from callouts
   1696 
   1697        The default return from the callout  function  is  zero,  which  allows
   1698        matching to continue. The callout_fail modifier can be given one or two
   1699        numbers. If there is only one number, 1 is returned instead of 0 (caus-
   1700        ing matching to backtrack) when a callout of that number is reached. If
   1701        two numbers (<n>:<m>) are given, 1 is  returned  when  callout  <n>  is
   1702        reached  and  there  have been at least <m> callouts. The callout_error
   1703        modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
   1704        ing  the entire matching process to be aborted. If both these modifiers
   1705        are set for the same callout number,  callout_error  takes  precedence.
   1706        Note  that  callouts  with string arguments are always given the number
   1707        zero.
   1708 
   1709        The callout_data modifier can be given an unsigned or a  negative  num-
   1710        ber.   This  is  set  as the "user data" that is passed to the matching
   1711        function, and passed back when the callout  function  is  invoked.  Any
   1712        value  other  than  zero  is  used as a return from pcre2test's callout
   1713        function.
   1714 
   1715        Inserting callouts can be helpful when using pcre2test to check compli-
   1716        cated  regular expressions. For further information about callouts, see
   1717        the pcre2callout documentation.
   1718 
   1719 
   1720 NON-PRINTING CHARACTERS
   1721 
   1722        When pcre2test is outputting text in the compiled version of a pattern,
   1723        bytes  other  than 32-126 are always treated as non-printing characters
   1724        and are therefore shown as hex escapes.
   1725 
   1726        When pcre2test is outputting text that is a matched part of  a  subject
   1727        string,  it behaves in the same way, unless a different locale has been
   1728        set for the pattern (using the locale  modifier).  In  this  case,  the
   1729        isprint()  function  is  used  to distinguish printing and non-printing
   1730        characters.
   1731 
   1732 
   1733 SAVING AND RESTORING COMPILED PATTERNS
   1734 
   1735        It is possible to save compiled patterns  on  disc  or  elsewhere,  and
   1736        reload them later, subject to a number of restrictions. JIT data cannot
   1737        be saved. The host on which the patterns are reloaded must  be  running
   1738        the same version of PCRE2, with the same code unit width, and must also
   1739        have the same endianness, pointer width  and  PCRE2_SIZE  type.  Before
   1740        compiled  patterns  can be saved they must be serialized, that is, con-
   1741        verted to a stream of bytes. A single byte stream may contain any  num-
   1742        ber  of  compiled  patterns,  but  they must all use the same character
   1743        tables. A single copy of the tables is included in the byte stream (its
   1744        size is 1088 bytes).
   1745 
   1746        The  functions  whose  names  begin  with pcre2_serialize_ are used for
   1747        serializing and de-serializing. They are described in the  pcre2serial-
   1748        ize  documentation.  In  this  section  we  describe  the  features  of
   1749        pcre2test that can be used to test these functions.
   1750 
   1751        Note that "serialization" in PCRE2 does not convert  compiled  patterns
   1752        to  an  abstract  format  like Java or .NET. It just makes a reloadable
   1753        byte code stream.  Hence the restrictions on reloading mentioned above.
   1754 
   1755        In pcre2test, when a pattern with push modifier  is  successfully  com-
   1756        piled,  it  is  pushed onto a stack of compiled patterns, and pcre2test
   1757        expects the next line to contain a new pattern (or command) instead  of
   1758        a subject line. By contrast, the pushcopy modifier causes a copy of the
   1759        compiled pattern to be stacked,  leaving  the  original  available  for
   1760        immediate matching. By using push and/or pushcopy, a number of patterns
   1761        can be compiled and retained. These  modifiers  are  incompatible  with
   1762        posix, and control modifiers that act at match time are ignored (with a
   1763        message) for the stacked patterns. The jitverify modifier applies  only
   1764        at compile time.
   1765 
   1766        The command
   1767 
   1768          #save <filename>
   1769 
   1770        causes all the stacked patterns to be serialized and the result written
   1771        to the named file. Afterwards, all the stacked patterns are freed.  The
   1772        command
   1773 
   1774          #load <filename>
   1775 
   1776        reads  the  data in the file, and then arranges for it to be de-serial-
   1777        ized, with the resulting compiled patterns added to the pattern  stack.
   1778        The  pattern  on the top of the stack can be retrieved by the #pop com-
   1779        mand, which must be followed by  lines  of  subjects  that  are  to  be
   1780        matched  with  the pattern, terminated as usual by an empty line or end
   1781        of file. This command may be followed by  a  modifier  list  containing
   1782        only  control  modifiers that act after a pattern has been compiled. In
   1783        particular,  hex,  posix,  posix_nosub,  push,  and  pushcopy  are  not
   1784        allowed,  nor are any option-setting modifiers.  The JIT modifiers are,
   1785        however permitted. Here is an example that saves and reloads  two  pat-
   1786        terns.
   1787 
   1788          /abc/push
   1789          /xyz/push
   1790          #save tempfile
   1791          #load tempfile
   1792          #pop info
   1793          xyz
   1794 
   1795          #pop jit,bincode
   1796          abc
   1797 
   1798        If  jitverify  is  used with #pop, it does not automatically imply jit,
   1799        which is different behaviour from when it is used on a pattern.
   1800 
   1801        The #popcopy command is analagous to the pushcopy modifier in  that  it
   1802        makes current a copy of the topmost stack pattern, leaving the original
   1803        still on the stack.
   1804 
   1805 
   1806 SEE ALSO
   1807 
   1808        pcre2(3),  pcre2api(3),  pcre2callout(3),  pcre2jit,  pcre2matching(3),
   1809        pcre2partial(d), pcre2pattern(3), pcre2serialize(3).
   1810 
   1811 
   1812 AUTHOR
   1813 
   1814        Philip Hazel
   1815        University Computing Service
   1816        Cambridge, England.
   1817 
   1818 
   1819 REVISION
   1820 
   1821        Last updated: 21 July 2018
   1822        Copyright (c) 1997-2018 University of Cambridge.
   1823