Home | History | Annotate | Download | only in doc
      1 PCRE2GREP(1)                General Commands Manual               PCRE2GREP(1)
      2 
      3 
      4 
      5 NAME
      6        pcre2grep - a grep with Perl-compatible regular expressions.
      7 
      8 SYNOPSIS
      9        pcre2grep [options] [long options] [pattern] [path1 path2 ...]
     10 
     11 
     12 DESCRIPTION
     13 
     14        pcre2grep  searches  files  for  character patterns, in the same way as
     15        other grep commands do,  but  it  uses  the  PCRE2  regular  expression
     16        library  to  support  patterns  that  are  compatible  with the regular
     17        expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
     18        of  pattern  syntax,  or  pcre2pattern(3) for a full description of the
     19        syntax and semantics of the regular expressions that PCRE2 supports.
     20 
     21        Patterns, whether supplied on the command line or in a  separate  file,
     22        are given without delimiters. For example:
     23 
     24          pcre2grep Thursday /etc/motd
     25 
     26        If you attempt to use delimiters (for example, by surrounding a pattern
     27        with slashes, as is common in Perl scripts), they  are  interpreted  as
     28        part  of  the pattern. Quotes can of course be used to delimit patterns
     29        on the command line because they are  interpreted  by  the  shell,  and
     30        indeed  quotes  are required if a pattern contains white space or shell
     31        metacharacters.
     32 
     33        The first argument that follows any option settings is treated  as  the
     34        single  pattern  to be matched when neither -e nor -f is present.  Con-
     35        versely, when one or both of these options are  used  to  specify  pat-
     36        terns, all arguments are treated as path names. At least one of -e, -f,
     37        or an argument pattern must be provided.
     38 
     39        If no files are specified, pcre2grep  reads  the  standard  input.  The
     40        standard  input can also be referenced by a name consisting of a single
     41        hyphen.  For example:
     42 
     43          pcre2grep some-pattern file1 - file3
     44 
     45        Input files are searched line by  line.  By  default,  each  line  that
     46        matches  a  pattern  is  copied to the standard output, and if there is
     47        more than one file, the file name is output at the start of each  line,
     48        followed  by  a  colon.  However, there are options that can change how
     49        pcre2grep behaves. In particular, the -M option makes  it  possible  to
     50        search  for  strings  that  span  line  boundaries. What defines a line
     51        boundary is controlled by the -N (--newline) option.
     52 
     53        The amount of memory used for buffering files that are being scanned is
     54        controlled  by  parameters  that  can  be  set by the --buffer-size and
     55        --max-buffer-size options. The first of these sets the size  of  buffer
     56        that  is obtained at the start of processing. If an input file contains
     57        very long lines, a larger buffer may be  needed;  this  is  handled  by
     58        automatically extending the buffer, up to the limit specified by --max-
     59        buffer-size. The default values for these parameters can  be  set  when
     60        pcre2grep  is  built;  if nothing is specified, the defaults are set to
     61        20KiB and 1MiB respectively. An error occurs if a line is too long  and
     62        the buffer can no longer be expanded.
     63 
     64        The  block  of  memory that is actually used is three times the "buffer
     65        size", to allow for buffering "before" and "after" lines. If the buffer
     66        size  is too small, fewer than requested "before" and "after" lines may
     67        be output.
     68 
     69        Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever  is  the
     70        greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
     71        pattern (specified by the use of -e and/or -f), each pattern is applied
     72        to  each  line  in the order in which they are defined, except that all
     73        the -e patterns are tried before the -f patterns.
     74 
     75        By default, as soon as one pattern matches a line, no further  patterns
     76        are considered. However, if --colour (or --color) is used to colour the
     77        matching substrings, or if --only-matching, --file-offsets, or  --line-
     78        offsets  is  used  to  output  only  the  part of the line that matched
     79        (either shown literally, or as an offset), scanning resumes immediately
     80        following  the  match,  so that further matches on the same line can be
     81        found. If there are multiple  patterns,  they  are  all  tried  on  the
     82        remainder  of  the  line, but patterns that follow the one that matched
     83        are not tried on the earlier part of the line.
     84 
     85        This behaviour means that the order  in  which  multiple  patterns  are
     86        specified  can affect the output when one of the above options is used.
     87        This is no longer the same behaviour as GNU grep, which now manages  to
     88        display  earlier  matches  for  later  patterns (as long as there is no
     89        overlap).
     90 
     91        Patterns that can match an empty string are accepted, but empty  string
     92        matches   are   never   recognized.   An   example   is   the   pattern
     93        "(super)?(man)?", in which all components are  optional.  This  pattern
     94        finds  all  occurrences  of  both "super" and "man"; the output differs
     95        from matching with "super|man" when only the  matching  substrings  are
     96        being shown.
     97 
     98        If  the  LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
     99        the value to set a locale when calling the PCRE2 library.  The --locale
    100        option can be used to override this.
    101 
    102 
    103 SUPPORT FOR COMPRESSED FILES
    104 
    105        It  is  possible to compile pcre2grep so that it uses libz or libbz2 to
    106        read compressed files whose names end in .gz or .bz2, respectively. You
    107        can  find out whether your pcre2grep binary has support for one or both
    108        of these file types by running it with the --help option. If the appro-
    109        priate support is not present, all files are treated as plain text. The
    110        standard input is always so treated. When input is  from  a  compressed
    111        .gz or .bz2 file, the --line-buffered option is ignored.
    112 
    113 
    114 BINARY FILES
    115 
    116        By  default,  a  file that contains a binary zero byte within the first
    117        1024 bytes is identified as a binary file, and is processed  specially.
    118        (GNU grep identifies binary files in this manner.) However, if the new-
    119        line type is specified as "nul", that is,  the  line  terminator  is  a
    120        binary  zero,  the  test  for  a  binary  file  is not applied. See the
    121        --binary-files option for a means of changing the way binary files  are
    122        handled.
    123 
    124 
    125 BINARY ZEROS IN PATTERNS
    126 
    127        Patterns  passed  from the command line are strings that are terminated
    128        by a binary zero, so cannot contain internal zeros.  However,  patterns
    129        that are read from a file via the -f option may contain binary zeros.
    130 
    131 
    132 OPTIONS
    133 
    134        The  order  in  which some of the options appear can affect the output.
    135        For example, both the -H and -l options affect  the  printing  of  file
    136        names.  Whichever  comes later in the command line will be the one that
    137        takes effect. Similarly, except where noted  below,  if  an  option  is
    138        given  twice,  the  later setting is used. Numerical values for options
    139        may be followed by K  or  M,  to  signify  multiplication  by  1024  or
    140        1024*1024 respectively.
    141 
    142        --        This terminates the list of options. It is useful if the next
    143                  item on the command line starts with a hyphen but is  not  an
    144                  option.  This  allows for the processing of patterns and file
    145                  names that start with hyphens.
    146 
    147        -A number, --after-context=number
    148                  Output up to number lines  of  context  after  each  matching
    149                  line.  Fewer lines are output if the next match or the end of
    150                  the file is reached, or if the  processing  buffer  size  has
    151                  been  set  too  small.  If file names and/or line numbers are
    152                  being output, a hyphen separator is used instead of  a  colon
    153                  for  the  context  lines.  A  line  containing "--" is output
    154                  between each group of lines, unless they are in fact contigu-
    155                  ous  in the input file. The value of number is expected to be
    156                  relatively small. When -c is used, -A is ignored.
    157 
    158        -a, --text
    159                  Treat binary files as text. This is equivalent  to  --binary-
    160                  files=text.
    161 
    162        -B number, --before-context=number
    163                  Output  up  to  number  lines of context before each matching
    164                  line. Fewer lines are output if the  previous  match  or  the
    165                  start  of the file is within number lines, or if the process-
    166                  ing buffer size has been set too small. If file names  and/or
    167                  line  numbers  are  being  output, a hyphen separator is used
    168                  instead of a colon for the context lines. A  line  containing
    169                  "--"  is  output between each group of lines, unless they are
    170                  in fact contiguous in the input file. The value of number  is
    171                  expected  to  be  relatively  small.  When  -c is used, -B is
    172                  ignored.
    173 
    174        --binary-files=word
    175                  Specify how binary files are to be processed. If the word  is
    176                  "binary"  (the  default),  pattern  matching  is performed on
    177                  binary files, but the only  output  is  "Binary  file  <name>
    178                  matches"  when a match succeeds. If the word is "text", which
    179                  is equivalent to the -a or --text option,  binary  files  are
    180                  processed  in  the  same way as any other file. In this case,
    181                  when a match succeeds, the  output  may  be  binary  garbage,
    182                  which  can  have  nasty effects if sent to a terminal. If the
    183                  word is  "without-match",  which  is  equivalent  to  the  -I
    184                  option,  binary  files  are  not  processed  at all; they are
    185                  assumed not to be of interest and are skipped without causing
    186                  any output or affecting the return code.
    187 
    188        --buffer-size=number
    189                  Set  the  parameter that controls how much memory is obtained
    190                  at the start of processing for buffering files that are being
    191                  scanned. See also --max-buffer-size below.
    192 
    193        -C number, --context=number
    194                  Output  number  lines  of  context both before and after each
    195                  matching line.  This is equivalent to setting both -A and  -B
    196                  to the same value.
    197 
    198        -c, --count
    199                  Do  not  output  lines from the files that are being scanned;
    200                  instead output the number  of  lines  that  would  have  been
    201                  shown, either because they matched, or, if -v is set, because
    202                  they failed to match. By default, this count is  exactly  the
    203                  same  as the number of lines that would have been output, but
    204                  if the -M (multiline) option is used (without -v), there  may
    205                  be  more suppressed lines than the count (that is, the number
    206                  of matches).
    207 
    208                  If no lines are selected, the number zero is output. If  sev-
    209                  eral  files are are being scanned, a count is output for each
    210                  of them and the -t option can be used to cause a total to  be
    211                  output  at  the  end.  However,  if  the --files-with-matches
    212                  option is also  used,  only  those  files  whose  counts  are
    213                  greater  than  zero  are listed. When -c is used, the -A, -B,
    214                  and -C options are ignored.
    215 
    216        --colour, --color
    217                  If this option is given without any data, it is equivalent to
    218                  "--colour=auto".   If  data  is required, it must be given in
    219                  the same shell item, separated by an equals sign.
    220 
    221        --colour=value, --color=value
    222                  This option specifies under what circumstances the parts of a
    223                  line that matched a pattern should be coloured in the output.
    224                  By default, the output is not coloured. The value  (which  is
    225                  optional,  see above) may be "never", "always", or "auto". In
    226                  the latter case, colouring happens only if the standard  out-
    227                  put  is connected to a terminal. More resources are used when
    228                  colouring is enabled, because pcre2grep has to search for all
    229                  possible  matches in a line, not just one, in order to colour
    230                  them all.
    231 
    232                  The colour that is used can be specified by  setting  one  of
    233                  the  environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR,
    234                  PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
    235                  order.  If  none  of  these  are  set,  pcre2grep  looks  for
    236                  GREP_COLORS or GREP_COLOR (in that order). The value  of  the
    237                  variable  should  be  a string of two numbers, separated by a
    238                  semicolon, except in the  case  of  GREP_COLORS,  which  must
    239                  start with "ms=" or "mt=" followed by two semicolon-separated
    240                  colours, terminated by the end of the string or by  a  colon.
    241                  If  GREP_COLORS  does  not  start  with  "ms=" or "mt=" it is
    242                  ignored, and GREP_COLOR is checked.
    243 
    244                  If the string obtained from one of the above  variables  con-
    245                  tains any characters other than semicolon or digits, the set-
    246                  ting is ignored and the default colour is used. The string is
    247                  copied directly into the control string for setting colour on
    248                  a terminal, so it is your responsibility to ensure  that  the
    249                  values  make  sense.  If  no relevant environment variable is
    250                  set, the default is "1;31", which gives red.
    251 
    252        -D action, --devices=action
    253                  If an input path is  not  a  regular  file  or  a  directory,
    254                  "action"  specifies  how  it is to be processed. Valid values
    255                  are "read" (the default) or "skip" (silently skip the path).
    256 
    257        -d action, --directories=action
    258                  If an input path is a directory, "action" specifies how it is
    259                  to  be  processed.   Valid  values are "read" (the default in
    260                  non-Windows environments, for compatibility with  GNU  grep),
    261                  "recurse"  (equivalent to the -r option), or "skip" (silently
    262                  skip the path, the default in Windows environments).  In  the
    263                  "read"  case,  directories  are read as if they were ordinary
    264                  files. In some operating systems  the  effect  of  reading  a
    265                  directory like this is an immediate end-of-file; in others it
    266                  may provoke an error.
    267 
    268        --depth-limit=number
    269                  See --match-limit below.
    270 
    271        -e pattern, --regex=pattern, --regexp=pattern
    272                  Specify a pattern to be matched. This option can be used mul-
    273                  tiple times in order to specify several patterns. It can also
    274                  be used as a way of specifying a single pattern  that  starts
    275                  with  a hyphen. When -e is used, no argument pattern is taken
    276                  from the command line; all  arguments  are  treated  as  file
    277                  names.  There is no limit to the number of patterns. They are
    278                  applied to each line in the order in which they  are  defined
    279                  until one matches.
    280 
    281                  If  -f is used with -e, the command line patterns are matched
    282                  first, followed by the patterns from the file(s), independent
    283                  of  the order in which these options are specified. Note that
    284                  multiple use of -e is not the same as a single  pattern  with
    285                  alternatives. For example, X|Y finds the first character in a
    286                  line that is X or Y, whereas if the two  patterns  are  given
    287                  separately, with X first, pcre2grep finds X if it is present,
    288                  even if it follows Y in the line. It finds Y only if there is
    289                  no  X  in  the line. This matters only if you are using -o or
    290                  --colo(u)r to show the part(s) of the line that matched.
    291 
    292        --exclude=pattern
    293                  Files (but not directories) whose names match the pattern are
    294                  skipped  without  being processed. This applies to all files,
    295                  whether listed on the command  line,  obtained  from  --file-
    296                  list, or by scanning a directory. The pattern is a PCRE2 reg-
    297                  ular expression, and is matched against the  final  component
    298                  of  the  file  name,  not the entire path. The -F, -w, and -x
    299                  options do not apply to this pattern. The option may be given
    300                  any number of times in order to specify multiple patterns. If
    301                  a file name matches both an --include and an  --exclude  pat-
    302                  tern, it is excluded. There is no short form for this option.
    303 
    304        --exclude-from=filename
    305                  Treat  each  non-empty  line  of  the file as the data for an
    306                  --exclude option. What constitutes a newline when reading the
    307                  file  is the operating system's default. The --newline option
    308                  has no effect on this option. This option may be  given  more
    309                  than once in order to specify a number of files to read.
    310 
    311        --exclude-dir=pattern
    312                  Directories whose names match the pattern are skipped without
    313                  being processed, whatever  the  setting  of  the  --recursive
    314                  option.  This  applies  to all directories, whether listed on
    315                  the command line, obtained from --file-list, or by scanning a
    316                  parent  directory. The pattern is a PCRE2 regular expression,
    317                  and is matched against the final component of  the  directory
    318                  name,  not the entire path. The -F, -w, and -x options do not
    319                  apply to this pattern. The option may be given any number  of
    320                  times  in order to specify more than one pattern. If a direc-
    321                  tory matches both  --include-dir  and  --exclude-dir,  it  is
    322                  excluded. There is no short form for this option.
    323 
    324        -F, --fixed-strings
    325                  Interpret  each  data-matching  pattern  as  a  list of fixed
    326                  strings, separated by  newlines,  instead  of  as  a  regular
    327                  expression.  What  constitutes  a newline for this purpose is
    328                  controlled by the --newline option. The -w (match as a  word)
    329                  and  -x (match whole line) options can be used with -F.  They
    330                  apply to each of the fixed strings. A line is selected if any
    331                  of the fixed strings are found in it (subject to -w or -x, if
    332                  present). This option applies only to the patterns  that  are
    333                  matched  against  the contents of files; it does not apply to
    334                  patterns specified by  any  of  the  --include  or  --exclude
    335                  options.
    336 
    337        -f filename, --file=filename
    338                  Read  patterns  from  the  file, one per line, and match them
    339                  against each line of input. As is the case with  patterns  on
    340                  the  command line, no delimiters should be used. What consti-
    341                  tutes a newline when reading the file is the  operating  sys-
    342                  tem's  default interpretation of \n. The --newline option has
    343                  no effect on this option. Trailing  white  space  is  removed
    344                  from  each  line,  and blank lines are ignored. An empty file
    345                  contains no patterns and therefore matches nothing.  Patterns
    346                  read  from a file in this way may contain binary zeros, which
    347                  are treated as ordinary data characters. See  also  the  com-
    348                  ments  about  multiple  patterns versus a single pattern with
    349                  alternatives in the description of -e above.
    350 
    351                  If this option is given more than  once,  all  the  specified
    352                  files  are read. A data line is output if any of the patterns
    353                  match it. A file name can be given as "-"  to  refer  to  the
    354                  standard  input.  When  -f is used, patterns specified on the
    355                  command line using -e may also be present;  they  are  tested
    356                  before  the  file's  patterns.  However,  no other pattern is
    357                  taken from the command line; all arguments are treated as the
    358                  names of paths to be searched.
    359 
    360        --file-list=filename
    361                  Read  a  list  of  files  and/or  directories  that are to be
    362                  scanned from the given file, one per line. What constitutes a
    363                  newline  when  reading  the  file  is  the operating system's
    364                  default. Trailing white space is removed from each line,  and
    365                  blank lines are ignored. These paths are processed before any
    366                  that are listed on the command line. The  file  name  can  be
    367                  given  as  "-"  to refer to the standard input. If --file and
    368                  --file-list are both specified  as  "-",  patterns  are  read
    369                  first.  This is useful only when the standard input is a ter-
    370                  minal, from which further lines (the list of  files)  can  be
    371                  read after an end-of-file indication. If this option is given
    372                  more than once, all the specified files are read.
    373 
    374        --file-offsets
    375                  Instead of showing lines or parts of lines that  match,  show
    376                  each  match  as  an  offset  from the start of the file and a
    377                  length, separated by a comma. In this  mode,  no  context  is
    378                  shown.  That  is,  the -A, -B, and -C options are ignored. If
    379                  there is more than one match in a line, each of them is shown
    380                  separately.  This option is mutually exclusive with --output,
    381                  --line-offsets, and --only-matching.
    382 
    383        -H, --with-filename
    384                  Force the inclusion of the file name at the start  of  output
    385                  lines when searching a single file. By default, the file name
    386                  is not shown in this case.  For matching lines, the file name
    387                  is followed by a colon; for context lines, a hyphen separator
    388                  is used. If a line number is also being  output,  it  follows
    389                  the  file  name. When the -M option causes a pattern to match
    390                  more than one line, only the first is preceded  by  the  file
    391                  name.  This  option  overrides  any  previous  -h,  -l, or -L
    392                  options.
    393 
    394        -h, --no-filename
    395                  Suppress the output file names when searching multiple files.
    396                  By  default,  file  names  are  shown when multiple files are
    397                  searched. For matching lines, the file name is followed by  a
    398                  colon;  for  context lines, a hyphen separator is used.  If a
    399                  line number is also being output, it follows the  file  name.
    400                  This option overrides any previous -H, -L, or -l options.
    401 
    402        --heap-limit=number
    403                  See --match-limit below.
    404 
    405        --help    Output  a  help  message, giving brief details of the command
    406                  options and file type support, and then exit.  Anything  else
    407                  on the command line is ignored.
    408 
    409        -I        Ignore   binary   files.  This  is  equivalent  to  --binary-
    410                  files=without-match.
    411 
    412        -i, --ignore-case
    413                  Ignore upper/lower case distinctions during comparisons.
    414 
    415        --include=pattern
    416                  If any --include patterns are specified, the only files  that
    417                  are  processed  are those that match one of the patterns (and
    418                  do not match an --exclude  pattern).  This  option  does  not
    419                  affect  directories,  but  it  applies  to all files, whether
    420                  listed on the command line, obtained from --file-list, or  by
    421                  scanning  a directory. The pattern is a PCRE2 regular expres-
    422                  sion, and is matched against the final component of the  file
    423                  name,  not the entire path. The -F, -w, and -x options do not
    424                  apply to this pattern. The option may be given any number  of
    425                  times.  If  a  file  name  matches  both  an --include and an
    426                  --exclude pattern, it is excluded.  There is  no  short  form
    427                  for this option.
    428 
    429        --include-from=filename
    430                  Treat  each  non-empty  line  of  the file as the data for an
    431                  --include option. What constitutes a newline for this purpose
    432                  is  the  operating system's default. The --newline option has
    433                  no effect on this option. This option may be given any number
    434                  of times; all the files are read.
    435 
    436        --include-dir=pattern
    437                  If  any --include-dir patterns are specified, the only direc-
    438                  tories that are processed are those that  match  one  of  the
    439                  patterns  (and  do  not match an --exclude-dir pattern). This
    440                  applies to all directories, whether  listed  on  the  command
    441                  line,  obtained  from  --file-list,  or  by scanning a parent
    442                  directory. The pattern is a PCRE2 regular expression, and  is
    443                  matched  against  the  final component of the directory name,
    444                  not the entire path. The -F, -w, and -x options do not  apply
    445                  to this pattern. The option may be given any number of times.
    446                  If a directory matches both --include-dir and  --exclude-dir,
    447                  it is excluded. There is no short form for this option.
    448 
    449        -L, --files-without-match
    450                  Instead  of  outputting lines from the files, just output the
    451                  names of the files that do not contain any lines  that  would
    452                  have  been  output. Each file name is output once, on a sepa-
    453                  rate line. This option overrides any previous -H, -h,  or  -l
    454                  options.
    455 
    456        -l, --files-with-matches
    457                  Instead  of  outputting lines from the files, just output the
    458                  names of the files containing lines that would have been out-
    459                  put.  Each  file  name  is  output  once, on a separate line.
    460                  Searching normally stops as soon as a matching line is  found
    461                  in  a  file.  However, if the -c (count) option is also used,
    462                  matching continues in order to obtain the correct count,  and
    463                  those  files  that  have  at least one match are listed along
    464                  with their counts. Using this option with -c is a way of sup-
    465                  pressing  the  listing  of files with no matches. This opeion
    466                  overrides any previous -H, -h, or -L options.
    467 
    468        --label=name
    469                  This option supplies a name to be used for the standard input
    470                  when file names are being output. If not supplied, "(standard
    471                  input)" is used. There is no short form for this option.
    472 
    473        --line-buffered
    474                  When this option is given, non-compressed input is  read  and
    475                  processed  line by line, and the output is flushed after each
    476                  write. By default, input is  read  in  large  chunks,  unless
    477                  pcre2grep  can  determine  that it is reading from a terminal
    478                  (which is currently possible only in  Unix-like  environments
    479                  or  Windows).  Output  to  terminal is normally automatically
    480                  flushed by the operating system. This option  can  be  useful
    481                  when the input or output is attached to a pipe and you do not
    482                  want pcre2grep to buffer up large amounts of data.   However,
    483                  its  use  will  affect  performance,  and  the -M (multiline)
    484                  option ceases to work. When input is from a compressed .gz or
    485                  .bz2 file, --line-buffered is ignored.
    486 
    487        --line-offsets
    488                  Instead  of  showing lines or parts of lines that match, show
    489                  each match as a line number, the offset from the start of the
    490                  line,  and a length. The line number is terminated by a colon
    491                  (as usual; see the -n option), and the offset and length  are
    492                  separated  by  a  comma.  In  this mode, no context is shown.
    493                  That is, the -A, -B, and -C options are ignored. If there  is
    494                  more  than  one  match in a line, each of them is shown sepa-
    495                  rately. This option  is  mutually  exclusive  with  --output,
    496                  --file-offsets, and --only-matching.
    497 
    498        --locale=locale-name
    499                  This  option specifies a locale to be used for pattern match-
    500                  ing. It overrides the value in the LC_ALL or  LC_CTYPE  envi-
    501                  ronment  variables.  If  no  locale  is  specified, the PCRE2
    502                  library's default (usually the "C" locale) is used. There  is
    503                  no short form for this option.
    504 
    505        --match-limit=number
    506                  Processing  some  regular expression patterns may take a very
    507                  long time to search for all possible matching strings. Others
    508                  may  require  a  very large amount of memory. There are three
    509                  options that set resource limits for matching.
    510 
    511                  The --match-limit option provides a means of limiting comput-
    512                  ing  resource  usage  when  processing  patterns that are not
    513                  going to match, but which have a very large number of  possi-
    514                  bilities in their search trees. The classic example is a pat-
    515                  tern that uses nested unlimited  repeats.  Internally,  PCRE2
    516                  has  a  counter that is incremented each time around its main
    517                  processing  loop.  If  the  value  set  by  --match-limit  is
    518                  reached, an error occurs.
    519 
    520                  The  --heap-limit  option specifies, as a number of kibibytes
    521                  (units of 1024 bytes), the amount of heap memory that may  be
    522                  used for matching. Heap memory is needed only if matching the
    523                  pattern requires a significant number of nested  backtracking
    524                  points to be remembered. This parameter can be set to zero to
    525                  forbid the use of heap memory altogether.
    526 
    527                  The --depth-limit option limits the  depth  of  nested  back-
    528                  tracking points, which indirectly limits the amount of memory
    529                  that is used. The amount of memory needed for each backtrack-
    530                  ing  point  depends on the number of capturing parentheses in
    531                  the pattern, so the amount of memory that is used before this
    532                  limit  acts  varies from pattern to pattern. This limit is of
    533                  use only if it is set smaller than --match-limit.
    534 
    535                  There are no short forms for these options. The default  lim-
    536                  its  can  be  set when the PCRE2 library is compiled; if they
    537                  are not specified, the defaults are very large and so  effec-
    538                  tively unlimited.
    539 
    540        --max-buffer-size=number
    541                  This  limits  the  expansion  of the processing buffer, whose
    542                  initial size can be set by --buffer-size. The maximum  buffer
    543                  size  is  silently  forced to be no smaller than the starting
    544                  buffer size.
    545 
    546        -M, --multiline
    547                  Allow patterns to match more than one line. When this  option
    548                  is set, the PCRE2 library is called in "multiline" mode. This
    549                  allows a matched string to extend past the end of a line  and
    550                  continue  on one or more subsequent lines. Patterns used with
    551                  -M may usefully contain literal newline characters and inter-
    552                  nal  occurrences of ^ and $ characters. The output for a suc-
    553                  cessful match may consist of more than one  line.  The  first
    554                  line  is  the  line  in which the match started, and the last
    555                  line is the line in which the match  ended.  If  the  matched
    556                  string  ends  with a newline sequence, the output ends at the
    557                  end of that line.  If -v is set,  none  of  the  lines  in  a
    558                  multi-line  match  are output. Once a match has been handled,
    559                  scanning restarts at the beginning of the line after the  one
    560                  in which the match ended.
    561 
    562                  The  newline  sequence  that separates multiple lines must be
    563                  matched as part of the pattern.  For  example,  to  find  the
    564                  phrase  "regular  expression" in a file where "regular" might
    565                  be at the end of a line and "expression" at the start of  the
    566                  next line, you could use this command:
    567 
    568                    pcre2grep -M 'regular\s+expression' <file>
    569 
    570                  The  \s  escape  sequence  matches any white space character,
    571                  including newlines, and is followed  by  +  so  as  to  match
    572                  trailing  white  space  on the first line as well as possibly
    573                  handling a two-character newline sequence.
    574 
    575                  There is a limit to the number of lines that can be  matched,
    576                  imposed  by  the way that pcre2grep buffers the input file as
    577                  it scans it. With a  sufficiently  large  processing  buffer,
    578                  this should not be a problem, but the -M option does not work
    579                  when input is read line by line (see --line-buffered.)
    580 
    581        -N newline-type, --newline=newline-type
    582                  The PCRE2 library supports  five  different  conventions  for
    583                  indicating  the  ends of lines. They are the single-character
    584                  sequences CR (carriage return) and LF  (linefeed),  the  two-
    585                  character  sequence CRLF, an "anycrlf" convention, which rec-
    586                  ognizes any of the preceding three types, and an  "any"  con-
    587                  vention, in which any Unicode line ending sequence is assumed
    588                  to end a line. The Unicode sequences are the three just  men-
    589                  tioned,  plus  VT  (vertical  tab,  U+000B),  FF  (form feed,
    590                  U+000C),  NEL  (next  line,  U+0085),  LS  (line   separator,
    591                  U+2028), and PS (paragraph separator, U+2029).
    592 
    593                  When  the  PCRE2  library  is  built,  a  default line-ending
    594                  sequence  is  specified.   This  is  normally  the   standard
    595                  sequence for the operating system. Unless otherwise specified
    596                  by this option, pcre2grep uses the  library's  default.   The
    597                  possible values for this option are CR, LF, CRLF, ANYCRLF, or
    598                  ANY. This makes it possible to use pcre2grep  to  scan  files
    599                  that have come from other environments without having to mod-
    600                  ify their line endings. If the data  that  is  being  scanned
    601                  does  not  agree  with  the  convention  set  by this option,
    602                  pcre2grep may behave in strange ways. Note that  this  option
    603                  does  not apply to files specified by the -f, --exclude-from,
    604                  or --include-from options, which  are  expected  to  use  the
    605                  operating system's standard newline sequence.
    606 
    607        -n, --line-number
    608                  Precede each output line by its line number in the file, fol-
    609                  lowed by a colon for matching lines or a hyphen  for  context
    610                  lines. If the file name is also being output, it precedes the
    611                  line number. When the -M option causes  a  pattern  to  match
    612                  more  than  one  line, only the first is preceded by its line
    613                  number. This option is forced if --line-offsets is used.
    614 
    615        --no-jit  If the PCRE2 library is built with support  for  just-in-time
    616                  compiling (which speeds up matching), pcre2grep automatically
    617                  makes use of this, unless it was explicitly disabled at build
    618                  time.  This  option  can be used to disable the use of JIT at
    619                  run time. It is provided for testing and working round  prob-
    620                  lems.  It should never be needed in normal use.
    621 
    622        -O text, --output=text
    623                  When  there  is a match, instead of outputting the whole line
    624                  that matched, output just the  given  text.  This  option  is
    625                  mutually  exclusive with --only-matching, --file-offsets, and
    626                  --line-offsets. Escape sequences starting with a dollar char-
    627                  acter  may be used to insert the contents of the matched part
    628                  of the line and/or captured substrings into the text.
    629 
    630                  $<digits> or ${<digits>} is replaced  by  the  captured  sub-
    631                  string  of  the  given  decimal  number; zero substitutes the
    632                  whole match. If the number is greater than the number of cap-
    633                  turing  substrings,  or if the capture is unset, the replace-
    634                  ment is empty.
    635 
    636                  $a is replaced by bell; $b by backspace; $e by escape; $f  by
    637                  form  feed;  $n by newline; $r by carriage return; $t by tab;
    638                  $v by vertical tab.
    639 
    640                  $o<digits> is replaced by the character  represented  by  the
    641                  given octal number; up to three digits are processed.
    642 
    643                  $x<digits>  is  replaced  by the character represented by the
    644                  given hexadecimal number; up to two digits are processed.
    645 
    646                  Any other character is substituted by itself. In  particular,
    647                  $$ is replaced by a single dollar.
    648 
    649        -o, --only-matching
    650                  Show only the part of the line that matched a pattern instead
    651                  of the whole line. In this mode, no context  is  shown.  That
    652                  is,  the -A, -B, and -C options are ignored. If there is more
    653                  than one match in a line, each of them is  shown  separately,
    654                  on  a  separate  line  of  output.  If -o is combined with -v
    655                  (invert the sense of the match to find  non-matching  lines),
    656                  no  output is generated, but the return code is set appropri-
    657                  ately. If the matched portion of the line is  empty,  nothing
    658                  is  output  unless  the  file  name  or line number are being
    659                  printed, in which case they are shown on an  otherwise  empty
    660                  line.  This  option  is  mutually  exclusive  with  --output,
    661                  --file-offsets and --line-offsets.
    662 
    663        -onumber, --only-matching=number
    664                  Show only the part of the line  that  matched  the  capturing
    665                  parentheses of the given number. Up to 32 capturing parenthe-
    666                  ses are supported, and -o0 is equivalent to -o without a num-
    667                  ber.  Because  these options can be given without an argument
    668                  (see above), if an argument is present, it must be  given  in
    669                  the  same  shell item, for example, -o3 or --only-matching=2.
    670                  The comments given for the non-argument case above also apply
    671                  to this option. If the specified capturing parentheses do not
    672                  exist in the pattern, or were not set in the  match,  nothing
    673                  is  output unless the file name or line number are being out-
    674                  put.
    675 
    676                  If this option is given multiple times,  multiple  substrings
    677                  are  output  for  each  match,  in  the order the options are
    678                  given, and all on one line. For example, -o3 -o1  -o3  causes
    679                  the  substrings  matched by capturing parentheses 3 and 1 and
    680                  then 3 again to be output. By default, there is no  separator
    681                  (but see the next option).
    682 
    683        --om-separator=text
    684                  Specify  a  separating string for multiple occurrences of -o.
    685                  The default is an empty string. Separating strings are  never
    686                  coloured.
    687 
    688        -q, --quiet
    689                  Work quietly, that is, display nothing except error messages.
    690                  The exit status indicates whether or  not  any  matches  were
    691                  found.
    692 
    693        -r, --recursive
    694                  If  any given path is a directory, recursively scan the files
    695                  it contains, taking note of any --include and --exclude  set-
    696                  tings.  By  default, a directory is read as a normal file; in
    697                  some operating systems this gives an  immediate  end-of-file.
    698                  This  option  is  a  shorthand  for  setting the -d option to
    699                  "recurse".
    700 
    701        --recursion-limit=number
    702                  See --match-limit above.
    703 
    704        -s, --no-messages
    705                  Suppress error  messages  about  non-existent  or  unreadable
    706                  files.  Such  files  are quietly skipped. However, the return
    707                  code is still 2, even if matches were found in other files.
    708 
    709        -t, --total-count
    710                  This option is useful when scanning more than  one  file.  If
    711                  used  on its own, -t suppresses all output except for a grand
    712                  total number of matching lines (or non-matching lines  if  -v
    713                  is  used)  in  all  the files. If -t is used with -c, a grand
    714                  total is output except when the previous output is  just  one
    715                  line.  In  other words, it is not output when just one file's
    716                  count is listed. If file names are being  output,  the  grand
    717                  total  is preceded by "TOTAL:". Otherwise, it appears as just
    718                  another number. The -t option is ignored when  used  with  -L
    719                  (list  files  without matches), because the grand total would
    720                  always be zero.
    721 
    722        -u, --utf-8
    723                  Operate in UTF-8 mode. This option is available only if PCRE2
    724                  has been compiled with UTF-8 support. All patterns (including
    725                  those for any --exclude and --include options) and  all  sub-
    726                  ject  lines  that  are scanned must be valid strings of UTF-8
    727                  characters.
    728 
    729        -V, --version
    730                  Write the version numbers of pcre2grep and the PCRE2  library
    731                  to  the  standard  output and then exit. Anything else on the
    732                  command line is ignored.
    733 
    734        -v, --invert-match
    735                  Invert the sense of the match, so that  lines  which  do  not
    736                  match any of the patterns are the ones that are found.
    737 
    738        -w, --word-regex, --word-regexp
    739                  Force the patterns only to match "words". That is, there must
    740                  be a word boundary at the  start  and  end  of  each  matched
    741                  string.  This is equivalent to having "\b(?:" at the start of
    742                  each pattern, and ")\b" at the end. This option applies  only
    743                  to  the  patterns  that  are  matched against the contents of
    744                  files; it does not apply to patterns specified by any of  the
    745                  --include or --exclude options.
    746 
    747        -x, --line-regex, --line-regexp
    748                  Force  the  patterns to start matching only at the beginnings
    749                  of lines, and in  addition,  require  them  to  match  entire
    750                  lines. In multiline mode the match may be more than one line.
    751                  This is equivalent to having "^(?:" at the start of each pat-
    752                  tern  and  ")$"  at  the end. This option applies only to the
    753                  patterns that are matched against the contents of  files;  it
    754                  does  not apply to patterns specified by any of the --include
    755                  or --exclude options.
    756 
    757 
    758 ENVIRONMENT VARIABLES
    759 
    760        The environment variables LC_ALL and LC_CTYPE  are  examined,  in  that
    761        order,  for  a  locale.  The first one that is set is used. This can be
    762        overridden by the --locale option. If  no  locale  is  set,  the  PCRE2
    763        library's default (usually the "C" locale) is used.
    764 
    765 
    766 NEWLINES
    767 
    768        The -N (--newline) option allows pcre2grep to scan files with different
    769        newline conventions from the default. Any parts of the input files that
    770        are  written  to the standard output are copied identically, with what-
    771        ever newline sequences they have in the input. However, the setting  of
    772        this  option  affects only the way scanned files are processed. It does
    773        not affect the interpretation of files specified  by  the  -f,  --file-
    774        list, --exclude-from, or --include-from options, nor does it affect the
    775        way in which pcre2grep writes informational messages  to  the  standard
    776        error and output streams. For these it uses the string "\n" to indicate
    777        newlines, relying on the C I/O library to convert this to an  appropri-
    778        ate sequence.
    779 
    780 
    781 OPTIONS COMPATIBILITY
    782 
    783        Many of the short and long forms of pcre2grep's options are the same as
    784        in the GNU grep program. Any long option of the form --xxx-regexp  (GNU
    785        terminology) is also available as --xxx-regex (PCRE2 terminology). How-
    786        ever, the  --depth-limit,  --file-list,  --file-offsets,  --heap-limit,
    787        --include-dir,  --line-offsets,  --locale,  --match-limit, -M, --multi-
    788        line, -N, --newline, --om-separator, --output, -u, and --utf-8  options
    789        are  specific to pcre2grep, as is the use of the --only-matching option
    790        with a capturing parentheses number.
    791 
    792        Although most of the common options work the same way, a few  are  dif-
    793        ferent  in pcre2grep. For example, the --include option's argument is a
    794        glob for GNU grep, but a regular expression for pcre2grep. If both  the
    795        -c  and  -l  options are given, GNU grep lists only file names, without
    796        counts, but pcre2grep gives the counts as well.
    797 
    798 
    799 OPTIONS WITH DATA
    800 
    801        There are four different ways in which an option with data can be spec-
    802        ified.   If  a  short  form option is used, the data may follow immedi-
    803        ately, or (with one exception) in the next command line item. For exam-
    804        ple:
    805 
    806          -f/some/file
    807          -f /some/file
    808 
    809        The  exception is the -o option, which may appear with or without data.
    810        Because of this, if data is present, it must follow immediately in  the
    811        same item, for example -o3.
    812 
    813        If  a long form option is used, the data may appear in the same command
    814        line item, separated by an equals character, or (with  two  exceptions)
    815        it may appear in the next command line item. For example:
    816 
    817          --file=/some/file
    818          --file /some/file
    819 
    820        Note,  however, that if you want to supply a file name beginning with ~
    821        as data in a shell command, and have the  shell  expand  ~  to  a  home
    822        directory, you must separate the file name from the option, because the
    823        shell does not treat ~ specially unless it is at the start of an item.
    824 
    825        The exceptions to the above are the --colour (or --color)  and  --only-
    826        matching  options,  for  which  the  data  is optional. If one of these
    827        options does have data, it must be given in the first  form,  using  an
    828        equals character. Otherwise pcre2grep will assume that it has no data.
    829 
    830 
    831 USING PCRE2'S CALLOUT FACILITY
    832 
    833        pcre2grep  has,  by  default,  support for calling external programs or
    834        scripts or echoing specific strings during matching by  making  use  of
    835        PCRE2's  callout  facility.  However, this support can be disabled when
    836        pcre2grep is built. You can find out whether your  binary  has  support
    837        for  callouts  by  running it with the --help option. If the support is
    838        not enabled, all callouts in patterns are ignored by pcre2grep.
    839 
    840        A callout in a PCRE2 pattern is of the form (?C<arg>) where  the  argu-
    841        ment  is either a number or a quoted string (see the pcre2callout docu-
    842        mentation for details). Numbered callouts  are  ignored  by  pcre2grep;
    843        only callouts with string arguments are useful.
    844 
    845    Calling external programs or scripts
    846 
    847        If the callout string does not start with a pipe (vertical bar) charac-
    848        ter, it is parsed into a list of substrings separated by  pipe  charac-
    849        ters.  The first substring must be an executable name, with the follow-
    850        ing substrings specifying arguments:
    851 
    852          executable_name|arg1|arg2|...
    853 
    854        Any substring  (including  the  executable  name)  may  contain  escape
    855        sequences  started  by  a dollar character: $<digits> or ${<digits>} is
    856        replaced by the captured substring of the given decimal  number,  which
    857        must  be greater than zero. If the number is greater than the number of
    858        capturing substrings, or if the capture is unset,  the  replacement  is
    859        empty.
    860 
    861        Any  other  character  is  substituted  by itself. In particular, $$ is
    862        replaced by a single dollar and $| is replaced  by  a  pipe  character.
    863        Here is an example:
    864 
    865          echo -e "abcde\n12345" | pcre2grep \
    866            '(?x)(.)(..(.))
    867            (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
    868 
    869          Output:
    870 
    871            Arg1: [a] [bcd] [d] Arg2: |a| ()
    872            abcde
    873            Arg1: [1] [234] [4] Arg2: |1| ()
    874            12345
    875 
    876        The parameters for the execv() system call that is used to run the pro-
    877        gram or script are zero-terminated strings. This means that binary zero
    878        characters  in the callout argument will cause premature termination of
    879        their substrings, and therefore  should  not  be  present.  Any  syntax
    880        errors  in  the  string  (for example, a dollar not followed by another
    881        character) cause the callout to be  ignored.  If  running  the  program
    882        fails for any reason (including the non-existence of the executable), a
    883        local matching failure occurs and the matcher backtracks in the  normal
    884        way.
    885 
    886    Echoing a specific string
    887 
    888        If  the callout string starts with a pipe (vertical bar) character, the
    889        rest of the string is written to the output, having been passed through
    890        the  same escape processing as text from the --output option. This pro-
    891        vides a simple echoing facility that avoids calling an external program
    892        or  script. No terminator is added to the string, so if you want a new-
    893        line, you must include  it  explicitly.   Matching  continues  normally
    894        after  the string is output. If you want to see only the callout output
    895        but not any output from an actual match, you should  end  the  relevant
    896        pattern with (*FAIL).
    897 
    898 
    899 MATCHING ERRORS
    900 
    901        It  is  possible  to supply a regular expression that takes a very long
    902        time to fail to match certain lines.  Such  patterns  normally  involve
    903        nested  indefinite repeats, for example: (a+)*\d when matched against a
    904        line of a's with no final digit. The  PCRE2  matching  function  has  a
    905        resource  limit that causes it to abort in these circumstances. If this
    906        happens, pcre2grep outputs an error message and the  line  that  caused
    907        the  problem  to  the  standard error stream. If there are more than 20
    908        such errors, pcre2grep gives up.
    909 
    910        The --match-limit option of pcre2grep can be used to  set  the  overall
    911        resource  limit.  There are also other limits that affect the amount of
    912        memory used during matching; see the  discussion  of  --heap-limit  and
    913        --depth-limit above.
    914 
    915 
    916 DIAGNOSTICS
    917 
    918        Exit status is 0 if any matches were found, 1 if no matches were found,
    919        and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
    920        files  (even if matches were found in other files) or too many matching
    921        errors. Using the -s option to suppress error messages about inaccessi-
    922        ble files does not affect the return code.
    923 
    924        When   run  under  VMS,  the  return  code  is  placed  in  the  symbol
    925        PCRE2GREP_RC because VMS  does  not  distinguish  between  exit(0)  and
    926        exit(1).
    927 
    928 
    929 SEE ALSO
    930 
    931        pcre2pattern(3), pcre2syntax(3), pcre2callout(3).
    932 
    933 
    934 AUTHOR
    935 
    936        Philip Hazel
    937        University Computing Service
    938        Cambridge, England.
    939 
    940 
    941 REVISION
    942 
    943        Last updated: 24 February 2018
    944        Copyright (c) 1997-2018 University of Cambridge.
    945