Home | History | Annotate | Download | only in doc
      1 PCRE2GREP(1)                General Commands Manual               PCRE2GREP(1)
      2 
      3 
      4 
      5 NAME
      6        pcre2grep - a grep with Perl-compatible regular expressions.
      7 
      8 SYNOPSIS
      9        pcre2grep [options] [long options] [pattern] [path1 path2 ...]
     10 
     11 
     12 DESCRIPTION
     13 
     14        pcre2grep  searches  files  for  character patterns, in the same way as
     15        other grep commands do,  but  it  uses  the  PCRE2  regular  expression
     16        library  to  support  patterns  that  are  compatible  with the regular
     17        expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
     18        of  pattern  syntax,  or  pcre2pattern(3) for a full description of the
     19        syntax and semantics of the regular expressions that PCRE2 supports.
     20 
     21        Patterns, whether supplied on the command line or in a  separate  file,
     22        are given without delimiters. For example:
     23 
     24          pcre2grep Thursday /etc/motd
     25 
     26        If you attempt to use delimiters (for example, by surrounding a pattern
     27        with slashes, as is common in Perl scripts), they  are  interpreted  as
     28        part  of  the pattern. Quotes can of course be used to delimit patterns
     29        on the command line because they are  interpreted  by  the  shell,  and
     30        indeed  quotes  are required if a pattern contains white space or shell
     31        metacharacters.
     32 
     33        The first argument that follows any option settings is treated  as  the
     34        single  pattern  to be matched when neither -e nor -f is present.  Con-
     35        versely, when one or both of these options are  used  to  specify  pat-
     36        terns, all arguments are treated as path names. At least one of -e, -f,
     37        or an argument pattern must be provided.
     38 
     39        If no files are specified, pcre2grep  reads  the  standard  input.  The
     40        standard  input can also be referenced by a name consisting of a single
     41        hyphen.  For example:
     42 
     43          pcre2grep some-pattern file1 - file3
     44 
     45        Input files are searched line by  line.  By  default,  each  line  that
     46        matches  a  pattern  is  copied to the standard output, and if there is
     47        more than one file, the file name is output at the start of each  line,
     48        followed  by  a  colon.  However, there are options that can change how
     49        pcre2grep behaves. In particular, the -M option makes  it  possible  to
     50        search  for  strings  that  span  line  boundaries. What defines a line
     51        boundary is controlled by the -N (--newline) option.
     52 
     53        The amount of memory used for buffering files that are being scanned is
     54        controlled  by a parameter that can be set by the --buffer-size option.
     55        The default value for this parameter is  specified  when  pcre2grep  is
     56        built,  with  the  default  default  being 20K. A block of memory three
     57        times this size is used (to allow for buffering  "before"  and  "after"
     58        lines). An error occurs if a line overflows the buffer.
     59 
     60        Patterns  can  be  no  longer than 8K or BUFSIZ bytes, whichever is the
     61        greater.  BUFSIZ is defined in <stdio.h>. When there is more  than  one
     62        pattern (specified by the use of -e and/or -f), each pattern is applied
     63        to each line in the order in which they are defined,  except  that  all
     64        the -e patterns are tried before the -f patterns.
     65 
     66        By  default, as soon as one pattern matches a line, no further patterns
     67        are considered. However, if --colour (or --color) is used to colour the
     68        matching  substrings, or if --only-matching, --file-offsets, or --line-
     69        offsets is used to output only  the  part  of  the  line  that  matched
     70        (either shown literally, or as an offset), scanning resumes immediately
     71        following the match, so that further matches on the same  line  can  be
     72        found.  If  there  are  multiple  patterns,  they  are all tried on the
     73        remainder of the line, but patterns that follow the  one  that  matched
     74        are not tried on the earlier part of the line.
     75 
     76        This  behaviour  means  that  the  order in which multiple patterns are
     77        specified can affect the output when one of the above options is  used.
     78        This  is no longer the same behaviour as GNU grep, which now manages to
     79        display earlier matches for later patterns (as  long  as  there  is  no
     80        overlap).
     81 
     82        Patterns  that can match an empty string are accepted, but empty string
     83        matches   are   never   recognized.   An   example   is   the   pattern
     84        "(super)?(man)?",  in  which  all components are optional. This pattern
     85        finds all occurrences of both "super" and  "man";  the  output  differs
     86        from  matching  with  "super|man" when only the matching substrings are
     87        being shown.
     88 
     89        If the LC_ALL or LC_CTYPE environment variable is set,  pcre2grep  uses
     90        the value to set a locale when calling the PCRE2 library.  The --locale
     91        option can be used to override this.
     92 
     93 
     94 SUPPORT FOR COMPRESSED FILES
     95 
     96        It is possible to compile pcre2grep so that it uses libz or  libbz2  to
     97        read  files  whose names end in .gz or .bz2, respectively. You can find
     98        out whether your binary has support for one or both of these file types
     99        by running it with the --help option. If the appropriate support is not
    100        present, files are treated as plain text. The standard input is  always
    101        so treated.
    102 
    103 
    104 BINARY FILES
    105 
    106        By  default,  a  file that contains a binary zero byte within the first
    107        1024 bytes is identified as a binary file, and is processed  specially.
    108        (GNU  grep  also  identifies  binary  files  in  this  manner.) See the
    109        --binary-files option for a means of changing the way binary files  are
    110        handled.
    111 
    112 
    113 OPTIONS
    114 
    115        The  order  in  which some of the options appear can affect the output.
    116        For example, both the -h and -l options affect  the  printing  of  file
    117        names.  Whichever  comes later in the command line will be the one that
    118        takes effect. Similarly, except where noted  below,  if  an  option  is
    119        given  twice,  the  later setting is used. Numerical values for options
    120        may be followed by K  or  M,  to  signify  multiplication  by  1024  or
    121        1024*1024 respectively.
    122 
    123        --        This terminates the list of options. It is useful if the next
    124                  item on the command line starts with a hyphen but is  not  an
    125                  option.  This  allows for the processing of patterns and file
    126                  names that start with hyphens.
    127 
    128        -A number, --after-context=number
    129                  Output number lines of context after each matching  line.  If
    130                  file  names  and/or  line  numbers are being output, a hyphen
    131                  separator is used instead of a colon for the context lines. A
    132                  line  containing  "--" is output between each group of lines,
    133                  unless they are in fact contiguous in  the  input  file.  The
    134                  value  of number is expected to be relatively small. However,
    135                  pcre2grep guarantees to have  up  to  8K  of  following  text
    136                  available for context output.
    137 
    138        -a, --text
    139                  Treat  binary  files as text. This is equivalent to --binary-
    140                  files=text.
    141 
    142        -B number, --before-context=number
    143                  Output number lines of context before each matching line.  If
    144                  file  names  and/or  line  numbers are being output, a hyphen
    145                  separator is used instead of a colon for the context lines. A
    146                  line  containing  "--" is output between each group of lines,
    147                  unless they are in fact contiguous in  the  input  file.  The
    148                  value  of number is expected to be relatively small. However,
    149                  pcre2grep guarantees to have  up  to  8K  of  preceding  text
    150                  available for context output.
    151 
    152        --binary-files=word
    153                  Specify  how binary files are to be processed. If the word is
    154                  "binary" (the default),  pattern  matching  is  performed  on
    155                  binary  files,  but  the  only  output is "Binary file <name>
    156                  matches" when a match succeeds. If the word is "text",  which
    157                  is  equivalent  to  the -a or --text option, binary files are
    158                  processed in the same way as any other file.  In  this  case,
    159                  when  a  match  succeeds,  the  output may be binary garbage,
    160                  which can have nasty effects if sent to a  terminal.  If  the
    161                  word  is  "without-match",  which  is  equivalent  to  the -I
    162                  option, binary files are  not  processed  at  all;  they  are
    163                  assumed not to be of interest and are skipped without causing
    164                  any output or affecting the return code.
    165 
    166        --buffer-size=number
    167                  Set the parameter that controls how much memory is  used  for
    168                  buffering files that are being scanned.
    169 
    170        -C number, --context=number
    171                  Output  number  lines  of  context both before and after each
    172                  matching line.  This is equivalent to setting both -A and  -B
    173                  to the same value.
    174 
    175        -c, --count
    176                  Do  not  output  lines from the files that are being scanned;
    177                  instead output the number of matches (or non-matches if -v is
    178                  used)  that would otherwise have caused lines to be shown. By
    179                  default, this count is the same as the number  of  suppressed
    180                  lines, but if the -M (multiline) option is used (without -v),
    181                  there may  be  more  suppressed  lines  than  the  number  of
    182                  matches.
    183 
    184                  If  no lines are selected, the number zero is output. If sev-
    185                  eral files are are being scanned, a count is output for  each
    186                  of  them. However, if the --files-with-matches option is also
    187                  used, only those files whose counts are greater than zero are
    188                  listed.  When  -c  is  used,  the  -A, -B, and -C options are
    189                  ignored.
    190 
    191        --colour, --color
    192                  If this option is given without any data, it is equivalent to
    193                  "--colour=auto".   If  data  is required, it must be given in
    194                  the same shell item, separated by an equals sign.
    195 
    196        --colour=value, --color=value
    197                  This option specifies under what circumstances the parts of a
    198                  line that matched a pattern should be coloured in the output.
    199                  By default, the output is not coloured. The value  (which  is
    200                  optional,  see above) may be "never", "always", or "auto". In
    201                  the latter case, colouring happens only if the standard  out-
    202                  put  is connected to a terminal. More resources are used when
    203                  colouring is enabled, because pcre2grep has to search for all
    204                  possible  matches in a line, not just one, in order to colour
    205                  them all.
    206 
    207                  The colour that is used can be specified by setting the envi-
    208                  ronment  variable  PCRE2GREP_COLOUR  or  PCRE2GREP_COLOR. The
    209                  value of this variable should be a  string  of  two  numbers,
    210                  separated  by  a semicolon. They are copied directly into the
    211                  control string for setting colour on a  terminal,  so  it  is
    212                  your  responsibility  to ensure that they make sense. If nei-
    213                  ther of the environment variables  is  set,  the  default  is
    214                  "1;31", which gives red.
    215 
    216        -D action, --devices=action
    217                  If  an  input  path  is  not  a  regular file or a directory,
    218                  "action" specifies how it is to be  processed.  Valid  values
    219                  are "read" (the default) or "skip" (silently skip the path).
    220 
    221        -d action, --directories=action
    222                  If an input path is a directory, "action" specifies how it is
    223                  to be processed.  Valid values are  "read"  (the  default  in
    224                  non-Windows  environments,  for compatibility with GNU grep),
    225                  "recurse" (equivalent to the -r option), or "skip"  (silently
    226                  skip  the  path, the default in Windows environments). In the
    227                  "read" case, directories are read as if  they  were  ordinary
    228                  files.  In  some  operating  systems  the effect of reading a
    229                  directory like this is an immediate end-of-file; in others it
    230                  may provoke an error.
    231 
    232        -e pattern, --regex=pattern, --regexp=pattern
    233                  Specify a pattern to be matched. This option can be used mul-
    234                  tiple times in order to specify several patterns. It can also
    235                  be  used  as a way of specifying a single pattern that starts
    236                  with a hyphen. When -e is used, no argument pattern is  taken
    237                  from  the  command  line;  all  arguments are treated as file
    238                  names. There is no limit to the number of patterns. They  are
    239                  applied  to  each line in the order in which they are defined
    240                  until one matches.
    241 
    242                  If -f is used with -e, the command line patterns are  matched
    243                  first, followed by the patterns from the file(s), independent
    244                  of the order in which these options are specified. Note  that
    245                  multiple  use  of -e is not the same as a single pattern with
    246                  alternatives. For example, X|Y finds the first character in a
    247                  line  that  is  X or Y, whereas if the two patterns are given
    248                  separately, with X first, pcre2grep finds X if it is present,
    249                  even if it follows Y in the line. It finds Y only if there is
    250                  no X in the line. This matters only if you are  using  -o  or
    251                  --colo(u)r to show the part(s) of the line that matched.
    252 
    253        --exclude=pattern
    254                  Files (but not directories) whose names match the pattern are
    255                  skipped without being processed. This applies to  all  files,
    256                  whether  listed  on  the  command line, obtained from --file-
    257                  list, or by scanning a directory. The pattern is a PCRE2 reg-
    258                  ular  expression,  and is matched against the final component
    259                  of the file name, not the entire path. The  -F,  -w,  and  -x
    260                  options do not apply to this pattern. The option may be given
    261                  any number of times in order to specify multiple patterns. If
    262                  a  file  name matches both an --include and an --exclude pat-
    263                  tern, it is excluded. There is no short form for this option.
    264 
    265        --exclude-from=filename
    266                  Treat each non-empty line of the file  as  the  data  for  an
    267                  --exclude option. What constitutes a newline when reading the
    268                  file is the operating system's default. The --newline  option
    269                  has  no  effect on this option. This option may be given more
    270                  than once in order to specify a number of files to read.
    271 
    272        --exclude-dir=pattern
    273                  Directories whose names match the pattern are skipped without
    274                  being  processed,  whatever  the  setting  of the --recursive
    275                  option. This applies to all directories,  whether  listed  on
    276                  the command line, obtained from --file-list, or by scanning a
    277                  parent directory. The pattern is a PCRE2 regular  expression,
    278                  and  is  matched against the final component of the directory
    279                  name, not the entire path. The -F, -w, and -x options do  not
    280                  apply  to this pattern. The option may be given any number of
    281                  times in order to specify more than one pattern. If a  direc-
    282                  tory  matches  both  --include-dir  and  --exclude-dir, it is
    283                  excluded. There is no short form for this option.
    284 
    285        -F, --fixed-strings
    286                  Interpret each data-matching  pattern  as  a  list  of  fixed
    287                  strings,  separated  by  newlines,  instead  of  as a regular
    288                  expression. What constitutes a newline for  this  purpose  is
    289                  controlled  by the --newline option. The -w (match as a word)
    290                  and -x (match whole line) options can be used with -F.   They
    291                  apply to each of the fixed strings. A line is selected if any
    292                  of the fixed strings are found in it (subject to -w or -x, if
    293                  present).  This  option applies only to the patterns that are
    294                  matched against the contents of files; it does not  apply  to
    295                  patterns  specified  by  any  of  the  --include or --exclude
    296                  options.
    297 
    298        -f filename, --file=filename
    299                  Read patterns from the file, one per  line,  and  match  them
    300                  against  each  line of input. What constitutes a newline when
    301                  reading the file  is  the  operating  system's  default.  The
    302                  --newline option has no effect on this option. Trailing white
    303                  space is removed from each line, and blank lines are ignored.
    304                  An  empty  file  contains  no  patterns and therefore matches
    305                  nothing. See also the comments about multiple patterns versus
    306                  a  single  pattern with alternatives in the description of -e
    307                  above.
    308 
    309                  If this option is given more than  once,  all  the  specified
    310                  files  are read. A data line is output if any of the patterns
    311                  match it. A file name can be given as "-"  to  refer  to  the
    312                  standard  input.  When  -f is used, patterns specified on the
    313                  command line using -e may also be present;  they  are  tested
    314                  before  the  file's  patterns.  However,  no other pattern is
    315                  taken from the command line; all arguments are treated as the
    316                  names of paths to be searched.
    317 
    318        --file-list=filename
    319                  Read  a  list  of  files  and/or  directories  that are to be
    320                  scanned from the given file, one  per  line.  Trailing  white
    321                  space is removed from each line, and blank lines are ignored.
    322                  These paths are processed before any that are listed  on  the
    323                  command  line.  The file name can be given as "-" to refer to
    324                  the standard input.  If --file and --file-list are both spec-
    325                  ified  as  "-",  patterns are read first. This is useful only
    326                  when the standard input is a  terminal,  from  which  further
    327                  lines  (the  list  of files) can be read after an end-of-file
    328                  indication. If this option is given more than once,  all  the
    329                  specified files are read.
    330 
    331        --file-offsets
    332                  Instead  of  showing lines or parts of lines that match, show
    333                  each match as an offset from the start  of  the  file  and  a
    334                  length,  separated  by  a  comma. In this mode, no context is
    335                  shown. That is, the -A, -B, and -C options  are  ignored.  If
    336                  there is more than one match in a line, each of them is shown
    337                  separately. This option is mutually  exclusive  with  --line-
    338                  offsets and --only-matching.
    339 
    340        -H, --with-filename
    341                  Force  the  inclusion of the file name at the start of output
    342                  lines when searching a single file. By default, the file name
    343                  is not shown in this case.  For matching lines, the file name
    344                  is followed by a colon; for context lines, a hyphen separator
    345                  is  used.  If  a line number is also being output, it follows
    346                  the file name. When the -M option causes a pattern  to  match
    347                  more  than  one  line, only the first is preceded by the file
    348                  name.
    349 
    350        -h, --no-filename
    351                  Suppress the output file names when searching multiple files.
    352                  By  default,  file  names  are  shown when multiple files are
    353                  searched. For matching lines, the file name is followed by  a
    354                  colon;  for  context lines, a hyphen separator is used.  If a
    355                  line number is also being output, it follows the file name.
    356 
    357        --help    Output a help message, giving brief details  of  the  command
    358                  options  and  file type support, and then exit. Anything else
    359                  on the command line is ignored.
    360 
    361        -I        Ignore  binary  files.  This  is  equivalent   to   --binary-
    362                  files=without-match.
    363 
    364        -i, --ignore-case
    365                  Ignore upper/lower case distinctions during comparisons.
    366 
    367        --include=pattern
    368                  If  any --include patterns are specified, the only files that
    369                  are processed are those that match one of the  patterns  (and
    370                  do  not  match  an  --exclude  pattern). This option does not
    371                  affect directories, but it  applies  to  all  files,  whether
    372                  listed  on the command line, obtained from --file-list, or by
    373                  scanning a directory. The pattern is a PCRE2 regular  expres-
    374                  sion,  and is matched against the final component of the file
    375                  name, not the entire path. The -F, -w, and -x options do  not
    376                  apply  to this pattern. The option may be given any number of
    377                  times. If a file  name  matches  both  an  --include  and  an
    378                  --exclude  pattern,  it  is excluded.  There is no short form
    379                  for this option.
    380 
    381        --include-from=filename
    382                  Treat each non-empty line of the file  as  the  data  for  an
    383                  --include option. What constitutes a newline for this purpose
    384                  is the operating system's default. The --newline  option  has
    385                  no effect on this option. This option may be given any number
    386                  of times; all the files are read.
    387 
    388        --include-dir=pattern
    389                  If any --include-dir patterns are specified, the only  direc-
    390                  tories  that  are  processed  are those that match one of the
    391                  patterns (and do not match an  --exclude-dir  pattern).  This
    392                  applies  to  all  directories,  whether listed on the command
    393                  line, obtained from --file-list,  or  by  scanning  a  parent
    394                  directory.  The pattern is a PCRE2 regular expression, and is
    395                  matched against the final component of  the  directory  name,
    396                  not  the entire path. The -F, -w, and -x options do not apply
    397                  to this pattern. The option may be given any number of times.
    398                  If  a directory matches both --include-dir and --exclude-dir,
    399                  it is excluded. There is no short form for this option.
    400 
    401        -L, --files-without-match
    402                  Instead of outputting lines from the files, just  output  the
    403                  names  of  the files that do not contain any lines that would
    404                  have been output. Each file name is output once, on  a  sepa-
    405                  rate line.
    406 
    407        -l, --files-with-matches
    408                  Instead  of  outputting lines from the files, just output the
    409                  names of the files containing lines that would have been out-
    410                  put.  Each  file  name  is  output  once, on a separate line.
    411                  Searching normally stops as soon as a matching line is  found
    412                  in  a  file.  However, if the -c (count) option is also used,
    413                  matching continues in order to obtain the correct count,  and
    414                  those  files  that  have  at least one match are listed along
    415                  with their counts. Using this option with -c is a way of sup-
    416                  pressing the listing of files with no matches.
    417 
    418        --label=name
    419                  This option supplies a name to be used for the standard input
    420                  when file names are being output. If not supplied, "(standard
    421                  input)" is used. There is no short form for this option.
    422 
    423        --line-buffered
    424                  When  this  option is given, input is read and processed line
    425                  by line, and the output  is  flushed  after  each  write.  By
    426                  default,  input is read in large chunks, unless pcre2grep can
    427                  determine that it is reading from a terminal (which  is  cur-
    428                  rently  possible  only  in Unix-like environments). Output to
    429                  terminal is normally automatically flushed by  the  operating
    430                  system. This option can be useful when the input or output is
    431                  attached to a pipe and you do not want pcre2grep to buffer up
    432                  large  amounts  of data. However, its use will affect perfor-
    433                  mance, and the -M (multiline) option ceases to work.
    434 
    435        --line-offsets
    436                  Instead of showing lines or parts of lines that  match,  show
    437                  each match as a line number, the offset from the start of the
    438                  line, and a length. The line number is terminated by a  colon
    439                  (as  usual; see the -n option), and the offset and length are
    440                  separated by a comma. In this  mode,  no  context  is  shown.
    441                  That  is, the -A, -B, and -C options are ignored. If there is
    442                  more than one match in a line, each of them  is  shown  sepa-
    443                  rately. This option is mutually exclusive with --file-offsets
    444                  and --only-matching.
    445 
    446        --locale=locale-name
    447                  This option specifies a locale to be used for pattern  match-
    448                  ing.  It  overrides the value in the LC_ALL or LC_CTYPE envi-
    449                  ronment variables. If  no  locale  is  specified,  the  PCRE2
    450                  library's  default (usually the "C" locale) is used. There is
    451                  no short form for this option.
    452 
    453        --match-limit=number
    454                  Processing some regular expression  patterns  can  require  a
    455                  very  large amount of memory, leading in some cases to a pro-
    456                  gram crash if not enough is available.   Other  patterns  may
    457                  take  a  very  long  time to search for all possible matching
    458                  strings.  The  pcre2_match()  function  that  is  called   by
    459                  pcre2grep  to  do  the  matching  has two parameters that can
    460                  limit the resources that it uses.
    461 
    462                  The  --match-limit  option  provides  a  means  of   limiting
    463                  resource usage when processing patterns that are not going to
    464                  match, but which have a very large number of possibilities in
    465                  their  search  trees.  The  classic example is a pattern that
    466                  uses nested unlimited repeats. Internally, PCRE2 uses a func-
    467                  tion  called  match()  which  it  calls repeatedly (sometimes
    468                  recursively). The limit set by --match-limit  is  imposed  on
    469                  the  number  of times this function is called during a match,
    470                  which has the effect of limiting the amount  of  backtracking
    471                  that can take place.
    472 
    473                  The --recursion-limit option is similar to --match-limit, but
    474                  instead of limiting the total number of times that match() is
    475                  called, it limits the depth of recursive calls, which in turn
    476                  limits the amount of memory that can be used.  The  recursion
    477                  depth  is  a  smaller  number than the total number of calls,
    478                  because not all calls to match() are recursive. This limit is
    479                  of use only if it is set smaller than --match-limit.
    480 
    481                  There  are no short forms for these options. The default set-
    482                  tings are specified when the PCRE2 library is compiled,  with
    483                  the default default being 10 million.
    484 
    485        -M, --multiline
    486                  Allow  patterns to match more than one line. When this option
    487                  is given, patterns may usefully contain literal newline char-
    488                  acters  and  internal  occurrences of ^ and $ characters. The
    489                  output for a successful match may consist of  more  than  one
    490                  line.  The  first is the line in which the match started, and
    491                  the last is the line in which the match ended. If the matched
    492                  string  ends  with  a newline sequence the output ends at the
    493                  end of that line.
    494 
    495                  When this option is set, the PCRE2 library is called in "mul-
    496                  tiline" mode. This allows a matched string to extend past the
    497                  end of a line and continue on one or more  subsequent  lines.
    498                  However,  pcre2grep  still  processes the input line by line.
    499                  Once a match has  been  handled,  scanning  restarts  at  the
    500                  beginning  of  the  next line, just as it does when -M is not
    501                  present. This means that it is possible  for  the  second  or
    502                  subsequent  lines  in a multiline match to be output again as
    503                  part of another match.
    504 
    505                  The newline sequence that separates multiple  lines  must  be
    506                  matched  as  part  of  the  pattern. For example, to find the
    507                  phrase "regular expression" in a file where  "regular"  might
    508                  be  at the end of a line and "expression" at the start of the
    509                  next line, you could use this command:
    510 
    511                    pcre2grep -M 'regular\s+expression' <file>
    512 
    513                  The \s escape sequence matches  any  white  space  character,
    514                  including  newlines,  and  is  followed  by  + so as to match
    515                  trailing white space on the first line as  well  as  possibly
    516                  handling a two-character newline sequence.
    517 
    518                  There  is a limit to the number of lines that can be matched,
    519                  imposed by the way that pcre2grep buffers the input  file  as
    520                  it  scans  it.  However,  pcre2grep  ensures that at least 8K
    521                  characters or the rest of the file (whichever is the shorter)
    522                  are  available for forward matching, and similarly the previ-
    523                  ous 8K characters (or all the previous characters,  if  fewer
    524                  than 8K) are guaranteed to be available for lookbehind asser-
    525                  tions. The -M option does not work when input is read line by
    526                  line (see --line-buffered.)
    527 
    528        -N newline-type, --newline=newline-type
    529                  The  PCRE2  library  supports  five different conventions for
    530                  indicating the ends of lines. They are  the  single-character
    531                  sequences  CR  (carriage  return) and LF (linefeed), the two-
    532                  character sequence CRLF, an "anycrlf" convention, which  rec-
    533                  ognizes  any  of the preceding three types, and an "any" con-
    534                  vention, in which any Unicode line ending sequence is assumed
    535                  to  end a line. The Unicode sequences are the three just men-
    536                  tioned, plus  VT  (vertical  tab,  U+000B),  FF  (form  feed,
    537                  U+000C),   NEL  (next  line,  U+0085),  LS  (line  separator,
    538                  U+2028), and PS (paragraph separator, U+2029).
    539 
    540                  When the  PCRE2  library  is  built,  a  default  line-ending
    541                  sequence   is  specified.   This  is  normally  the  standard
    542                  sequence for the operating system. Unless otherwise specified
    543                  by  this  option,  pcre2grep uses the library's default.  The
    544                  possible values for this option are CR, LF, CRLF, ANYCRLF, or
    545                  ANY.  This  makes  it possible to use pcre2grep to scan files
    546                  that have come from other environments without having to mod-
    547                  ify  their  line  endings.  If the data that is being scanned
    548                  does not agree  with  the  convention  set  by  this  option,
    549                  pcre2grep  may  behave in strange ways. Note that this option
    550                  does not apply to files specified by the -f,  --exclude-from,
    551                  or  --include-from  options,  which  are  expected to use the
    552                  operating system's standard newline sequence.
    553 
    554        -n, --line-number
    555                  Precede each output line by its line number in the file, fol-
    556                  lowed  by  a colon for matching lines or a hyphen for context
    557                  lines. If the file name is also being output, it precedes the
    558                  line  number.  When  the  -M option causes a pattern to match
    559                  more than one line, only the first is preceded  by  its  line
    560                  number. This option is forced if --line-offsets is used.
    561 
    562        --no-jit  If  the  PCRE2 library is built with support for just-in-time
    563                  compiling (which speeds up matching), pcre2grep automatically
    564                  makes use of this, unless it was explicitly disabled at build
    565                  time. This option can be used to disable the use  of  JIT  at
    566                  run  time. It is provided for testing and working round prob-
    567                  lems.  It should never be needed in normal use.
    568 
    569        -o, --only-matching
    570                  Show only the part of the line that matched a pattern instead
    571                  of  the  whole  line. In this mode, no context is shown. That
    572                  is, the -A, -B, and -C options are ignored. If there is  more
    573                  than  one  match in a line, each of them is shown separately.
    574                  If -o is combined with -v (invert the sense of the  match  to
    575                  find  non-matching  lines),  no  output is generated, but the
    576                  return code is set appropriately. If the matched  portion  of
    577                  the  line is empty, nothing is output unless the file name or
    578                  line number are being printed, in which case they  are  shown
    579                  on an otherwise empty line. This option is mutually exclusive
    580                  with --file-offsets and --line-offsets.
    581 
    582        -onumber, --only-matching=number
    583                  Show only the part of the line  that  matched  the  capturing
    584                  parentheses of the given number. Up to 32 capturing parenthe-
    585                  ses are supported, and -o0 is equivalent to -o without a num-
    586                  ber.  Because  these options can be given without an argument
    587                  (see above), if an argument is present, it must be  given  in
    588                  the  same  shell item, for example, -o3 or --only-matching=2.
    589                  The comments given for the non-argument case above also apply
    590                  to  this  case. If the specified capturing parentheses do not
    591                  exist in the pattern, or were not set in the  match,  nothing
    592                  is  output unless the file name or line number are being out-
    593                  put.
    594 
    595                  If this option is given multiple times,  multiple  substrings
    596                  are  output, in the order the options are given. For example,
    597                  -o3 -o1 -o3 causes the substrings matched by capturing paren-
    598                  theses  3  and  1  and then 3 again to be output. By default,
    599                  there is no separator (but see the next option).
    600 
    601        --om-separator=text
    602                  Specify a separating string for multiple occurrences  of  -o.
    603                  The  default is an empty string. Separating strings are never
    604                  coloured.
    605 
    606        -q, --quiet
    607                  Work quietly, that is, display nothing except error messages.
    608                  The  exit  status  indicates  whether or not any matches were
    609                  found.
    610 
    611        -r, --recursive
    612                  If any given path is a directory, recursively scan the  files
    613                  it  contains, taking note of any --include and --exclude set-
    614                  tings. By default, a directory is read as a normal  file;  in
    615                  some  operating  systems this gives an immediate end-of-file.
    616                  This option is a shorthand  for  setting  the  -d  option  to
    617                  "recurse".
    618 
    619        --recursion-limit=number
    620                  See --match-limit above.
    621 
    622        -s, --no-messages
    623                  Suppress  error  messages  about  non-existent  or unreadable
    624                  files. Such files are quietly skipped.  However,  the  return
    625                  code is still 2, even if matches were found in other files.
    626 
    627        -u, --utf-8
    628                  Operate in UTF-8 mode. This option is available only if PCRE2
    629                  has been compiled with UTF-8 support. All patterns (including
    630                  those  for  any --exclude and --include options) and all sub-
    631                  ject lines that are scanned must be valid  strings  of  UTF-8
    632                  characters.
    633 
    634        -V, --version
    635                  Write  the version numbers of pcre2grep and the PCRE2 library
    636                  to the standard output and then exit. Anything  else  on  the
    637                  command line is ignored.
    638 
    639        -v, --invert-match
    640                  Invert  the  sense  of  the match, so that lines which do not
    641                  match any of the patterns are the ones that are found.
    642 
    643        -w, --word-regex, --word-regexp
    644                  Force the patterns to match only whole words. This is equiva-
    645                  lent  to  having \b at the start and end of the pattern. This
    646                  option applies only to the patterns that are matched  against
    647                  the  contents  of files; it does not apply to patterns speci-
    648                  fied by any of the --include or --exclude options.
    649 
    650        -x, --line-regex, --line-regexp
    651                  Force the patterns to be anchored (each must  start  matching
    652                  at  the beginning of a line) and in addition, require them to
    653                  match entire lines. This is equivalent  to  having  ^  and  $
    654                  characters at the start and end of each alternative top-level
    655                  branch in every pattern. This option applies only to the pat-
    656                  terns that are matched against the contents of files; it does
    657                  not apply to patterns specified by any of  the  --include  or
    658                  --exclude options.
    659 
    660 
    661 ENVIRONMENT VARIABLES
    662 
    663        The  environment  variables  LC_ALL  and LC_CTYPE are examined, in that
    664        order, for a locale. The first one that is set is  used.  This  can  be
    665        overridden  by  the  --locale  option.  If  no locale is set, the PCRE2
    666        library's default (usually the "C" locale) is used.
    667 
    668 
    669 NEWLINES
    670 
    671        The -N (--newline) option allows pcre2grep to scan files with different
    672        newline conventions from the default. Any parts of the input files that
    673        are written to the standard output are copied identically,  with  what-
    674        ever  newline sequences they have in the input. However, the setting of
    675        this option does not affect the interpretation of  files  specified  by
    676        the -f, --exclude-from, or --include-from options, which are assumed to
    677        use the operating system's  standard  newline  sequence,  nor  does  it
    678        affect  the way in which pcre2grep writes informational messages to the
    679        standard error and output streams. For these it uses the string "\n" to
    680        indicate  newlines,  relying on the C I/O library to convert this to an
    681        appropriate sequence.
    682 
    683 
    684 OPTIONS COMPATIBILITY
    685 
    686        Many of the short and long forms of pcre2grep's options are the same as
    687        in  the GNU grep program. Any long option of the form --xxx-regexp (GNU
    688        terminology) is also available as --xxx-regex (PCRE2 terminology). How-
    689        ever,  the  --file-list, --file-offsets, --include-dir, --line-offsets,
    690        --locale, --match-limit, -M, --multiline, -N,  --newline,  --om-separa-
    691        tor,  --recursion-limit,  -u,  and  --utf-8  options  are  specific  to
    692        pcre2grep, as is the use of the --only-matching option with a capturing
    693        parentheses number.
    694 
    695        Although  most  of the common options work the same way, a few are dif-
    696        ferent in pcre2grep. For example, the --include option's argument is  a
    697        glob  for GNU grep, but a regular expression for pcre2grep. If both the
    698        -c and -l options are given, GNU grep lists only  file  names,  without
    699        counts, but pcre2grep gives the counts as well.
    700 
    701 
    702 OPTIONS WITH DATA
    703 
    704        There are four different ways in which an option with data can be spec-
    705        ified.  If a short form option is used, the  data  may  follow  immedi-
    706        ately, or (with one exception) in the next command line item. For exam-
    707        ple:
    708 
    709          -f/some/file
    710          -f /some/file
    711 
    712        The exception is the -o option, which may appear with or without  data.
    713        Because  of this, if data is present, it must follow immediately in the
    714        same item, for example -o3.
    715 
    716        If a long form option is used, the data may appear in the same  command
    717        line  item,  separated by an equals character, or (with two exceptions)
    718        it may appear in the next command line item. For example:
    719 
    720          --file=/some/file
    721          --file /some/file
    722 
    723        Note, however, that if you want to supply a file name beginning with  ~
    724        as  data  in  a  shell  command,  and have the shell expand ~ to a home
    725        directory, you must separate the file name from the option, because the
    726        shell does not treat ~ specially unless it is at the start of an item.
    727 
    728        The  exceptions  to the above are the --colour (or --color) and --only-
    729        matching options, for which the data  is  optional.  If  one  of  these
    730        options  does  have  data, it must be given in the first form, using an
    731        equals character. Otherwise pcre2grep will assume that it has no data.
    732 
    733 
    734 CALLING EXTERNAL SCRIPTS
    735 
    736        On non-Windows systems, pcre2grep has, by default, support for  calling
    737        external  programs  or scripts during matching by making use of PCRE2's
    738        callout facility. However, this support can be disabled when  pcre2grep
    739        is  built.   You can find out whether your binary has support for call-
    740        outs by running it with the  --help  option.  If  the  support  is  not
    741        enabled, all callouts in patterns are ignored by pcre2grep.
    742 
    743        A  callout  in a PCRE2 pattern is of the form (?C<arg>) where the argu-
    744        ment is either a number or a quoted string (see the pcre2callout  docu-
    745        mentation  for  details).  Numbered  callouts are ignored by pcre2grep.
    746        String arguments are parsed as a list of substrings separated  by  pipe
    747        (vertical  bar)  characters.  The first substring must be an executable
    748        name, with the following substrings specifying arguments:
    749 
    750          executable_name|arg1|arg2|...
    751 
    752        Any substring  (including  the  executable  name)  may  contain  escape
    753        sequences  started  by  a dollar character: $<digits> or ${<digits>} is
    754        replaced by the captured substring of the given decimal  number,  which
    755        must  be greater than zero. If the number is greater than the number of
    756        capturing substrings, or if the capture is unset,  the  replacement  is
    757        empty.
    758 
    759        Any  other  character  is  substituted  by itself. In particular, $$ is
    760        replaced by a single dollar and $| is replaced  by  a  pipe  character.
    761        Here is an example:
    762 
    763          echo -e "abcde\n12345" | pcre2grep \
    764            '(?x)(.)(..(.))
    765            (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
    766 
    767          Output:
    768 
    769            Arg1: [a] [bcd] [d] Arg2: |a| ()
    770            abcde
    771            Arg1: [1] [234] [4] Arg2: |1| ()
    772            12345
    773 
    774        The parameters for the execv() system call that is used to run the pro-
    775        gram or script are zero-terminated strings. This means that binary zero
    776        characters  in the callout argument will cause premature termination of
    777        their substrings, and therefore  should  not  be  present.  Any  syntax
    778        errors  in  the  string  (for example, a dollar not followed by another
    779        character) cause the callout to be  ignored.  If  running  the  program
    780        fails for any reason (including the non-existence of the executable), a
    781        local matching failure occurs and the matcher backtracks in the  normal
    782        way.
    783 
    784 
    785 MATCHING ERRORS
    786 
    787        It  is  possible  to supply a regular expression that takes a very long
    788        time to fail to match certain lines.  Such  patterns  normally  involve
    789        nested  indefinite repeats, for example: (a+)*\d when matched against a
    790        line of a's with no final digit. The  PCRE2  matching  function  has  a
    791        resource  limit that causes it to abort in these circumstances. If this
    792        happens, pcre2grep outputs an error message and the  line  that  caused
    793        the  problem  to  the  standard error stream. If there are more than 20
    794        such errors, pcre2grep gives up.
    795 
    796        The --match-limit option of pcre2grep can be used to  set  the  overall
    797        resource  limit; there is a second option called --recursion-limit that
    798        sets a limit on the amount of memory (usually stack) that is used  (see
    799        the discussion of these options above).
    800 
    801 
    802 DIAGNOSTICS
    803 
    804        Exit status is 0 if any matches were found, 1 if no matches were found,
    805        and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
    806        files  (even if matches were found in other files) or too many matching
    807        errors. Using the -s option to suppress error messages about inaccessi-
    808        ble files does not affect the return code.
    809 
    810 
    811 SEE ALSO
    812 
    813        pcre2pattern(3), pcre2syntax(3), pcre2callout(3).
    814 
    815 
    816 AUTHOR
    817 
    818        Philip Hazel
    819        University Computing Service
    820        Cambridge, England.
    821 
    822 
    823 REVISION
    824 
    825        Last updated: 19 June 2016
    826        Copyright (c) 1997-2016 University of Cambridge.
    827