Home | History | Annotate | Download | only in html
      1 <html>
      2 <head>
      3 <title>pcre2grep specification</title>
      4 </head>
      5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
      6 <h1>pcre2grep man page</h1>
      7 <p>
      8 Return to the <a href="index.html">PCRE2 index page</a>.
      9 </p>
     10 <p>
     11 This page is part of the PCRE2 HTML documentation. It was generated
     12 automatically from the original man page. If there is any nonsense in it,
     13 please consult the man page, in case the conversion went wrong.
     14 <br>
     15 <ul>
     16 <li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
     17 <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
     18 <li><a name="TOC3" href="#SEC3">SUPPORT FOR COMPRESSED FILES</a>
     19 <li><a name="TOC4" href="#SEC4">BINARY FILES</a>
     20 <li><a name="TOC5" href="#SEC5">OPTIONS</a>
     21 <li><a name="TOC6" href="#SEC6">ENVIRONMENT VARIABLES</a>
     22 <li><a name="TOC7" href="#SEC7">NEWLINES</a>
     23 <li><a name="TOC8" href="#SEC8">OPTIONS COMPATIBILITY</a>
     24 <li><a name="TOC9" href="#SEC9">OPTIONS WITH DATA</a>
     25 <li><a name="TOC10" href="#SEC10">CALLING EXTERNAL SCRIPTS</a>
     26 <li><a name="TOC11" href="#SEC11">MATCHING ERRORS</a>
     27 <li><a name="TOC12" href="#SEC12">DIAGNOSTICS</a>
     28 <li><a name="TOC13" href="#SEC13">SEE ALSO</a>
     29 <li><a name="TOC14" href="#SEC14">AUTHOR</a>
     30 <li><a name="TOC15" href="#SEC15">REVISION</a>
     31 </ul>
     32 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
     33 <P>
     34 <b>pcre2grep [options] [long options] [pattern] [path1 path2 ...]</b>
     35 </P>
     36 <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
     37 <P>
     38 <b>pcre2grep</b> searches files for character patterns, in the same way as other
     39 grep commands do, but it uses the PCRE2 regular expression library to support
     40 patterns that are compatible with the regular expressions of Perl 5. See
     41 <a href="pcre2syntax.html"><b>pcre2syntax</b>(3)</a>
     42 for a quick-reference summary of pattern syntax, or
     43 <a href="pcre2pattern.html"><b>pcre2pattern</b>(3)</a>
     44 for a full description of the syntax and semantics of the regular expressions
     45 that PCRE2 supports.
     46 </P>
     47 <P>
     48 Patterns, whether supplied on the command line or in a separate file, are given
     49 without delimiters. For example:
     50 <pre>
     51   pcre2grep Thursday /etc/motd
     52 </pre>
     53 If you attempt to use delimiters (for example, by surrounding a pattern with
     54 slashes, as is common in Perl scripts), they are interpreted as part of the
     55 pattern. Quotes can of course be used to delimit patterns on the command line
     56 because they are interpreted by the shell, and indeed quotes are required if a
     57 pattern contains white space or shell metacharacters.
     58 </P>
     59 <P>
     60 The first argument that follows any option settings is treated as the single
     61 pattern to be matched when neither <b>-e</b> nor <b>-f</b> is present.
     62 Conversely, when one or both of these options are used to specify patterns, all
     63 arguments are treated as path names. At least one of <b>-e</b>, <b>-f</b>, or an
     64 argument pattern must be provided.
     65 </P>
     66 <P>
     67 If no files are specified, <b>pcre2grep</b> reads the standard input. The
     68 standard input can also be referenced by a name consisting of a single hyphen.
     69 For example:
     70 <pre>
     71   pcre2grep some-pattern file1 - file3
     72 </pre>
     73 Input files are searched line by line. By default, each line that matches a
     74 pattern is copied to the standard output, and if there is more than one file,
     75 the file name is output at the start of each line, followed by a colon.
     76 However, there are options that can change how <b>pcre2grep</b> behaves. In
     77 particular, the <b>-M</b> option makes it possible to search for strings that
     78 span line boundaries. What defines a line boundary is controlled by the
     79 <b>-N</b> (<b>--newline</b>) option.
     80 </P>
     81 <P>
     82 The amount of memory used for buffering files that are being scanned is
     83 controlled by a parameter that can be set by the <b>--buffer-size</b> option.
     84 The default value for this parameter is specified when <b>pcre2grep</b> is
     85 built, with the default default being 20K. A block of memory three times this
     86 size is used (to allow for buffering "before" and "after" lines). An error
     87 occurs if a line overflows the buffer.
     88 </P>
     89 <P>
     90 Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
     91 BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>. When there is more than one pattern
     92 (specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
     93 each line in the order in which they are defined, except that all the <b>-e</b>
     94 patterns are tried before the <b>-f</b> patterns.
     95 </P>
     96 <P>
     97 By default, as soon as one pattern matches a line, no further patterns are
     98 considered. However, if <b>--colour</b> (or <b>--color</b>) is used to colour the
     99 matching substrings, or if <b>--only-matching</b>, <b>--file-offsets</b>, or
    100 <b>--line-offsets</b> is used to output only the part of the line that matched
    101 (either shown literally, or as an offset), scanning resumes immediately
    102 following the match, so that further matches on the same line can be found. If
    103 there are multiple patterns, they are all tried on the remainder of the line,
    104 but patterns that follow the one that matched are not tried on the earlier part
    105 of the line.
    106 </P>
    107 <P>
    108 This behaviour means that the order in which multiple patterns are specified
    109 can affect the output when one of the above options is used. This is no longer
    110 the same behaviour as GNU grep, which now manages to display earlier matches
    111 for later patterns (as long as there is no overlap).
    112 </P>
    113 <P>
    114 Patterns that can match an empty string are accepted, but empty string
    115 matches are never recognized. An example is the pattern "(super)?(man)?", in
    116 which all components are optional. This pattern finds all occurrences of both
    117 "super" and "man"; the output differs from matching with "super|man" when only
    118 the matching substrings are being shown.
    119 </P>
    120 <P>
    121 If the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variable is set,
    122 <b>pcre2grep</b> uses the value to set a locale when calling the PCRE2 library.
    123 The <b>--locale</b> option can be used to override this.
    124 </P>
    125 <br><a name="SEC3" href="#TOC1">SUPPORT FOR COMPRESSED FILES</a><br>
    126 <P>
    127 It is possible to compile <b>pcre2grep</b> so that it uses <b>libz</b> or
    128 <b>libbz2</b> to read files whose names end in <b>.gz</b> or <b>.bz2</b>,
    129 respectively. You can find out whether your binary has support for one or both
    130 of these file types by running it with the <b>--help</b> option. If the
    131 appropriate support is not present, files are treated as plain text. The
    132 standard input is always so treated.
    133 </P>
    134 <br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
    135 <P>
    136 By default, a file that contains a binary zero byte within the first 1024 bytes
    137 is identified as a binary file, and is processed specially. (GNU grep also
    138 identifies binary files in this manner.) See the <b>--binary-files</b> option
    139 for a means of changing the way binary files are handled.
    140 </P>
    141 <br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
    142 <P>
    143 The order in which some of the options appear can affect the output. For
    144 example, both the <b>-h</b> and <b>-l</b> options affect the printing of file
    145 names. Whichever comes later in the command line will be the one that takes
    146 effect. Similarly, except where noted below, if an option is given twice, the
    147 later setting is used. Numerical values for options may be followed by K or M,
    148 to signify multiplication by 1024 or 1024*1024 respectively.
    149 </P>
    150 <P>
    151 <b>--</b>
    152 This terminates the list of options. It is useful if the next item on the
    153 command line starts with a hyphen but is not an option. This allows for the
    154 processing of patterns and file names that start with hyphens.
    155 </P>
    156 <P>
    157 <b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
    158 Output <i>number</i> lines of context after each matching line. If file names
    159 and/or line numbers are being output, a hyphen separator is used instead of a
    160 colon for the context lines. A line containing "--" is output between each
    161 group of lines, unless they are in fact contiguous in the input file. The value
    162 of <i>number</i> is expected to be relatively small. However, <b>pcre2grep</b>
    163 guarantees to have up to 8K of following text available for context output.
    164 </P>
    165 <P>
    166 <b>-a</b>, <b>--text</b>
    167 Treat binary files as text. This is equivalent to
    168 <b>--binary-files</b>=<i>text</i>.
    169 </P>
    170 <P>
    171 <b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
    172 Output <i>number</i> lines of context before each matching line. If file names
    173 and/or line numbers are being output, a hyphen separator is used instead of a
    174 colon for the context lines. A line containing "--" is output between each
    175 group of lines, unless they are in fact contiguous in the input file. The value
    176 of <i>number</i> is expected to be relatively small. However, <b>pcre2grep</b>
    177 guarantees to have up to 8K of preceding text available for context output.
    178 </P>
    179 <P>
    180 <b>--binary-files=</b><i>word</i>
    181 Specify how binary files are to be processed. If the word is "binary" (the
    182 default), pattern matching is performed on binary files, but the only output is
    183 "Binary file &#60;name&#62; matches" when a match succeeds. If the word is "text",
    184 which is equivalent to the <b>-a</b> or <b>--text</b> option, binary files are
    185 processed in the same way as any other file. In this case, when a match
    186 succeeds, the output may be binary garbage, which can have nasty effects if
    187 sent to a terminal. If the word is "without-match", which is equivalent to the
    188 <b>-I</b> option, binary files are not processed at all; they are assumed not to
    189 be of interest and are skipped without causing any output or affecting the
    190 return code.
    191 </P>
    192 <P>
    193 <b>--buffer-size=</b><i>number</i>
    194 Set the parameter that controls how much memory is used for buffering files
    195 that are being scanned.
    196 </P>
    197 <P>
    198 <b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
    199 Output <i>number</i> lines of context both before and after each matching line.
    200 This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
    201 </P>
    202 <P>
    203 <b>-c</b>, <b>--count</b>
    204 Do not output lines from the files that are being scanned; instead output the
    205 number of matches (or non-matches if <b>-v</b> is used) that would otherwise
    206 have caused lines to be shown. By default, this count is the same as the number
    207 of suppressed lines, but if the <b>-M</b> (multiline) option is used (without
    208 <b>-v</b>), there may be more suppressed lines than the number of matches.
    209 <br>
    210 <br>
    211 If no lines are selected, the number zero is output. If several files are are
    212 being scanned, a count is output for each of them. However, if the
    213 <b>--files-with-matches</b> option is also used, only those files whose counts
    214 are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
    215 <b>-B</b>, and <b>-C</b> options are ignored.
    216 </P>
    217 <P>
    218 <b>--colour</b>, <b>--color</b>
    219 If this option is given without any data, it is equivalent to "--colour=auto".
    220 If data is required, it must be given in the same shell item, separated by an
    221 equals sign.
    222 </P>
    223 <P>
    224 <b>--colour=</b><i>value</i>, <b>--color=</b><i>value</i>
    225 This option specifies under what circumstances the parts of a line that matched
    226 a pattern should be coloured in the output. By default, the output is not
    227 coloured. The value (which is optional, see above) may be "never", "always", or
    228 "auto". In the latter case, colouring happens only if the standard output is
    229 connected to a terminal. More resources are used when colouring is enabled,
    230 because <b>pcre2grep</b> has to search for all possible matches in a line, not
    231 just one, in order to colour them all.
    232 <br>
    233 <br>
    234 The colour that is used can be specified by setting the environment variable
    235 PCRE2GREP_COLOUR or PCRE2GREP_COLOR. The value of this variable should be a
    236 string of two numbers, separated by a semicolon. They are copied directly into
    237 the control string for setting colour on a terminal, so it is your
    238 responsibility to ensure that they make sense. If neither of the environment
    239 variables is set, the default is "1;31", which gives red.
    240 </P>
    241 <P>
    242 <b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
    243 If an input path is not a regular file or a directory, "action" specifies how
    244 it is to be processed. Valid values are "read" (the default) or "skip"
    245 (silently skip the path).
    246 </P>
    247 <P>
    248 <b>-d</b> <i>action</i>, <b>--directories=</b><i>action</i>
    249 If an input path is a directory, "action" specifies how it is to be processed.
    250 Valid values are "read" (the default in non-Windows environments, for
    251 compatibility with GNU grep), "recurse" (equivalent to the <b>-r</b> option), or
    252 "skip" (silently skip the path, the default in Windows environments). In the
    253 "read" case, directories are read as if they were ordinary files. In some
    254 operating systems the effect of reading a directory like this is an immediate
    255 end-of-file; in others it may provoke an error.
    256 </P>
    257 <P>
    258 <b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>, <b>--regexp=</b><i>pattern</i>
    259 Specify a pattern to be matched. This option can be used multiple times in
    260 order to specify several patterns. It can also be used as a way of specifying a
    261 single pattern that starts with a hyphen. When <b>-e</b> is used, no argument
    262 pattern is taken from the command line; all arguments are treated as file
    263 names. There is no limit to the number of patterns. They are applied to each
    264 line in the order in which they are defined until one matches.
    265 <br>
    266 <br>
    267 If <b>-f</b> is used with <b>-e</b>, the command line patterns are matched first,
    268 followed by the patterns from the file(s), independent of the order in which
    269 these options are specified. Note that multiple use of <b>-e</b> is not the same
    270 as a single pattern with alternatives. For example, X|Y finds the first
    271 character in a line that is X or Y, whereas if the two patterns are given
    272 separately, with X first, <b>pcre2grep</b> finds X if it is present, even if it
    273 follows Y in the line. It finds Y only if there is no X in the line. This
    274 matters only if you are using <b>-o</b> or <b>--colo(u)r</b> to show the part(s)
    275 of the line that matched.
    276 </P>
    277 <P>
    278 <b>--exclude</b>=<i>pattern</i>
    279 Files (but not directories) whose names match the pattern are skipped without
    280 being processed. This applies to all files, whether listed on the command line,
    281 obtained from <b>--file-list</b>, or by scanning a directory. The pattern is a
    282 PCRE2 regular expression, and is matched against the final component of the
    283 file name, not the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do
    284 not apply to this pattern. The option may be given any number of times in order
    285 to specify multiple patterns. If a file name matches both an <b>--include</b>
    286 and an <b>--exclude</b> pattern, it is excluded. There is no short form for this
    287 option.
    288 </P>
    289 <P>
    290 <b>--exclude-from=</b><i>filename</i>
    291 Treat each non-empty line of the file as the data for an <b>--exclude</b>
    292 option. What constitutes a newline when reading the file is the operating
    293 system's default. The <b>--newline</b> option has no effect on this option. This
    294 option may be given more than once in order to specify a number of files to
    295 read.
    296 </P>
    297 <P>
    298 <b>--exclude-dir</b>=<i>pattern</i>
    299 Directories whose names match the pattern are skipped without being processed,
    300 whatever the setting of the <b>--recursive</b> option. This applies to all
    301 directories, whether listed on the command line, obtained from
    302 <b>--file-list</b>, or by scanning a parent directory. The pattern is a PCRE2
    303 regular expression, and is matched against the final component of the directory
    304 name, not the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do not
    305 apply to this pattern. The option may be given any number of times in order to
    306 specify more than one pattern. If a directory matches both <b>--include-dir</b>
    307 and <b>--exclude-dir</b>, it is excluded. There is no short form for this
    308 option.
    309 </P>
    310 <P>
    311 <b>-F</b>, <b>--fixed-strings</b>
    312 Interpret each data-matching pattern as a list of fixed strings, separated by
    313 newlines, instead of as a regular expression. What constitutes a newline for
    314 this purpose is controlled by the <b>--newline</b> option. The <b>-w</b> (match
    315 as a word) and <b>-x</b> (match whole line) options can be used with <b>-F</b>.
    316 They apply to each of the fixed strings. A line is selected if any of the fixed
    317 strings are found in it (subject to <b>-w</b> or <b>-x</b>, if present). This
    318 option applies only to the patterns that are matched against the contents of
    319 files; it does not apply to patterns specified by any of the <b>--include</b> or
    320 <b>--exclude</b> options.
    321 </P>
    322 <P>
    323 <b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
    324 Read patterns from the file, one per line, and match them against
    325 each line of input. What constitutes a newline when reading the file is the
    326 operating system's default. The <b>--newline</b> option has no effect on this
    327 option. Trailing white space is removed from each line, and blank lines are
    328 ignored. An empty file contains no patterns and therefore matches nothing. See
    329 also the comments about multiple patterns versus a single pattern with
    330 alternatives in the description of <b>-e</b> above.
    331 <br>
    332 <br>
    333 If this option is given more than once, all the specified files are
    334 read. A data line is output if any of the patterns match it. A file name can
    335 be given as "-" to refer to the standard input. When <b>-f</b> is used, patterns
    336 specified on the command line using <b>-e</b> may also be present; they are
    337 tested before the file's patterns. However, no other pattern is taken from the
    338 command line; all arguments are treated as the names of paths to be searched.
    339 </P>
    340 <P>
    341 <b>--file-list</b>=<i>filename</i>
    342 Read a list of files and/or directories that are to be scanned from the given
    343 file, one per line. Trailing white space is removed from each line, and blank
    344 lines are ignored. These paths are processed before any that are listed on the
    345 command line. The file name can be given as "-" to refer to the standard input.
    346 If <b>--file</b> and <b>--file-list</b> are both specified as "-", patterns are
    347 read first. This is useful only when the standard input is a terminal, from
    348 which further lines (the list of files) can be read after an end-of-file
    349 indication. If this option is given more than once, all the specified files are
    350 read.
    351 </P>
    352 <P>
    353 <b>--file-offsets</b>
    354 Instead of showing lines or parts of lines that match, show each match as an
    355 offset from the start of the file and a length, separated by a comma. In this
    356 mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b>
    357 options are ignored. If there is more than one match in a line, each of them is
    358 shown separately. This option is mutually exclusive with <b>--line-offsets</b>
    359 and <b>--only-matching</b>.
    360 </P>
    361 <P>
    362 <b>-H</b>, <b>--with-filename</b>
    363 Force the inclusion of the file name at the start of output lines when
    364 searching a single file. By default, the file name is not shown in this case.
    365 For matching lines, the file name is followed by a colon; for context lines, a
    366 hyphen separator is used. If a line number is also being output, it follows the
    367 file name. When the <b>-M</b> option causes a pattern to match more than one
    368 line, only the first is preceded by the file name.
    369 </P>
    370 <P>
    371 <b>-h</b>, <b>--no-filename</b>
    372 Suppress the output file names when searching multiple files. By default,
    373 file names are shown when multiple files are searched. For matching lines, the
    374 file name is followed by a colon; for context lines, a hyphen separator is used.
    375 If a line number is also being output, it follows the file name.
    376 </P>
    377 <P>
    378 <b>--help</b>
    379 Output a help message, giving brief details of the command options and file
    380 type support, and then exit. Anything else on the command line is
    381 ignored.
    382 </P>
    383 <P>
    384 <b>-I</b>
    385 Ignore binary files. This is equivalent to
    386 <b>--binary-files</b>=<i>without-match</i>.
    387 </P>
    388 <P>
    389 <b>-i</b>, <b>--ignore-case</b>
    390 Ignore upper/lower case distinctions during comparisons.
    391 </P>
    392 <P>
    393 <b>--include</b>=<i>pattern</i>
    394 If any <b>--include</b> patterns are specified, the only files that are
    395 processed are those that match one of the patterns (and do not match an
    396 <b>--exclude</b> pattern). This option does not affect directories, but it
    397 applies to all files, whether listed on the command line, obtained from
    398 <b>--file-list</b>, or by scanning a directory. The pattern is a PCRE2 regular
    399 expression, and is matched against the final component of the file name, not
    400 the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do not apply to
    401 this pattern. The option may be given any number of times. If a file name
    402 matches both an <b>--include</b> and an <b>--exclude</b> pattern, it is excluded.
    403 There is no short form for this option.
    404 </P>
    405 <P>
    406 <b>--include-from=</b><i>filename</i>
    407 Treat each non-empty line of the file as the data for an <b>--include</b>
    408 option. What constitutes a newline for this purpose is the operating system's
    409 default. The <b>--newline</b> option has no effect on this option. This option
    410 may be given any number of times; all the files are read.
    411 </P>
    412 <P>
    413 <b>--include-dir</b>=<i>pattern</i>
    414 If any <b>--include-dir</b> patterns are specified, the only directories that
    415 are processed are those that match one of the patterns (and do not match an
    416 <b>--exclude-dir</b> pattern). This applies to all directories, whether listed
    417 on the command line, obtained from <b>--file-list</b>, or by scanning a parent
    418 directory. The pattern is a PCRE2 regular expression, and is matched against
    419 the final component of the directory name, not the entire path. The <b>-F</b>,
    420 <b>-w</b>, and <b>-x</b> options do not apply to this pattern. The option may be
    421 given any number of times. If a directory matches both <b>--include-dir</b> and
    422 <b>--exclude-dir</b>, it is excluded. There is no short form for this option.
    423 </P>
    424 <P>
    425 <b>-L</b>, <b>--files-without-match</b>
    426 Instead of outputting lines from the files, just output the names of the files
    427 that do not contain any lines that would have been output. Each file name is
    428 output once, on a separate line.
    429 </P>
    430 <P>
    431 <b>-l</b>, <b>--files-with-matches</b>
    432 Instead of outputting lines from the files, just output the names of the files
    433 containing lines that would have been output. Each file name is output
    434 once, on a separate line. Searching normally stops as soon as a matching line
    435 is found in a file. However, if the <b>-c</b> (count) option is also used,
    436 matching continues in order to obtain the correct count, and those files that
    437 have at least one match are listed along with their counts. Using this option
    438 with <b>-c</b> is a way of suppressing the listing of files with no matches.
    439 </P>
    440 <P>
    441 <b>--label</b>=<i>name</i>
    442 This option supplies a name to be used for the standard input when file names
    443 are being output. If not supplied, "(standard input)" is used. There is no
    444 short form for this option.
    445 </P>
    446 <P>
    447 <b>--line-buffered</b>
    448 When this option is given, input is read and processed line by line, and the
    449 output is flushed after each write. By default, input is read in large chunks,
    450 unless <b>pcre2grep</b> can determine that it is reading from a terminal (which
    451 is currently possible only in Unix-like environments). Output to terminal is
    452 normally automatically flushed by the operating system. This option can be
    453 useful when the input or output is attached to a pipe and you do not want
    454 <b>pcre2grep</b> to buffer up large amounts of data. However, its use will
    455 affect performance, and the <b>-M</b> (multiline) option ceases to work.
    456 </P>
    457 <P>
    458 <b>--line-offsets</b>
    459 Instead of showing lines or parts of lines that match, show each match as a
    460 line number, the offset from the start of the line, and a length. The line
    461 number is terminated by a colon (as usual; see the <b>-n</b> option), and the
    462 offset and length are separated by a comma. In this mode, no context is shown.
    463 That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b> options are ignored. If there is
    464 more than one match in a line, each of them is shown separately. This option is
    465 mutually exclusive with <b>--file-offsets</b> and <b>--only-matching</b>.
    466 </P>
    467 <P>
    468 <b>--locale</b>=<i>locale-name</i>
    469 This option specifies a locale to be used for pattern matching. It overrides
    470 the value in the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variables. If no
    471 locale is specified, the PCRE2 library's default (usually the "C" locale) is
    472 used. There is no short form for this option.
    473 </P>
    474 <P>
    475 <b>--match-limit</b>=<i>number</i>
    476 Processing some regular expression patterns can require a very large amount of
    477 memory, leading in some cases to a program crash if not enough is available.
    478 Other patterns may take a very long time to search for all possible matching
    479 strings. The <b>pcre2_match()</b> function that is called by <b>pcre2grep</b> to
    480 do the matching has two parameters that can limit the resources that it uses.
    481 <br>
    482 <br>
    483 The <b>--match-limit</b> option provides a means of limiting resource usage
    484 when processing patterns that are not going to match, but which have a very
    485 large number of possibilities in their search trees. The classic example is a
    486 pattern that uses nested unlimited repeats. Internally, PCRE2 uses a function
    487 called <b>match()</b> which it calls repeatedly (sometimes recursively). The
    488 limit set by <b>--match-limit</b> is imposed on the number of times this
    489 function is called during a match, which has the effect of limiting the amount
    490 of backtracking that can take place.
    491 <br>
    492 <br>
    493 The <b>--recursion-limit</b> option is similar to <b>--match-limit</b>, but
    494 instead of limiting the total number of times that <b>match()</b> is called, it
    495 limits the depth of recursive calls, which in turn limits the amount of memory
    496 that can be used. The recursion depth is a smaller number than the total number
    497 of calls, because not all calls to <b>match()</b> are recursive. This limit is
    498 of use only if it is set smaller than <b>--match-limit</b>.
    499 <br>
    500 <br>
    501 There are no short forms for these options. The default settings are specified
    502 when the PCRE2 library is compiled, with the default default being 10 million.
    503 </P>
    504 <P>
    505 <b>-M</b>, <b>--multiline</b>
    506 Allow patterns to match more than one line. When this option is given, patterns
    507 may usefully contain literal newline characters and internal occurrences of ^
    508 and $ characters. The output for a successful match may consist of more than
    509 one line. The first is the line in which the match started, and the last is the
    510 line in which the match ended. If the matched string ends with a newline
    511 sequence the output ends at the end of that line.
    512 <br>
    513 <br>
    514 When this option is set, the PCRE2 library is called in "multiline" mode. This
    515 allows a matched string to extend past the end of a line and continue on one or
    516 more subsequent lines. However, <b>pcre2grep</b> still processes the input line
    517 by line. Once a match has been handled, scanning restarts at the beginning of
    518 the next line, just as it does when <b>-M</b> is not present. This means that it
    519 is possible for the second or subsequent lines in a multiline match to be
    520 output again as part of another match.
    521 <br>
    522 <br>
    523 The newline sequence that separates multiple lines must be matched as part of
    524 the pattern. For example, to find the phrase "regular expression" in a file
    525 where "regular" might be at the end of a line and "expression" at the start of
    526 the next line, you could use this command:
    527 <pre>
    528   pcre2grep -M 'regular\s+expression' &#60;file&#62;
    529 </pre>
    530 The \s escape sequence matches any white space character, including newlines,
    531 and is followed by + so as to match trailing white space on the first line as
    532 well as possibly handling a two-character newline sequence.
    533 <br>
    534 <br>
    535 There is a limit to the number of lines that can be matched, imposed by the way
    536 that <b>pcre2grep</b> buffers the input file as it scans it. However,
    537 <b>pcre2grep</b> ensures that at least 8K characters or the rest of the file
    538 (whichever is the shorter) are available for forward matching, and similarly
    539 the previous 8K characters (or all the previous characters, if fewer than 8K)
    540 are guaranteed to be available for lookbehind assertions. The <b>-M</b> option
    541 does not work when input is read line by line (see \fP--line-buffered\fP.)
    542 </P>
    543 <P>
    544 <b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
    545 The PCRE2 library supports five different conventions for indicating
    546 the ends of lines. They are the single-character sequences CR (carriage return)
    547 and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention,
    548 which recognizes any of the preceding three types, and an "any" convention, in
    549 which any Unicode line ending sequence is assumed to end a line. The Unicode
    550 sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF
    551 (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and
    552 PS (paragraph separator, U+2029).
    553 <br>
    554 <br>
    555 When the PCRE2 library is built, a default line-ending sequence is specified.
    556 This is normally the standard sequence for the operating system. Unless
    557 otherwise specified by this option, <b>pcre2grep</b> uses the library's default.
    558 The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
    559 makes it possible to use <b>pcre2grep</b> to scan files that have come from
    560 other environments without having to modify their line endings. If the data
    561 that is being scanned does not agree with the convention set by this option,
    562 <b>pcre2grep</b> may behave in strange ways. Note that this option does not
    563 apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
    564 <b>--include-from</b> options, which are expected to use the operating system's
    565 standard newline sequence.
    566 </P>
    567 <P>
    568 <b>-n</b>, <b>--line-number</b>
    569 Precede each output line by its line number in the file, followed by a colon
    570 for matching lines or a hyphen for context lines. If the file name is also
    571 being output, it precedes the line number. When the <b>-M</b> option causes a
    572 pattern to match more than one line, only the first is preceded by its line
    573 number. This option is forced if <b>--line-offsets</b> is used.
    574 </P>
    575 <P>
    576 <b>--no-jit</b>
    577 If the PCRE2 library is built with support for just-in-time compiling (which
    578 speeds up matching), <b>pcre2grep</b> automatically makes use of this, unless it
    579 was explicitly disabled at build time. This option can be used to disable the
    580 use of JIT at run time. It is provided for testing and working round problems.
    581 It should never be needed in normal use.
    582 </P>
    583 <P>
    584 <b>-o</b>, <b>--only-matching</b>
    585 Show only the part of the line that matched a pattern instead of the whole
    586 line. In this mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>, and
    587 <b>-C</b> options are ignored. If there is more than one match in a line, each
    588 of them is shown separately. If <b>-o</b> is combined with <b>-v</b> (invert the
    589 sense of the match to find non-matching lines), no output is generated, but the
    590 return code is set appropriately. If the matched portion of the line is empty,
    591 nothing is output unless the file name or line number are being printed, in
    592 which case they are shown on an otherwise empty line. This option is mutually
    593 exclusive with <b>--file-offsets</b> and <b>--line-offsets</b>.
    594 </P>
    595 <P>
    596 <b>-o</b><i>number</i>, <b>--only-matching</b>=<i>number</i>
    597 Show only the part of the line that matched the capturing parentheses of the
    598 given number. Up to 32 capturing parentheses are supported, and -o0 is
    599 equivalent to <b>-o</b> without a number. Because these options can be given
    600 without an argument (see above), if an argument is present, it must be given in
    601 the same shell item, for example, -o3 or --only-matching=2. The comments given
    602 for the non-argument case above also apply to this case. If the specified
    603 capturing parentheses do not exist in the pattern, or were not set in the
    604 match, nothing is output unless the file name or line number are being output.
    605 <br>
    606 <br>
    607 If this option is given multiple times, multiple substrings are output, in the
    608 order the options are given. For example, -o3 -o1 -o3 causes the substrings
    609 matched by capturing parentheses 3 and 1 and then 3 again to be output. By
    610 default, there is no separator (but see the next option).
    611 </P>
    612 <P>
    613 <b>--om-separator</b>=<i>text</i>
    614 Specify a separating string for multiple occurrences of <b>-o</b>. The default
    615 is an empty string. Separating strings are never coloured.
    616 </P>
    617 <P>
    618 <b>-q</b>, <b>--quiet</b>
    619 Work quietly, that is, display nothing except error messages. The exit
    620 status indicates whether or not any matches were found.
    621 </P>
    622 <P>
    623 <b>-r</b>, <b>--recursive</b>
    624 If any given path is a directory, recursively scan the files it contains,
    625 taking note of any <b>--include</b> and <b>--exclude</b> settings. By default, a
    626 directory is read as a normal file; in some operating systems this gives an
    627 immediate end-of-file. This option is a shorthand for setting the <b>-d</b>
    628 option to "recurse".
    629 </P>
    630 <P>
    631 <b>--recursion-limit</b>=<i>number</i>
    632 See <b>--match-limit</b> above.
    633 </P>
    634 <P>
    635 <b>-s</b>, <b>--no-messages</b>
    636 Suppress error messages about non-existent or unreadable files. Such files are
    637 quietly skipped. However, the return code is still 2, even if matches were
    638 found in other files.
    639 </P>
    640 <P>
    641 <b>-u</b>, <b>--utf-8</b>
    642 Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
    643 with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
    644 <b>--include</b> options) and all subject lines that are scanned must be valid
    645 strings of UTF-8 characters.
    646 </P>
    647 <P>
    648 <b>-V</b>, <b>--version</b>
    649 Write the version numbers of <b>pcre2grep</b> and the PCRE2 library to the
    650 standard output and then exit. Anything else on the command line is
    651 ignored.
    652 </P>
    653 <P>
    654 <b>-v</b>, <b>--invert-match</b>
    655 Invert the sense of the match, so that lines which do <i>not</i> match any of
    656 the patterns are the ones that are found.
    657 </P>
    658 <P>
    659 <b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
    660 Force the patterns to match only whole words. This is equivalent to having \b
    661 at the start and end of the pattern. This option applies only to the patterns
    662 that are matched against the contents of files; it does not apply to patterns
    663 specified by any of the <b>--include</b> or <b>--exclude</b> options.
    664 </P>
    665 <P>
    666 <b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
    667 Force the patterns to be anchored (each must start matching at the beginning of
    668 a line) and in addition, require them to match entire lines. This is equivalent
    669 to having ^ and $ characters at the start and end of each alternative top-level
    670 branch in every pattern. This option applies only to the patterns that are
    671 matched against the contents of files; it does not apply to patterns specified
    672 by any of the <b>--include</b> or <b>--exclude</b> options.
    673 </P>
    674 <br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
    675 <P>
    676 The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
    677 order, for a locale. The first one that is set is used. This can be overridden
    678 by the <b>--locale</b> option. If no locale is set, the PCRE2 library's default
    679 (usually the "C" locale) is used.
    680 </P>
    681 <br><a name="SEC7" href="#TOC1">NEWLINES</a><br>
    682 <P>
    683 The <b>-N</b> (<b>--newline</b>) option allows <b>pcre2grep</b> to scan files with
    684 different newline conventions from the default. Any parts of the input files
    685 that are written to the standard output are copied identically, with whatever
    686 newline sequences they have in the input. However, the setting of this option
    687 does not affect the interpretation of files specified by the <b>-f</b>,
    688 <b>--exclude-from</b>, or <b>--include-from</b> options, which are assumed to use
    689 the operating system's standard newline sequence, nor does it affect the way in
    690 which <b>pcre2grep</b> writes informational messages to the standard error and
    691 output streams. For these it uses the string "\n" to indicate newlines,
    692 relying on the C I/O library to convert this to an appropriate sequence.
    693 </P>
    694 <br><a name="SEC8" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
    695 <P>
    696 Many of the short and long forms of <b>pcre2grep</b>'s options are the same
    697 as in the GNU <b>grep</b> program. Any long option of the form
    698 <b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
    699 (PCRE2 terminology). However, the <b>--file-list</b>, <b>--file-offsets</b>,
    700 <b>--include-dir</b>, <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>,
    701 <b>-M</b>, <b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
    702 <b>--recursion-limit</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to
    703 <b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a
    704 capturing parentheses number.
    705 </P>
    706 <P>
    707 Although most of the common options work the same way, a few are different in
    708 <b>pcre2grep</b>. For example, the <b>--include</b> option's argument is a glob
    709 for GNU <b>grep</b>, but a regular expression for <b>pcre2grep</b>. If both the
    710 <b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
    711 without counts, but <b>pcre2grep</b> gives the counts as well.
    712 </P>
    713 <br><a name="SEC9" href="#TOC1">OPTIONS WITH DATA</a><br>
    714 <P>
    715 There are four different ways in which an option with data can be specified.
    716 If a short form option is used, the data may follow immediately, or (with one
    717 exception) in the next command line item. For example:
    718 <pre>
    719   -f/some/file
    720   -f /some/file
    721 </pre>
    722 The exception is the <b>-o</b> option, which may appear with or without data.
    723 Because of this, if data is present, it must follow immediately in the same
    724 item, for example -o3.
    725 </P>
    726 <P>
    727 If a long form option is used, the data may appear in the same command line
    728 item, separated by an equals character, or (with two exceptions) it may appear
    729 in the next command line item. For example:
    730 <pre>
    731   --file=/some/file
    732   --file /some/file
    733 </pre>
    734 Note, however, that if you want to supply a file name beginning with ~ as data
    735 in a shell command, and have the shell expand ~ to a home directory, you must
    736 separate the file name from the option, because the shell does not treat ~
    737 specially unless it is at the start of an item.
    738 </P>
    739 <P>
    740 The exceptions to the above are the <b>--colour</b> (or <b>--color</b>) and
    741 <b>--only-matching</b> options, for which the data is optional. If one of these
    742 options does have data, it must be given in the first form, using an equals
    743 character. Otherwise <b>pcre2grep</b> will assume that it has no data.
    744 </P>
    745 <br><a name="SEC10" href="#TOC1">CALLING EXTERNAL SCRIPTS</a><br>
    746 <P>
    747 On non-Windows systems, <b>pcre2grep</b> has, by default, support for calling
    748 external programs or scripts during matching by making use of PCRE2's callout
    749 facility. However, this support can be disabled when <b>pcre2grep</b> is built.
    750 You can find out whether your binary has support for callouts by running it
    751 with the <b>--help</b> option. If the support is not enabled, all callouts in
    752 patterns are ignored by <b>pcre2grep</b>.
    753 </P>
    754 <P>
    755 A callout in a PCRE2 pattern is of the form (?C&#60;arg&#62;) where the argument is
    756 either a number or a quoted string (see the
    757 <a href="pcre2callout.html"><b>pcre2callout</b></a>
    758 documentation for details). Numbered callouts are ignored by <b>pcre2grep</b>.
    759 String arguments are parsed as a list of substrings separated by pipe (vertical
    760 bar) characters. The first substring must be an executable name, with the
    761 following substrings specifying arguments:
    762 <pre>
    763   executable_name|arg1|arg2|...
    764 </pre>
    765 Any substring (including the executable name) may contain escape sequences
    766 started by a dollar character: $&#60;digits&#62; or ${&#60;digits&#62;} is replaced by the
    767 captured substring of the given decimal number, which must be greater than
    768 zero. If the number is greater than the number of capturing substrings, or if
    769 the capture is unset, the replacement is empty.
    770 </P>
    771 <P>
    772 Any other character is substituted by itself. In particular, $$ is replaced by
    773 a single dollar and $| is replaced by a pipe character. Here is an example:
    774 <pre>
    775   echo -e "abcde\n12345" | pcre2grep \
    776     '(?x)(.)(..(.))
    777     (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
    778 
    779   Output:
    780 
    781     Arg1: [a] [bcd] [d] Arg2: |a| ()
    782     abcde
    783     Arg1: [1] [234] [4] Arg2: |1| ()
    784     12345
    785 </pre>
    786 The parameters for the <b>execv()</b> system call that is used to run the
    787 program or script are zero-terminated strings. This means that binary zero
    788 characters in the callout argument will cause premature termination of their
    789 substrings, and therefore should not be present. Any syntax errors in the
    790 string (for example, a dollar not followed by another character) cause the
    791 callout to be ignored. If running the program fails for any reason (including
    792 the non-existence of the executable), a local matching failure occurs and the
    793 matcher backtracks in the normal way.
    794 </P>
    795 <br><a name="SEC11" href="#TOC1">MATCHING ERRORS</a><br>
    796 <P>
    797 It is possible to supply a regular expression that takes a very long time to
    798 fail to match certain lines. Such patterns normally involve nested indefinite
    799 repeats, for example: (a+)*\d when matched against a line of a's with no final
    800 digit. The PCRE2 matching function has a resource limit that causes it to abort
    801 in these circumstances. If this happens, <b>pcre2grep</b> outputs an error
    802 message and the line that caused the problem to the standard error stream. If
    803 there are more than 20 such errors, <b>pcre2grep</b> gives up.
    804 </P>
    805 <P>
    806 The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the
    807 overall resource limit; there is a second option called <b>--recursion-limit</b>
    808 that sets a limit on the amount of memory (usually stack) that is used (see the
    809 discussion of these options above).
    810 </P>
    811 <br><a name="SEC12" href="#TOC1">DIAGNOSTICS</a><br>
    812 <P>
    813 Exit status is 0 if any matches were found, 1 if no matches were found, and 2
    814 for syntax errors, overlong lines, non-existent or inaccessible files (even if
    815 matches were found in other files) or too many matching errors. Using the
    816 <b>-s</b> option to suppress error messages about inaccessible files does not
    817 affect the return code.
    818 </P>
    819 <br><a name="SEC13" href="#TOC1">SEE ALSO</a><br>
    820 <P>
    821 <b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3), <b>pcre2callout</b>(3).
    822 </P>
    823 <br><a name="SEC14" href="#TOC1">AUTHOR</a><br>
    824 <P>
    825 Philip Hazel
    826 <br>
    827 University Computing Service
    828 <br>
    829 Cambridge, England.
    830 <br>
    831 </P>
    832 <br><a name="SEC15" href="#TOC1">REVISION</a><br>
    833 <P>
    834 Last updated: 19 June 2016
    835 <br>
    836 Copyright &copy; 1997-2016 University of Cambridge.
    837 <br>
    838 <p>
    839 Return to the <a href="index.html">PCRE2 index page</a>.
    840 </p>
    841