1 PCRE2GREP(1) General Commands Manual PCRE2GREP(1) 2 3 4 5 NAME 6 pcre2grep - a grep with Perl-compatible regular expressions. 7 8 SYNOPSIS 9 pcre2grep [options] [long options] [pattern] [path1 path2 ...] 10 11 12 DESCRIPTION 13 14 pcre2grep searches files for character patterns, in the same way as 15 other grep commands do, but it uses the PCRE2 regular expression 16 library to support patterns that are compatible with the regular 17 expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary 18 of pattern syntax, or pcre2pattern(3) for a full description of the 19 syntax and semantics of the regular expressions that PCRE2 supports. 20 21 Patterns, whether supplied on the command line or in a separate file, 22 are given without delimiters. For example: 23 24 pcre2grep Thursday /etc/motd 25 26 If you attempt to use delimiters (for example, by surrounding a pattern 27 with slashes, as is common in Perl scripts), they are interpreted as 28 part of the pattern. Quotes can of course be used to delimit patterns 29 on the command line because they are interpreted by the shell, and 30 indeed quotes are required if a pattern contains white space or shell 31 metacharacters. 32 33 The first argument that follows any option settings is treated as the 34 single pattern to be matched when neither -e nor -f is present. Con- 35 versely, when one or both of these options are used to specify pat- 36 terns, all arguments are treated as path names. At least one of -e, -f, 37 or an argument pattern must be provided. 38 39 If no files are specified, pcre2grep reads the standard input. The 40 standard input can also be referenced by a name consisting of a single 41 hyphen. For example: 42 43 pcre2grep some-pattern file1 - file3 44 45 Input files are searched line by line. By default, each line that 46 matches a pattern is copied to the standard output, and if there is 47 more than one file, the file name is output at the start of each line, 48 followed by a colon. However, there are options that can change how 49 pcre2grep behaves. In particular, the -M option makes it possible to 50 search for strings that span line boundaries. What defines a line 51 boundary is controlled by the -N (--newline) option. 52 53 The amount of memory used for buffering files that are being scanned is 54 controlled by parameters that can be set by the --buffer-size and 55 --max-buffer-size options. The first of these sets the size of buffer 56 that is obtained at the start of processing. If an input file contains 57 very long lines, a larger buffer may be needed; this is handled by 58 automatically extending the buffer, up to the limit specified by --max- 59 buffer-size. The default values for these parameters can be set when 60 pcre2grep is built; if nothing is specified, the defaults are set to 61 20KiB and 1MiB respectively. An error occurs if a line is too long and 62 the buffer can no longer be expanded. 63 64 The block of memory that is actually used is three times the "buffer 65 size", to allow for buffering "before" and "after" lines. If the buffer 66 size is too small, fewer than requested "before" and "after" lines may 67 be output. 68 69 Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the 70 greater. BUFSIZ is defined in <stdio.h>. When there is more than one 71 pattern (specified by the use of -e and/or -f), each pattern is applied 72 to each line in the order in which they are defined, except that all 73 the -e patterns are tried before the -f patterns. 74 75 By default, as soon as one pattern matches a line, no further patterns 76 are considered. However, if --colour (or --color) is used to colour the 77 matching substrings, or if --only-matching, --file-offsets, or --line- 78 offsets is used to output only the part of the line that matched 79 (either shown literally, or as an offset), scanning resumes immediately 80 following the match, so that further matches on the same line can be 81 found. If there are multiple patterns, they are all tried on the 82 remainder of the line, but patterns that follow the one that matched 83 are not tried on the earlier part of the line. 84 85 This behaviour means that the order in which multiple patterns are 86 specified can affect the output when one of the above options is used. 87 This is no longer the same behaviour as GNU grep, which now manages to 88 display earlier matches for later patterns (as long as there is no 89 overlap). 90 91 Patterns that can match an empty string are accepted, but empty string 92 matches are never recognized. An example is the pattern 93 "(super)?(man)?", in which all components are optional. This pattern 94 finds all occurrences of both "super" and "man"; the output differs 95 from matching with "super|man" when only the matching substrings are 96 being shown. 97 98 If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses 99 the value to set a locale when calling the PCRE2 library. The --locale 100 option can be used to override this. 101 102 103 SUPPORT FOR COMPRESSED FILES 104 105 It is possible to compile pcre2grep so that it uses libz or libbz2 to 106 read compressed files whose names end in .gz or .bz2, respectively. You 107 can find out whether your pcre2grep binary has support for one or both 108 of these file types by running it with the --help option. If the appro- 109 priate support is not present, all files are treated as plain text. The 110 standard input is always so treated. When input is from a compressed 111 .gz or .bz2 file, the --line-buffered option is ignored. 112 113 114 BINARY FILES 115 116 By default, a file that contains a binary zero byte within the first 117 1024 bytes is identified as a binary file, and is processed specially. 118 (GNU grep identifies binary files in this manner.) However, if the new- 119 line type is specified as "nul", that is, the line terminator is a 120 binary zero, the test for a binary file is not applied. See the 121 --binary-files option for a means of changing the way binary files are 122 handled. 123 124 125 BINARY ZEROS IN PATTERNS 126 127 Patterns passed from the command line are strings that are terminated 128 by a binary zero, so cannot contain internal zeros. However, patterns 129 that are read from a file via the -f option may contain binary zeros. 130 131 132 OPTIONS 133 134 The order in which some of the options appear can affect the output. 135 For example, both the -H and -l options affect the printing of file 136 names. Whichever comes later in the command line will be the one that 137 takes effect. Similarly, except where noted below, if an option is 138 given twice, the later setting is used. Numerical values for options 139 may be followed by K or M, to signify multiplication by 1024 or 140 1024*1024 respectively. 141 142 -- This terminates the list of options. It is useful if the next 143 item on the command line starts with a hyphen but is not an 144 option. This allows for the processing of patterns and file 145 names that start with hyphens. 146 147 -A number, --after-context=number 148 Output up to number lines of context after each matching 149 line. Fewer lines are output if the next match or the end of 150 the file is reached, or if the processing buffer size has 151 been set too small. If file names and/or line numbers are 152 being output, a hyphen separator is used instead of a colon 153 for the context lines. A line containing "--" is output 154 between each group of lines, unless they are in fact contigu- 155 ous in the input file. The value of number is expected to be 156 relatively small. When -c is used, -A is ignored. 157 158 -a, --text 159 Treat binary files as text. This is equivalent to --binary- 160 files=text. 161 162 -B number, --before-context=number 163 Output up to number lines of context before each matching 164 line. Fewer lines are output if the previous match or the 165 start of the file is within number lines, or if the process- 166 ing buffer size has been set too small. If file names and/or 167 line numbers are being output, a hyphen separator is used 168 instead of a colon for the context lines. A line containing 169 "--" is output between each group of lines, unless they are 170 in fact contiguous in the input file. The value of number is 171 expected to be relatively small. When -c is used, -B is 172 ignored. 173 174 --binary-files=word 175 Specify how binary files are to be processed. If the word is 176 "binary" (the default), pattern matching is performed on 177 binary files, but the only output is "Binary file <name> 178 matches" when a match succeeds. If the word is "text", which 179 is equivalent to the -a or --text option, binary files are 180 processed in the same way as any other file. In this case, 181 when a match succeeds, the output may be binary garbage, 182 which can have nasty effects if sent to a terminal. If the 183 word is "without-match", which is equivalent to the -I 184 option, binary files are not processed at all; they are 185 assumed not to be of interest and are skipped without causing 186 any output or affecting the return code. 187 188 --buffer-size=number 189 Set the parameter that controls how much memory is obtained 190 at the start of processing for buffering files that are being 191 scanned. See also --max-buffer-size below. 192 193 -C number, --context=number 194 Output number lines of context both before and after each 195 matching line. This is equivalent to setting both -A and -B 196 to the same value. 197 198 -c, --count 199 Do not output lines from the files that are being scanned; 200 instead output the number of lines that would have been 201 shown, either because they matched, or, if -v is set, because 202 they failed to match. By default, this count is exactly the 203 same as the number of lines that would have been output, but 204 if the -M (multiline) option is used (without -v), there may 205 be more suppressed lines than the count (that is, the number 206 of matches). 207 208 If no lines are selected, the number zero is output. If sev- 209 eral files are are being scanned, a count is output for each 210 of them and the -t option can be used to cause a total to be 211 output at the end. However, if the --files-with-matches 212 option is also used, only those files whose counts are 213 greater than zero are listed. When -c is used, the -A, -B, 214 and -C options are ignored. 215 216 --colour, --color 217 If this option is given without any data, it is equivalent to 218 "--colour=auto". If data is required, it must be given in 219 the same shell item, separated by an equals sign. 220 221 --colour=value, --color=value 222 This option specifies under what circumstances the parts of a 223 line that matched a pattern should be coloured in the output. 224 By default, the output is not coloured. The value (which is 225 optional, see above) may be "never", "always", or "auto". In 226 the latter case, colouring happens only if the standard out- 227 put is connected to a terminal. More resources are used when 228 colouring is enabled, because pcre2grep has to search for all 229 possible matches in a line, not just one, in order to colour 230 them all. 231 232 The colour that is used can be specified by setting one of 233 the environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, 234 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that 235 order. If none of these are set, pcre2grep looks for 236 GREP_COLORS or GREP_COLOR (in that order). The value of the 237 variable should be a string of two numbers, separated by a 238 semicolon, except in the case of GREP_COLORS, which must 239 start with "ms=" or "mt=" followed by two semicolon-separated 240 colours, terminated by the end of the string or by a colon. 241 If GREP_COLORS does not start with "ms=" or "mt=" it is 242 ignored, and GREP_COLOR is checked. 243 244 If the string obtained from one of the above variables con- 245 tains any characters other than semicolon or digits, the set- 246 ting is ignored and the default colour is used. The string is 247 copied directly into the control string for setting colour on 248 a terminal, so it is your responsibility to ensure that the 249 values make sense. If no relevant environment variable is 250 set, the default is "1;31", which gives red. 251 252 -D action, --devices=action 253 If an input path is not a regular file or a directory, 254 "action" specifies how it is to be processed. Valid values 255 are "read" (the default) or "skip" (silently skip the path). 256 257 -d action, --directories=action 258 If an input path is a directory, "action" specifies how it is 259 to be processed. Valid values are "read" (the default in 260 non-Windows environments, for compatibility with GNU grep), 261 "recurse" (equivalent to the -r option), or "skip" (silently 262 skip the path, the default in Windows environments). In the 263 "read" case, directories are read as if they were ordinary 264 files. In some operating systems the effect of reading a 265 directory like this is an immediate end-of-file; in others it 266 may provoke an error. 267 268 --depth-limit=number 269 See --match-limit below. 270 271 -e pattern, --regex=pattern, --regexp=pattern 272 Specify a pattern to be matched. This option can be used mul- 273 tiple times in order to specify several patterns. It can also 274 be used as a way of specifying a single pattern that starts 275 with a hyphen. When -e is used, no argument pattern is taken 276 from the command line; all arguments are treated as file 277 names. There is no limit to the number of patterns. They are 278 applied to each line in the order in which they are defined 279 until one matches. 280 281 If -f is used with -e, the command line patterns are matched 282 first, followed by the patterns from the file(s), independent 283 of the order in which these options are specified. Note that 284 multiple use of -e is not the same as a single pattern with 285 alternatives. For example, X|Y finds the first character in a 286 line that is X or Y, whereas if the two patterns are given 287 separately, with X first, pcre2grep finds X if it is present, 288 even if it follows Y in the line. It finds Y only if there is 289 no X in the line. This matters only if you are using -o or 290 --colo(u)r to show the part(s) of the line that matched. 291 292 --exclude=pattern 293 Files (but not directories) whose names match the pattern are 294 skipped without being processed. This applies to all files, 295 whether listed on the command line, obtained from --file- 296 list, or by scanning a directory. The pattern is a PCRE2 reg- 297 ular expression, and is matched against the final component 298 of the file name, not the entire path. The -F, -w, and -x 299 options do not apply to this pattern. The option may be given 300 any number of times in order to specify multiple patterns. If 301 a file name matches both an --include and an --exclude pat- 302 tern, it is excluded. There is no short form for this option. 303 304 --exclude-from=filename 305 Treat each non-empty line of the file as the data for an 306 --exclude option. What constitutes a newline when reading the 307 file is the operating system's default. The --newline option 308 has no effect on this option. This option may be given more 309 than once in order to specify a number of files to read. 310 311 --exclude-dir=pattern 312 Directories whose names match the pattern are skipped without 313 being processed, whatever the setting of the --recursive 314 option. This applies to all directories, whether listed on 315 the command line, obtained from --file-list, or by scanning a 316 parent directory. The pattern is a PCRE2 regular expression, 317 and is matched against the final component of the directory 318 name, not the entire path. The -F, -w, and -x options do not 319 apply to this pattern. The option may be given any number of 320 times in order to specify more than one pattern. If a direc- 321 tory matches both --include-dir and --exclude-dir, it is 322 excluded. There is no short form for this option. 323 324 -F, --fixed-strings 325 Interpret each data-matching pattern as a list of fixed 326 strings, separated by newlines, instead of as a regular 327 expression. What constitutes a newline for this purpose is 328 controlled by the --newline option. The -w (match as a word) 329 and -x (match whole line) options can be used with -F. They 330 apply to each of the fixed strings. A line is selected if any 331 of the fixed strings are found in it (subject to -w or -x, if 332 present). This option applies only to the patterns that are 333 matched against the contents of files; it does not apply to 334 patterns specified by any of the --include or --exclude 335 options. 336 337 -f filename, --file=filename 338 Read patterns from the file, one per line, and match them 339 against each line of input. As is the case with patterns on 340 the command line, no delimiters should be used. What consti- 341 tutes a newline when reading the file is the operating sys- 342 tem's default interpretation of \n. The --newline option has 343 no effect on this option. Trailing white space is removed 344 from each line, and blank lines are ignored. An empty file 345 contains no patterns and therefore matches nothing. Patterns 346 read from a file in this way may contain binary zeros, which 347 are treated as ordinary data characters. See also the com- 348 ments about multiple patterns versus a single pattern with 349 alternatives in the description of -e above. 350 351 If this option is given more than once, all the specified 352 files are read. A data line is output if any of the patterns 353 match it. A file name can be given as "-" to refer to the 354 standard input. When -f is used, patterns specified on the 355 command line using -e may also be present; they are tested 356 before the file's patterns. However, no other pattern is 357 taken from the command line; all arguments are treated as the 358 names of paths to be searched. 359 360 --file-list=filename 361 Read a list of files and/or directories that are to be 362 scanned from the given file, one per line. What constitutes a 363 newline when reading the file is the operating system's 364 default. Trailing white space is removed from each line, and 365 blank lines are ignored. These paths are processed before any 366 that are listed on the command line. The file name can be 367 given as "-" to refer to the standard input. If --file and 368 --file-list are both specified as "-", patterns are read 369 first. This is useful only when the standard input is a ter- 370 minal, from which further lines (the list of files) can be 371 read after an end-of-file indication. If this option is given 372 more than once, all the specified files are read. 373 374 --file-offsets 375 Instead of showing lines or parts of lines that match, show 376 each match as an offset from the start of the file and a 377 length, separated by a comma. In this mode, no context is 378 shown. That is, the -A, -B, and -C options are ignored. If 379 there is more than one match in a line, each of them is shown 380 separately. This option is mutually exclusive with --output, 381 --line-offsets, and --only-matching. 382 383 -H, --with-filename 384 Force the inclusion of the file name at the start of output 385 lines when searching a single file. By default, the file name 386 is not shown in this case. For matching lines, the file name 387 is followed by a colon; for context lines, a hyphen separator 388 is used. If a line number is also being output, it follows 389 the file name. When the -M option causes a pattern to match 390 more than one line, only the first is preceded by the file 391 name. This option overrides any previous -h, -l, or -L 392 options. 393 394 -h, --no-filename 395 Suppress the output file names when searching multiple files. 396 By default, file names are shown when multiple files are 397 searched. For matching lines, the file name is followed by a 398 colon; for context lines, a hyphen separator is used. If a 399 line number is also being output, it follows the file name. 400 This option overrides any previous -H, -L, or -l options. 401 402 --heap-limit=number 403 See --match-limit below. 404 405 --help Output a help message, giving brief details of the command 406 options and file type support, and then exit. Anything else 407 on the command line is ignored. 408 409 -I Ignore binary files. This is equivalent to --binary- 410 files=without-match. 411 412 -i, --ignore-case 413 Ignore upper/lower case distinctions during comparisons. 414 415 --include=pattern 416 If any --include patterns are specified, the only files that 417 are processed are those that match one of the patterns (and 418 do not match an --exclude pattern). This option does not 419 affect directories, but it applies to all files, whether 420 listed on the command line, obtained from --file-list, or by 421 scanning a directory. The pattern is a PCRE2 regular expres- 422 sion, and is matched against the final component of the file 423 name, not the entire path. The -F, -w, and -x options do not 424 apply to this pattern. The option may be given any number of 425 times. If a file name matches both an --include and an 426 --exclude pattern, it is excluded. There is no short form 427 for this option. 428 429 --include-from=filename 430 Treat each non-empty line of the file as the data for an 431 --include option. What constitutes a newline for this purpose 432 is the operating system's default. The --newline option has 433 no effect on this option. This option may be given any number 434 of times; all the files are read. 435 436 --include-dir=pattern 437 If any --include-dir patterns are specified, the only direc- 438 tories that are processed are those that match one of the 439 patterns (and do not match an --exclude-dir pattern). This 440 applies to all directories, whether listed on the command 441 line, obtained from --file-list, or by scanning a parent 442 directory. The pattern is a PCRE2 regular expression, and is 443 matched against the final component of the directory name, 444 not the entire path. The -F, -w, and -x options do not apply 445 to this pattern. The option may be given any number of times. 446 If a directory matches both --include-dir and --exclude-dir, 447 it is excluded. There is no short form for this option. 448 449 -L, --files-without-match 450 Instead of outputting lines from the files, just output the 451 names of the files that do not contain any lines that would 452 have been output. Each file name is output once, on a sepa- 453 rate line. This option overrides any previous -H, -h, or -l 454 options. 455 456 -l, --files-with-matches 457 Instead of outputting lines from the files, just output the 458 names of the files containing lines that would have been out- 459 put. Each file name is output once, on a separate line. 460 Searching normally stops as soon as a matching line is found 461 in a file. However, if the -c (count) option is also used, 462 matching continues in order to obtain the correct count, and 463 those files that have at least one match are listed along 464 with their counts. Using this option with -c is a way of sup- 465 pressing the listing of files with no matches. This opeion 466 overrides any previous -H, -h, or -L options. 467 468 --label=name 469 This option supplies a name to be used for the standard input 470 when file names are being output. If not supplied, "(standard 471 input)" is used. There is no short form for this option. 472 473 --line-buffered 474 When this option is given, non-compressed input is read and 475 processed line by line, and the output is flushed after each 476 write. By default, input is read in large chunks, unless 477 pcre2grep can determine that it is reading from a terminal 478 (which is currently possible only in Unix-like environments 479 or Windows). Output to terminal is normally automatically 480 flushed by the operating system. This option can be useful 481 when the input or output is attached to a pipe and you do not 482 want pcre2grep to buffer up large amounts of data. However, 483 its use will affect performance, and the -M (multiline) 484 option ceases to work. When input is from a compressed .gz or 485 .bz2 file, --line-buffered is ignored. 486 487 --line-offsets 488 Instead of showing lines or parts of lines that match, show 489 each match as a line number, the offset from the start of the 490 line, and a length. The line number is terminated by a colon 491 (as usual; see the -n option), and the offset and length are 492 separated by a comma. In this mode, no context is shown. 493 That is, the -A, -B, and -C options are ignored. If there is 494 more than one match in a line, each of them is shown sepa- 495 rately. This option is mutually exclusive with --output, 496 --file-offsets, and --only-matching. 497 498 --locale=locale-name 499 This option specifies a locale to be used for pattern match- 500 ing. It overrides the value in the LC_ALL or LC_CTYPE envi- 501 ronment variables. If no locale is specified, the PCRE2 502 library's default (usually the "C" locale) is used. There is 503 no short form for this option. 504 505 --match-limit=number 506 Processing some regular expression patterns may take a very 507 long time to search for all possible matching strings. Others 508 may require a very large amount of memory. There are three 509 options that set resource limits for matching. 510 511 The --match-limit option provides a means of limiting comput- 512 ing resource usage when processing patterns that are not 513 going to match, but which have a very large number of possi- 514 bilities in their search trees. The classic example is a pat- 515 tern that uses nested unlimited repeats. Internally, PCRE2 516 has a counter that is incremented each time around its main 517 processing loop. If the value set by --match-limit is 518 reached, an error occurs. 519 520 The --heap-limit option specifies, as a number of kibibytes 521 (units of 1024 bytes), the amount of heap memory that may be 522 used for matching. Heap memory is needed only if matching the 523 pattern requires a significant number of nested backtracking 524 points to be remembered. This parameter can be set to zero to 525 forbid the use of heap memory altogether. 526 527 The --depth-limit option limits the depth of nested back- 528 tracking points, which indirectly limits the amount of memory 529 that is used. The amount of memory needed for each backtrack- 530 ing point depends on the number of capturing parentheses in 531 the pattern, so the amount of memory that is used before this 532 limit acts varies from pattern to pattern. This limit is of 533 use only if it is set smaller than --match-limit. 534 535 There are no short forms for these options. The default lim- 536 its can be set when the PCRE2 library is compiled; if they 537 are not specified, the defaults are very large and so effec- 538 tively unlimited. 539 540 --max-buffer-size=number 541 This limits the expansion of the processing buffer, whose 542 initial size can be set by --buffer-size. The maximum buffer 543 size is silently forced to be no smaller than the starting 544 buffer size. 545 546 -M, --multiline 547 Allow patterns to match more than one line. When this option 548 is set, the PCRE2 library is called in "multiline" mode. This 549 allows a matched string to extend past the end of a line and 550 continue on one or more subsequent lines. Patterns used with 551 -M may usefully contain literal newline characters and inter- 552 nal occurrences of ^ and $ characters. The output for a suc- 553 cessful match may consist of more than one line. The first 554 line is the line in which the match started, and the last 555 line is the line in which the match ended. If the matched 556 string ends with a newline sequence, the output ends at the 557 end of that line. If -v is set, none of the lines in a 558 multi-line match are output. Once a match has been handled, 559 scanning restarts at the beginning of the line after the one 560 in which the match ended. 561 562 The newline sequence that separates multiple lines must be 563 matched as part of the pattern. For example, to find the 564 phrase "regular expression" in a file where "regular" might 565 be at the end of a line and "expression" at the start of the 566 next line, you could use this command: 567 568 pcre2grep -M 'regular\s+expression' <file> 569 570 The \s escape sequence matches any white space character, 571 including newlines, and is followed by + so as to match 572 trailing white space on the first line as well as possibly 573 handling a two-character newline sequence. 574 575 There is a limit to the number of lines that can be matched, 576 imposed by the way that pcre2grep buffers the input file as 577 it scans it. With a sufficiently large processing buffer, 578 this should not be a problem, but the -M option does not work 579 when input is read line by line (see --line-buffered.) 580 581 -N newline-type, --newline=newline-type 582 The PCRE2 library supports five different conventions for 583 indicating the ends of lines. They are the single-character 584 sequences CR (carriage return) and LF (linefeed), the two- 585 character sequence CRLF, an "anycrlf" convention, which rec- 586 ognizes any of the preceding three types, and an "any" con- 587 vention, in which any Unicode line ending sequence is assumed 588 to end a line. The Unicode sequences are the three just men- 589 tioned, plus VT (vertical tab, U+000B), FF (form feed, 590 U+000C), NEL (next line, U+0085), LS (line separator, 591 U+2028), and PS (paragraph separator, U+2029). 592 593 When the PCRE2 library is built, a default line-ending 594 sequence is specified. This is normally the standard 595 sequence for the operating system. Unless otherwise specified 596 by this option, pcre2grep uses the library's default. The 597 possible values for this option are CR, LF, CRLF, ANYCRLF, or 598 ANY. This makes it possible to use pcre2grep to scan files 599 that have come from other environments without having to mod- 600 ify their line endings. If the data that is being scanned 601 does not agree with the convention set by this option, 602 pcre2grep may behave in strange ways. Note that this option 603 does not apply to files specified by the -f, --exclude-from, 604 or --include-from options, which are expected to use the 605 operating system's standard newline sequence. 606 607 -n, --line-number 608 Precede each output line by its line number in the file, fol- 609 lowed by a colon for matching lines or a hyphen for context 610 lines. If the file name is also being output, it precedes the 611 line number. When the -M option causes a pattern to match 612 more than one line, only the first is preceded by its line 613 number. This option is forced if --line-offsets is used. 614 615 --no-jit If the PCRE2 library is built with support for just-in-time 616 compiling (which speeds up matching), pcre2grep automatically 617 makes use of this, unless it was explicitly disabled at build 618 time. This option can be used to disable the use of JIT at 619 run time. It is provided for testing and working round prob- 620 lems. It should never be needed in normal use. 621 622 -O text, --output=text 623 When there is a match, instead of outputting the whole line 624 that matched, output just the given text. This option is 625 mutually exclusive with --only-matching, --file-offsets, and 626 --line-offsets. Escape sequences starting with a dollar char- 627 acter may be used to insert the contents of the matched part 628 of the line and/or captured substrings into the text. 629 630 $<digits> or ${<digits>} is replaced by the captured sub- 631 string of the given decimal number; zero substitutes the 632 whole match. If the number is greater than the number of cap- 633 turing substrings, or if the capture is unset, the replace- 634 ment is empty. 635 636 $a is replaced by bell; $b by backspace; $e by escape; $f by 637 form feed; $n by newline; $r by carriage return; $t by tab; 638 $v by vertical tab. 639 640 $o<digits> is replaced by the character represented by the 641 given octal number; up to three digits are processed. 642 643 $x<digits> is replaced by the character represented by the 644 given hexadecimal number; up to two digits are processed. 645 646 Any other character is substituted by itself. In particular, 647 $$ is replaced by a single dollar. 648 649 -o, --only-matching 650 Show only the part of the line that matched a pattern instead 651 of the whole line. In this mode, no context is shown. That 652 is, the -A, -B, and -C options are ignored. If there is more 653 than one match in a line, each of them is shown separately, 654 on a separate line of output. If -o is combined with -v 655 (invert the sense of the match to find non-matching lines), 656 no output is generated, but the return code is set appropri- 657 ately. If the matched portion of the line is empty, nothing 658 is output unless the file name or line number are being 659 printed, in which case they are shown on an otherwise empty 660 line. This option is mutually exclusive with --output, 661 --file-offsets and --line-offsets. 662 663 -onumber, --only-matching=number 664 Show only the part of the line that matched the capturing 665 parentheses of the given number. Up to 32 capturing parenthe- 666 ses are supported, and -o0 is equivalent to -o without a num- 667 ber. Because these options can be given without an argument 668 (see above), if an argument is present, it must be given in 669 the same shell item, for example, -o3 or --only-matching=2. 670 The comments given for the non-argument case above also apply 671 to this option. If the specified capturing parentheses do not 672 exist in the pattern, or were not set in the match, nothing 673 is output unless the file name or line number are being out- 674 put. 675 676 If this option is given multiple times, multiple substrings 677 are output for each match, in the order the options are 678 given, and all on one line. For example, -o3 -o1 -o3 causes 679 the substrings matched by capturing parentheses 3 and 1 and 680 then 3 again to be output. By default, there is no separator 681 (but see the next option). 682 683 --om-separator=text 684 Specify a separating string for multiple occurrences of -o. 685 The default is an empty string. Separating strings are never 686 coloured. 687 688 -q, --quiet 689 Work quietly, that is, display nothing except error messages. 690 The exit status indicates whether or not any matches were 691 found. 692 693 -r, --recursive 694 If any given path is a directory, recursively scan the files 695 it contains, taking note of any --include and --exclude set- 696 tings. By default, a directory is read as a normal file; in 697 some operating systems this gives an immediate end-of-file. 698 This option is a shorthand for setting the -d option to 699 "recurse". 700 701 --recursion-limit=number 702 See --match-limit above. 703 704 -s, --no-messages 705 Suppress error messages about non-existent or unreadable 706 files. Such files are quietly skipped. However, the return 707 code is still 2, even if matches were found in other files. 708 709 -t, --total-count 710 This option is useful when scanning more than one file. If 711 used on its own, -t suppresses all output except for a grand 712 total number of matching lines (or non-matching lines if -v 713 is used) in all the files. If -t is used with -c, a grand 714 total is output except when the previous output is just one 715 line. In other words, it is not output when just one file's 716 count is listed. If file names are being output, the grand 717 total is preceded by "TOTAL:". Otherwise, it appears as just 718 another number. The -t option is ignored when used with -L 719 (list files without matches), because the grand total would 720 always be zero. 721 722 -u, --utf-8 723 Operate in UTF-8 mode. This option is available only if PCRE2 724 has been compiled with UTF-8 support. All patterns (including 725 those for any --exclude and --include options) and all sub- 726 ject lines that are scanned must be valid strings of UTF-8 727 characters. 728 729 -V, --version 730 Write the version numbers of pcre2grep and the PCRE2 library 731 to the standard output and then exit. Anything else on the 732 command line is ignored. 733 734 -v, --invert-match 735 Invert the sense of the match, so that lines which do not 736 match any of the patterns are the ones that are found. 737 738 -w, --word-regex, --word-regexp 739 Force the patterns only to match "words". That is, there must 740 be a word boundary at the start and end of each matched 741 string. This is equivalent to having "\b(?:" at the start of 742 each pattern, and ")\b" at the end. This option applies only 743 to the patterns that are matched against the contents of 744 files; it does not apply to patterns specified by any of the 745 --include or --exclude options. 746 747 -x, --line-regex, --line-regexp 748 Force the patterns to start matching only at the beginnings 749 of lines, and in addition, require them to match entire 750 lines. In multiline mode the match may be more than one line. 751 This is equivalent to having "^(?:" at the start of each pat- 752 tern and ")$" at the end. This option applies only to the 753 patterns that are matched against the contents of files; it 754 does not apply to patterns specified by any of the --include 755 or --exclude options. 756 757 758 ENVIRONMENT VARIABLES 759 760 The environment variables LC_ALL and LC_CTYPE are examined, in that 761 order, for a locale. The first one that is set is used. This can be 762 overridden by the --locale option. If no locale is set, the PCRE2 763 library's default (usually the "C" locale) is used. 764 765 766 NEWLINES 767 768 The -N (--newline) option allows pcre2grep to scan files with different 769 newline conventions from the default. Any parts of the input files that 770 are written to the standard output are copied identically, with what- 771 ever newline sequences they have in the input. However, the setting of 772 this option affects only the way scanned files are processed. It does 773 not affect the interpretation of files specified by the -f, --file- 774 list, --exclude-from, or --include-from options, nor does it affect the 775 way in which pcre2grep writes informational messages to the standard 776 error and output streams. For these it uses the string "\n" to indicate 777 newlines, relying on the C I/O library to convert this to an appropri- 778 ate sequence. 779 780 781 OPTIONS COMPATIBILITY 782 783 Many of the short and long forms of pcre2grep's options are the same as 784 in the GNU grep program. Any long option of the form --xxx-regexp (GNU 785 terminology) is also available as --xxx-regex (PCRE2 terminology). How- 786 ever, the --depth-limit, --file-list, --file-offsets, --heap-limit, 787 --include-dir, --line-offsets, --locale, --match-limit, -M, --multi- 788 line, -N, --newline, --om-separator, --output, -u, and --utf-8 options 789 are specific to pcre2grep, as is the use of the --only-matching option 790 with a capturing parentheses number. 791 792 Although most of the common options work the same way, a few are dif- 793 ferent in pcre2grep. For example, the --include option's argument is a 794 glob for GNU grep, but a regular expression for pcre2grep. If both the 795 -c and -l options are given, GNU grep lists only file names, without 796 counts, but pcre2grep gives the counts as well. 797 798 799 OPTIONS WITH DATA 800 801 There are four different ways in which an option with data can be spec- 802 ified. If a short form option is used, the data may follow immedi- 803 ately, or (with one exception) in the next command line item. For exam- 804 ple: 805 806 -f/some/file 807 -f /some/file 808 809 The exception is the -o option, which may appear with or without data. 810 Because of this, if data is present, it must follow immediately in the 811 same item, for example -o3. 812 813 If a long form option is used, the data may appear in the same command 814 line item, separated by an equals character, or (with two exceptions) 815 it may appear in the next command line item. For example: 816 817 --file=/some/file 818 --file /some/file 819 820 Note, however, that if you want to supply a file name beginning with ~ 821 as data in a shell command, and have the shell expand ~ to a home 822 directory, you must separate the file name from the option, because the 823 shell does not treat ~ specially unless it is at the start of an item. 824 825 The exceptions to the above are the --colour (or --color) and --only- 826 matching options, for which the data is optional. If one of these 827 options does have data, it must be given in the first form, using an 828 equals character. Otherwise pcre2grep will assume that it has no data. 829 830 831 USING PCRE2'S CALLOUT FACILITY 832 833 pcre2grep has, by default, support for calling external programs or 834 scripts or echoing specific strings during matching by making use of 835 PCRE2's callout facility. However, this support can be disabled when 836 pcre2grep is built. You can find out whether your binary has support 837 for callouts by running it with the --help option. If the support is 838 not enabled, all callouts in patterns are ignored by pcre2grep. 839 840 A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu- 841 ment is either a number or a quoted string (see the pcre2callout docu- 842 mentation for details). Numbered callouts are ignored by pcre2grep; 843 only callouts with string arguments are useful. 844 845 Calling external programs or scripts 846 847 If the callout string does not start with a pipe (vertical bar) charac- 848 ter, it is parsed into a list of substrings separated by pipe charac- 849 ters. The first substring must be an executable name, with the follow- 850 ing substrings specifying arguments: 851 852 executable_name|arg1|arg2|... 853 854 Any substring (including the executable name) may contain escape 855 sequences started by a dollar character: $<digits> or ${<digits>} is 856 replaced by the captured substring of the given decimal number, which 857 must be greater than zero. If the number is greater than the number of 858 capturing substrings, or if the capture is unset, the replacement is 859 empty. 860 861 Any other character is substituted by itself. In particular, $$ is 862 replaced by a single dollar and $| is replaced by a pipe character. 863 Here is an example: 864 865 echo -e "abcde\n12345" | pcre2grep \ 866 '(?x)(.)(..(.)) 867 (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' - 868 869 Output: 870 871 Arg1: [a] [bcd] [d] Arg2: |a| () 872 abcde 873 Arg1: [1] [234] [4] Arg2: |1| () 874 12345 875 876 The parameters for the execv() system call that is used to run the pro- 877 gram or script are zero-terminated strings. This means that binary zero 878 characters in the callout argument will cause premature termination of 879 their substrings, and therefore should not be present. Any syntax 880 errors in the string (for example, a dollar not followed by another 881 character) cause the callout to be ignored. If running the program 882 fails for any reason (including the non-existence of the executable), a 883 local matching failure occurs and the matcher backtracks in the normal 884 way. 885 886 Echoing a specific string 887 888 If the callout string starts with a pipe (vertical bar) character, the 889 rest of the string is written to the output, having been passed through 890 the same escape processing as text from the --output option. This pro- 891 vides a simple echoing facility that avoids calling an external program 892 or script. No terminator is added to the string, so if you want a new- 893 line, you must include it explicitly. Matching continues normally 894 after the string is output. If you want to see only the callout output 895 but not any output from an actual match, you should end the relevant 896 pattern with (*FAIL). 897 898 899 MATCHING ERRORS 900 901 It is possible to supply a regular expression that takes a very long 902 time to fail to match certain lines. Such patterns normally involve 903 nested indefinite repeats, for example: (a+)*\d when matched against a 904 line of a's with no final digit. The PCRE2 matching function has a 905 resource limit that causes it to abort in these circumstances. If this 906 happens, pcre2grep outputs an error message and the line that caused 907 the problem to the standard error stream. If there are more than 20 908 such errors, pcre2grep gives up. 909 910 The --match-limit option of pcre2grep can be used to set the overall 911 resource limit. There are also other limits that affect the amount of 912 memory used during matching; see the discussion of --heap-limit and 913 --depth-limit above. 914 915 916 DIAGNOSTICS 917 918 Exit status is 0 if any matches were found, 1 if no matches were found, 919 and 2 for syntax errors, overlong lines, non-existent or inaccessible 920 files (even if matches were found in other files) or too many matching 921 errors. Using the -s option to suppress error messages about inaccessi- 922 ble files does not affect the return code. 923 924 When run under VMS, the return code is placed in the symbol 925 PCRE2GREP_RC because VMS does not distinguish between exit(0) and 926 exit(1). 927 928 929 SEE ALSO 930 931 pcre2pattern(3), pcre2syntax(3), pcre2callout(3). 932 933 934 AUTHOR 935 936 Philip Hazel 937 University Computing Service 938 Cambridge, England. 939 940 941 REVISION 942 943 Last updated: 24 February 2018 944 Copyright (c) 1997-2018 University of Cambridge. 945