1 This is ../../doc/sed.info, produced by makeinfo version 4.12 from 2 ../../doc//config.texi. 3 4 INFO-DIR-SECTION Text creation and manipulation 5 START-INFO-DIR-ENTRY 6 * sed: (sed). Stream EDitor. 7 8 END-INFO-DIR-ENTRY 9 10 This file documents version 4.2.1 of GNU `sed', a stream editor. 11 12 Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004 Free Software 13 Foundation, Inc. 14 15 This document is released under the terms of the GNU Free 16 Documentation License as published by the Free Software Foundation; 17 either version 1.1, or (at your option) any later version. 18 19 You should have received a copy of the GNU Free Documentation 20 License along with GNU `sed'; see the file `COPYING.DOC'. If not, 21 write to the Free Software Foundation, 59 Temple Place - Suite 330, 22 Boston, MA 02110-1301, USA. 23 24 There are no Cover Texts and no Invariant Sections; this text, along 25 with its equivalent in the printed manual, constitutes the Title Page. 26 27 28 File: sed.info, Node: Top, Next: Introduction, Up: (dir) 29 30 sed, a stream editor 31 ******************** 32 33 This file documents version 4.2.1 of GNU `sed', a stream editor. 34 35 Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004 Free Software 36 Foundation, Inc. 37 38 This document is released under the terms of the GNU Free 39 Documentation License as published by the Free Software Foundation; 40 either version 1.1, or (at your option) any later version. 41 42 You should have received a copy of the GNU Free Documentation 43 License along with GNU `sed'; see the file `COPYING.DOC'. If not, 44 write to the Free Software Foundation, 59 Temple Place - Suite 330, 45 Boston, MA 02110-1301, USA. 46 47 There are no Cover Texts and no Invariant Sections; this text, along 48 with its equivalent in the printed manual, constitutes the Title Page. 49 50 * Menu: 51 52 * Introduction:: Introduction 53 * Invoking sed:: Invocation 54 * sed Programs:: `sed' programs 55 * Examples:: Some sample scripts 56 * Limitations:: Limitations and (non-)limitations of GNU `sed' 57 * Other Resources:: Other resources for learning about `sed' 58 * Reporting Bugs:: Reporting bugs 59 60 * Extended regexps:: `egrep'-style regular expressions 61 62 * Concept Index:: A menu with all the topics in this manual. 63 * Command and Option Index:: A menu with all `sed' commands and 64 command-line options. 65 66 --- The detailed node listing --- 67 68 sed Programs: 69 * Execution Cycle:: How `sed' works 70 * Addresses:: Selecting lines with `sed' 71 * Regular Expressions:: Overview of regular expression syntax 72 * Common Commands:: Often used commands 73 * The "s" Command:: `sed''s Swiss Army Knife 74 * Other Commands:: Less frequently used commands 75 * Programming Commands:: Commands for `sed' gurus 76 * Extended Commands:: Commands specific of GNU `sed' 77 * Escapes:: Specifying special characters 78 79 Examples: 80 * Centering lines:: 81 * Increment a number:: 82 * Rename files to lower case:: 83 * Print bash environment:: 84 * Reverse chars of lines:: 85 * tac:: Reverse lines of files 86 * cat -n:: Numbering lines 87 * cat -b:: Numbering non-blank lines 88 * wc -c:: Counting chars 89 * wc -w:: Counting words 90 * wc -l:: Counting lines 91 * head:: Printing the first lines 92 * tail:: Printing the last lines 93 * uniq:: Make duplicate lines unique 94 * uniq -d:: Print duplicated lines of input 95 * uniq -u:: Remove all duplicated lines 96 * cat -s:: Squeezing blank lines 97 98 99 File: sed.info, Node: Introduction, Next: Invoking sed, Prev: Top, Up: Top 100 101 1 Introduction 102 ************** 103 104 `sed' is a stream editor. A stream editor is used to perform basic text 105 transformations on an input stream (a file or input from a pipeline). 106 While in some ways similar to an editor which permits scripted edits 107 (such as `ed'), `sed' works by making only one pass over the input(s), 108 and is consequently more efficient. But it is `sed''s ability to 109 filter text in a pipeline which particularly distinguishes it from 110 other types of editors. 111 112 113 File: sed.info, Node: Invoking sed, Next: sed Programs, Prev: Introduction, Up: Top 114 115 2 Invocation 116 ************ 117 118 Normally `sed' is invoked like this: 119 120 sed SCRIPT INPUTFILE... 121 122 The full format for invoking `sed' is: 123 124 sed OPTIONS... [SCRIPT] [INPUTFILE...] 125 126 If you do not specify INPUTFILE, or if INPUTFILE is `-', `sed' 127 filters the contents of the standard input. The SCRIPT is actually the 128 first non-option parameter, which `sed' specially considers a script 129 and not an input file if (and only if) none of the other OPTIONS 130 specifies a script to be executed, that is if neither of the `-e' and 131 `-f' options is specified. 132 133 `sed' may be invoked with the following command-line options: 134 135 `--version' 136 Print out the version of `sed' that is being run and a copyright 137 notice, then exit. 138 139 `--help' 140 Print a usage message briefly summarizing these command-line 141 options and the bug-reporting address, then exit. 142 143 `-n' 144 `--quiet' 145 `--silent' 146 By default, `sed' prints out the pattern space at the end of each 147 cycle through the script (*note How `sed' works: Execution Cycle.). 148 These options disable this automatic printing, and `sed' only 149 produces output when explicitly told to via the `p' command. 150 151 `-e SCRIPT' 152 `--expression=SCRIPT' 153 Add the commands in SCRIPT to the set of commands to be run while 154 processing the input. 155 156 `-f SCRIPT-FILE' 157 `--file=SCRIPT-FILE' 158 Add the commands contained in the file SCRIPT-FILE to the set of 159 commands to be run while processing the input. 160 161 `-i[SUFFIX]' 162 `--in-place[=SUFFIX]' 163 This option specifies that files are to be edited in-place. GNU 164 `sed' does this by creating a temporary file and sending output to 165 this file rather than to the standard output.(1). 166 167 This option implies `-s'. 168 169 When the end of the file is reached, the temporary file is renamed 170 to the output file's original name. The extension, if supplied, 171 is used to modify the name of the old file before renaming the 172 temporary file, thereby making a backup copy(2)). 173 174 This rule is followed: if the extension doesn't contain a `*', 175 then it is appended to the end of the current filename as a 176 suffix; if the extension does contain one or more `*' characters, 177 then _each_ asterisk is replaced with the current filename. This 178 allows you to add a prefix to the backup file, instead of (or in 179 addition to) a suffix, or even to place backup copies of the 180 original files into another directory (provided the directory 181 already exists). 182 183 If no extension is supplied, the original file is overwritten 184 without making a backup. 185 186 `-l N' 187 `--line-length=N' 188 Specify the default line-wrap length for the `l' command. A 189 length of 0 (zero) means to never wrap long lines. If not 190 specified, it is taken to be 70. 191 192 `--posix' 193 GNU `sed' includes several extensions to POSIX sed. In order to 194 simplify writing portable scripts, this option disables all the 195 extensions that this manual documents, including additional 196 commands. Most of the extensions accept `sed' programs that are 197 outside the syntax mandated by POSIX, but some of them (such as 198 the behavior of the `N' command described in *note Reporting 199 Bugs::) actually violate the standard. If you want to disable 200 only the latter kind of extension, you can set the 201 `POSIXLY_CORRECT' variable to a non-empty value. 202 203 `-b' 204 `--binary' 205 This option is available on every platform, but is only effective 206 where the operating system makes a distinction between text files 207 and binary files. When such a distinction is made--as is the case 208 for MS-DOS, Windows, Cygwin--text files are composed of lines 209 separated by a carriage return _and_ a line feed character, and 210 `sed' does not see the ending CR. When this option is specified, 211 `sed' will open input files in binary mode, thus not requesting 212 this special processing and considering lines to end at a line 213 feed. 214 215 `--follow-symlinks' 216 This option is available only on platforms that support symbolic 217 links and has an effect only if option `-i' is specified. In this 218 case, if the file that is specified on the command line is a 219 symbolic link, `sed' will follow the link and edit the ultimate 220 destination of the link. The default behavior is to break the 221 symbolic link, so that the link destination will not be modified. 222 223 `-r' 224 `--regexp-extended' 225 Use extended regular expressions rather than basic regular 226 expressions. Extended regexps are those that `egrep' accepts; 227 they can be clearer because they usually have less backslashes, 228 but are a GNU extension and hence scripts that use them are not 229 portable. *Note Extended regular expressions: Extended regexps. 230 231 `-s' 232 `--separate' 233 By default, `sed' will consider the files specified on the command 234 line as a single continuous long stream. This GNU `sed' extension 235 allows the user to consider them as separate files: range 236 addresses (such as `/abc/,/def/') are not allowed to span several 237 files, line numbers are relative to the start of each file, `$' 238 refers to the last line of each file, and files invoked from the 239 `R' commands are rewound at the start of each file. 240 241 `-u' 242 `--unbuffered' 243 Buffer both input and output as minimally as practical. (This is 244 particularly useful if the input is coming from the likes of `tail 245 -f', and you wish to see the transformed output as soon as 246 possible.) 247 248 249 If no `-e', `-f', `--expression', or `--file' options are given on 250 the command-line, then the first non-option argument on the command 251 line is taken to be the SCRIPT to be executed. 252 253 If any command-line parameters remain after processing the above, 254 these parameters are interpreted as the names of input files to be 255 processed. A file name of `-' refers to the standard input stream. 256 The standard input will be processed if no file names are specified. 257 258 ---------- Footnotes ---------- 259 260 (1) This applies to commands such as `=', `a', `c', `i', `l', `p'. 261 You can still write to the standard output by using the `w' or `W' 262 commands together with the `/dev/stdout' special file 263 264 (2) Note that GNU `sed' creates the backup file whether or not any 265 output is actually changed. 266 267 268 File: sed.info, Node: sed Programs, Next: Examples, Prev: Invoking sed, Up: Top 269 270 3 `sed' Programs 271 **************** 272 273 A `sed' program consists of one or more `sed' commands, passed in by 274 one or more of the `-e', `-f', `--expression', and `--file' options, or 275 the first non-option argument if zero of these options are used. This 276 document will refer to "the" `sed' script; this is understood to mean 277 the in-order catenation of all of the SCRIPTs and SCRIPT-FILEs passed 278 in. 279 280 Each `sed' command consists of an optional address or address range, 281 followed by a one-character command name and any additional 282 command-specific code. 283 284 * Menu: 285 286 * Execution Cycle:: How `sed' works 287 * Addresses:: Selecting lines with `sed' 288 * Regular Expressions:: Overview of regular expression syntax 289 * Common Commands:: Often used commands 290 * The "s" Command:: `sed''s Swiss Army Knife 291 * Other Commands:: Less frequently used commands 292 * Programming Commands:: Commands for `sed' gurus 293 * Extended Commands:: Commands specific of GNU `sed' 294 * Escapes:: Specifying special characters 295 296 297 File: sed.info, Node: Execution Cycle, Next: Addresses, Up: sed Programs 298 299 3.1 How `sed' Works 300 =================== 301 302 `sed' maintains two data buffers: the active _pattern_ space, and the 303 auxiliary _hold_ space. Both are initially empty. 304 305 `sed' operates by performing the following cycle on each lines of 306 input: first, `sed' reads one line from the input stream, removes any 307 trailing newline, and places it in the pattern space. Then commands 308 are executed; each command can have an address associated to it: 309 addresses are a kind of condition code, and a command is only executed 310 if the condition is verified before the command is to be executed. 311 312 When the end of the script is reached, unless the `-n' option is in 313 use, the contents of pattern space are printed out to the output 314 stream, adding back the trailing newline if it was removed.(1) Then the 315 next cycle starts for the next input line. 316 317 Unless special commands (like `D') are used, the pattern space is 318 deleted between two cycles. The hold space, on the other hand, keeps 319 its data between cycles (see commands `h', `H', `x', `g', `G' to move 320 data between both buffers). 321 322 ---------- Footnotes ---------- 323 324 (1) Actually, if `sed' prints a line without the terminating 325 newline, it will nevertheless print the missing newline as soon as more 326 text is sent to the same output stream, which gives the "least expected 327 surprise" even though it does not make commands like `sed -n p' exactly 328 identical to `cat'. 329 330 331 File: sed.info, Node: Addresses, Next: Regular Expressions, Prev: Execution Cycle, Up: sed Programs 332 333 3.2 Selecting lines with `sed' 334 ============================== 335 336 Addresses in a `sed' script can be in any of the following forms: 337 `NUMBER' 338 Specifying a line number will match only that line in the input. 339 (Note that `sed' counts lines continuously across all input files 340 unless `-i' or `-s' options are specified.) 341 342 `FIRST~STEP' 343 This GNU extension matches every STEPth line starting with line 344 FIRST. In particular, lines will be selected when there exists a 345 non-negative N such that the current line-number equals FIRST + (N 346 * STEP). Thus, to select the odd-numbered lines, one would use 347 `1~2'; to pick every third line starting with the second, `2~3' 348 would be used; to pick every fifth line starting with the tenth, 349 use `10~5'; and `50~0' is just an obscure way of saying `50'. 350 351 `$' 352 This address matches the last line of the last file of input, or 353 the last line of each file when the `-i' or `-s' options are 354 specified. 355 356 `/REGEXP/' 357 This will select any line which matches the regular expression 358 REGEXP. If REGEXP itself includes any `/' characters, each must 359 be escaped by a backslash (`\'). 360 361 The empty regular expression `//' repeats the last regular 362 expression match (the same holds if the empty regular expression is 363 passed to the `s' command). Note that modifiers to regular 364 expressions are evaluated when the regular expression is compiled, 365 thus it is invalid to specify them together with the empty regular 366 expression. 367 368 `\%REGEXP%' 369 (The `%' may be replaced by any other single character.) 370 371 This also matches the regular expression REGEXP, but allows one to 372 use a different delimiter than `/'. This is particularly useful 373 if the REGEXP itself contains a lot of slashes, since it avoids 374 the tedious escaping of every `/'. If REGEXP itself includes any 375 delimiter characters, each must be escaped by a backslash (`\'). 376 377 `/REGEXP/I' 378 `\%REGEXP%I' 379 The `I' modifier to regular-expression matching is a GNU extension 380 which causes the REGEXP to be matched in a case-insensitive manner. 381 382 `/REGEXP/M' 383 `\%REGEXP%M' 384 The `M' modifier to regular-expression matching is a GNU `sed' 385 extension which causes `^' and `$' to match respectively (in 386 addition to the normal behavior) the empty string after a newline, 387 and the empty string before a newline. There are special character 388 sequences (`\`' and `\'') which always match the beginning or the 389 end of the buffer. `M' stands for `multi-line'. 390 391 392 If no addresses are given, then all lines are matched; if one 393 address is given, then only lines matching that address are matched. 394 395 An address range can be specified by specifying two addresses 396 separated by a comma (`,'). An address range matches lines starting 397 from where the first address matches, and continues until the second 398 address matches (inclusively). 399 400 If the second address is a REGEXP, then checking for the ending 401 match will start with the line _following_ the line which matched the 402 first address: a range will always span at least two lines (except of 403 course if the input stream ends). 404 405 If the second address is a NUMBER less than (or equal to) the line 406 matching the first address, then only the one line is matched. 407 408 GNU `sed' also supports some special two-address forms; all these 409 are GNU extensions: 410 `0,/REGEXP/' 411 A line number of `0' can be used in an address specification like 412 `0,/REGEXP/' so that `sed' will try to match REGEXP in the first 413 input line too. In other words, `0,/REGEXP/' is similar to 414 `1,/REGEXP/', except that if ADDR2 matches the very first line of 415 input the `0,/REGEXP/' form will consider it to end the range, 416 whereas the `1,/REGEXP/' form will match the beginning of its 417 range and hence make the range span up to the _second_ occurrence 418 of the regular expression. 419 420 Note that this is the only place where the `0' address makes 421 sense; there is no 0-th line and commands which are given the `0' 422 address in any other way will give an error. 423 424 `ADDR1,+N' 425 Matches ADDR1 and the N lines following ADDR1. 426 427 `ADDR1,~N' 428 Matches ADDR1 and the lines following ADDR1 until the next line 429 whose input line number is a multiple of N. 430 431 Appending the `!' character to the end of an address specification 432 negates the sense of the match. That is, if the `!' character follows 433 an address range, then only lines which do _not_ match the address range 434 will be selected. This also works for singleton addresses, and, 435 perhaps perversely, for the null address. 436 437 438 File: sed.info, Node: Regular Expressions, Next: Common Commands, Prev: Addresses, Up: sed Programs 439 440 3.3 Overview of Regular Expression Syntax 441 ========================================= 442 443 To know how to use `sed', people should understand regular expressions 444 ("regexp" for short). A regular expression is a pattern that is 445 matched against a subject string from left to right. Most characters 446 are "ordinary": they stand for themselves in a pattern, and match the 447 corresponding characters in the subject. As a trivial example, the 448 pattern 449 450 The quick brown fox 451 452 matches a portion of a subject string that is identical to itself. The 453 power of regular expressions comes from the ability to include 454 alternatives and repetitions in the pattern. These are encoded in the 455 pattern by the use of "special characters", which do not stand for 456 themselves but instead are interpreted in some special way. Here is a 457 brief description of regular expression syntax as used in `sed'. 458 459 `CHAR' 460 A single ordinary character matches itself. 461 462 `*' 463 Matches a sequence of zero or more instances of matches for the 464 preceding regular expression, which must be an ordinary character, 465 a special character preceded by `\', a `.', a grouped regexp (see 466 below), or a bracket expression. As a GNU extension, a postfixed 467 regular expression can also be followed by `*'; for example, `a**' 468 is equivalent to `a*'. POSIX 1003.1-2001 says that `*' stands for 469 itself when it appears at the start of a regular expression or 470 subexpression, but many nonGNU implementations do not support this 471 and portable scripts should instead use `\*' in these contexts. 472 473 `\+' 474 As `*', but matches one or more. It is a GNU extension. 475 476 `\?' 477 As `*', but only matches zero or one. It is a GNU extension. 478 479 `\{I\}' 480 As `*', but matches exactly I sequences (I is a decimal integer; 481 for portability, keep it between 0 and 255 inclusive). 482 483 `\{I,J\}' 484 Matches between I and J, inclusive, sequences. 485 486 `\{I,\}' 487 Matches more than or equal to I sequences. 488 489 `\(REGEXP\)' 490 Groups the inner REGEXP as a whole, this is used to: 491 492 * Apply postfix operators, like `\(abcd\)*': this will search 493 for zero or more whole sequences of `abcd', while `abcd*' 494 would search for `abc' followed by zero or more occurrences 495 of `d'. Note that support for `\(abcd\)*' is required by 496 POSIX 1003.1-2001, but many non-GNU implementations do not 497 support it and hence it is not universally portable. 498 499 * Use back references (see below). 500 501 `.' 502 Matches any character, including newline. 503 504 `^' 505 Matches the null string at beginning of the pattern space, i.e. 506 what appears after the circumflex must appear at the beginning of 507 the pattern space. 508 509 In most scripts, pattern space is initialized to the content of 510 each line (*note How `sed' works: Execution Cycle.). So, it is a 511 useful simplification to think of `^#include' as matching only 512 lines where `#include' is the first thing on line--if there are 513 spaces before, for example, the match fails. This simplification 514 is valid as long as the original content of pattern space is not 515 modified, for example with an `s' command. 516 517 `^' acts as a special character only at the beginning of the 518 regular expression or subexpression (that is, after `\(' or `\|'). 519 Portable scripts should avoid `^' at the beginning of a 520 subexpression, though, as POSIX allows implementations that treat 521 `^' as an ordinary character in that context. 522 523 `$' 524 It is the same as `^', but refers to end of pattern space. `$' 525 also acts as a special character only at the end of the regular 526 expression or subexpression (that is, before `\)' or `\|'), and 527 its use at the end of a subexpression is not portable. 528 529 `[LIST]' 530 `[^LIST]' 531 Matches any single character in LIST: for example, `[aeiou]' 532 matches all vowels. A list may include sequences like 533 `CHAR1-CHAR2', which matches any character between (inclusive) 534 CHAR1 and CHAR2. 535 536 A leading `^' reverses the meaning of LIST, so that it matches any 537 single character _not_ in LIST. To include `]' in the list, make 538 it the first character (after the `^' if needed), to include `-' 539 in the list, make it the first or last; to include `^' put it 540 after the first character. 541 542 The characters `$', `*', `.', `[', and `\' are normally not 543 special within LIST. For example, `[\*]' matches either `\' or 544 `*', because the `\' is not special here. However, strings like 545 `[.ch.]', `[=a=]', and `[:space:]' are special within LIST and 546 represent collating symbols, equivalence classes, and character 547 classes, respectively, and `[' is therefore special within LIST 548 when it is followed by `.', `=', or `:'. Also, when not in 549 `POSIXLY_CORRECT' mode, special escapes like `\n' and `\t' are 550 recognized within LIST. *Note Escapes::. 551 552 `REGEXP1\|REGEXP2' 553 Matches either REGEXP1 or REGEXP2. Use parentheses to use complex 554 alternative regular expressions. The matching process tries each 555 alternative in turn, from left to right, and the first one that 556 succeeds is used. It is a GNU extension. 557 558 `REGEXP1REGEXP2' 559 Matches the concatenation of REGEXP1 and REGEXP2. Concatenation 560 binds more tightly than `\|', `^', and `$', but less tightly than 561 the other regular expression operators. 562 563 `\DIGIT' 564 Matches the DIGIT-th `\(...\)' parenthesized subexpression in the 565 regular expression. This is called a "back reference". 566 Subexpressions are implicity numbered by counting occurrences of 567 `\(' left-to-right. 568 569 `\n' 570 Matches the newline character. 571 572 `\CHAR' 573 Matches CHAR, where CHAR is one of `$', `*', `.', `[', `\', or `^'. 574 Note that the only C-like backslash sequences that you can 575 portably assume to be interpreted are `\n' and `\\'; in particular 576 `\t' is not portable, and matches a `t' under most implementations 577 of `sed', rather than a tab character. 578 579 580 Note that the regular expression matcher is greedy, i.e., matches 581 are attempted from left to right and, if two or more matches are 582 possible starting at the same character, it selects the longest. 583 584 Examples: 585 `abcdef' 586 Matches `abcdef'. 587 588 `a*b' 589 Matches zero or more `a's followed by a single `b'. For example, 590 `b' or `aaaaab'. 591 592 `a\?b' 593 Matches `b' or `ab'. 594 595 `a\+b\+' 596 Matches one or more `a's followed by one or more `b's: `ab' is the 597 shortest possible match, but other examples are `aaaab' or 598 `abbbbb' or `aaaaaabbbbbbb'. 599 600 `.*' 601 `.\+' 602 These two both match all the characters in a string; however, the 603 first matches every string (including the empty string), while the 604 second matches only strings containing at least one character. 605 606 `^main.*(.*)' 607 his matches a string starting with `main', followed by an opening 608 and closing parenthesis. The `n', `(' and `)' need not be 609 adjacent. 610 611 `^#' 612 This matches a string beginning with `#'. 613 614 `\\$' 615 This matches a string ending with a single backslash. The regexp 616 contains two backslashes for escaping. 617 618 `\$' 619 Instead, this matches a string consisting of a single dollar sign, 620 because it is escaped. 621 622 `[a-zA-Z0-9]' 623 In the C locale, this matches any ASCII letters or digits. 624 625 `[^ tab]\+' 626 (Here `tab' stands for a single tab character.) This matches a 627 string of one or more characters, none of which is a space or a 628 tab. Usually this means a word. 629 630 `^\(.*\)\n\1$' 631 This matches a string consisting of two equal substrings separated 632 by a newline. 633 634 `.\{9\}A$' 635 This matches nine characters followed by an `A'. 636 637 `^.\{15\}A' 638 This matches the start of a string that contains 16 characters, 639 the last of which is an `A'. 640 641 642 643 File: sed.info, Node: Common Commands, Next: The "s" Command, Prev: Regular Expressions, Up: sed Programs 644 645 3.4 Often-Used Commands 646 ======================= 647 648 If you use `sed' at all, you will quite likely want to know these 649 commands. 650 651 `#' 652 [No addresses allowed.] 653 654 The `#' character begins a comment; the comment continues until 655 the next newline. 656 657 If you are concerned about portability, be aware that some 658 implementations of `sed' (which are not POSIX conformant) may only 659 support a single one-line comment, and then only when the very 660 first character of the script is a `#'. 661 662 Warning: if the first two characters of the `sed' script are `#n', 663 then the `-n' (no-autoprint) option is forced. If you want to put 664 a comment in the first line of your script and that comment begins 665 with the letter `n' and you do not want this behavior, then be 666 sure to either use a capital `N', or place at least one space 667 before the `n'. 668 669 `q [EXIT-CODE]' 670 This command only accepts a single address. 671 672 Exit `sed' without processing any more commands or input. Note 673 that the current pattern space is printed if auto-print is not 674 disabled with the `-n' options. The ability to return an exit 675 code from the `sed' script is a GNU `sed' extension. 676 677 `d' 678 Delete the pattern space; immediately start next cycle. 679 680 `p' 681 Print out the pattern space (to the standard output). This 682 command is usually only used in conjunction with the `-n' 683 command-line option. 684 685 `n' 686 If auto-print is not disabled, print the pattern space, then, 687 regardless, replace the pattern space with the next line of input. 688 If there is no more input then `sed' exits without processing any 689 more commands. 690 691 `{ COMMANDS }' 692 A group of commands may be enclosed between `{' and `}' characters. 693 This is particularly useful when you want a group of commands to 694 be triggered by a single address (or address-range) match. 695 696 697 698 File: sed.info, Node: The "s" Command, Next: Other Commands, Prev: Common Commands, Up: sed Programs 699 700 3.5 The `s' Command 701 =================== 702 703 The syntax of the `s' (as in substitute) command is 704 `s/REGEXP/REPLACEMENT/FLAGS'. The `/' characters may be uniformly 705 replaced by any other single character within any given `s' command. 706 The `/' character (or whatever other character is used in its stead) 707 can appear in the REGEXP or REPLACEMENT only if it is preceded by a `\' 708 character. 709 710 The `s' command is probably the most important in `sed' and has a 711 lot of different options. Its basic concept is simple: the `s' command 712 attempts to match the pattern space against the supplied REGEXP; if the 713 match is successful, then that portion of the pattern space which was 714 matched is replaced with REPLACEMENT. 715 716 The REPLACEMENT can contain `\N' (N being a number from 1 to 9, 717 inclusive) references, which refer to the portion of the match which is 718 contained between the Nth `\(' and its matching `\)'. Also, the 719 REPLACEMENT can contain unescaped `&' characters which reference the 720 whole matched portion of the pattern space. Finally, as a GNU `sed' 721 extension, you can include a special sequence made of a backslash and 722 one of the letters `L', `l', `U', `u', or `E'. The meaning is as 723 follows: 724 725 `\L' 726 Turn the replacement to lowercase until a `\U' or `\E' is found, 727 728 `\l' 729 Turn the next character to lowercase, 730 731 `\U' 732 Turn the replacement to uppercase until a `\L' or `\E' is found, 733 734 `\u' 735 Turn the next character to uppercase, 736 737 `\E' 738 Stop case conversion started by `\L' or `\U'. 739 740 To include a literal `\', `&', or newline in the final replacement, 741 be sure to precede the desired `\', `&', or newline in the REPLACEMENT 742 with a `\'. 743 744 The `s' command can be followed by zero or more of the following 745 FLAGS: 746 747 `g' 748 Apply the replacement to _all_ matches to the REGEXP, not just the 749 first. 750 751 `NUMBER' 752 Only replace the NUMBERth match of the REGEXP. 753 754 Note: the POSIX standard does not specify what should happen when 755 you mix the `g' and NUMBER modifiers, and currently there is no 756 widely agreed upon meaning across `sed' implementations. For GNU 757 `sed', the interaction is defined to be: ignore matches before the 758 NUMBERth, and then match and replace all matches from the NUMBERth 759 on. 760 761 `p' 762 If the substitution was made, then print the new pattern space. 763 764 Note: when both the `p' and `e' options are specified, the 765 relative ordering of the two produces very different results. In 766 general, `ep' (evaluate then print) is what you want, but 767 operating the other way round can be useful for debugging. For 768 this reason, the current version of GNU `sed' interprets specially 769 the presence of `p' options both before and after `e', printing 770 the pattern space before and after evaluation, while in general 771 flags for the `s' command show their effect just once. This 772 behavior, although documented, might change in future versions. 773 774 `w FILE-NAME' 775 If the substitution was made, then write out the result to the 776 named file. As a GNU `sed' extension, two special values of 777 FILE-NAME are supported: `/dev/stderr', which writes the result to 778 the standard error, and `/dev/stdout', which writes to the standard 779 output.(1) 780 781 `e' 782 This command allows one to pipe input from a shell command into 783 pattern space. If a substitution was made, the command that is 784 found in pattern space is executed and pattern space is replaced 785 with its output. A trailing newline is suppressed; results are 786 undefined if the command to be executed contains a NUL character. 787 This is a GNU `sed' extension. 788 789 `I' 790 `i' 791 The `I' modifier to regular-expression matching is a GNU extension 792 which makes `sed' match REGEXP in a case-insensitive manner. 793 794 `M' 795 `m' 796 The `M' modifier to regular-expression matching is a GNU `sed' 797 extension which causes `^' and `$' to match respectively (in 798 addition to the normal behavior) the empty string after a newline, 799 and the empty string before a newline. There are special character 800 sequences (`\`' and `\'') which always match the beginning or the 801 end of the buffer. `M' stands for `multi-line'. 802 803 804 ---------- Footnotes ---------- 805 806 (1) This is equivalent to `p' unless the `-i' option is being used. 807 808 809 File: sed.info, Node: Other Commands, Next: Programming Commands, Prev: The "s" Command, Up: sed Programs 810 811 3.6 Less Frequently-Used Commands 812 ================================= 813 814 Though perhaps less frequently used than those in the previous section, 815 some very small yet useful `sed' scripts can be built with these 816 commands. 817 818 `y/SOURCE-CHARS/DEST-CHARS/' 819 (The `/' characters may be uniformly replaced by any other single 820 character within any given `y' command.) 821 822 Transliterate any characters in the pattern space which match any 823 of the SOURCE-CHARS with the corresponding character in DEST-CHARS. 824 825 Instances of the `/' (or whatever other character is used in its 826 stead), `\', or newlines can appear in the SOURCE-CHARS or 827 DEST-CHARS lists, provide that each instance is escaped by a `\'. 828 The SOURCE-CHARS and DEST-CHARS lists _must_ contain the same 829 number of characters (after de-escaping). 830 831 `a\' 832 `TEXT' 833 As a GNU extension, this command accepts two addresses. 834 835 Queue the lines of text which follow this command (each but the 836 last ending with a `\', which are removed from the output) to be 837 output at the end of the current cycle, or when the next input 838 line is read. 839 840 Escape sequences in TEXT are processed, so you should use `\\' in 841 TEXT to print a single backslash. 842 843 As a GNU extension, if between the `a' and the newline there is 844 other than a whitespace-`\' sequence, then the text of this line, 845 starting at the first non-whitespace character after the `a', is 846 taken as the first line of the TEXT block. (This enables a 847 simplification in scripting a one-line add.) This extension also 848 works with the `i' and `c' commands. 849 850 `i\' 851 `TEXT' 852 As a GNU extension, this command accepts two addresses. 853 854 Immediately output the lines of text which follow this command 855 (each but the last ending with a `\', which are removed from the 856 output). 857 858 `c\' 859 `TEXT' 860 Delete the lines matching the address or address-range, and output 861 the lines of text which follow this command (each but the last 862 ending with a `\', which are removed from the output) in place of 863 the last line (or in place of each line, if no addresses were 864 specified). A new cycle is started after this command is done, 865 since the pattern space will have been deleted. 866 867 `=' 868 As a GNU extension, this command accepts two addresses. 869 870 Print out the current input line number (with a trailing newline). 871 872 `l N' 873 Print the pattern space in an unambiguous form: non-printable 874 characters (and the `\' character) are printed in C-style escaped 875 form; long lines are split, with a trailing `\' character to 876 indicate the split; the end of each line is marked with a `$'. 877 878 N specifies the desired line-wrap length; a length of 0 (zero) 879 means to never wrap long lines. If omitted, the default as 880 specified on the command line is used. The N parameter is a GNU 881 `sed' extension. 882 883 `r FILENAME' 884 As a GNU extension, this command accepts two addresses. 885 886 Queue the contents of FILENAME to be read and inserted into the 887 output stream at the end of the current cycle, or when the next 888 input line is read. Note that if FILENAME cannot be read, it is 889 treated as if it were an empty file, without any error indication. 890 891 As a GNU `sed' extension, the special value `/dev/stdin' is 892 supported for the file name, which reads the contents of the 893 standard input. 894 895 `w FILENAME' 896 Write the pattern space to FILENAME. As a GNU `sed' extension, 897 two special values of FILE-NAME are supported: `/dev/stderr', 898 which writes the result to the standard error, and `/dev/stdout', 899 which writes to the standard output.(1) 900 901 The file will be created (or truncated) before the first input 902 line is read; all `w' commands (including instances of `w' flag on 903 successful `s' commands) which refer to the same FILENAME are 904 output without closing and reopening the file. 905 906 `D' 907 Delete text in the pattern space up to the first newline. If any 908 text is left, restart cycle with the resultant pattern space 909 (without reading a new line of input), otherwise start a normal 910 new cycle. 911 912 `N' 913 Add a newline to the pattern space, then append the next line of 914 input to the pattern space. If there is no more input then `sed' 915 exits without processing any more commands. 916 917 `P' 918 Print out the portion of the pattern space up to the first newline. 919 920 `h' 921 Replace the contents of the hold space with the contents of the 922 pattern space. 923 924 `H' 925 Append a newline to the contents of the hold space, and then 926 append the contents of the pattern space to that of the hold space. 927 928 `g' 929 Replace the contents of the pattern space with the contents of the 930 hold space. 931 932 `G' 933 Append a newline to the contents of the pattern space, and then 934 append the contents of the hold space to that of the pattern space. 935 936 `x' 937 Exchange the contents of the hold and pattern spaces. 938 939 940 ---------- Footnotes ---------- 941 942 (1) This is equivalent to `p' unless the `-i' option is being used. 943 944 945 File: sed.info, Node: Programming Commands, Next: Extended Commands, Prev: Other Commands, Up: sed Programs 946 947 3.7 Commands for `sed' gurus 948 ============================ 949 950 In most cases, use of these commands indicates that you are probably 951 better off programming in something like `awk' or Perl. But 952 occasionally one is committed to sticking with `sed', and these 953 commands can enable one to write quite convoluted scripts. 954 955 `: LABEL' 956 [No addresses allowed.] 957 958 Specify the location of LABEL for branch commands. In all other 959 respects, a no-op. 960 961 `b LABEL' 962 Unconditionally branch to LABEL. The LABEL may be omitted, in 963 which case the next cycle is started. 964 965 `t LABEL' 966 Branch to LABEL only if there has been a successful `s'ubstitution 967 since the last input line was read or conditional branch was taken. 968 The LABEL may be omitted, in which case the next cycle is started. 969 970 971 972 File: sed.info, Node: Extended Commands, Next: Escapes, Prev: Programming Commands, Up: sed Programs 973 974 3.8 Commands Specific to GNU `sed' 975 ================================== 976 977 These commands are specific to GNU `sed', so you must use them with 978 care and only when you are sure that hindering portability is not evil. 979 They allow you to check for GNU `sed' extensions or to do tasks that 980 are required quite often, yet are unsupported by standard `sed's. 981 982 `e [COMMAND]' 983 This command allows one to pipe input from a shell command into 984 pattern space. Without parameters, the `e' command executes the 985 command that is found in pattern space and replaces the pattern 986 space with the output; a trailing newline is suppressed. 987 988 If a parameter is specified, instead, the `e' command interprets 989 it as a command and sends its output to the output stream (like 990 `r' does). The command can run across multiple lines, all but the 991 last ending with a back-slash. 992 993 In both cases, the results are undefined if the command to be 994 executed contains a NUL character. 995 996 `L N' 997 This GNU `sed' extension fills and joins lines in pattern space to 998 produce output lines of (at most) N characters, like `fmt' does; 999 if N is omitted, the default as specified on the command line is 1000 used. This command is considered a failed experiment and unless 1001 there is enough request (which seems unlikely) will be removed in 1002 future versions. 1003 1004 `Q [EXIT-CODE]' 1005 This command only accepts a single address. 1006 1007 This command is the same as `q', but will not print the contents 1008 of pattern space. Like `q', it provides the ability to return an 1009 exit code to the caller. 1010 1011 This command can be useful because the only alternative ways to 1012 accomplish this apparently trivial function are to use the `-n' 1013 option (which can unnecessarily complicate your script) or 1014 resorting to the following snippet, which wastes time by reading 1015 the whole file without any visible effect: 1016 1017 :eat 1018 $d Quit silently on the last line 1019 N Read another line, silently 1020 g Overwrite pattern space each time to save memory 1021 b eat 1022 1023 `R FILENAME' 1024 Queue a line of FILENAME to be read and inserted into the output 1025 stream at the end of the current cycle, or when the next input 1026 line is read. Note that if FILENAME cannot be read, or if its end 1027 is reached, no line is appended, without any error indication. 1028 1029 As with the `r' command, the special value `/dev/stdin' is 1030 supported for the file name, which reads a line from the standard 1031 input. 1032 1033 `T LABEL' 1034 Branch to LABEL only if there have been no successful 1035 `s'ubstitutions since the last input line was read or conditional 1036 branch was taken. The LABEL may be omitted, in which case the next 1037 cycle is started. 1038 1039 `v VERSION' 1040 This command does nothing, but makes `sed' fail if GNU `sed' 1041 extensions are not supported, simply because other versions of 1042 `sed' do not implement it. In addition, you can specify the 1043 version of `sed' that your script requires, such as `4.0.5'. The 1044 default is `4.0' because that is the first version that 1045 implemented this command. 1046 1047 This command enables all GNU extensions even if `POSIXLY_CORRECT' 1048 is set in the environment. 1049 1050 `W FILENAME' 1051 Write to the given filename the portion of the pattern space up to 1052 the first newline. Everything said under the `w' command about 1053 file handling holds here too. 1054 1055 `z' 1056 This command empties the content of pattern space. It is usually 1057 the same as `s/.*//', but is more efficient and works in the 1058 presence of invalid multibyte sequences in the input stream. 1059 POSIX mandates that such sequences are _not_ matched by `.', so 1060 that there is no portable way to clear `sed''s buffers in the 1061 middle of the script in most multibyte locales (including UTF-8 1062 locales). 1063 1064 1065 File: sed.info, Node: Escapes, Prev: Extended Commands, Up: sed Programs 1066 1067 3.9 GNU Extensions for Escapes in Regular Expressions 1068 ===================================================== 1069 1070 Until this chapter, we have only encountered escapes of the form `\^', 1071 which tell `sed' not to interpret the circumflex as a special 1072 character, but rather to take it literally. For example, `\*' matches 1073 a single asterisk rather than zero or more backslashes. 1074 1075 This chapter introduces another kind of escape(1)--that is, escapes 1076 that are applied to a character or sequence of characters that 1077 ordinarily are taken literally, and that `sed' replaces with a special 1078 character. This provides a way of encoding non-printable characters in 1079 patterns in a visible manner. There is no restriction on the 1080 appearance of non-printing characters in a `sed' script but when a 1081 script is being prepared in the shell or by text editing, it is usually 1082 easier to use one of the following escape sequences than the binary 1083 character it represents: 1084 1085 The list of these escapes is: 1086 1087 `\a' 1088 Produces or matches a BEL character, that is an "alert" (ASCII 7). 1089 1090 `\f' 1091 Produces or matches a form feed (ASCII 12). 1092 1093 `\n' 1094 Produces or matches a newline (ASCII 10). 1095 1096 `\r' 1097 Produces or matches a carriage return (ASCII 13). 1098 1099 `\t' 1100 Produces or matches a horizontal tab (ASCII 9). 1101 1102 `\v' 1103 Produces or matches a so called "vertical tab" (ASCII 11). 1104 1105 `\cX' 1106 Produces or matches `CONTROL-X', where X is any character. The 1107 precise effect of `\cX' is as follows: if X is a lower case 1108 letter, it is converted to upper case. Then bit 6 of the 1109 character (hex 40) is inverted. Thus `\cz' becomes hex 1A, but 1110 `\c{' becomes hex 3B, while `\c;' becomes hex 7B. 1111 1112 `\dXXX' 1113 Produces or matches a character whose decimal ASCII value is XXX. 1114 1115 `\oXXX' 1116 Produces or matches a character whose octal ASCII value is XXX. 1117 1118 `\xXX' 1119 Produces or matches a character whose hexadecimal ASCII value is 1120 XX. 1121 1122 `\b' (backspace) was omitted because of the conflict with the 1123 existing "word boundary" meaning. 1124 1125 Other escapes match a particular character class and are valid only 1126 in regular expressions: 1127 1128 `\w' 1129 Matches any "word" character. A "word" character is any letter or 1130 digit or the underscore character. 1131 1132 `\W' 1133 Matches any "non-word" character. 1134 1135 `\b' 1136 Matches a word boundary; that is it matches if the character to 1137 the left is a "word" character and the character to the right is a 1138 "non-word" character, or vice-versa. 1139 1140 `\B' 1141 Matches everywhere but on a word boundary; that is it matches if 1142 the character to the left and the character to the right are 1143 either both "word" characters or both "non-word" characters. 1144 1145 `\`' 1146 Matches only at the start of pattern space. This is different 1147 from `^' in multi-line mode. 1148 1149 `\'' 1150 Matches only at the end of pattern space. This is different from 1151 `$' in multi-line mode. 1152 1153 1154 ---------- Footnotes ---------- 1155 1156 (1) All the escapes introduced here are GNU extensions, with the 1157 exception of `\n'. In basic regular expression mode, setting 1158 `POSIXLY_CORRECT' disables them inside bracket expressions. 1159 1160 1161 File: sed.info, Node: Examples, Next: Limitations, Prev: sed Programs, Up: Top 1162 1163 4 Some Sample Scripts 1164 ********************* 1165 1166 Here are some `sed' scripts to guide you in the art of mastering `sed'. 1167 1168 * Menu: 1169 1170 Some exotic examples: 1171 * Centering lines:: 1172 * Increment a number:: 1173 * Rename files to lower case:: 1174 * Print bash environment:: 1175 * Reverse chars of lines:: 1176 1177 Emulating standard utilities: 1178 * tac:: Reverse lines of files 1179 * cat -n:: Numbering lines 1180 * cat -b:: Numbering non-blank lines 1181 * wc -c:: Counting chars 1182 * wc -w:: Counting words 1183 * wc -l:: Counting lines 1184 * head:: Printing the first lines 1185 * tail:: Printing the last lines 1186 * uniq:: Make duplicate lines unique 1187 * uniq -d:: Print duplicated lines of input 1188 * uniq -u:: Remove all duplicated lines 1189 * cat -s:: Squeezing blank lines 1190 1191 1192 File: sed.info, Node: Centering lines, Next: Increment a number, Up: Examples 1193 1194 4.1 Centering Lines 1195 =================== 1196 1197 This script centers all lines of a file on a 80 columns width. To 1198 change that width, the number in `\{...\}' must be replaced, and the 1199 number of added spaces also must be changed. 1200 1201 Note how the buffer commands are used to separate parts in the 1202 regular expressions to be matched--this is a common technique. 1203 1204 #!/usr/bin/sed -f 1205 1206 # Put 80 spaces in the buffer 1207 1 { 1208 x 1209 s/^$/ / 1210 s/^.*$/&&&&&&&&/ 1211 x 1212 } 1213 1214 # del leading and trailing spaces 1215 y/tab/ / 1216 s/^ *// 1217 s/ *$// 1218 1219 # add a newline and 80 spaces to end of line 1220 G 1221 1222 # keep first 81 chars (80 + a newline) 1223 s/^\(.\{81\}\).*$/\1/ 1224 1225 # \2 matches half of the spaces, which are moved to the beginning 1226 s/^\(.*\)\n\(.*\)\2/\2\1/ 1227 1228 1229 File: sed.info, Node: Increment a number, Next: Rename files to lower case, Prev: Centering lines, Up: Examples 1230 1231 4.2 Increment a Number 1232 ====================== 1233 1234 This script is one of a few that demonstrate how to do arithmetic in 1235 `sed'. This is indeed possible,(1) but must be done manually. 1236 1237 To increment one number you just add 1 to last digit, replacing it 1238 by the following digit. There is one exception: when the digit is a 1239 nine the previous digits must be also incremented until you don't have 1240 a nine. 1241 1242 This solution by Bruno Haible is very clever and smart because it 1243 uses a single buffer; if you don't have this limitation, the algorithm 1244 used in *note Numbering lines: cat -n, is faster. It works by 1245 replacing trailing nines with an underscore, then using multiple `s' 1246 commands to increment the last digit, and then again substituting 1247 underscores with zeros. 1248 1249 #!/usr/bin/sed -f 1250 1251 /[^0-9]/ d 1252 1253 # replace all leading 9s by _ (any other character except digits, could 1254 # be used) 1255 :d 1256 s/9\(_*\)$/_\1/ 1257 td 1258 1259 # incr last digit only. The first line adds a most-significant 1260 # digit of 1 if we have to add a digit. 1261 # 1262 # The `tn' commands are not necessary, but make the thing 1263 # faster 1264 1265 s/^\(_*\)$/1\1/; tn 1266 s/8\(_*\)$/9\1/; tn 1267 s/7\(_*\)$/8\1/; tn 1268 s/6\(_*\)$/7\1/; tn 1269 s/5\(_*\)$/6\1/; tn 1270 s/4\(_*\)$/5\1/; tn 1271 s/3\(_*\)$/4\1/; tn 1272 s/2\(_*\)$/3\1/; tn 1273 s/1\(_*\)$/2\1/; tn 1274 s/0\(_*\)$/1\1/; tn 1275 1276 :n 1277 y/_/0/ 1278 1279 ---------- Footnotes ---------- 1280 1281 (1) `sed' guru Greg Ubben wrote an implementation of the `dc' RPN 1282 calculator! It is distributed together with sed. 1283 1284 1285 File: sed.info, Node: Rename files to lower case, Next: Print bash environment, Prev: Increment a number, Up: Examples 1286 1287 4.3 Rename Files to Lower Case 1288 ============================== 1289 1290 This is a pretty strange use of `sed'. We transform text, and 1291 transform it to be shell commands, then just feed them to shell. Don't 1292 worry, even worse hacks are done when using `sed'; I have seen a script 1293 converting the output of `date' into a `bc' program! 1294 1295 The main body of this is the `sed' script, which remaps the name 1296 from lower to upper (or vice-versa) and even checks out if the remapped 1297 name is the same as the original name. Note how the script is 1298 parameterized using shell variables and proper quoting. 1299 1300 #! /bin/sh 1301 # rename files to lower/upper case... 1302 # 1303 # usage: 1304 # move-to-lower * 1305 # move-to-upper * 1306 # or 1307 # move-to-lower -R . 1308 # move-to-upper -R . 1309 # 1310 1311 help() 1312 { 1313 cat << eof 1314 Usage: $0 [-n] [-r] [-h] files... 1315 1316 -n do nothing, only see what would be done 1317 -R recursive (use find) 1318 -h this message 1319 files files to remap to lower case 1320 1321 Examples: 1322 $0 -n * (see if everything is ok, then...) 1323 $0 * 1324 1325 $0 -R . 1326 1327 eof 1328 } 1329 1330 apply_cmd='sh' 1331 finder='echo "$@" | tr " " "\n"' 1332 files_only= 1333 1334 while : 1335 do 1336 case "$1" in 1337 -n) apply_cmd='cat' ;; 1338 -R) finder='find "$@" -type f';; 1339 -h) help ; exit 1 ;; 1340 *) break ;; 1341 esac 1342 shift 1343 done 1344 1345 if [ -z "$1" ]; then 1346 echo Usage: $0 [-h] [-n] [-r] files... 1347 exit 1 1348 fi 1349 1350 LOWER='abcdefghijklmnopqrstuvwxyz' 1351 UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ' 1352 1353 case `basename $0` in 1354 *upper*) TO=$UPPER; FROM=$LOWER ;; 1355 *) FROM=$UPPER; TO=$LOWER ;; 1356 esac 1357 1358 eval $finder | sed -n ' 1359 1360 # remove all trailing slashes 1361 s/\/*$// 1362 1363 # add ./ if there is no path, only a filename 1364 /\//! s/^/.\// 1365 1366 # save path+filename 1367 h 1368 1369 # remove path 1370 s/.*\/// 1371 1372 # do conversion only on filename 1373 y/'$FROM'/'$TO'/ 1374 1375 # now line contains original path+file, while 1376 # hold space contains the new filename 1377 x 1378 1379 # add converted file name to line, which now contains 1380 # path/file-name\nconverted-file-name 1381 G 1382 1383 # check if converted file name is equal to original file name, 1384 # if it is, do not print nothing 1385 /^.*\/\(.*\)\n\1/b 1386 1387 # now, transform path/fromfile\n, into 1388 # mv path/fromfile path/tofile and print it 1389 s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p 1390 1391 ' | $apply_cmd 1392 1393 1394 File: sed.info, Node: Print bash environment, Next: Reverse chars of lines, Prev: Rename files to lower case, Up: Examples 1395 1396 4.4 Print `bash' Environment 1397 ============================ 1398 1399 This script strips the definition of the shell functions from the 1400 output of the `set' Bourne-shell command. 1401 1402 #!/bin/sh 1403 1404 set | sed -n ' 1405 :x 1406 1407 # if no occurrence of "=()" print and load next line 1408 /=()/! { p; b; } 1409 / () $/! { p; b; } 1410 1411 # possible start of functions section 1412 # save the line in case this is a var like FOO="() " 1413 h 1414 1415 # if the next line has a brace, we quit because 1416 # nothing comes after functions 1417 n 1418 /^{/ q 1419 1420 # print the old line 1421 x; p 1422 1423 # work on the new line now 1424 x; bx 1425 ' 1426 1427 1428 File: sed.info, Node: Reverse chars of lines, Next: tac, Prev: Print bash environment, Up: Examples 1429 1430 4.5 Reverse Characters of Lines 1431 =============================== 1432 1433 This script can be used to reverse the position of characters in lines. 1434 The technique moves two characters at a time, hence it is faster than 1435 more intuitive implementations. 1436 1437 Note the `tx' command before the definition of the label. This is 1438 often needed to reset the flag that is tested by the `t' command. 1439 1440 Imaginative readers will find uses for this script. An example is 1441 reversing the output of `banner'.(1) 1442 1443 #!/usr/bin/sed -f 1444 1445 /../! b 1446 1447 # Reverse a line. Begin embedding the line between two newlines 1448 s/^.*$/\ 1449 &\ 1450 / 1451 1452 # Move first character at the end. The regexp matches until 1453 # there are zero or one characters between the markers 1454 tx 1455 :x 1456 s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/ 1457 tx 1458 1459 # Remove the newline markers 1460 s/\n//g 1461 1462 ---------- Footnotes ---------- 1463 1464 (1) This requires another script to pad the output of banner; for 1465 example 1466 1467 #! /bin/sh 1468 1469 banner -w $1 $2 $3 $4 | 1470 sed -e :a -e '/^.\{0,'$1'\}$/ { s/$/ /; ba; }' | 1471 ~/sedscripts/reverseline.sed 1472 1473 1474 File: sed.info, Node: tac, Next: cat -n, Prev: Reverse chars of lines, Up: Examples 1475 1476 4.6 Reverse Lines of Files 1477 ========================== 1478 1479 This one begins a series of totally useless (yet interesting) scripts 1480 emulating various Unix commands. This, in particular, is a `tac' 1481 workalike. 1482 1483 Note that on implementations other than GNU `sed' this script might 1484 easily overflow internal buffers. 1485 1486 #!/usr/bin/sed -nf 1487 1488 # reverse all lines of input, i.e. first line became last, ... 1489 1490 # from the second line, the buffer (which contains all previous lines) 1491 # is *appended* to current line, so, the order will be reversed 1492 1! G 1493 1494 # on the last line we're done -- print everything 1495 $ p 1496 1497 # store everything on the buffer again 1498 h 1499 1500 1501 File: sed.info, Node: cat -n, Next: cat -b, Prev: tac, Up: Examples 1502 1503 4.7 Numbering Lines 1504 =================== 1505 1506 This script replaces `cat -n'; in fact it formats its output exactly 1507 like GNU `cat' does. 1508 1509 Of course this is completely useless and for two reasons: first, 1510 because somebody else did it in C, second, because the following 1511 Bourne-shell script could be used for the same purpose and would be 1512 much faster: 1513 1514 #! /bin/sh 1515 sed -e "=" $@ | sed -e ' 1516 s/^/ / 1517 N 1518 s/^ *\(......\)\n/\1 / 1519 ' 1520 1521 It uses `sed' to print the line number, then groups lines two by two 1522 using `N'. Of course, this script does not teach as much as the one 1523 presented below. 1524 1525 The algorithm used for incrementing uses both buffers, so the line 1526 is printed as soon as possible and then discarded. The number is split 1527 so that changing digits go in a buffer and unchanged ones go in the 1528 other; the changed digits are modified in a single step (using a `y' 1529 command). The line number for the next line is then composed and 1530 stored in the hold space, to be used in the next iteration. 1531 1532 #!/usr/bin/sed -nf 1533 1534 # Prime the pump on the first line 1535 x 1536 /^$/ s/^.*$/1/ 1537 1538 # Add the correct line number before the pattern 1539 G 1540 h 1541 1542 # Format it and print it 1543 s/^/ / 1544 s/^ *\(......\)\n/\1 /p 1545 1546 # Get the line number from hold space; add a zero 1547 # if we're going to add a digit on the next line 1548 g 1549 s/\n.*$// 1550 /^9*$/ s/^/0/ 1551 1552 # separate changing/unchanged digits with an x 1553 s/.9*$/x&/ 1554 1555 # keep changing digits in hold space 1556 h 1557 s/^.*x// 1558 y/0123456789/1234567890/ 1559 x 1560 1561 # keep unchanged digits in pattern space 1562 s/x.*$// 1563 1564 # compose the new number, remove the newline implicitly added by G 1565 G 1566 s/\n// 1567 h 1568 1569 1570 File: sed.info, Node: cat -b, Next: wc -c, Prev: cat -n, Up: Examples 1571 1572 4.8 Numbering Non-blank Lines 1573 ============================= 1574 1575 Emulating `cat -b' is almost the same as `cat -n'--we only have to 1576 select which lines are to be numbered and which are not. 1577 1578 The part that is common to this script and the previous one is not 1579 commented to show how important it is to comment `sed' scripts 1580 properly... 1581 1582 #!/usr/bin/sed -nf 1583 1584 /^$/ { 1585 p 1586 b 1587 } 1588 1589 # Same as cat -n from now 1590 x 1591 /^$/ s/^.*$/1/ 1592 G 1593 h 1594 s/^/ / 1595 s/^ *\(......\)\n/\1 /p 1596 x 1597 s/\n.*$// 1598 /^9*$/ s/^/0/ 1599 s/.9*$/x&/ 1600 h 1601 s/^.*x// 1602 y/0123456789/1234567890/ 1603 x 1604 s/x.*$// 1605 G 1606 s/\n// 1607 h 1608 1609 1610 File: sed.info, Node: wc -c, Next: wc -w, Prev: cat -b, Up: Examples 1611 1612 4.9 Counting Characters 1613 ======================= 1614 1615 This script shows another way to do arithmetic with `sed'. In this 1616 case we have to add possibly large numbers, so implementing this by 1617 successive increments would not be feasible (and possibly even more 1618 complicated to contrive than this script). 1619 1620 The approach is to map numbers to letters, kind of an abacus 1621 implemented with `sed'. `a's are units, `b's are tens and so on: we 1622 simply add the number of characters on the current line as units, and 1623 then propagate the carry to tens, hundreds, and so on. 1624 1625 As usual, running totals are kept in hold space. 1626 1627 On the last line, we convert the abacus form back to decimal. For 1628 the sake of variety, this is done with a loop rather than with some 80 1629 `s' commands(1): first we convert units, removing `a's from the number; 1630 then we rotate letters so that tens become `a's, and so on until no 1631 more letters remain. 1632 1633 #!/usr/bin/sed -nf 1634 1635 # Add n+1 a's to hold space (+1 is for the newline) 1636 s/./a/g 1637 H 1638 x 1639 s/\n/a/ 1640 1641 # Do the carry. The t's and b's are not necessary, 1642 # but they do speed up the thing 1643 t a 1644 : a; s/aaaaaaaaaa/b/g; t b; b done 1645 : b; s/bbbbbbbbbb/c/g; t c; b done 1646 : c; s/cccccccccc/d/g; t d; b done 1647 : d; s/dddddddddd/e/g; t e; b done 1648 : e; s/eeeeeeeeee/f/g; t f; b done 1649 : f; s/ffffffffff/g/g; t g; b done 1650 : g; s/gggggggggg/h/g; t h; b done 1651 : h; s/hhhhhhhhhh//g 1652 1653 : done 1654 $! { 1655 h 1656 b 1657 } 1658 1659 # On the last line, convert back to decimal 1660 1661 : loop 1662 /a/! s/[b-h]*/&0/ 1663 s/aaaaaaaaa/9/ 1664 s/aaaaaaaa/8/ 1665 s/aaaaaaa/7/ 1666 s/aaaaaa/6/ 1667 s/aaaaa/5/ 1668 s/aaaa/4/ 1669 s/aaa/3/ 1670 s/aa/2/ 1671 s/a/1/ 1672 1673 : next 1674 y/bcdefgh/abcdefg/ 1675 /[a-h]/ b loop 1676 p 1677 1678 ---------- Footnotes ---------- 1679 1680 (1) Some implementations have a limit of 199 commands per script 1681 1682 1683 File: sed.info, Node: wc -w, Next: wc -l, Prev: wc -c, Up: Examples 1684 1685 4.10 Counting Words 1686 =================== 1687 1688 This script is almost the same as the previous one, once each of the 1689 words on the line is converted to a single `a' (in the previous script 1690 each letter was changed to an `a'). 1691 1692 It is interesting that real `wc' programs have optimized loops for 1693 `wc -c', so they are much slower at counting words rather than 1694 characters. This script's bottleneck, instead, is arithmetic, and 1695 hence the word-counting one is faster (it has to manage smaller 1696 numbers). 1697 1698 Again, the common parts are not commented to show the importance of 1699 commenting `sed' scripts. 1700 1701 #!/usr/bin/sed -nf 1702 1703 # Convert words to a's 1704 s/[ tab][ tab]*/ /g 1705 s/^/ / 1706 s/ [^ ][^ ]*/a /g 1707 s/ //g 1708 1709 # Append them to hold space 1710 H 1711 x 1712 s/\n// 1713 1714 # From here on it is the same as in wc -c. 1715 /aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g 1716 /bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g 1717 /cccccccccc/! bx; s/cccccccccc/d/g 1718 /dddddddddd/! bx; s/dddddddddd/e/g 1719 /eeeeeeeeee/! bx; s/eeeeeeeeee/f/g 1720 /ffffffffff/! bx; s/ffffffffff/g/g 1721 /gggggggggg/! bx; s/gggggggggg/h/g 1722 s/hhhhhhhhhh//g 1723 :x 1724 $! { h; b; } 1725 :y 1726 /a/! s/[b-h]*/&0/ 1727 s/aaaaaaaaa/9/ 1728 s/aaaaaaaa/8/ 1729 s/aaaaaaa/7/ 1730 s/aaaaaa/6/ 1731 s/aaaaa/5/ 1732 s/aaaa/4/ 1733 s/aaa/3/ 1734 s/aa/2/ 1735 s/a/1/ 1736 y/bcdefgh/abcdefg/ 1737 /[a-h]/ by 1738 p 1739 1740 1741 File: sed.info, Node: wc -l, Next: head, Prev: wc -w, Up: Examples 1742 1743 4.11 Counting Lines 1744 =================== 1745 1746 No strange things are done now, because `sed' gives us `wc -l' 1747 functionality for free!!! Look: 1748 1749 #!/usr/bin/sed -nf 1750 $= 1751 1752 1753 File: sed.info, Node: head, Next: tail, Prev: wc -l, Up: Examples 1754 1755 4.12 Printing the First Lines 1756 ============================= 1757 1758 This script is probably the simplest useful `sed' script. It displays 1759 the first 10 lines of input; the number of displayed lines is right 1760 before the `q' command. 1761 1762 #!/usr/bin/sed -f 1763 10q 1764 1765 1766 File: sed.info, Node: tail, Next: uniq, Prev: head, Up: Examples 1767 1768 4.13 Printing the Last Lines 1769 ============================ 1770 1771 Printing the last N lines rather than the first is more complex but 1772 indeed possible. N is encoded in the second line, before the bang 1773 character. 1774 1775 This script is similar to the `tac' script in that it keeps the 1776 final output in the hold space and prints it at the end: 1777 1778 #!/usr/bin/sed -nf 1779 1780 1! {; H; g; } 1781 1,10 !s/[^\n]*\n// 1782 $p 1783 h 1784 1785 Mainly, the scripts keeps a window of 10 lines and slides it by 1786 adding a line and deleting the oldest (the substitution command on the 1787 second line works like a `D' command but does not restart the loop). 1788 1789 The "sliding window" technique is a very powerful way to write 1790 efficient and complex `sed' scripts, because commands like `P' would 1791 require a lot of work if implemented manually. 1792 1793 To introduce the technique, which is fully demonstrated in the rest 1794 of this chapter and is based on the `N', `P' and `D' commands, here is 1795 an implementation of `tail' using a simple "sliding window." 1796 1797 This looks complicated but in fact the working is the same as the 1798 last script: after we have kicked in the appropriate number of lines, 1799 however, we stop using the hold space to keep inter-line state, and 1800 instead use `N' and `D' to slide pattern space by one line: 1801 1802 #!/usr/bin/sed -f 1803 1804 1h 1805 2,10 {; H; g; } 1806 $q 1807 1,9d 1808 N 1809 D 1810 1811 Note how the first, second and fourth line are inactive after the 1812 first ten lines of input. After that, all the script does is: exiting 1813 on the last line of input, appending the next input line to pattern 1814 space, and removing the first line. 1815 1816 1817 File: sed.info, Node: uniq, Next: uniq -d, Prev: tail, Up: Examples 1818 1819 4.14 Make Duplicate Lines Unique 1820 ================================ 1821 1822 This is an example of the art of using the `N', `P' and `D' commands, 1823 probably the most difficult to master. 1824 1825 #!/usr/bin/sed -f 1826 h 1827 1828 :b 1829 # On the last line, print and exit 1830 $b 1831 N 1832 /^\(.*\)\n\1$/ { 1833 # The two lines are identical. Undo the effect of 1834 # the n command. 1835 g 1836 bb 1837 } 1838 1839 # If the `N' command had added the last line, print and exit 1840 $b 1841 1842 # The lines are different; print the first and go 1843 # back working on the second. 1844 P 1845 D 1846 1847 As you can see, we mantain a 2-line window using `P' and `D'. This 1848 technique is often used in advanced `sed' scripts. 1849 1850 1851 File: sed.info, Node: uniq -d, Next: uniq -u, Prev: uniq, Up: Examples 1852 1853 4.15 Print Duplicated Lines of Input 1854 ==================================== 1855 1856 This script prints only duplicated lines, like `uniq -d'. 1857 1858 #!/usr/bin/sed -nf 1859 1860 $b 1861 N 1862 /^\(.*\)\n\1$/ { 1863 # Print the first of the duplicated lines 1864 s/.*\n// 1865 p 1866 1867 # Loop until we get a different line 1868 :b 1869 $b 1870 N 1871 /^\(.*\)\n\1$/ { 1872 s/.*\n// 1873 bb 1874 } 1875 } 1876 1877 # The last line cannot be followed by duplicates 1878 $b 1879 1880 # Found a different one. Leave it alone in the pattern space 1881 # and go back to the top, hunting its duplicates 1882 D 1883 1884 1885 File: sed.info, Node: uniq -u, Next: cat -s, Prev: uniq -d, Up: Examples 1886 1887 4.16 Remove All Duplicated Lines 1888 ================================ 1889 1890 This script prints only unique lines, like `uniq -u'. 1891 1892 #!/usr/bin/sed -f 1893 1894 # Search for a duplicate line --- until that, print what you find. 1895 $b 1896 N 1897 /^\(.*\)\n\1$/ ! { 1898 P 1899 D 1900 } 1901 1902 :c 1903 # Got two equal lines in pattern space. At the 1904 # end of the file we simply exit 1905 $d 1906 1907 # Else, we keep reading lines with `N' until we 1908 # find a different one 1909 s/.*\n// 1910 N 1911 /^\(.*\)\n\1$/ { 1912 bc 1913 } 1914 1915 # Remove the last instance of the duplicate line 1916 # and go back to the top 1917 D 1918 1919 1920 File: sed.info, Node: cat -s, Prev: uniq -u, Up: Examples 1921 1922 4.17 Squeezing Blank Lines 1923 ========================== 1924 1925 As a final example, here are three scripts, of increasing complexity 1926 and speed, that implement the same function as `cat -s', that is 1927 squeezing blank lines. 1928 1929 The first leaves a blank line at the beginning and end if there are 1930 some already. 1931 1932 #!/usr/bin/sed -f 1933 1934 # on empty lines, join with next 1935 # Note there is a star in the regexp 1936 :x 1937 /^\n*$/ { 1938 N 1939 bx 1940 } 1941 1942 # now, squeeze all '\n', this can be also done by: 1943 # s/^\(\n\)*/\1/ 1944 s/\n*/\ 1945 / 1946 1947 This one is a bit more complex and removes all empty lines at the 1948 beginning. It does leave a single blank line at end if one was there. 1949 1950 #!/usr/bin/sed -f 1951 1952 # delete all leading empty lines 1953 1,/^./{ 1954 /./!d 1955 } 1956 1957 # on an empty line we remove it and all the following 1958 # empty lines, but one 1959 :x 1960 /./!{ 1961 N 1962 s/^\n$// 1963 tx 1964 } 1965 1966 This removes leading and trailing blank lines. It is also the 1967 fastest. Note that loops are completely done with `n' and `b', without 1968 relying on `sed' to restart the the script automatically at the end of 1969 a line. 1970 1971 #!/usr/bin/sed -nf 1972 1973 # delete all (leading) blanks 1974 /./!d 1975 1976 # get here: so there is a non empty 1977 :x 1978 # print it 1979 p 1980 # get next 1981 n 1982 # got chars? print it again, etc... 1983 /./bx 1984 1985 # no, don't have chars: got an empty line 1986 :z 1987 # get next, if last line we finish here so no trailing 1988 # empty lines are written 1989 n 1990 # also empty? then ignore it, and get next... this will 1991 # remove ALL empty lines 1992 /./!bz 1993 1994 # all empty lines were deleted/ignored, but we have a non empty. As 1995 # what we want to do is to squeeze, insert a blank line artificially 1996 i\ 1997 1998 bx 1999 2000 2001 File: sed.info, Node: Limitations, Next: Other Resources, Prev: Examples, Up: Top 2002 2003 5 GNU `sed''s Limitations and Non-limitations 2004 ********************************************* 2005 2006 For those who want to write portable `sed' scripts, be aware that some 2007 implementations have been known to limit line lengths (for the pattern 2008 and hold spaces) to be no more than 4000 bytes. The POSIX standard 2009 specifies that conforming `sed' implementations shall support at least 2010 8192 byte line lengths. GNU `sed' has no built-in limit on line length; 2011 as long as it can `malloc()' more (virtual) memory, you can feed or 2012 construct lines as long as you like. 2013 2014 However, recursion is used to handle subpatterns and indefinite 2015 repetition. This means that the available stack space may limit the 2016 size of the buffer that can be processed by certain patterns. 2017 2018 2019 File: sed.info, Node: Other Resources, Next: Reporting Bugs, Prev: Limitations, Up: Top 2020 2021 6 Other Resources for Learning About `sed' 2022 ****************************************** 2023 2024 In addition to several books that have been written about `sed' (either 2025 specifically or as chapters in books which discuss shell programming), 2026 one can find out more about `sed' (including suggestions of a few 2027 books) from the FAQ for the `sed-users' mailing list, available from: 2028 `http://sed.sourceforge.net/sedfaq.html' 2029 2030 Also of interest are 2031 `http://www.student.northpark.edu/pemente/sed/index.htm' and 2032 `http://sed.sf.net/grabbag', which include `sed' tutorials and other 2033 `sed'-related goodies. 2034 2035 The `sed-users' mailing list itself maintained by Sven Guckes. To 2036 subscribe, visit `http://groups.yahoo.com' and search for the 2037 `sed-users' mailing list. 2038 2039 2040 File: sed.info, Node: Reporting Bugs, Next: Extended regexps, Prev: Other Resources, Up: Top 2041 2042 7 Reporting Bugs 2043 **************** 2044 2045 Email bug reports to <bonzini (a] gnu.org>. Be sure to include the word 2046 "sed" somewhere in the `Subject:' field. Also, please include the 2047 output of `sed --version' in the body of your report if at all possible. 2048 2049 Please do not send a bug report like this: 2050 2051 while building frobme-1.3.4 2052 $ configure 2053 error--> sed: file sedscr line 1: Unknown option to 's' 2054 2055 If GNU `sed' doesn't configure your favorite package, take a few 2056 extra minutes to identify the specific problem and make a stand-alone 2057 test case. Unlike other programs such as C compilers, making such test 2058 cases for `sed' is quite simple. 2059 2060 A stand-alone test case includes all the data necessary to perform 2061 the test, and the specific invocation of `sed' that causes the problem. 2062 The smaller a stand-alone test case is, the better. A test case should 2063 not involve something as far removed from `sed' as "try to configure 2064 frobme-1.3.4". Yes, that is in principle enough information to look 2065 for the bug, but that is not a very practical prospect. 2066 2067 Here are a few commonly reported bugs that are not bugs. 2068 2069 `N' command on the last line 2070 Most versions of `sed' exit without printing anything when the `N' 2071 command is issued on the last line of a file. GNU `sed' prints 2072 pattern space before exiting unless of course the `-n' command 2073 switch has been specified. This choice is by design. 2074 2075 For example, the behavior of 2076 sed N foo bar 2077 would depend on whether foo has an even or an odd number of 2078 lines(1). Or, when writing a script to read the next few lines 2079 following a pattern match, traditional implementations of `sed' 2080 would force you to write something like 2081 /foo/{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N } 2082 instead of just 2083 /foo/{ N;N;N;N;N;N;N;N;N; } 2084 2085 In any case, the simplest workaround is to use `$d;N' in scripts 2086 that rely on the traditional behavior, or to set the 2087 `POSIXLY_CORRECT' variable to a non-empty value. 2088 2089 Regex syntax clashes (problems with backslashes) 2090 `sed' uses the POSIX basic regular expression syntax. According to 2091 the standard, the meaning of some escape sequences is undefined in 2092 this syntax; notable in the case of `sed' are `\|', `\+', `\?', 2093 `\`', `\'', `\<', `\>', `\b', `\B', `\w', and `\W'. 2094 2095 As in all GNU programs that use POSIX basic regular expressions, 2096 `sed' interprets these escape sequences as special characters. 2097 So, `x\+' matches one or more occurrences of `x'. `abc\|def' 2098 matches either `abc' or `def'. 2099 2100 This syntax may cause problems when running scripts written for 2101 other `sed's. Some `sed' programs have been written with the 2102 assumption that `\|' and `\+' match the literal characters `|' and 2103 `+'. Such scripts must be modified by removing the spurious 2104 backslashes if they are to be used with modern implementations of 2105 `sed', like GNU `sed'. 2106 2107 On the other hand, some scripts use s|abc\|def||g to remove 2108 occurrences of _either_ `abc' or `def'. While this worked until 2109 `sed' 4.0.x, newer versions interpret this as removing the string 2110 `abc|def'. This is again undefined behavior according to POSIX, 2111 and this interpretation is arguably more robust: older `sed's, for 2112 example, required that the regex matcher parsed `\/' as `/' in the 2113 common case of escaping a slash, which is again undefined 2114 behavior; the new behavior avoids this, and this is good because 2115 the regex matcher is only partially under our control. 2116 2117 In addition, this version of `sed' supports several escape 2118 characters (some of which are multi-character) to insert 2119 non-printable characters in scripts (`\a', `\c', `\d', `\o', `\r', 2120 `\t', `\v', `\x'). These can cause similar problems with scripts 2121 written for other `sed's. 2122 2123 `-i' clobbers read-only files 2124 In short, `sed -i' will let you delete the contents of a read-only 2125 file, and in general the `-i' option (*note Invocation: Invoking 2126 sed.) lets you clobber protected files. This is not a bug, but 2127 rather a consequence of how the Unix filesystem works. 2128 2129 The permissions on a file say what can happen to the data in that 2130 file, while the permissions on a directory say what can happen to 2131 the list of files in that directory. `sed -i' will not ever open 2132 for writing a file that is already on disk. Rather, it will work 2133 on a temporary file that is finally renamed to the original name: 2134 if you rename or delete files, you're actually modifying the 2135 contents of the directory, so the operation depends on the 2136 permissions of the directory, not of the file. For this same 2137 reason, `sed' does not let you use `-i' on a writeable file in a 2138 read-only directory, and will break hard or symbolic links when 2139 `-i' is used on such a file. 2140 2141 `0a' does not work (gives an error) 2142 There is no line 0. 0 is a special address that is only used to 2143 treat addresses like `0,/RE/' as active when the script starts: if 2144 you write `1,/abc/d' and the first line includes the word `abc', 2145 then that match would be ignored because address ranges must span 2146 at least two lines (barring the end of the file); but what you 2147 probably wanted is to delete every line up to the first one 2148 including `abc', and this is obtained with `0,/abc/d'. 2149 2150 `[a-z]' is case insensitive 2151 You are encountering problems with locales. POSIX mandates that 2152 `[a-z]' uses the current locale's collation order - in C parlance, 2153 that means using `strcoll(3)' instead of `strcmp(3)'. Some 2154 locales have a case-insensitive collation order, others don't. 2155 2156 Another problem is that `[a-z]' tries to use collation symbols. 2157 This only happens if you are on the GNU system, using GNU libc's 2158 regular expression matcher instead of compiling the one supplied 2159 with GNU sed. In a Danish locale, for example, the regular 2160 expression `^[a-z]$' matches the string `aa', because this is a 2161 single collating symbol that comes after `a' and before `b'; `ll' 2162 behaves similarly in Spanish locales, or `ij' in Dutch locales. 2163 2164 To work around these problems, which may cause bugs in shell 2165 scripts, set the `LC_COLLATE' and `LC_CTYPE' environment variables 2166 to `C'. 2167 2168 `s/.*//' does not clear pattern space 2169 This happens if your input stream includes invalid multibyte 2170 sequences. POSIX mandates that such sequences are _not_ matched 2171 by `.', so that `s/.*//' will not clear pattern space as you would 2172 expect. In fact, there is no way to clear sed's buffers in the 2173 middle of the script in most multibyte locales (including UTF-8 2174 locales). For this reason, GNU `sed' provides a `z' command (for 2175 `zap') as an extension. 2176 2177 To work around these problems, which may cause bugs in shell 2178 scripts, set the `LC_COLLATE' and `LC_CTYPE' environment variables 2179 to `C'. 2180 2181 ---------- Footnotes ---------- 2182 2183 (1) which is the actual "bug" that prompted the change in behavior 2184 2185 2186 File: sed.info, Node: Extended regexps, Next: Concept Index, Prev: Reporting Bugs, Up: Top 2187 2188 Appendix A Extended regular expressions 2189 *************************************** 2190 2191 The only difference between basic and extended regular expressions is in 2192 the behavior of a few characters: `?', `+', parentheses, and braces 2193 (`{}'). While basic regular expressions require these to be escaped if 2194 you want them to behave as special characters, when using extended 2195 regular expressions you must escape them if you want them _to match a 2196 literal character_. 2197 2198 Examples: 2199 `abc?' 2200 becomes `abc\?' when using extended regular expressions. It 2201 matches the literal string `abc?'. 2202 2203 `c\+' 2204 becomes `c+' when using extended regular expressions. It matches 2205 one or more `c's. 2206 2207 `a\{3,\}' 2208 becomes `a{3,}' when using extended regular expressions. It 2209 matches three or more `a's. 2210 2211 `\(abc\)\{2,3\}' 2212 becomes `(abc){2,3}' when using extended regular expressions. It 2213 matches either `abcabc' or `abcabcabc'. 2214 2215 `\(abc*\)\1' 2216 becomes `(abc*)\1' when using extended regular expressions. 2217 Backreferences must still be escaped when using extended regular 2218 expressions. 2219 2220 2221 File: sed.info, Node: Concept Index, Next: Command and Option Index, Prev: Extended regexps, Up: Top 2222 2223 Concept Index 2224 ************* 2225 2226 This is a general index of all issues discussed in this manual, with the 2227 exception of the `sed' commands and command-line options. 2228 2229 [index] 2230 * Menu: 2231 2232 * 0 address: Reporting Bugs. (line 103) 2233 * Additional reading about sed: Other Resources. (line 6) 2234 * ADDR1,+N: Addresses. (line 78) 2235 * ADDR1,~N: Addresses. (line 78) 2236 * Address, as a regular expression: Addresses. (line 27) 2237 * Address, last line: Addresses. (line 22) 2238 * Address, numeric: Addresses. (line 8) 2239 * Addresses, in sed scripts: Addresses. (line 6) 2240 * Append hold space to pattern space: Other Commands. (line 125) 2241 * Append next input line to pattern space: Other Commands. (line 105) 2242 * Append pattern space to hold space: Other Commands. (line 117) 2243 * Appending text after a line: Other Commands. (line 27) 2244 * Backreferences, in regular expressions: The "s" Command. (line 19) 2245 * Branch to a label, if s/// failed: Extended Commands. (line 63) 2246 * Branch to a label, if s/// succeeded: Programming Commands. 2247 (line 22) 2248 * Branch to a label, unconditionally: Programming Commands. 2249 (line 18) 2250 * Buffer spaces, pattern and hold: Execution Cycle. (line 6) 2251 * Bugs, reporting: Reporting Bugs. (line 6) 2252 * Case-insensitive matching: The "s" Command. (line 94) 2253 * Caveat -- #n on first line: Common Commands. (line 20) 2254 * Command groups: Common Commands. (line 50) 2255 * Comments, in scripts: Common Commands. (line 12) 2256 * Conditional branch <1>: Extended Commands. (line 63) 2257 * Conditional branch: Programming Commands. 2258 (line 22) 2259 * Copy hold space into pattern space: Other Commands. (line 121) 2260 * Copy pattern space into hold space: Other Commands. (line 113) 2261 * Delete first line from pattern space: Other Commands. (line 99) 2262 * Disabling autoprint, from command line: Invoking sed. (line 34) 2263 * empty regular expression: Addresses. (line 31) 2264 * Emptying pattern space <1>: Reporting Bugs. (line 130) 2265 * Emptying pattern space: Extended Commands. (line 85) 2266 * Evaluate Bourne-shell commands: Extended Commands. (line 12) 2267 * Evaluate Bourne-shell commands, after substitution: The "s" Command. 2268 (line 85) 2269 * Exchange hold space with pattern space: Other Commands. (line 129) 2270 * Excluding lines: Addresses. (line 101) 2271 * Extended regular expressions, choosing: Invoking sed. (line 113) 2272 * Extended regular expressions, syntax: Extended regexps. (line 6) 2273 * Files to be processed as input: Invoking sed. (line 141) 2274 * Flow of control in scripts: Programming Commands. 2275 (line 11) 2276 * Global substitution: The "s" Command. (line 51) 2277 * GNU extensions, /dev/stderr file <1>: Other Commands. (line 88) 2278 * GNU extensions, /dev/stderr file: The "s" Command. (line 78) 2279 * GNU extensions, /dev/stdin file <1>: Extended Commands. (line 53) 2280 * GNU extensions, /dev/stdin file: Other Commands. (line 78) 2281 * GNU extensions, /dev/stdout file <1>: Other Commands. (line 88) 2282 * GNU extensions, /dev/stdout file <2>: The "s" Command. (line 78) 2283 * GNU extensions, /dev/stdout file: Invoking sed. (line 149) 2284 * GNU extensions, 0 address <1>: Reporting Bugs. (line 103) 2285 * GNU extensions, 0 address: Addresses. (line 78) 2286 * GNU extensions, 0,ADDR2 addressing: Addresses. (line 78) 2287 * GNU extensions, ADDR1,+N addressing: Addresses. (line 78) 2288 * GNU extensions, ADDR1,~N addressing: Addresses. (line 78) 2289 * GNU extensions, branch if s/// failed: Extended Commands. (line 63) 2290 * GNU extensions, case modifiers in s commands: The "s" Command. 2291 (line 23) 2292 * GNU extensions, checking for their presence: Extended Commands. 2293 (line 69) 2294 * GNU extensions, disabling: Invoking sed. (line 81) 2295 * GNU extensions, emptying pattern space <1>: Reporting Bugs. (line 130) 2296 * GNU extensions, emptying pattern space: Extended Commands. (line 85) 2297 * GNU extensions, evaluating Bourne-shell commands <1>: Extended Commands. 2298 (line 12) 2299 * GNU extensions, evaluating Bourne-shell commands: The "s" Command. 2300 (line 85) 2301 * GNU extensions, extended regular expressions: Invoking sed. (line 113) 2302 * GNU extensions, g and NUMBER modifier interaction in s command: The "s" Command. 2303 (line 57) 2304 * GNU extensions, I modifier <1>: The "s" Command. (line 94) 2305 * GNU extensions, I modifier: Addresses. (line 49) 2306 * GNU extensions, in-place editing <1>: Reporting Bugs. (line 85) 2307 * GNU extensions, in-place editing: Invoking sed. (line 51) 2308 * GNU extensions, L command: Extended Commands. (line 26) 2309 * GNU extensions, M modifier: The "s" Command. (line 99) 2310 * GNU extensions, modifiers and the empty regular expression: Addresses. 2311 (line 31) 2312 * GNU extensions, N~M addresses: Addresses. (line 13) 2313 * GNU extensions, quitting silently: Extended Commands. (line 36) 2314 * GNU extensions, R command: Extended Commands. (line 53) 2315 * GNU extensions, reading a file a line at a time: Extended Commands. 2316 (line 53) 2317 * GNU extensions, reformatting paragraphs: Extended Commands. (line 26) 2318 * GNU extensions, returning an exit code <1>: Extended Commands. 2319 (line 36) 2320 * GNU extensions, returning an exit code: Common Commands. (line 30) 2321 * GNU extensions, setting line length: Other Commands. (line 65) 2322 * GNU extensions, special escapes <1>: Reporting Bugs. (line 78) 2323 * GNU extensions, special escapes: Escapes. (line 6) 2324 * GNU extensions, special two-address forms: Addresses. (line 78) 2325 * GNU extensions, subprocesses <1>: Extended Commands. (line 12) 2326 * GNU extensions, subprocesses: The "s" Command. (line 85) 2327 * GNU extensions, to basic regular expressions <1>: Reporting Bugs. 2328 (line 51) 2329 * GNU extensions, to basic regular expressions: Regular Expressions. 2330 (line 26) 2331 * GNU extensions, two addresses supported by most commands: Other Commands. 2332 (line 25) 2333 * GNU extensions, unlimited line length: Limitations. (line 6) 2334 * GNU extensions, writing first line to a file: Extended Commands. 2335 (line 80) 2336 * Goto, in scripts: Programming Commands. 2337 (line 18) 2338 * Greedy regular expression matching: Regular Expressions. (line 143) 2339 * Grouping commands: Common Commands. (line 50) 2340 * Hold space, appending from pattern space: Other Commands. (line 117) 2341 * Hold space, appending to pattern space: Other Commands. (line 125) 2342 * Hold space, copy into pattern space: Other Commands. (line 121) 2343 * Hold space, copying pattern space into: Other Commands. (line 113) 2344 * Hold space, definition: Execution Cycle. (line 6) 2345 * Hold space, exchange with pattern space: Other Commands. (line 129) 2346 * In-place editing: Reporting Bugs. (line 85) 2347 * In-place editing, activating: Invoking sed. (line 51) 2348 * In-place editing, Perl-style backup file names: Invoking sed. 2349 (line 62) 2350 * Inserting text before a line: Other Commands. (line 46) 2351 * Labels, in scripts: Programming Commands. 2352 (line 14) 2353 * Last line, selecting: Addresses. (line 22) 2354 * Line length, setting <1>: Other Commands. (line 65) 2355 * Line length, setting: Invoking sed. (line 76) 2356 * Line number, printing: Other Commands. (line 62) 2357 * Line selection: Addresses. (line 6) 2358 * Line, selecting by number: Addresses. (line 8) 2359 * Line, selecting by regular expression match: Addresses. (line 27) 2360 * Line, selecting last: Addresses. (line 22) 2361 * List pattern space: Other Commands. (line 65) 2362 * Mixing g and NUMBER modifiers in the s command: The "s" Command. 2363 (line 57) 2364 * Next input line, append to pattern space: Other Commands. (line 105) 2365 * Next input line, replace pattern space with: Common Commands. 2366 (line 44) 2367 * Non-bugs, 0 address: Reporting Bugs. (line 103) 2368 * Non-bugs, in-place editing: Reporting Bugs. (line 85) 2369 * Non-bugs, localization-related: Reporting Bugs. (line 112) 2370 * Non-bugs, N command on the last line: Reporting Bugs. (line 31) 2371 * Non-bugs, regex syntax clashes: Reporting Bugs. (line 51) 2372 * Parenthesized substrings: The "s" Command. (line 19) 2373 * Pattern space, definition: Execution Cycle. (line 6) 2374 * Perl-style regular expressions, multiline: Addresses. (line 54) 2375 * Portability, comments: Common Commands. (line 15) 2376 * Portability, line length limitations: Limitations. (line 6) 2377 * Portability, N command on the last line: Reporting Bugs. (line 31) 2378 * POSIXLY_CORRECT behavior, bracket expressions: Regular Expressions. 2379 (line 105) 2380 * POSIXLY_CORRECT behavior, enabling: Invoking sed. (line 84) 2381 * POSIXLY_CORRECT behavior, escapes: Escapes. (line 11) 2382 * POSIXLY_CORRECT behavior, N command: Reporting Bugs. (line 46) 2383 * Print first line from pattern space: Other Commands. (line 110) 2384 * Printing line number: Other Commands. (line 62) 2385 * Printing text unambiguously: Other Commands. (line 65) 2386 * Quitting <1>: Extended Commands. (line 36) 2387 * Quitting: Common Commands. (line 30) 2388 * Range of lines: Addresses. (line 65) 2389 * Range with start address of zero: Addresses. (line 78) 2390 * Read next input line: Common Commands. (line 44) 2391 * Read text from a file <1>: Extended Commands. (line 53) 2392 * Read text from a file: Other Commands. (line 78) 2393 * Reformat pattern space: Extended Commands. (line 26) 2394 * Reformatting paragraphs: Extended Commands. (line 26) 2395 * Replace hold space with copy of pattern space: Other Commands. 2396 (line 113) 2397 * Replace pattern space with copy of hold space: Other Commands. 2398 (line 121) 2399 * Replacing all text matching regexp in a line: The "s" Command. 2400 (line 51) 2401 * Replacing only Nth match of regexp in a line: The "s" Command. 2402 (line 55) 2403 * Replacing selected lines with other text: Other Commands. (line 52) 2404 * Requiring GNU sed: Extended Commands. (line 69) 2405 * Script structure: sed Programs. (line 6) 2406 * Script, from a file: Invoking sed. (line 46) 2407 * Script, from command line: Invoking sed. (line 41) 2408 * sed program structure: sed Programs. (line 6) 2409 * Selecting lines to process: Addresses. (line 6) 2410 * Selecting non-matching lines: Addresses. (line 101) 2411 * Several lines, selecting: Addresses. (line 65) 2412 * Slash character, in regular expressions: Addresses. (line 41) 2413 * Spaces, pattern and hold: Execution Cycle. (line 6) 2414 * Special addressing forms: Addresses. (line 78) 2415 * Standard input, processing as input: Invoking sed. (line 143) 2416 * Stream editor: Introduction. (line 6) 2417 * Subprocesses <1>: Extended Commands. (line 12) 2418 * Subprocesses: The "s" Command. (line 85) 2419 * Substitution of text, options: The "s" Command. (line 47) 2420 * Text, appending: Other Commands. (line 27) 2421 * Text, deleting: Common Commands. (line 36) 2422 * Text, insertion: Other Commands. (line 46) 2423 * Text, printing: Common Commands. (line 39) 2424 * Text, printing after substitution: The "s" Command. (line 65) 2425 * Text, writing to a file after substitution: The "s" Command. 2426 (line 78) 2427 * Transliteration: Other Commands. (line 14) 2428 * Unbuffered I/O, choosing: Invoking sed. (line 131) 2429 * Usage summary, printing: Invoking sed. (line 28) 2430 * Version, printing: Invoking sed. (line 24) 2431 * Working on separate files: Invoking sed. (line 121) 2432 * Write first line to a file: Extended Commands. (line 80) 2433 * Write to a file: Other Commands. (line 88) 2434 * Zero, as range start address: Addresses. (line 78) 2435 2436 2437 File: sed.info, Node: Command and Option Index, Prev: Concept Index, Up: Top 2438 2439 Command and Option Index 2440 ************************ 2441 2442 This is an alphabetical list of all `sed' commands and command-line 2443 options. 2444 2445 [index] 2446 * Menu: 2447 2448 * # (comments): Common Commands. (line 12) 2449 * --binary: Invoking sed. (line 93) 2450 * --expression: Invoking sed. (line 41) 2451 * --file: Invoking sed. (line 46) 2452 * --follow-symlinks: Invoking sed. (line 104) 2453 * --help: Invoking sed. (line 28) 2454 * --in-place: Invoking sed. (line 51) 2455 * --line-length: Invoking sed. (line 76) 2456 * --quiet: Invoking sed. (line 34) 2457 * --regexp-extended: Invoking sed. (line 113) 2458 * --silent: Invoking sed. (line 34) 2459 * --unbuffered: Invoking sed. (line 131) 2460 * --version: Invoking sed. (line 24) 2461 * -b: Invoking sed. (line 93) 2462 * -e: Invoking sed. (line 41) 2463 * -f: Invoking sed. (line 46) 2464 * -i: Invoking sed. (line 51) 2465 * -l: Invoking sed. (line 76) 2466 * -n: Invoking sed. (line 34) 2467 * -n, forcing from within a script: Common Commands. (line 20) 2468 * -r: Invoking sed. (line 113) 2469 * -u: Invoking sed. (line 131) 2470 * : (label) command: Programming Commands. 2471 (line 14) 2472 * = (print line number) command: Other Commands. (line 62) 2473 * a (append text lines) command: Other Commands. (line 27) 2474 * b (branch) command: Programming Commands. 2475 (line 18) 2476 * c (change to text lines) command: Other Commands. (line 52) 2477 * D (delete first line) command: Other Commands. (line 99) 2478 * d (delete) command: Common Commands. (line 36) 2479 * e (evaluate) command: Extended Commands. (line 12) 2480 * G (appending Get) command: Other Commands. (line 125) 2481 * g (get) command: Other Commands. (line 121) 2482 * H (append Hold) command: Other Commands. (line 117) 2483 * h (hold) command: Other Commands. (line 113) 2484 * i (insert text lines) command: Other Commands. (line 46) 2485 * L (fLow paragraphs) command: Extended Commands. (line 26) 2486 * l (list unambiguously) command: Other Commands. (line 65) 2487 * N (append Next line) command: Other Commands. (line 105) 2488 * n (next-line) command: Common Commands. (line 44) 2489 * P (print first line) command: Other Commands. (line 110) 2490 * p (print) command: Common Commands. (line 39) 2491 * q (quit) command: Common Commands. (line 30) 2492 * Q (silent Quit) command: Extended Commands. (line 36) 2493 * r (read file) command: Other Commands. (line 78) 2494 * R (read line) command: Extended Commands. (line 53) 2495 * s command, option flags: The "s" Command. (line 47) 2496 * T (test and branch if failed) command: Extended Commands. (line 63) 2497 * t (test and branch if successful) command: Programming Commands. 2498 (line 22) 2499 * v (version) command: Extended Commands. (line 69) 2500 * w (write file) command: Other Commands. (line 88) 2501 * W (write first line) command: Extended Commands. (line 80) 2502 * x (eXchange) command: Other Commands. (line 129) 2503 * y (transliterate) command: Other Commands. (line 14) 2504 * z (Zap) command: Extended Commands. (line 85) 2505 * {} command grouping: Common Commands. (line 50) 2506 2507 2508 2509 Tag Table: 2510 Node: Top944 2511 Node: Introduction3867 2512 Node: Invoking sed4421 2513 Ref: Invoking sed-Footnote-110512 2514 Ref: Invoking sed-Footnote-210704 2515 Node: sed Programs10803 2516 Node: Execution Cycle11951 2517 Ref: Execution Cycle-Footnote-113129 2518 Node: Addresses13430 2519 Node: Regular Expressions18174 2520 Node: Common Commands26082 2521 Node: The "s" Command28085 2522 Ref: The "s" Command-Footnote-132422 2523 Node: Other Commands32494 2524 Ref: Other Commands-Footnote-137636 2525 Node: Programming Commands37708 2526 Node: Extended Commands38622 2527 Node: Escapes42630 2528 Ref: Escapes-Footnote-145641 2529 Node: Examples45832 2530 Node: Centering lines46928 2531 Node: Increment a number47820 2532 Ref: Increment a number-Footnote-149380 2533 Node: Rename files to lower case49500 2534 Node: Print bash environment52203 2535 Node: Reverse chars of lines52958 2536 Ref: Reverse chars of lines-Footnote-153959 2537 Node: tac54176 2538 Node: cat -n54943 2539 Node: cat -b56765 2540 Node: wc -c57512 2541 Ref: wc -c-Footnote-159420 2542 Node: wc -w59489 2543 Node: wc -l60953 2544 Node: head61197 2545 Node: tail61528 2546 Node: uniq63209 2547 Node: uniq -d63997 2548 Node: uniq -u64708 2549 Node: cat -s65419 2550 Node: Limitations67270 2551 Node: Other Resources68111 2552 Node: Reporting Bugs68956 2553 Ref: Reporting Bugs-Footnote-176092 2554 Node: Extended regexps76163 2555 Node: Concept Index77349 2556 Node: Command and Option Index92298 2557 2558 End Tag Table 2559