1 * ABOUT BUGS 2 3 Before reporting a bug, please check the list of known bugs 4 and the list of oft-reported non-bugs (below). 5 6 Bugs and comments may be sent to bonzini (a] gnu.org; please 7 include in the Subject: header the first line of the output of 8 ``sed --version''. 9 10 Please do not send a bug report like this: 11 12 [while building frobme-1.3.4] 13 $ configure 14 sed: file sedscr line 1: Unknown option to 's' 15 16 If sed doesn't configure your favorite package, take a few extra 17 minutes to identify the specific problem and make a stand-alone test 18 case. 19 20 A stand-alone test case includes all the data necessary to perform the 21 test, and the specific invocation of sed that causes the problem. The 22 smaller a stand-alone test case is, the better. A test case should 23 not involve something as far removed from sed as ``try to configure 24 frobme-1.3.4''. Yes, that is in principle enough information to look 25 for the bug, but that is not a very practical prospect. 26 27 28 29 * NON-BUGS 30 31 `N' command on the last line 32 33 Most versions of sed exit without printing anything when the `N' 34 command is issued on the last line of a file. GNU sed instead 35 prints pattern space before exiting unless of course the `-n' 36 command switch has been specified. More information on the reason 37 behind this choice can be found in the Info manual. 38 39 40 regex syntax clashes (problems with backslashes) 41 42 sed uses the Posix basic regular expression syntax. According to 43 the standard, the meaning of some escape sequences is undefined in 44 this syntax; notable in the case of GNU sed are `\|', `\+', `\?', 45 `\`', `\'', `\<', `\>', `\b', `\B', `\w', and `\W'. 46 47 As in all GNU programs that use Posix basic regular expressions, sed 48 interprets these escape sequences as meta-characters. So, `x\+' 49 matches one or more occurrences of `x'. `abc\|def' matches either 50 `abc' or `def'. 51 52 This syntax may cause problems when running scripts written for other 53 seds. Some sed programs have been written with the assumption that 54 `\|' and `\+' match the literal characters `|' and `+'. Such scripts 55 must be modified by removing the spurious backslashes if they are to 56 be used with recent versions of sed (not only GNU sed). 57 58 On the other hand, some scripts use `s|abc\|def||g' to remove occurrences 59 of _either_ `abc' or `def'. While this worked until sed 4.0.x, newer 60 versions interpret this as removing the string `abc|def'. This is 61 again undefined behavior according to POSIX, but this interpretation 62 is arguably more robust: the older one, for example, required that 63 the regex matcher parsed `\/' as `/' in the common case of escaping 64 a slash, which is again undefined behavior; the new behavior avoids 65 this, and this is good because the regex matcher is only partially 66 under our control. 67 68 In addition, GNU sed supports several escape characters (some of 69 which are multi-character) to insert non-printable characters 70 in scripts (`\a', `\c', `\d', `\o', `\r', `\t', `\v', `\x'). These 71 can cause similar problems with scripts written for other seds. 72 73 74 -i clobbers read-only files 75 76 In short, `sed d -i' will let one delete the contents of 77 a read-only file, and in general the `-i' option will let 78 one clobber protected files. This is not a bug, but rather a 79 consequence of how the Unix filesystem works. 80 81 The permissions on a file say what can happen to the data 82 in that file, while the permissions on a directory say what can 83 happen to the list of files in that directory. `sed -i' 84 will not ever open for writing a file that is already on disk, 85 rather, it will work on a temporary file that is finally renamed 86 to the original name: if you rename or delete files, you're actually 87 modifying the contents of the directory, so the operation depends on 88 the permissions of the directory, not of the file). For this same 89 reason, sed will not let one use `-i' on a writeable file in a 90 read-only directory, and will break hard or symbolic links when 91 `-i' is used on such a file. 92 93 94 `0a' does not work (gives an error) 95 96 There is no line 0. 0 is a special address that is only used to treat 97 addresses like `0,/RE/' as active when the script starts: if you 98 write `1,/abc/d' and the first line includes the word `abc', then 99 that match would be ignored because address ranges must span at least 100 two lines (barring the end of the file); but what you probably wanted is 101 to delete every line up to the first one including `abc', and this 102 is obtained with `0,/abc/d'. 103 104 105 `[a-z]' is case insensitive 106 `s/.*//' does not clear pattern space 107 108 You are encountering problems with locales. POSIX mandates that `[a-z]' 109 uses the current locale's collation order -- in C parlance, that means 110 strcoll(3) instead of strcmp(3). Some locales have a case insensitive 111 strcoll, others don't. 112 113 Another problem is that [a-z] tries to use collation symbols. This 114 only happens if you are on the GNU system, using GNU libc's regular 115 expression matcher instead of compiling the one supplied with GNU sed. 116 In a Danish locale, for example, the regular expression `^[a-z]$' 117 matches the string `aa', because `aa' is a single collating symbol that 118 comes after `a' and before `b'; `ll' behaves similarly in Spanish 119 locales, or `ij' in Dutch locales. 120 121 Another common localization-related problem happens if your input stream 122 includes invalid multibyte sequences. POSIX mandates that such 123 sequences are _not_ matched by `.', so that `s/.*//' will not clear 124 pattern space as you would expect. In fact, there is no way to clear 125 sed's buffers in the middle of the script in most multibyte locales 126 (including UTF-8 locales). For this reason, GNU sed provides a `z' 127 command (for `zap') as an extension. 128 129 However, to work around both of these problems, which may cause bugs 130 in shell scripts, you can set the LC_ALL environment variable to `C', 131 or set the locale on a more fine-grained basis with the other LC_* 132 environment variables. 133 134