Home | History | Annotate | Download | only in sed-4.2.1
      1 * ABOUT BUGS
      2 
      3 Before reporting a bug, please check the list of known bugs
      4 and the list of oft-reported non-bugs (below).
      5 
      6 Bugs and comments may be sent to bonzini (a] gnu.org; please
      7 include in the Subject: header the first line of the output of
      8 ``sed --version''.
      9 
     10 Please do not send a bug report like this:
     11 
     12 	[while building frobme-1.3.4] 
     13 	$ configure 
     14 	sed: file sedscr line 1: Unknown option to 's'
     15 
     16 If sed doesn't configure your favorite package, take a few extra
     17 minutes to identify the specific problem and make a stand-alone test
     18 case.
     19 
     20 A stand-alone test case includes all the data necessary to perform the
     21 test, and the specific invocation of sed that causes the problem.  The
     22 smaller a stand-alone test case is, the better.  A test case should
     23 not involve something as far removed from sed as ``try to configure
     24 frobme-1.3.4''.  Yes, that is in principle enough information to look
     25 for the bug, but that is not a very practical prospect.
     26 
     27 
     28 
     29 * NON-BUGS
     30 
     31 `N' command on the last line
     32 
     33   Most versions of sed exit without printing anything when the `N'
     34   command is issued on the last line of a file.  GNU sed instead
     35   prints pattern space before exiting unless of course the `-n'
     36   command switch has been specified.  More information on the reason
     37   behind this choice can be found in the Info manual.
     38 
     39 
     40 regex syntax clashes (problems with backslashes)
     41 
     42   sed uses the Posix basic regular expression syntax.  According to
     43   the standard, the meaning of some escape sequences is undefined in
     44   this syntax;  notable in the case of GNU sed are `\|', `\+', `\?',
     45   `\`', `\'', `\<', `\>', `\b', `\B', `\w', and `\W'.
     46 
     47   As in all GNU programs that use Posix basic regular expressions, sed
     48   interprets these escape sequences as meta-characters.  So, `x\+'
     49   matches one or more occurrences of `x'.   `abc\|def' matches either
     50   `abc' or `def'.
     51 
     52   This syntax may cause problems when running scripts written for other
     53   seds.  Some sed programs have been written with the assumption that
     54   `\|' and `\+' match the literal characters `|' and `+'.  Such scripts
     55   must be modified by removing the spurious backslashes if they are to
     56   be used with recent versions of sed (not only GNU sed).
     57 
     58   On the other hand, some scripts use `s|abc\|def||g' to remove occurrences
     59   of _either_ `abc' or `def'.  While this worked until sed 4.0.x, newer
     60   versions interpret this as removing the string `abc|def'.  This is
     61   again undefined behavior according to POSIX, but this interpretation
     62   is arguably more robust: the older one, for example, required that
     63   the regex matcher parsed `\/' as `/' in the common case of escaping
     64   a slash, which is again undefined behavior; the new behavior avoids
     65   this, and this is good because the regex matcher is only partially
     66   under our control.
     67 
     68   In addition, GNU sed supports several escape characters (some of
     69   which are multi-character) to insert non-printable characters
     70   in scripts (`\a', `\c', `\d', `\o', `\r', `\t', `\v', `\x').  These
     71   can cause similar problems with scripts written for other seds.
     72 
     73 
     74 -i clobbers read-only files
     75 
     76   In short, `sed d -i' will let one delete the contents of
     77   a read-only file, and in general the `-i' option will let
     78   one clobber protected files.  This is not a bug, but rather a
     79   consequence of how the Unix filesystem works.
     80 
     81   The permissions on a file say what can happen to the data
     82   in that file, while the permissions on a directory say what can
     83   happen to the list of files in that directory.  `sed -i'
     84   will not ever open for writing  a file that is already on disk,
     85   rather, it will work on a temporary file that is finally renamed
     86   to the original name: if you rename or delete files, you're actually
     87   modifying the contents of the directory, so the operation depends on
     88   the permissions of the directory, not of the file).  For this same
     89   reason, sed will not let one use `-i' on a writeable file in a
     90   read-only directory, and will break hard or symbolic links when
     91   `-i' is used on such a file.
     92 
     93 
     94 `0a' does not work (gives an error)
     95 
     96   There is no line 0.  0 is a special address that is only used to treat
     97   addresses like `0,/RE/' as active when the script starts: if you
     98   write `1,/abc/d' and the first line includes the word `abc', then
     99   that match would be ignored because address ranges must span at least
    100   two lines (barring the end of the file); but what you probably wanted is
    101   to delete every line up to the first one including `abc', and this
    102   is obtained with `0,/abc/d'.
    103 
    104 
    105 `[a-z]' is case insensitive
    106 `s/.*//' does not clear pattern space
    107 
    108   You are encountering problems with locales.  POSIX mandates that `[a-z]'
    109   uses the current locale's collation order -- in C parlance, that means
    110   strcoll(3) instead of strcmp(3).  Some locales have a case insensitive
    111   strcoll, others don't.
    112 
    113   Another problem is that [a-z] tries to use collation symbols.  This
    114   only happens if you are on the GNU system, using GNU libc's regular
    115   expression matcher instead of compiling the one supplied with GNU sed.
    116   In a Danish locale, for example, the regular expression `^[a-z]$'
    117   matches the string `aa', because `aa' is a single collating symbol that
    118   comes after `a' and before `b'; `ll' behaves similarly in Spanish
    119   locales, or `ij' in Dutch locales.
    120 
    121   Another common localization-related problem happens if your input stream
    122   includes invalid multibyte sequences.  POSIX mandates that such
    123   sequences are _not_ matched by `.', so that `s/.*//' will not clear
    124   pattern space as you would expect.  In fact, there is no way to clear
    125   sed's buffers in the middle of the script in most multibyte locales
    126   (including UTF-8 locales).  For this reason, GNU sed provides a `z'
    127   command (for `zap') as an extension.
    128 
    129   However, to work around both of these problems, which may cause bugs
    130   in shell scripts, you can set the LC_ALL environment variable to `C',
    131   or set the locale on a more fine-grained basis with the other LC_*
    132   environment variables.
    133 
    134