Next: Limitations of Make, Previous: Limitations of Builtins, Up: Portable Shell
The small set of tools you can expect to find on any machine can still include some limitations you should be aware of.
$ gawk 'function die () { print "Aaaaarg!" }
BEGIN { die () }'
gawk: cmd. line:2: BEGIN { die () }
gawk: cmd. line:2: ^ parse error
$ gawk 'function die () { print "Aaaaarg!" }
BEGIN { die() }'
Aaaaarg!
If you want your program to be deterministic, don't depend on for
on arrays:
$ cat for.awk
END {
arr["foo"] = 1
arr["bar"] = 1
for (i in arr)
print i
}
$ gawk -f for.awk </dev/null
foo
bar
$ nawk -f for.awk </dev/null
bar
foo
Some AWK, such as HPUX 11.0's native one, have regex engines fragile to inner anchors:
$ echo xfoo | $AWK '/foo|^bar/ { print }'
$ echo bar | $AWK '/foo|^bar/ { print }'
bar
$ echo xfoo | $AWK '/^bar|foo/ { print }'
xfoo
$ echo bar | $AWK '/^bar|foo/ { print }'
bar
Either do not depend on such patterns (i.e., use `/^(.*foo|bar)/',
or use a simple test to reject such AWK.
AC_PROG_CC_C_O.
When a compilation such as `cc -o foo foo.c' fails, some compilers (such as cds on Reliant unix) leave a foo.o.
HP-UX cc doesn't accept .S files to preprocess and assemble. `cc -c foo.S' will appear to succeed, but in fact does nothing.
The default executable, produced by `cc foo.c', can be
The C compiler's traditional name is cc, but other names like
gcc are common. POSIX 1003.1-2001 specifies the
name c99, but older POSIX editions specified
c89 and anyway these standard names are rarely used in
practice. Typically the C compiler is invoked from makefiles that use
`$(CC)', so the value of the `CC' make variable selects the
compiler name.
utime, which has 1-second resolution, but some newer
cp implementations use utimes, which has
1-microsecond resolution. These newer implementations include GNU
coreutils 5.0.91 or later, and Solaris 8 (sparc) patch 109933-02 or
later. Unfortunately as of September 2003 there is still no system
call to set time stamps to the full nanosecond resolution.
SunOS cp does not support -f, although its
mv does. It's possible to deduce why mv and
cp are different with respect to -f. mv
prompts by default before overwriting a read-only file. cp
does not. Therefore, mv requires a -f option, but
cp does not. mv and cp behave differently
with respect to read-only files because the simplest form of
cp cannot overwrite a read-only file, but the simplest form of
mv can. This is because cp opens the target for
write access, whereas mv simply calls link (or, in
newer systems, rename).
Bob Proulx notes that `cp -p' always tries to copy ownerships. But whether it actually does copy ownerships or not is a system dependent policy decision implemented by the kernel. If the kernel allows it then it happens. If the kernel does not allow it then it does not happen. It is not something cp itself has control over.
In SysV any user can chown files to any other user, and SysV also had a non-sticky /tmp. That undoubtedly derives from the heritage of SysV in a business environment without hostile users. BSD changed this to be a more secure model where only root can chown files and a sticky /tmp is used. That undoubtedly derives from the heritage of BSD in a campus environment.
Linux by default follows BSD, but it can be configured to allow
chown. HP-UX as an alternate example follows SysV, but it can
be configured to use the modern security model and disallow
chown. Since it is an administrator configurable parameter
you can't use the name of the kernel as an indicator of the behavior.
$ uname -a
OSF1 medusa.sis.pasteur.fr V5.1 732 alpha
$ date "+%s"
%s
Some implementations, such as Tru64's, fail when comparing to
/dev/null. Use an empty file instead.
AS_DIRNAME (see Programming in M4sh). For example:
dir=`dirname "$file"` # This is not portable.
dir=`AS_DIRNAME(["$file"])` # This is more portable.
This handles a few subtleties in the standard way required by POSIX. For example, under UN*X, should `dirname //1' give `/'? Paul Eggert answers:
No, under some older flavors of Unix, leading `//' is a special path name: it refers to a “super-root” and is used to access other machines' files. Leading `///', `////', etc. are equivalent to `/'; but leading `//' is special. I think this tradition started with Apollo Domain/OS, an OS that is still in use on some older hosts.POSIX allows but does not require the special treatment for `//'. It says that the behavior of dirname on path names of the form `//([^/]+/*)?' is implementation defined. In these cases, GNU dirname returns `/', but it's more portable to return `//' as this works even on those older flavors of Unix.
grep -E. To work around this problem, invoke
AC_PROG_EGREP and then use $EGREP.
The empty alternative is not portable, use `?' instead. For instance with Digital Unix v5.0:
> printf "foo\n|foo\n" | $EGREP '^(|foo|bar)$'
|foo
> printf "bar\nbar|\n" | $EGREP '^(foo|bar|)$'
bar|
> printf "foo\nfoo|\n|bar\nbar\n" | $EGREP '^(foo||bar)$'
foo
|bar
$EGREP also suffers the limitations of grep.
Don't use length, substr, match and index.
expr '' \| ''
GNU/Linux and POSIX.2-1992 return the empty string for this case, but traditional unix returns `0' (Solaris is one such example). In POSIX.1-2001, the specification has been changed to match traditional unix's behavior (which is bizarre, but it's too late to fix this). Please note that the same problem does arise when the empty string results from a computation, as in:
expr bar : foo \| foo : bar
Avoid this portability problem by avoiding the empty string.
The POSIX standard is ambiguous as to whether `expr 'a' : '\(b\)'' outputs `0' or the empty string. In practice, it outputs the empty string on most platforms, but portable scripts should not assume this. For instance, the QNX 4.25 native expr returns `0'.
One might think that a way to get a uniform behavior would be to use the empty string as a default value:
expr a : '\(b\)' \| ''
Unfortunately this behaves exactly as the original expression; see the `expr (`:')' entry for more information.
Older expr implementations (e.g., SunOS 4 expr and Solaris 8 /usr/ucb/expr) have a silly length limit that causes expr to fail if the matched substring is longer than 120 bytes. In this case, you might want to fall back on `echo|sed' if expr fails.
Don't leave, there is some more!
The QNX 4.25 expr, in addition of preferring `0' to the empty string, has a funny behavior in its exit status: it's always 1 when parentheses are used!
$ val=`expr 'a' : 'a'`; echo "$?: $val"
0: 1
$ val=`expr 'a' : 'b'`; echo "$?: $val"
1: 0
$ val=`expr 'a' : '\(a\)'`; echo "?: $val"
1: a
$ val=`expr 'a' : '\(b\)'`; echo "?: $val"
1: 0
In practice this can be a big problem if you are ready to catch failures of expr programs with some other method (such as using sed), since you may get twice the result. For instance
$ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'
will output `a' on most hosts, but `aa' on QNX 4.25. A
simple workaround consists in testing expr and use a variable
set to expr or to false according to the result.
grep -F. To work around this problem, invoke
AC_PROG_FGREP and then use $FGREP.
The replacement of `{}' is guaranteed only if the argument is exactly {}, not if it's only a part of an argument. For instance on DU, and HP-UX 10.20 and HP-UX 11:
$ touch foo
$ find . -name foo -exec echo "{}-{}" \;
{}-{}
while GNU find reports `./foo-./foo'.
grep to /dev/null. Check the exit
status of grep to determine whether it found a match.
Don't use multiple regexps with -e, as some grep will only
honor the last pattern (e.g., irix 6.5 and Solaris 2.5.1). Anyway,
Stardent Vistra SVR4 grep lacks -e... Instead, use
extended regular expressions and alternation.
Don't rely on -w, as Irix 6.5.16m's grep does not
support it.
For versions of the DJGPP before 2.04, ln emulates soft links
to executables by generating a stub that in turn calls the real
program. This feature also works with nonexistent files like in the
Unix spec. So `ln -s file link' will generate link.exe,
which will attempt to call file.exe if run. But this feature only
works for executables, so `cp -p' is used instead for these
systems. DJGPP versions 2.04 and later have full symlink support.
Modern practice is for all diagnostics to go to standard error, but
traditional `ls foo' prints the message `foo not found' to
standard output if foo does not exist. Be careful when writing
shell commands like `sources=`ls *.c 2>/dev/null`', since with
traditional ls this is equivalent to `sources="*.c not
found"' if there are no `.c' files.
AS_MKDIR_P(filename) (see Programming in M4sh).
POSIX does not clearly specify whether `mkdir -p foo' should succeed when foo is a symbolic link to an already-existing directory. GNU Coreutils 5.1.0 mkdir succeeds, but Solaris 9 mkdir fails.
Not all mkdir -p implementations are thread-safe. When it is not
and you call mkdir -p a/b and mkdir -p a/c at the same
time, both will detect that a/ is missing, one will create
a/, then the other will try to create a/ and die with a
File exists error. At least Solaris 8, NetBSD 1.6, and OpenBSD
3.4 have an unsafe mkdir -p. GNU Coreutils (since Fileutils
version 4.0c), FreeBSD 5.0, and NetBSD-current are known to have a
race-free mkdir -p. This possible race is harmful in parallel
builds when several Makefile rules call mkdir -p to
construct directories. You may use mkinstalldirs or
install-sh -d as a safe replacement, provided these scripts are
recent enough (the copies shipped with Automake 1.8.3 are OK, those from
older versions are not thread-safe either).
Moving individual files between file systems is portable (it was in V6), but it is not always atomic: when doing `mv new existing', there's a critical section where neither the old nor the new version of existing actually exists.
Be aware that moving files from /tmp can sometimes cause
undesirable (but perfectly valid) warnings, even if you created these
files. On some systems, creating the file in /tmp is setting a
guid wheel which you may not be part of. So the file is copied,
and then the chgrp fails:
$ touch /tmp/foo
$ mv /tmp/foo .
error-->mv: ./foo: set owner/group (was: 3830/0): Operation not permitted
$ echo $?
0
$ ls foo
foo
This behavior conforms to POSIX:
If the duplication of the file characteristics fails for any reason, mv shall write a diagnostic message to standard error, but this failure shall not cause mv to modify its exit status.”
Moving directories across mount points is not portable, use cp and rm.
Moving/Deleting open files isn't portable. The following can't be done on DOS/WIN32:
exec > foo
mv foo bar
nor can
exec > foo
rm -f foo
Sed scripts should not use branch labels longer than 8 characters and should not contain comments.
Don't include extra `;', as some sed, such as NetBSD 1.4.2's, try to interpret the second as a command:
$ echo a | sed 's/x/x/;;s/x/x/'
sed: 1: "s/x/x/;;s/x/x/": invalid command code ;
Input should have reasonably long lines, since some sed have an input buffer limited to 4000 bytes.
Alternation, `\|', is common but POSIX does not require its support, so it should be avoided in portable scripts. Solaris 8 sed does not support alternation; e.g., `sed '/a\|b/d'' deletes only lines that contain the literal string `a|b'.
Anchors (`^' and `$') inside groups are not portable.
Nested parenthesization in patterns (e.g., `\(\(a*\)b*)\)') is quite portable to modern hosts, but is not supported by some older sed implementations like SVR3.
Of course the option -e is portable, but it is not needed. No valid Sed program can start with a dash, so it does not help disambiguating. Its sole usefulness is to help enforcing indentation as in:
sed -e instruction-1 \
-e instruction-2
as opposed to
sed instruction-1;instruction-2
Contrary to yet another urban legend, you may portably use `&' in
the replacement part of the s command to mean “what was
matched”. All descendants of Bell Lab's V7 sed (at least; we
don't have first hand experience with older seds) have
supported it.
POSIX requires that you must not have any white space between `!' and the following command. It is OK to have blanks between the address and the `!'. For instance, on Solaris 8:
$ echo "foo" | sed -n '/bar/ ! p'
error-->Unrecognized command: /bar/ ! p
$ echo "foo" | sed -n '/bar/! p'
error-->Unrecognized command: /bar/! p
$ echo "foo" | sed -n '/bar/ !p'
foo
s/keep me/kept/g # a
t end # b
s/.*/deleted/g # c
: end # d
on
delete me # 1
delete me # 2
keep me # 3
delete me # 4
you get
deleted
delete me
kept
deleted
instead of
deleted
deleted
kept
deleted
Why? When processing 1, a matches, therefore sets the t flag, b jumps to d, and the output is produced. When processing line 2, the t flag is still set (this is the bug). Line a fails to match, but sed is not supposed to clear the t flag when a substitution fails. Line b sees that the flag is set, therefore it clears it, and jumps to d, hence you get `delete me' instead of `deleted'. When processing 3, t is clear, a matches, so the flag is set, hence b clears the flags and jumps. Finally, since the flag is clear, 4 is processed properly.
There are two things one should remember about `t' in sed. Firstly, always remember that `t' jumps if some substitution succeeded, not only the immediately preceding substitution. Therefore, always use a fake `t clear; : clear' to reset the t flag where needed.
Secondly, you cannot rely on sed to clear the flag at each new cycle.
One portable implementation of the script above is:
t clear
: clear
s/keep me/kept/g
t end
s/.*/deleted/g
: end
utime or
utimes system call, which can result in the same kind of
timestamp truncation problems that `cp -p' has.
On some old BSD systems, touch or any command that
results in an empty file does not update the timestamps, so use a
command like echo as a workaround.
GNU touch 3.16r (and presumably all before that) fails to work on SunOS 4.1.3 when the empty file is on an NFS-mounted 4.2 volume.