Discussion:
[PATCH] maint.mk: less syntax-check noise when SIGPIPE is ignored
(too old to reply)
Eric Blake
2014-07-08 19:42:43 UTC
Permalink
For a project with enough files, such as libvirt, vc-list-files
can produce so much input that it can lead to SIGPIPE to earlier
parts of a pipeline when later parts do a quick filter. Also,
many buildbot environments (annoyingly) ignore SIGPIPE, which
causes a number of tools to be rather chatty about reporting
EPIPE write failures. It doesn't help that POSIX has standardized
that the shell is unable to revert SIGPIPE to unignored status
if it inherits it as ignored - otherwise, the solution would just
be to re-enable SIGPIPE anywhere we expect to benefit from early
filtering exits. Here's a short demonstration:

$ ( trap '' PIPE; build-aux/vc-list-files | grep -l '\.c$' >/dev/null)
sed: couldn't write 16 items to stdout: Broken pipe

and a link to the much larger buildbot results against libvirt
which provoked this patch:
http://honk.sigxcpu.org:8001/job/libvirt-syntax-check/2465/

But look at the above example: we are piping data to grep -l,
and then discarding that output. At most, data | grep -l will
output "(standard input)", and exit early if the first match
is found before the end of a page (causing SIGPIPE to the process
feeding the pipe). It makes much more sense to use grep -l when
searching for a subset of files that have a match among a larger
set of file names passed as arguments, and NOT when used to
filter stdin. Sure, we're burning a bit more CPU power by
processing the full list instead of exiting early, but at least
it cuts down on the noise.

* top/maint.mk (_sc_header_without_use)
(sc_require_config_h_first): Parse full list.

Signed-off-by: Eric Blake <***@redhat.com>
---

I'll push this to gnulib to work around the issue. But it really
begs the question - can sed and grep be taught a way to silently
ignore EPIPE errors?

See also http://austingroupbugs.net/view.php?id=789, which is
considering standardizing the shell's 'set -o pipefail', and where
it becomes vital to be able to exit with 0 status when used on
the left side of a pipe if the only reason we are exiting early
is because the right side of the pipe is also exiting early without
consuming everything we are shoving into the pipe. It is unclear
at this point whether POSIX would recommend that filter
applications should _always_ exit with 0 status on pipe failure,
or only do this for EPIPE write failures when SIGPIPE is ignored,
or whether it should be optional behavior that must be explicitly
enabled via a command-line option and/or system-wide environment
variable. But the point remains that among all possible write
failures, the failure to write to a pipe is often expected as part
of an optimized pipeline in order to reduce CPU usage, and there
should be a way to handle it silently.

ChangeLog | 6 ++++++
top/maint.mk | 4 ++--
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index e793866..a93f468 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2014-07-08 Eric Blake <***@redhat.com>
+
+ maint.mk: less syntax-check noise when SIGPIPE is ignored
+ * top/maint.mk (_sc_header_without_use)
+ (sc_require_config_h_first): Parse full list.
+
2014-06-27 Paul Eggert <***@cs.ucla.edu>

mktime: merge #if/#ifdef usage from glibc
diff --git a/top/maint.mk b/top/maint.mk
index 3f369b7..0cc769c 100644
--- a/top/maint.mk
+++ b/top/maint.mk
@@ -440,7 +440,7 @@ sc_require_config_h:
# You must include <config.h> before including any other header file.
# This can possibly be via a package-specific header, if given by cfg.mk.
sc_require_config_h_first:
- @if $(VC_LIST_EXCEPT) | grep -l '\.c$$' > /dev/null; then \
+ @if $(VC_LIST_EXCEPT) | grep '\.c$$' > /dev/null; then \
fail=0; \
for i in $$($(VC_LIST_EXCEPT) | grep '\.c$$'); do \
grep '^# *include\>' $$i | $(SED) 1q \
@@ -464,7 +464,7 @@ sc_prohibit_HAVE_MBRTOWC:
define _sc_header_without_use
dummy=; : so we do not need a semicolon before each use; \
h_esc=`echo '[<"]'"$$h"'[">]'|$(SED) 's/\./\\\\./g'`; \
- if $(VC_LIST_EXCEPT) | grep -l '\.c$$' > /dev/null; then \
+ if $(VC_LIST_EXCEPT) | grep '\.c$$' > /dev/null; then \
files=$$(grep -l '^# *include '"$$h_esc" \
$$($(VC_LIST_EXCEPT) | grep '\.c$$')) && \
grep -LE "$$re" $$files | grep . && \
--
1.9.3
Paul Eggert
2014-07-11 20:58:24 UTC
Permalink
Post by Eric Blake
It is unclear
at this point whether POSIX would recommend that filter
applications should_always_ exit with 0 status on pipe failure,
or only do this for EPIPE write failures when SIGPIPE is ignored,
or whether it should be optional behavior that must be explicitly
enabled via a command-line option and/or system-wide environment
variable.
None of these options sound appealing, I'm afraid. The first two would
be an incompatible change to longstanding standard behavior. A
system-wide environment variable would be problematic for all the usual
reaosns. A command-line option would be a pain to use (what? I have to
modify all my shell scripts?).

Instead, how about this idea? Change the behavior of the shell so that
SIGPIPE is not ignored in a pipeline (except in the pipeline's last
member of course), even if it is ignored in the parent. This is also a
change to POSIX, but it's a relatively minor one. Or, if we want to be
conservative about it, we could make the new behavior depend on a new
shell option. Either way, this would solve the problem without having
to change grep, sed, etc.

We might also want to have a way to reenable traps in the shell when
they're disabled; that's been a longstanding problem even aside from
this SIGPIPE business.
Eric Blake
2014-07-11 21:10:21 UTC
Permalink
Post by Paul Eggert
Post by Eric Blake
It is unclear
at this point whether POSIX would recommend that filter
applications should_always_ exit with 0 status on pipe failure,
or only do this for EPIPE write failures when SIGPIPE is ignored,
or whether it should be optional behavior that must be explicitly
enabled via a command-line option and/or system-wide environment
variable.
None of these options sound appealing, I'm afraid. The first two would
be an incompatible change to longstanding standard behavior. A
system-wide environment variable would be problematic for all the usual
reaosns. A command-line option would be a pain to use (what? I have to
modify all my shell scripts?).
Not all your scripts, only those scripts where you plan to use 'set -o
pipefail'.
Post by Paul Eggert
Instead, how about this idea? Change the behavior of the shell so that
SIGPIPE is not ignored in a pipeline (except in the pipeline's last
member of course), even if it is ignored in the parent. This is also a
change to POSIX, but it's a relatively minor one. Or, if we want to be
conservative about it, we could make the new behavior depend on a new
shell option. Either way, this would solve the problem without having
to change grep, sed, etc.
That's not quite right. Remember, the choice is between:

sigpipe normal:
foo | bar

if foo dies from SIGPIPE, but 'set -o pipefail' is in effect, then the
whole pipeline fails with status 141 (SIGPIPE killed a member of the
pipeline).

vs. sigpipe ignored:

foo | bar

foo will NOT get SIGPIPE, but instead gets EPIPE. If it treats the
write error as fatal, and exits non-zero, then 'set -o pipefail' fails
the whole pipeline. But if it treats the write error as the request to
do an early non-fatal exit with status 0, then the whole pipeline can
also have status 0.

The idea is that if you are going to write code with 'set -o pipefail',
you would do it like:

set -o pipefail
(trap '' PIPE; foo) | bar
set +o pipefail

where you explicitly ignore SIGPIPE (and force EPIPE write errors) on
any element of the pipeline where you expect the right side of the pipe
may exit early.
Post by Paul Eggert
We might also want to have a way to reenable traps in the shell when
they're disabled; that's been a longstanding problem even aside from
this SIGPIPE business.
Yes, that would be nice to have. But it goes in the opposite direction
of 'set -o pipefail', because re-enabling default SIGPIPE behavior
causes processes to die with status 141.

Or are you arguing that any shell that provides 'set -o pipefail' should
ALSO provide a knob to explicitly treat death due to SIGPIPE as not
impacting pipefail? At which point, then you DO want to re-enable
rather than ignore SIGPIPE, and don't have to worry about what child
processes do on EPIPE, but only what they do on SIGPIPE, where death by
SIGPIPE is not fatal to the pipeline.

Probably worth adding some of these thoughts to the Austin Group bug
proposal.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Paul Eggert
2014-07-11 21:12:31 UTC
Permalink
Post by Eric Blake
are you arguing that any shell that provides 'set -o pipefail' should
ALSO provide a knob to explicitly treat death due to SIGPIPE as not
impacting pipefail? At which point, then you DO want to re-enable
rather than ignore SIGPIPE, and don't have to worry about what child
processes do on EPIPE, but only what they do on SIGPIPE, where death by
SIGPIPE is not fatal to the pipeline.
Yes, that's the basic idea. Sorry I did not explain it clearly enough.
Please feel free to forward this on to the Austin group.

Loading...