Discussion:
sed bug: ASCII NUL doesn't work on the rhs of y// commands
(too old to reply)
t***@inventati.org
2014-09-03 01:09:02 UTC
Permalink
Hi,

The subject pretty much says it all for this bug. Compare the output of
"echo abc | sed -e 's/b/\x00/' | hexdump -c" and "echo abc | sed -e
'y/b/\x00/' | hexdump -c". The s command behaves correctly (as I would
expect, it replaces the 'b' with a NUL character), while the y command
fails to output anything when it should print NUL, resulting in an
output file shorter than the input was.

Thanks,
table

x86_64 Linux
$ sed --version
sed (GNU sed) 4.2.2
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jay Fenlason, Tom Lord, Ken Pizzini,
and Paolo Bonzini.
GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-***@gnu.org>.
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
Paolo Bonzini
2014-09-03 11:19:33 UTC
Permalink
Post by t***@inventati.org
Hi,
The subject pretty much says it all for this bug. Compare the output of
"echo abc | sed -e 's/b/\x00/' | hexdump -c" and "echo abc | sed -e
'y/b/\x00/' | hexdump -c". The s command behaves correctly (as I would
expect, it replaces the 'b' with a NUL character), while the y command
fails to output anything when it should print NUL, resulting in an
output file shorter than the input was.
Looks like the bug was introduced when "y" was extended to support
multibyte characters. The minimal patch should be to change

int trans_len = strlen(trans[2*i+1]);

to

char *trans = trans[2*i+1];
int trans_len = *trans == '\0' ? 1 : strlen(trans);

in sed/execute.c

Paolo
Jim Meyering
2014-09-06 16:29:43 UTC
Permalink
Post by Paolo Bonzini
Post by t***@inventati.org
Hi,
The subject pretty much says it all for this bug. Compare the output of
"echo abc | sed -e 's/b/\x00/' | hexdump -c" and "echo abc | sed -e
'y/b/\x00/' | hexdump -c". The s command behaves correctly (as I would
expect, it replaces the 'b' with a NUL character), while the y command
fails to output anything when it should print NUL, resulting in an
output file shorter than the input was.
Looks like the bug was introduced when "y" was extended to support
multibyte characters. The minimal patch should be to change
int trans_len = strlen(trans[2*i+1]);
to
char *trans = trans[2*i+1];
int trans_len = *trans == '\0' ? 1 : strlen(trans);
in sed/execute.c
Hi Paolo,
Thanks for the suggestion.
Here's a complete patch (can't reuse the name
"trans" that way, and I prefer to s/int/size_t/).
I expect to find the precise commit that introduced
the bug, adjust the log and NEWS,
and then push tomorrow.
I updated NEWS, but didn't take the time to find the precise commit.
I was surprised to see there is only one git tag.

I've also pushed the following to fix the trivial "version" test failure:
Loading...