Discussion:
Diff doesn't properly ignore whitespace for this input
(too old to reply)
Tyler Bletsch
2015-07-14 19:01:28 UTC
Permalink
I believe I've found a bug in diff's handling of "ignore whitespace"
mode. I have two test files that differ only in whitespace and newlines;
I've verified this using a separate tool (WinMerge) plus doing a diff on
the files after doing s/\s*/ / on the whole file. When I ask for the
diff using "-wb", it reports a spurious difference only in whitespace if
I give the arguments in one order, but correctly reports no differences
if I give it the reverse order. Further, I get consistently correct
behavior if I add the "-d" option.

Example:

$ diff -wB in1.txt in2.txt
3946c4201,4203
< Exits:
---
$ diff -wB in2.txt in1.txt
$ diff -dwB in1.txt in2.txt
$ diff -dwB in2.txt in1.txt

This came up while using diff to automatically grade a text adventure
I'm having students do in my class -- this is the ONLY file pair out of
over 3000 that appears to exhibit the problem. This leads me to believe
that it must be a fairly rare issue. I'm fixing it on my end by always
using -d, but I think this should be classified as a bug, because it
reports a non-whitespace difference in files where none exists.

I'm not sure if this mailing list allows attachments, so I've put the
files in question here:

https://dl.dropboxusercontent.com/u/68643317/diff-bug-test-files.zip

I tried paring the files down to just demonstrate the bug and nothing
else, but the behavior would seemingly go away at random as I removed
content from the files. Therefore, I'm including the files in their
original form. The files represent test output of the text adventure,
specifically navigation of the default world from the ROM 2.4b6 MUD
(after having been converted to a format for my class's assignment).
This content is safe to share.

I've confirmed that this behavior is present in the following builds of
diff:
- diff (GNU diffutils) 2.8.1 on Red Hat Enterprise Linux Server release
6.5 (Santiago)
- diff (GNU diffutils) 3.2 on Ubuntu 12.04.4 LTS
- diff (GNU diffutils) 2.9 on Cygwin 32-bit (Windows 7 x64)

Let me know if there's any further information I can provide that might
assist. Thanks for producing quality utilities used the world over!

Regards,
Dr. Tyler Bletsch
Adjunct Professor, NC State University
Jim Meyering
2015-07-16 20:45:55 UTC
Permalink
I believe I've found a bug in diff's handling of "ignore whitespace" mode. I
have two test files that differ only in whitespace and newlines; I've
verified this using a separate tool (WinMerge) plus doing a diff on the
files after doing s/\s*/ / on the whole file. When I ask for the diff using
"-wb", it reports a spurious difference only in whitespace if I give the
arguments in one order, but correctly reports no differences if I give it
the reverse order. Further, I get consistently correct behavior if I add
the "-d" option.
$ diff -wB in1.txt in2.txt
3946c4201,4203
---
$ diff -wB in2.txt in1.txt
$ diff -dwB in1.txt in2.txt
$ diff -dwB in2.txt in1.txt
This came up while using diff to automatically grade a text adventure I'm
having students do in my class -- this is the ONLY file pair out of over
3000 that appears to exhibit the problem. This leads me to believe that it
must be a fairly rare issue. I'm fixing it on my end by always using -d, but
I think this should be classified as a bug, because it reports a
non-whitespace difference in files where none exists.
I'm not sure if this mailing list allows attachments, so I've put the files
https://dl.dropboxusercontent.com/u/68643317/diff-bug-test-files.zip
I tried paring the files down to just demonstrate the bug and nothing else,
but the behavior would seemingly go away at random as I removed content from
the files. Therefore, I'm including the files in their original form. The
files represent test output of the text adventure, specifically navigation
of the default world from the ROM 2.4b6 MUD (after having been converted to
a format for my class's assignment). This content is safe to share.
I've confirmed that this behavior is present in the following builds of
- diff (GNU diffutils) 2.8.1 on Red Hat Enterprise Linux Server release 6.5
(Santiago)
- diff (GNU diffutils) 3.2 on Ubuntu 12.04.4 LTS
- diff (GNU diffutils) 2.9 on Cygwin 32-bit (Windows 7 x64)
Thank you for the report.
I confirm that it also affects diff-3.3, but found that with the very
latest from diff.git (v3.3-30-g29e8de4), the problem does not arise.
I.e., comparing your two files like this produces no output:

$ src/diff -wBu /t/in{1,2}.txt | wc -c
0

I suspect that it was fixed via this change by Paul Eggert:

http://git.savannah.gnu.org/cgit/diffutils.git/commit/?id=9b48bf3d3ed002e32fad
http://bugs.gnu.org/16848
Tyler Bletsch
2015-07-17 17:22:36 UTC
Permalink
Thanks for the reply. It is so neat to see one of the original authors
update diff with a fix that actually affects me. A nice object lesson in
production software development for my class, too.

- Tyler
Post by Jim Meyering
I believe I've found a bug in diff's handling of "ignore whitespace" mode. I
have two test files that differ only in whitespace and newlines; I've
verified this using a separate tool (WinMerge) plus doing a diff on the
files after doing s/\s*/ / on the whole file. When I ask for the diff using
"-wb", it reports a spurious difference only in whitespace if I give the
arguments in one order, but correctly reports no differences if I give it
the reverse order. Further, I get consistently correct behavior if I add
the "-d" option.
$ diff -wB in1.txt in2.txt
3946c4201,4203
---
$ diff -wB in2.txt in1.txt
$ diff -dwB in1.txt in2.txt
$ diff -dwB in2.txt in1.txt
This came up while using diff to automatically grade a text adventure I'm
having students do in my class -- this is the ONLY file pair out of over
3000 that appears to exhibit the problem. This leads me to believe that it
must be a fairly rare issue. I'm fixing it on my end by always using -d, but
I think this should be classified as a bug, because it reports a
non-whitespace difference in files where none exists.
I'm not sure if this mailing list allows attachments, so I've put the files
https://dl.dropboxusercontent.com/u/68643317/diff-bug-test-files.zip
I tried paring the files down to just demonstrate the bug and nothing else,
but the behavior would seemingly go away at random as I removed content from
the files. Therefore, I'm including the files in their original form. The
files represent test output of the text adventure, specifically navigation
of the default world from the ROM 2.4b6 MUD (after having been converted to
a format for my class's assignment). This content is safe to share.
I've confirmed that this behavior is present in the following builds of
- diff (GNU diffutils) 2.8.1 on Red Hat Enterprise Linux Server release 6.5
(Santiago)
- diff (GNU diffutils) 3.2 on Ubuntu 12.04.4 LTS
- diff (GNU diffutils) 2.9 on Cygwin 32-bit (Windows 7 x64)
Thank you for the report.
I confirm that it also affects diff-3.3, but found that with the very
latest from diff.git (v3.3-30-g29e8de4), the problem does not arise.
$ src/diff -wBu /t/in{1,2}.txt | wc -c
0
http://git.savannah.gnu.org/cgit/diffutils.git/commit/?id=9b48bf3d3ed002e32fad
http://bugs.gnu.org/16848
Loading...