Discussion:
[PATCH 0/4] Cross compiling sharutils
(too old to reply)
Bruce Korb
2015-05-19 23:23:10 UTC
Permalink
Hi Bruce, Eric,
Excuse me for writing to you directly... Is there a mailing list for sharutils
discussion? I couldn't really find one...
"bug-gnu-***@gnu.org"

But since I don't carefully monitor it, you are better off cc-ing me directly.

http://lists.gnu.org/archive/html/info-gnu/2015-02/msg00002.html
http://www.gnu.org/software/sharutils/

I'm confused as to where this is coming from. Anyway, after grubbing around a bit,
https://www.sourceware.org/bugzilla/show_bug.cgi?id=11312
wherein our glibc maintainer suggested that all adaptations belong in Windows
so the One True Libc Source can be as absolutely efficient as possible. *sigh*.
1) Turn the popen support for binary mode into a runtime check and fallback to
using non-binary mode if it fails as unsupported.
Just use an #if defined(_WIN32) || defined(_WIN64), yes?
2) Use a configure check to detect whether the host OS *requires* binary mode
for that particular popen call. It seems the patch was introduced for
cygwin, unfortunately I don't really have access to a cygwin machine to
check whether that makes a difference and to propose a check here. In the
that's a possibility?
I have no idea. I don't have access to very many platforms.
3) If we keep a configure check that possibly requires running a program
(either current case or proposal #2), then introduce an autoconf cache
variable for it, so that at least it is possible to override the check by
passing something like gl_cv_popen_supports_binary=yes in the ./configure
command line. Also, ideally, use the third case of AC_RUN_IFELSE which is
triggered when cross-compiling to default it to something sane (I'd say "no"
which is fine as long as someone cross-compiling for cygwin can still
override it, perhaps it's possible to use some config.guess magic to detect
cygwin and default it differently there.)
Either #3 or a variation on #1 that goes to the extra somersaults IFF the target
is a Windows platform. I think we can pretty well agree that we're only talking
Windows here.
Eric Blake
2015-05-20 01:57:25 UTC
Permalink
2) Use a configure check to detect whether the host OS *requires*
binary mode
for that particular popen call. It seems the patch was introduced for
cygwin, unfortunately I don't really have access to a cygwin
machine to
check whether that makes a difference and to propose a check here.
In the
perhaps
that's a possibility?
You probably don't even need a configure check. As in simply:

popen(..., O_BINARY ? "rb" : "r")

That is, the few platforms where O_BINARY is non-zero probably already
support "rb" as a mode for popen (are there any besides Cygwin and mingw?).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Eli Zaretskii
2015-05-20 02:56:37 UTC
Permalink
Date: Tue, 19 May 2015 19:57:25 -0600
That is, the few platforms where O_BINARY is non-zero probably already
support "rb" as a mode for popen (are there any besides Cygwin and mingw?).
DJGPP.
Bruce Korb
2015-05-21 03:51:16 UTC
Permalink
Post by Eli Zaretskii
Date: Tue, 19 May 2015 19:57:25 -0600
That is, the few platforms where O_BINARY is non-zero probably already
support "rb" as a mode for popen (are there any besides Cygwin and mingw?).
DJGPP.
That answers the question, "Does popen support "rb" as an open mode?"
The other question is: "Is O_BINARY non-zero on DJGPP, too?" I would expect.

Actually, since we're also dealing with glibc, which is perfect in every way
and has no need of facilitating cross platform development, O_BINARY
Post by Eli Zaretskii
$ cc -o tmp tmp.c
tmp.c:7:54: error: 'O_BINARY' undeclared (first use in this function)
printf("O_BINARY has the value %1$u (0x%1$X)\n", O_BINARY);
so it is actually necessary to add kludges for a platform that doesn't define it.
Post by Eli Zaretskii
#if defined(O_BINARY) && (O_BINARY != 0)
# define READ_BINARY_MODE "rb"
#else
# define READ_BINARY_MODE "r"
#endif
It seems one or two other projects have stubbed their toes, too:

$ grep -w O_BINARY $list
/usr/include/mysql/my_global.h:#ifndef O_BINARY
/usr/include/mysql/my_global.h:#define O_BINARY 0 /* Flag to my_open for binary files */
/usr/include/mysql/my_global.h:#define FILE_BINARY O_BINARY /* Flag to my_fopen for binary streams */
/usr/include/X11/Xw32defs.h:# define O_BINARY _O_BINARY
/usr/include/ImageMagick-6/magick/studio.h:#if !defined(O_BINARY)
/usr/include/ImageMagick-6/magick/studio.h:#define O_BINARY 0x00
/usr/include/kde_file.h:#ifndef O_BINARY
/usr/include/kde_file.h:#define O_BINARY 0 /* for open() */
/usr/include/jasper/jas_stream.h:#ifndef O_BINARY
/usr/include/jasper/jas_stream.h:#define O_BINARY 0
Eric Blake
2015-05-21 04:05:14 UTC
Permalink
Post by Bruce Korb
Post by Eli Zaretskii
Date: Tue, 19 May 2015 19:57:25 -0600
That is, the few platforms where O_BINARY is non-zero probably already
support "rb" as a mode for popen (are there any besides Cygwin and
mingw?).
DJGPP.
That answers the question, "Does popen support "rb" as an open mode?"
The other question is: "Is O_BINARY non-zero on DJGPP, too?" I would
expect.
Actually, since we're also dealing with glibc, which is perfect in every
way
and has no need of facilitating cross platform development, O_BINARY
gnulib does:

#ifndef O_BINARY
#define O_BINARY 0
#endif

which is perfectly fine; you can then use 'if (O_BINARY)' throughout
your code to bracket code that is only needed on platforms where text is
different from binary.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Filipe Brandenburger
2015-05-21 04:05:18 UTC
Permalink
Hi Bruce,

Just a short update that I'm working on an updated revision of this patch.

My current plan is along the O_BINARY lines, in fact I noticed there's
already code in lib/system.h to check for that and define
FOPEN_READ_BINARY and FOPEN_WRITE_BINARY, so I'm planning to just
reuse that:

http://git.savannah.gnu.org/cgit/sharutils.git/tree/lib/system.h#n102

#if ! defined(O_BINARY) || (O_BINARY == 0)
# define FOPEN_READ_BINARY "r"
# define FOPEN_WRITE_BINARY "w"
#else
# define FOPEN_READ_BINARY "rb"
# define FOPEN_WRITE_BINARY "wb"
#endif

I got my hands on a Windows instance and installed cygwin on it, so
I'm currently running cygwin tests to ensure there's no regression.
Post by Bruce Korb
Post by Eli Zaretskii
Post by Eric Blake
That is, the few platforms where O_BINARY is non-zero probably already
support "rb" as a mode for popen (are there any besides Cygwin and
mingw?).
DJGPP.
That answers the question, "Does popen support "rb" as an open mode?"
The other question is: "Is O_BINARY non-zero on DJGPP, too?" I would
expect.
I'm not really sure that's relevant. I looked into a mingw build
(actually cross compiling from Linux) and it stumbled on things like
#include <pwd.h> among many others, so I'm not convinced that windows
other than cygwin actually builds...

Though yeah, I think it's pretty safe to assume they'll have O_BINARY
defined to != 0 so I'm expecting if they ever worked, they will keep
working.
Post by Bruce Korb
Actually, since we're also dealing with glibc, which is perfect in every way
and has no need of facilitating cross platform development, O_BINARY
Post by Eli Zaretskii
$ cc -o tmp tmp.c
tmp.c:7:54: error: 'O_BINARY' undeclared (first use in this function)
printf("O_BINARY has the value %1$u (0x%1$X)\n", O_BINARY);
so it is actually necessary to add kludges for a platform that doesn't
define it.
That's actually handled by gnulib in binary-io.h (I believe indirectly
through fcntl.h in that directory) which is what I believe Eric
suggested in the original thread from 5 years ago.

http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/fcntl.in.h#n319

I tested with that on glibc and O_BINARY ? "rb" : "r" works when
#include "binary-io.h" is in place.
Post by Bruce Korb
Post by Eli Zaretskii
#if defined(O_BINARY) && (O_BINARY != 0)
# define READ_BINARY_MODE "rb"
#else
# define READ_BINARY_MODE "r"
#endif
Seems like the snippet I just pasted above which already exists in the code :-)

Later tonight, I should have an updated patch, tested on both cygwin and glibc.

Cheers,
Filipe
Filipe Brandenburger
2015-05-21 04:40:16 UTC
Permalink
On Wed, May 20, 2015 at 9:05 PM, Filipe Brandenburger
Post by Filipe Brandenburger
I got my hands on a Windows instance and installed cygwin on it, so
I'm currently running cygwin tests to ensure there's no regression.
Hmmm... So I checked cygwin and I built a version of the unmodified
tarball and then another one where I just changed the freadonly_binary
to "r" in the popen call.

I tried shar -C gzip/xz/bzip2 on many files, unpacked them by running
the shar script and compared them with the original, but I didn't
really manage to reproduce the issue by which popen with a simple "r"
corrupted the files...

Looking at the cygwin popen implementation (assuming I have the right
one), it doesn't really look like they look for a 'b' and handle that
any differently:

https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/posix/popen.c;h=faf72b6e380a8f74cc6af590a540b9fe292aab28;hb=HEAD#l120

So I'm really at a loss at why this code was even there, as I can't
really think of another platform that would conceivably handle
popen("rb") different from popen("r") if cygwin doesn't...

In any case, I think it makes sense to check for O_BINARY and use the
FOPEN_{READ,WRITE}_BINARY defines. They don't really hurt... So I'll
send you a patch for that anyways.

Just doing a few more tests, will send you as soon as I'm confident it's good.

Cheers,
Filipe
Eric Blake
2015-05-21 14:40:29 UTC
Permalink
Post by Filipe Brandenburger
Looking at the cygwin popen implementation (assuming I have the right
one), it doesn't really look like they look for a 'b' and handle that
https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/posix/popen.c;h=faf72b6e380a8f74cc6af590a540b9fe292aab28;hb=HEAD#l120
Wrong source. That's the generic newlib implementation of popen, but
cygwin overrides with its own version at:

https://cygwin.com/git/?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/syscalls.cc;h=86faa3;hb=HEAD#l4297

which indeed treats "rb" differently from "r" and the extension "rt".
Post by Filipe Brandenburger
So I'm really at a loss at why this code was even there, as I can't
really think of another platform that would conceivably handle
popen("rb") different from popen("r") if cygwin doesn't...
But cygwin DOES care about "rb"; if you don't pass the 'b', then you
risk \r being unexpectedly stripped from the command's output.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Eli Zaretskii
2015-05-21 15:57:35 UTC
Permalink
Date: Wed, 20 May 2015 20:51:16 -0700
Post by Eli Zaretskii
Date: Tue, 19 May 2015 19:57:25 -0600
That is, the few platforms where O_BINARY is non-zero probably already
support "rb" as a mode for popen (are there any besides Cygwin and mingw?).
DJGPP.
That answers the question, "Does popen support "rb" as an open mode?"
Yes.
The other question is: "Is O_BINARY non-zero on DJGPP, too?" I would expect.
Yes.
Eli Zaretskii
2015-05-21 16:00:23 UTC
Permalink
Date: Wed, 20 May 2015 21:05:18 -0700
Post by Bruce Korb
That answers the question, "Does popen support "rb" as an open mode?"
The other question is: "Is O_BINARY non-zero on DJGPP, too?" I would
expect.
I'm not really sure that's relevant. I looked into a mingw build
(actually cross compiling from Linux) and it stumbled on things like
#include <pwd.h> among many others, so I'm not convinced that windows
other than cygwin actually builds...
The MinGW port of Sharutils certainly builds (or at least it did a few
versions ago).
Eli Zaretskii
2015-05-21 16:12:20 UTC
Permalink
Date: Wed, 20 May 2015 21:40:16 -0700
On Wed, May 20, 2015 at 9:05 PM, Filipe Brandenburger
Post by Filipe Brandenburger
I got my hands on a Windows instance and installed cygwin on it, so
I'm currently running cygwin tests to ensure there's no regression.
Hmmm... So I checked cygwin and I built a version of the unmodified
tarball and then another one where I just changed the freadonly_binary
to "r" in the popen call.
I tried shar -C gzip/xz/bzip2 on many files, unpacked them by running
the shar script and compared them with the original, but I didn't
really manage to reproduce the issue by which popen with a simple "r"
corrupted the files...
It was I who added the binary mode to some popen calls in Sharutils.
I did that when I worked on the MinGW port of Sharutils. The reason
is simple: you _must_ read output from a compressor in binary mode,
because what the compressor outputs is binary data, not text. If you
read it in text mode, the read will stop on the first ^Z character,
and it will strip CR characters from the data. (I'm guessing that the
former doesn't happen in Cygwin, so you don't see problems because
your examples, by sheer luck, didn't have CR characters in the
compressed data, or maybe didn't have them before an LF character.)
So I'm really at a loss at why this code was even there, as I can't
really think of another platform that would conceivably handle
popen("rb") different from popen("r") if cygwin doesn't...
See above: all Windows platforms do.
Filipe Brandenburger
2015-05-22 04:50:55 UTC
Permalink
Hi Eli,
Post by Eli Zaretskii
It was I who added the binary mode to some popen calls in Sharutils.
I did that when I worked on the MinGW port of Sharutils. The reason
is simple: you _must_ read output from a compressor in binary mode,
because what the compressor outputs is binary data, not text. If you
read it in text mode, the read will stop on the first ^Z character,
and it will strip CR characters from the data. (I'm guessing that the
former doesn't happen in Cygwin, so you don't see problems because
your examples, by sheer luck, didn't have CR characters in the
compressed data, or maybe didn't have them before an LF character.)
Ok, so I spent some time doing some tests to understand how this works...

TL;DR: cygwin doesn't seem to need popen("rb") though mingw definitely does.

Even though cygwin does differentiate "r" and "rb", it my tests it
seemed to be using binary by default. Not sure if that's a cygwin
environment setting or if it was detecting that gzip output "looked"
binary and was using binary that way...

I set up an experiment generating random strings, writing them to
files and calling popen of "gzip -c -9" on those files.

For example, this particular random string:

"3pgcYgX yir4aRb mlWU2Zu lt1LO80 URPM6TO UZzj5Pb 6wdgIZ0 yMiH59i I57sMM2\r\n"

Compressing it with "gzip -c -9" will produce contents that will
contain a ^Z, a CRLF and a bare CR.

In cygwin, both popen(cmd, "r") and popen(cmd, "rb") returned the
exact same output, including the unmodified ^Z, CRLF and bare CR
characters.

Used popen(cmd, "rt") in Cygwin would indeed convert th CRLF into an
LF (though it would not choke on the ^Z or touch the bare CR), so I
could get some data corruption but I had to ask for it with the
explicit "t" modifier.

I tried the same tests with random strings on binaries built on mingw.
Indeed, in that environment, I saw exactly what you described. With
popen(cmd, "r"), a ^Z did truncate the input and a CRLF would be
converted into a bare LF. With popen(cmd, "rb"), none of that
happened.

I tried to go deeper in the mingw experiment, by actually building
sharutils with a modified popen without the "b" modifier. But I didn't
really get to build it in mingw/msys. I can run ./configure fine, but
make chokes when compiling lib/idcache.c that wants to #include
<pwd.h> which is not there (among other problems, it's not just this
one.) I'm not sure whether a recent change broke the mingw build of
sharutils or whether I'm making a mistake while trying to build it
myself. But at this point, looks like diminshing returns, so I'll give
myself for satisfied with the experiment on mingw calling popen on
random strings.

In any case, this all seems to corroborate that mingw does indeed need
the "b" modifier and that using the check for O_BINARY being defined
to a non-zero value is a good way to determine whether to pass the "b"
to popen and fopen calls or not.

Do you still have any objections to this O_BINARY approach?

Cheers,
Filipe
Eli Zaretskii
2015-05-22 08:31:05 UTC
Permalink
Date: Thu, 21 May 2015 21:50:55 -0700
I tried to go deeper in the mingw experiment, by actually building
sharutils with a modified popen without the "b" modifier. But I didn't
really get to build it in mingw/msys. I can run ./configure fine, but
make chokes when compiling lib/idcache.c that wants to #include
<pwd.h> which is not there (among other problems, it's not just this
one.)
When I ported Sharutils to MinGW, I indeed added that header (and also
grp.h.).
In any case, this all seems to corroborate that mingw does indeed need
the "b" modifier and that using the check for O_BINARY being defined
to a non-zero value is a good way to determine whether to pass the "b"
to popen and fopen calls or not.
Do you still have any objections to this O_BINARY approach?
No.
Eric Blake
2015-05-22 21:32:13 UTC
Permalink
Post by Filipe Brandenburger
TL;DR: cygwin doesn't seem to need popen("rb") though mingw definitely does.
Even though cygwin does differentiate "r" and "rb", it my tests it
seemed to be using binary by default. Not sure if that's a cygwin
environment setting or if it was detecting that gzip output "looked"
binary and was using binary that way...
Actually, https://cygwin.com/cygwin-ug-net/using-cygwinenv.html
documents that CYGWIN=nopipe_byte defined in your environment will force
cygwin to default to handling pipe contents in text mode (the default is
pipe_byte, which handles pipe contents in binary mode). Using
popen("rt") forces text mode; popen("rb") forces binary mode, and
popen("r") uses the settings of $CYGWIN to decide which mode to use
(usually binary, as the default of pipe_byte was set for a reason).

If you MUST test text mode pipes on cygwin, then you'll have to
experiment with changing $CYGWIN
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Loading...