Discussion:
say if grep can find non-ascii
(too old to reply)
Dan Jacobson
2006-02-24 23:37:29 UTC
Permalink
The grep manual page should say how to do
$ perl -nwe 'print if /[^[:ascii:]]/'
but with grep. Or say if there's no way.
Never mind this:
$ grep -P '[^[:ascii:]]'
grep: The -P option is not supported
C> bug-***@gnu.org is probably the right place to ask.
Well I was just doing as the man page on Debian sid said.
Paul Eggert
2006-03-07 08:22:43 UTC
Permalink
I don't think the Grep manual should say explicitly how to do that
particular thing.
I disagree. I think it'd be useful to have a simple pattern that
tests for ASCII characters (i.e., bytes in the range 00 through 7F).

I myself needed such a pattern in the last couple of days, when I
mentioned to Andrew Josey of the Open Group that some of their
published text documents contained non-ASCII characters, and he
responded "How can I easily check for this?". I ended up telling him
"LC_ALL=C grep '[^[:space:][:print:]]'", which (1) is not quite
correct, and (2) is far less convenient than "grep '[[:ascii:]]'"
would be.
I'm not sure what the definition of "ASCII" is in this case
The standard one. See <http://en.wikipedia.org/wiki/ASCII>.
Does the following command do what you want?
grep '[ -~]'
That isn't correct, first because it's not portable outside the C
locale, and second because it doesn't match the 33 ASCII control
characters.
Claudio Fontana
2006-03-07 15:50:28 UTC
Permalink
I don't think the Grep manual should say
explicitly how to do that
particular thing.
I disagree. I think it'd be useful to have a simple
pattern that
tests for ASCII characters (i.e., bytes in the range
00 through 7F).
I had good results using the bash quoting.
Example: (find all lines containing bytes between 0x80
and 0xff)

$ grep [$'\x80'-$'\xFF'] *.txt

However the character 0x0 cannot be explicitly
specified for obvious argv reasons.

CLaudio





___________________________________
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive
http://it.messenger.yahoo.com
Julian Foad
2006-03-06 23:59:45 UTC
Permalink
Post by Dan Jacobson
The grep manual page should say how to do
$ perl -nwe 'print if /[^[:ascii:]]/'
but with grep. Or say if there's no way.
I don't think the Grep manual should say explicitly how to do that particular
thing. The Grep manual should (and does) describe exactly what Grep can do and
how to make it do what it can do, and it may give some examples. Is there any
particular reason you feel it should describe how to do the equivalent of your
particular Perl command?

As far as I can tell, you want to find each line that contains a non-ASCII
character, except that I'm not sure what the definition of "ASCII" is in this
case, nor how the behaviour of this Perl command is affected by locale.

I assume you've seen in Grep's manual (and I'll assume you're using Grep
v2.5.1) that it supports a set of character classes such as "[:alnum:]" but not
"[:ascii:]".

Does the following command do what you want?

grep '[ -~]'

- Julian
Dan Jacobson
2006-03-07 00:45:47 UTC
Permalink
There should be some mention of how one is to give control characters
as arguments to grep. If one must one do
$ grep $'\x00' file.txt, etc.
And if without bash handy, must one pass them raw right there as
arguments? I.e., document if there is a way or no way to describe
arbitrary octets or must one pass them raw.

Loading...