Discussion:
SED collation sequence
(too old to reply)
william moss
2014-08-06 20:27:13 UTC
Permalink
On Debian
uname -a
Linux bbunny 3.2.0-4-686-pae #1 SMP Debian 3.2.60-1+deb7u3 i686 GNU/Linux

sed(1) collation sequence \x00-\x1f would not work, \x01-\x1f works
fine. The \x00 works on a custom system of mine with the save version of
sed(1) but with a 3.4.91 kernel compiled by me.

Sed version on both is 4.2.1.

command used is
A=$( 'the result of a call to an application'
|& sed -r -e's/\x00-\x1f\x80-\xff/?/g' )

The idea is to duplicate the output format of ls(1) with respect to non
printing characters, those outside the normal 7 bit ASCII range.

Not a big deal, I simply used \x01, but figured that it would be
something to note.

- --
William (Bill) Moss
***@acm.org
NY (USA)

Those who will not reason, are bigots,
those who cannot, are fools,
and those who dare not, are slaves.
by Lord Byron
Justice will not be served until those who are
unaffected are as outraged as those who are.
by Benjamin Franklin
That government is best which governs least.
Henry David Thoreau
Honor, justice and humanity forbid us tamely to
surrender that freedom which we received from
our gallant ancestors and which our innocent
posterity have a right to receive from us. We
cannot endure the infamy and guilt of resigning
succeeding generations to that wretchedness which
inevitably awaits them if we basely entail
hereditary bondage on them.
by Thomas Jefferson
Declaration of the Causes and Necessities
of Taking up Arms
6 July 1775
Petr Pisar
2014-08-07 06:37:04 UTC
Permalink
Post by william moss
On Debian
uname -a
Linux bbunny 3.2.0-4-686-pae #1 SMP Debian 3.2.60-1+deb7u3 i686 GNU/Linux
This has nothing to do with kernel. libc, sed versions, and your locale
matter.
Post by william moss
sed(1) collation sequence \x00-\x1f would not work, \x01-\x1f works
fine.
Maybe sed has some weired implmentation I don't know, however in general
character ranges are subject of collation which is defined by locale
(LC_COLLATE and maybe LC_CTYPE). If your locale sorts \x00 after \x1f,
then the the range will be empty. Try set them to "C" locale.
Post by william moss
command used is
A=$( 'the result of a call to an application'
|& sed -r -e's/\x00-\x1f\x80-\xff/?/g' )
Does that really work? Shouldn't the expression be:

s/[\x00-\x1f\x80-\xff]/?/g

If you will not find a solution with sed, try perl which supports
special character class `ascii' like this:

$ echo ažb | perl -pe 's/[[:^ascii:]]/?/g'
a??b

-- Petr

Loading...