Re: Supporting Macintosh FinderInfo/Resource Fork in Linux NWFS 2.0

Mike Touloumtzis (miket@bluemug.com)
Mon, 13 Dec 1999 15:25:29 -0800


On Mon, Dec 13, 1999 at 09:57:45AM -0700, Jeff V. Merkey wrote:
>
> I've read through all the threads, and I am still mystified as to why
> our standard bash shell (which I love) does not have this enabled by
> default -- I think I've tracked down the problem here, it's the
> character set the bash shell is using to output unicode. The data does
> make it through with the eighth bit, but gets translated into a "non PC"
> character set by default on bash (???).
>

Jeff,

As several people have noted, bash has nothing to do with the output
character set. Your terminal controls that. Bash hooks your program's
output to its controlling terminal, but does not actually process the
program's output. You are probably already getting 8 bit output, but
that does not automatically mean that you'll get box drawing glyphs.

The "PC character set" box drawing characters you refer to are grossly
nonstandard. In particular, they are not part of ASCII, which doesn't
go above 127. Nor are they necessarily present in Windows fonts,
unless those fonts provide the so-called "OEM" charset. They are in
fact present in a number of Linux/X fonts, but there are no guarantees.

Right now, your terminal/console is probably using Unicode or a subset
thereof. From 0 to 127, Unicode is identical to ASCII. From 128 to
255, Unicode is identical to ISO-8859-1 aka latin-1. Your terminal
is probably using fonts that contain ISO-8859-1 glyphs at the moment,
since this provides the Roman alphabet complete with accented characters
suitable for several European languages.

FYI, Unicode _does_ have box drawing characters; they are in the range
0x2500 to 0x257F and look just like the PC ROM ones. Your terminal
is probably not using a full Unicode glyph set, though, so you may
not have access to these. Also, to output non-ASCII (>127) Unicode
characters on the console, you'll have to use the UTF-8 encoding, which
is a simple multibyte encoding of the normally fixed-width Unicode (UTF-8
can actually encode the full UCS-4, I believe). Just dumping raw UTF-8
into your terminal is not portable, though, since the terminal may not
be Unicode/UTF-8 based.

In summary, characters in 128-255 could be interpreted as:

-- Junk (doesn't happen much nowadays).
-- Single byte characters in ISO-8859-[1-9].
-- Single byte characters in another charset, such as KOI8-R (which
is optimized for Cyrillic character handling and degrades cleverly
when high bits are accidentally stripped).
-- Partial characters in the multibyte UTF-8 encoding of Unicode.
-- Partial characters in other multibyte charsets, such as EUC-JP
or Shift-JIS (both Japanese).
-- (even more possibilities; has anyone been crazy enough to implement
a UTF-7-capable terminal emulator? :-).

Although you want easy-to-use and nice looking box drawing characters,
you must realize that people using Kana/Kanji charsets, KOI8-R charsets,
ISO-8859-* charsets, etc. may not have access to box drawing characters
-- they are inherently nonportable. If you can't handle the appearance
degradation that results from use of the ncurses box drawing functions
(or those present in higher level text-widget libraries), I suggest you
look at the "fat line" (background-color based) drawing used by e.g. the
"dialog" package.

miket

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/