Re: somebody dropped a (warning) bomb

From: Sergei Organov
Date: Mon Feb 19 2007 - 06:57:21 EST


Bodo Eggert <7eggert@xxxxxx> writes:
> On Fri, 16 Feb 2007, Sergei Organov wrote:
>> Bodo Eggert <7eggert@xxxxxx> writes:
>> > Sergei Organov <osv@xxxxxxxxx> wrote:
>> >> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
>
[...]
>> If we start talking about the C language, my opinion is that it's C
>> problem that it allows numeric comparison of "char" variables at
>> all. If one actually needs to compare alphabetic characters numerically,
>> he should first cast to required integer type.
>
> Char values are indexes in a character set. Taking the difference of
> indexes is legal. Other languages have ord('x'), but it's only an
> expensive typecast (byte('x') does exactly the same thing).

Why this typecast should be expensive? Is

char c1, c2;
...
if((unsigned char)c1 < (unsigned char)c2)

any more expensive than if(c1 < c2)?

[...]

>> Comparison of characters being numeric is not a very good property of
>> the C language.
>
> NACK, as above

The point was not to entirely disable it or to make it any more
expensive, but to somehow force programmer to be explicit that he indeed
needs numeric comparison of characters' indexes. Well, I do realize that
the above is not in the C "spirit" that is to allow implicit conversions
almost everywhere.

>> > I repeat: Thanks to using signed chars, the programs only
>> > work /by/ /accident/! Promoting the use of signed char strings is promoting
>> > bugs and endangering the stability of all our systems. You should stop this
>> > bullshit now, instead of increasing the pile.
>>
>> Where did you see I promoted using of "singed char strings"?!
>
> char strings are often signed char strings, only using unsigned char
> prevents this in a portable way. If you care about ordinal values of
> your strings, use unsigned.

It means that (as I don't want to change the type of my
strings/characters in case I suddenly start to care about ordinal values
of the characters) I must always use "unsigned char" for storing
characters in practice. Right?

There are at least three problems then:

1. The type of "abc" remains "char*", so effectively I'll have two
different representations of strings.

2. I can't mix two different representations of strings in C without
compromising type safety. In C, the type for characters is "char",
that by definition is different type than "unsigned char".

3. As I now represent characters by unsigned tiny integers, I loose
distinct type for unsigned tiny integers. Effectively this brings me
back in time to the old days when C didn't have distinct type for
signed tiny integers.

Therefore, your "solution" to one problem creates a bunch of
others. Worse yet, you in fact don't solve initial problem of comparing
strings. Take, for example, KOI8-R encoding. Comparing characters from
this encoding using "unsigned char" representation of character indexes
makes no more sense than comparing "signed char" representations, as
both comparisons will bring meaningless results, due to the fact that
the order of (some of) the characters in the table doesn't match their
alphabetical order.

>> If you don't like the fact that in C language characters are "char",
>> strings are "char*", and the sign of char is implementation-defined,
>> please argue with the C committee, not with me.
>>
>> Or use -funsigned-char to get dialect of C that fits your requirements
>> better.
>
> If you restrict yourself to a specific C compiler, you are doomed, and
> that's not a good start.

Isn't it you who insists that "char" should always be unsigned? It's
your problem then to use compiler that understands those imaginary
language that meets the requirement.

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/