Re: somebody dropped a (warning) bomb

From: Linus Torvalds
Date: Mon Feb 12 2007 - 11:27:05 EST




On Mon, 12 Feb 2007, Sergei Organov wrote:
>
> Why strlen() should be allowed to be called with an incompatible pointer
> type? My point is that gcc should issue *different warning*, -- the same
> warning it issues here:

I agree that "strlen()" per se isn't different.

The issue is not that the warning isn't "technically correct". IT IS.

Nobody should ever argue that the warning isn't "correct". I hope people
didn't think I argued that.

I've argued that the warning is STUPID. That's a totally different thing.

I can say totally idiotic things in perfectly reasonable English grammar
and spelling. Does that make the things I say "good"? No.

The same is true of this gcc warning. It's technically perfectly
reasonable both in English grammar and spelling (well, as far as
any compiler warning ever is) _and_ in "C grammar and spelling" too.

But being grammatically correct does not make it "smart". IT IS STILL
STUPID.

Can people not see the difference between "grammatically correct" and
"intelligent"? People on the internet seem to have often acquired the
understanding that "bad grammar and spelling" => "stupid", and yes, there
is definitely some kind of correlation there. But as any logician and
matematician hopefully knows, "a => b" does NOT imply "!a => !b".

Some people think that "warnings are always good". HELL NO!

A warnign is only as good as
(a) the thing it warns about
(b) the thing you can do about it

And THAT is the fundamental problem with that *idiotic* warning. Yes, it's
technically correct. Yes, it's "proper C grammar". But if you can't get
over the hump of realizing that there is a difference between "grammar"
and "intelligent speech", you shouldn't be doing compilers.

So the warning sucks because:

- the thing it warns about (passing "unsigned char" to something that
doesn't specify a sign at all!) is not something that sounds wrong in
the first place. Yes, it's unsigned. But no, the thing it is passed to
didn't specify that it wanted a "signed" thing in the first place. The
"strlen()" function literally says

"I want a char of indeterminate sign"!

which implies that strlen really doesn't care about the sign. The same
is true of *any* function that takes a "char *". Such a function
doesn't care, and fundamentally CANNOT care about the sign, since it's
not even defined!

So the warning fails the (a) criterion. The warning isn't valid,
because the thing it warns about isn't a valid problem!

- it _also_ fails the (b) criterion, because quite often there is nothing
you can do about it. Yes, you can add a cast, but adding a cast
actually causes _worse_ code (but the warning is certainly gone). But
that makes the _other_ argument for the warning totally point-less: if
the reason for the warning was "bad code", then having the warning is
actively BAD, because the end result is actually "worse code".

See? The second point is why it's important to also realize that there is
a lot of real and valid code that actually _does_ pass "strlen()" an
unsigned string. There are tons of reasons for that to happen: the part of
the program that _does_ care wants to use a "unsigned char" array, because
it ends up doing things like "isspace(array[x])", and that is not
well-defined if you use a "char *" array.

So there are lots of reasons to use "unsigned char" arrays for strings.
Look it up. Look up any half-way reasonable man-page for the "isspace()"
kind of functions, and if they don't actually explicitly say that you
should use unsigned characters for it, those man-pages are crap. Because
those functions really *are* defined in "int", but it's the same kind of
namespace that "getchar()" works in (ie "unsigned char" + EOF, where EOF
_usually_ is -1, although other values are certainly technically legal
too).

So:

- in practice, a lot of "good programming" uses "unsigned char" pointers
for doing strings. There are LOTS of reasons for that, but "isspace()"
and friends is the most obvious one.

- if you can't call "strlen()" on your strings without the compiler
warning, there's two choices: the compiler warning is CRAP, or your
program is bad. But as I just showed you, "unsigned char *" is actually
often the *right* thing to use for string work, so it clearly wasn't
the program that was bad.

So *please* understand:

- yes, the warning is "correct" from a C grammatical standpoint

- the warnign is STILL CRAP, because grammar isn't the only thing about a
computer language. Sane usage is MUCH MORE important than any grammar.

Thus ends the sacred teachings of Linus "always right" Torvalds. Go and
ponder these words, and please send me all your money (certified checks
only, please - sending small unmarked bills is against USPS rules) to show
your support of the holy church of good taste.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/