Re: [PATCH] kbuild: treat char as always signed

From: Gabriel Paubert
Date: Sat Oct 22 2022 - 07:02:01 EST


On Fri, Oct 21, 2022 at 03:46:01PM -0700, Linus Torvalds wrote:
> On Thu, Oct 20, 2022 at 3:41 AM Gabriel Paubert <paubert@xxxxxxx> wrote:
> >
> > I must miss something, the strcmp man page says:
> >
> > "The comparison is done using unsigned characters."
>
> You're not missing anything, I just hadn't looked at strcmp() in forever.
>
> Yeah, strcmp clearly doesn't care about the signedness of 'char', and
> arguably an unsigned char argument makes more sense considering the
> semantics of the funmction.
>
> > But it's not for this that I wrote this message. Has anybody considered
> > using transparent unions?
>
> I don't love the transparent union-as-argument syntax, but you're
> right, that would fix the warning.

I'm not in love with the syntax either.

>
> Except it then doesn't actually *work* very well.
>
> Try this:
>
> #include <sys/types.h>
>
> #if USE_UNION
> typedef union {
> const char *a;
> const signed char *b;
> const unsigned char *c;
> } conststring_arg __attribute__ ((__transparent_union__));
> size_t strlen(conststring_arg);
> #else
> size_t strlen(const char *);
> #endif
>
> int test(char *a, unsigned char *b)
> {
> return strlen(a)+strlen(b);
> }
>
> int test2(void)
> {
> return strlen("hello");
> }
>
> and now compile it both ways with
>
> gcc -DUSE_UNION -Wall -O2 -S t.c
> gcc -Wall -O2 -S t.c
>

Ok, I´ve just tried it, except that I had something slightly different in
mind, but perhaps should have been clearer in my first post.

I have change your code to the following:


#include <sys/types.h>

#if USE_UNION
typedef union {
const char *a;
const signed char *b;
const unsigned char *c;
} conststring_arg __attribute__ ((__transparent_union__));
static inline size_t strlen(conststring_arg p)
{
return __builtin_strlen(p.a);
}
#else
size_t strlen(const char *);
#endif

int test(char *a, unsigned char *b)
{
return strlen(a)+strlen(b);
}

int test2(void)
{
return strlen("hello");
}

> and notice how yes, the "-DUSE_UNION" one silences the warning about
> using 'unsigned char *' for strlen. So it seems to work fine.
>
> But then look at the code it generates for 'test2()" in the two cases.

Now test2 looks properly optimized.

This is a bit exploiting a compiler loophole, it calls an external
function which has been defined with the same name!

Depending on how you look at it, it's either disgusting or clever.

I don´t have clang installed, so I don't know whether it would swallow
this code or react with a strong allergy.

Gabriel
>
> The transparent union version actually generates a function call to an
> external 'strlen()' function.
>
> The regular version uses the compiler builtin, and just compiles
> test2() to return the constant value 5.
>
> So playing games with anonymous union arguments ends up also disabling
> all the compiler optimizations we do want, becaue apparently gcc then
> decides "ok, I'm not going to warn about you declaring this
> differently, but I'm also not going to use the regular one because you
> declared it differently".
>
> This, btw, is also the reason why we don't use --freestanding in the
> kernel. We do want the basic <string.h> things to just DTRT.
>
> For the sockaddr_in games, the above isn't an issue. For strlen() and
> friends, it very much is.
>
> Linus