Re: [RFCv2] string: Use faster alternatives when constant arguments are used

From: Sultan Alsawaf
Date: Sun Mar 24 2019 - 18:32:15 EST


On Sun, Mar 24, 2019 at 10:17:49PM +0100, Rasmus Villemoes wrote:
> gcc already knows the semantics of these functions and can optimize
> accordingly. E.g. for strcpy() of a literal to a buffer, gcc readily
> compiles

The example you gave appears to get optimized accordingly, but there are
numerous sane counterexamples that don't get optimized.

On arm64, with GCC 8.3.0, let's look at the simple strcmp usage in
kernel/trace/preemptirq_delay_test.c:

static int preemptirq_delay_run(void *data)
{
unsigned long flags;

if (!strcmp(test_mode, "irq")) {
local_irq_save(flags);
busy_wait(delay);
local_irq_restore(flags);
} else if (!strcmp(test_mode, "preempt")) {
preempt_disable();
busy_wait(delay);
preempt_enable();
}

return 0;
}

This is what happens without my patch:

preemptirq_delay_run:
.LFB2517:
.cfi_startproc
stp x29, x30, [sp, -32]!
.cfi_def_cfa_offset 32
.cfi_offset 29, -32
.cfi_offset 30, -24
adrp x1, .LC0
add x1, x1, :lo12:.LC0
mov x29, sp
stp x19, x20, [sp, 16]
.cfi_offset 19, -16
.cfi_offset 20, -8
adrp x19, .LANCHOR0
add x19, x19, :lo12:.LANCHOR0
mov x0, x19
> bl strcmp
cbz w0, .L22
adrp x1, .LC1
mov x0, x19
add x1, x1, :lo12:.LC1
> bl strcmp
cbz w0, .L23

The calls to strcmp are not optimized out. Here is what this snippet looks like
after my patch:

preemptirq_delay_run:
.LFB2517:
.cfi_startproc
stp x29, x30, [sp, -32]!
.cfi_def_cfa_offset 32
.cfi_offset 29, -32
.cfi_offset 30, -24
adrp x0, .LANCHOR0
mov w1, 29289
mov x29, sp
ldr w2, [x0, #:lo12:.LANCHOR0]
movk w1, 0x71, lsl 16
add x3, x0, :lo12:.LANCHOR0
cmp w2, w1
beq .L23
ldr x1, [x0, #:lo12:.LANCHOR0]
mov x0, 29296
movk x0, 0x6565, lsl 16
movk x0, 0x706d, lsl 32
movk x0, 0x74, lsl 48
cmp x1, x0
beq .L24

The calls to strcmp were optimized out completely, not even replaced by a call
to memcmp.

I can produce lots of kernel examples that don't seem to follow basic str*
optimization, which is what prompted me to write this in the first place.

> Does this even compile? It's a well-known (or perhaps
> not-so-well-known?) pitfall that __builtin_constant_p() is not
> guaranteed to be usable in __builtin_choose_expr() - the latter only
> accepts bona fide integer constant expressions, while evaluation of
> __builtin_constant_p can be delayed until various optimization phases.

It not only compiles, but also seems to work correctly from what I've observed.

> This seems to be buggy - you don't know that src is at least as long as
> dest. And arguing "if it's shorter, there's a nul byte, which will
> differ from dest at that point, so memcmp will/should stop" means that
> the whole word-at-a-time argument would be out.

You mean reading the last word of a string could read out of bounds of the
string when using memcmp? There are lots of memcmp instances using a literal
string in the kernel; I don't think it would be hard to find one that violates
what you've pointed out, so I'm not really sure why it's a problem.

After a few minutes of grepping, it seems like the memcmp usage in
drivers/md/dm-integrity.c can read out of bounds on its arguments:
} else if (!memcmp(opt_string, "internal_hash:", strlen("internal_hash:"))) {
r = get_alg_and_key(opt_string, &ic->internal_hash_alg, &ti->error,
"Invalid internal_hash argument");
if (r)
goto bad;
} else if (!memcmp(opt_string, "journal_crypt:", strlen("journal_crypt:"))) {
r = get_alg_and_key(opt_string, &ic->journal_crypt_alg, &ti->error,
"Invalid journal_crypt argument");
if (r)
goto bad;
} else if (!memcmp(opt_string, "journal_mac:", strlen("journal_mac:"))) {

Where opt_string is a dynamically-set argument with no specified minimum length.

Sultan