Re: [PATCH] x86/smp: Use atomic_try_cmpxchg() to micro-optimize native_stop_other_cpus()

From: Dave Hansen
Date: Fri Nov 17 2023 - 12:23:06 EST


On 11/17/23 08:37, Uros Bizjak wrote:
> On Fri, Nov 17, 2023 at 5:18 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>> Is there some other side benefit that I'm missing here? Applying this
>> patch doesn't seem to have a great risk/reward ratio.
>
> Yes, in addition to better asm code, I think that the use of magic
> constant (-1) is not descriptive at all. I tried to make this code
> look like nmi_panic() from kernel/panic.c, which has similar
> functionality, and describe that this constant belongs to old_cpu
> (same as in nmi_panic() ).

I guess it's a step in the direction of nmi_panic(). But honestly, it
doesn't go far enough for me at least. The nice part about nmi_panic()
is that it gives -1 nice symbolic name and uses that name in the
definition of the atomic_t.

> Also, from converting many cmpxchg to try_cmpxchg, it becomes evident
> that in cases like this (usage in "if" clauses) the correct locking
> primitive is try_cmpxchg. Additionally, in this particular case, it
> is not the speed, but a little code save that can be achieved with
> the same functionality.

I think I understand what you're trying to say: using try_cmpxchg can
result in better code generation in general than plain cmpxchg. Thus,
it's more "correct" to use try_cmpxchg in any case where plain cmpxchg
is in use. Right?

I honestly don't think cmpxchg is more or less "correct" than
try_cmpxchg. If you're going to be sending patches like these, I'd
remove this kind of language from your changelogs and discussions
because it holds zero weight.

Here's what I'm afraid of: this patch is not tested enough. We apply it
and then start getting reports of reboot breaking on some server.
Someone spends two hours bisecting to this patch. We'll wonder after
the fact: Was this patch worth it?

I don't think what you have here is more descriptive than what was there
before. It still has a -1 and still doesn't logically connect it to the
'stopping_cpu' definition. I have no idea how much this was tested. It
doesn't _clearly_ move the needle enough on making the code better to
apply it.

We shouldn't be blindly converting cmpxchg=>try_cmpxchg, and then trying
to justify it after the fact. I _do_ agree that try_cmpxchg() leads to
better looking C code *AND* generated code. So, for new x86 code, it
seems like the best thing to do. But for old (working) code, I think it
should mostly be left alone.

Maybe you could keep an eye on:

> https://lore.kernel.org/lkml/?q=dfb%3Aatomic_cmpxchg+-dfb%3Aatomic_try_cmpxchg

and remind folks what they should be using, rather than expending
efforts on trying to move existing code over.