Re: [GIT PULL] locking changes for v6.4

From: Ingo Molnar
Date: Sat Apr 29 2023 - 02:50:39 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, Apr 27, 2023 at 12:58 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> > - Add non-atomic __xchg() variant, use it in a couple of places
>
> Guys, this is insane, and completely unacceptable.
>
> I pulled this, but I'm going to unpull it, because the code is
> actively wrong and ugly.
>
> It not only randomly decides to re-use a name that has existing users
> that now need to be fixed up.

meh - you are 100% right, I'm not sure what we were thinking there ... [
actually, I know what we were thinking, but it's a bit complicated - see
the various non-perfect nomenclature options further below. ]

So the first line of our thinking was that "__" also often & additionally
means 'lighter weight version of a similar API signature, beware, here be
dragons, use at your own risk', and more of the focus of these particular
changes was on identifying hand-coded xchg-ish pieces of code, such as in:

26ace5d28d36 ("arch/*/uprobes: simplify arch_uretprobe_hijack_return_addr")

... but while that background of '__' is somewhat valid logic that we use
quite often in various kernel facilities, it doesn't really excuse the
sloppy decision to slap __ in front of an existing API without trying
harder, *especially* that a better name with fetch_and_zero() already
existed :-/

> It then *also* decides to start "preferring" this absolutely
> disgusting new name over a much more legible one in the i915 driver,
> which had this same functionality except it used a prettier name:
>
> fetch_and_zero()
>
> But what then takes the cake for me is that this horribly ugly feature
> then didn't even get that right, and only randomly converted *some* of
> the users, with most of them remaining:
>
> git grep fetch_and_zero drivers/gpu/drm/i915/ | wc
> 58 187 5534
> git grep -w __xchg drivers/gpu/drm/i915/ | wc
> 22 109 1899
>
> and it looks like the only "logic" to this is that the converted ones
> were in the "gt/" subdirectory. What a random choice, but happily it
> caused a trivial conflict, and as a result I noticed how bad things
> were.
>
> Anyway, I really find this all offensively ugly and pointless. I'm not
> going to pull some "fixed" version of this. This needs to go away and
> never come back.

Yeah. So I've rebased locking/core to take out these changes - a simple
revert is too ugly and the history has no value here really.

Will re-send the rest of locking/core.

> What was so magically great about the name "__xchg" that it needed to be
> taken over by this function? And why was that legibly named version of it
> replaced so randomly?

Yeah.

So fetch_and_zero() has a bit of a nomenclature & ambiguity problem as
well: there's already an atomic_fetch_*() API family, and it's easy to
think that fetch_and_zero() is atomic too - a bit like how xchg() is atomic
without mentioning 'atomic'.

Adding to the confusion is that there's already atomic APIs that don't use
atomic_t:

xchg()
cmpxchg()
try_cmpxchg()

... and by *that* implicit nomenclature logic, dropping the atomic_ from a
atomic_fetch_and_zero() API means: 'atomic API, not using atomic_t'. Which
fetch_and_zero() clearly isnt ...

So by all that logic and somewhat idiosynchratic API history, the new
facility should probably not be fetch_and_zero(), but something like
nonatomic_fetch_and_zero(), but that's quite a mouthful for something so
simple - and the API family connection to xchg() is lost as well, which is
a bit sad...

In all that context the least bad approach sounded to add a __ to denote
__xchg() is 'something special and also lighter weight' (which it is).

I *think* the bigger danger in locking nomenclature is to falsely imply
atomicity - in that sense I'm not sure fetch_and_zero() is ideal - but I
can certainly live with it b/c the perfect name keeps eluding me.

> The *whole* point of two underscores is to say "don't use this - it's
> an internal implementation". That's the historical meaning, and it's
> the meaning we have in the kernel too. Two underscores means "this is
> special and doesn't do everything required" (it might need locking
> around it, for example).

Yeah. I do think we might want to keep one related change though:

e27cff3d2d43 ("arch: rename all internal names __xchg to __arch_xchg")

... not because we want to use the __xchg namespace, but because an _arch
prefix makes it even *less* likely to be used by non-infrastructure code.

> So then making a new interface with two underscores and thinking "we
> should now make random drivers use this" is fundamentally bogus.
>
> Look, just grep for "__xchg" in the main tree (ie the one *without* this
> change). It all makes sense. It's all clearly an internal helper - as
> marked by that double underscore - and it's not used by any driver or
> filesystem code.
>
> Exactly like K&R and God intended.

Yeah. We'll try this new facility again in v6.5, but with a better name.
Sorry about that!

Thanks,

Ingo