Re: [GIT PULL] locking changes for v6.4

From: Andrzej Hajda
Date: Thu May 04 2023 - 13:30:56 EST


Hi all,

Sorry for late response.
Just two cents to constructively defend myself.

On 29.04.2023 08:47, Ingo Molnar wrote:
* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

On Thu, Apr 27, 2023 at 12:58 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
- Add non-atomic __xchg() variant, use it in a couple of places
Guys, this is insane, and completely unacceptable.

I pulled this, but I'm going to unpull it, because the code is
actively wrong and ugly.

It not only randomly decides to re-use a name that has existing users
that now need to be fixed up.
meh - you are 100% right, I'm not sure what we were thinking there ... [
actually, I know what we were thinking, but it's a bit complicated - see
the various non-perfect nomenclature options further below. ]

So the first line of our thinking was that "__" also often & additionally
means 'lighter weight version of a similar API signature, beware, here be
dragons, use at your own risk', and more of the focus of these particular
changes was on identifying hand-coded xchg-ish pieces of code, such as in:

26ace5d28d36 ("arch/*/uprobes: simplify arch_uretprobe_hijack_return_addr")

... but while that background of '__' is somewhat valid logic that we use
quite often in various kernel facilities, it doesn't really excuse the
sloppy decision to slap __ in front of an existing API without trying
harder, *especially* that a better name with fetch_and_zero() already
existed :-/

It then *also* decides to start "preferring" this absolutely
disgusting new name over a much more legible one in the i915 driver,
which had this same functionality except it used a prettier name:

fetch_and_zero()

My 1st approach was to follow C++ library and I have proposed 'exchange' [1][2].
Later I have also showed/suggested Rust's approach [3][4]:
- 'replace' for __xchg,
- 'take' for fetch_and_zero

Anyway it ended as is.

[1]: https://lore.kernel.org/lkml/Y5OFSvaYbv4XCxhE@xxxxxxxxxxxxxxxxxx/T/
[2]: https://en.cppreference.com/w/cpp/header/utility
[3]: https://doc.rust-lang.org/std/mem/index.html
[4]: https://lore.kernel.org/lkml/20230118153529.57695-1-andrzej.hajda@xxxxxxxxx/


But what then takes the cake for me is that this horribly ugly feature
then didn't even get that right, and only randomly converted *some* of
the users, with most of them remaining:

git grep fetch_and_zero drivers/gpu/drm/i915/ | wc
58 187 5534
git grep -w __xchg drivers/gpu/drm/i915/ | wc
22 109 1899

and it looks like the only "logic" to this is that the converted ones
were in the "gt/" subdirectory. What a random choice, but happily it
caused a trivial conflict, and as a result I noticed how bad things
were.

The logic was just to focus on introduction of the helper, providing few (not all) examples of usage and finish i915 conversion later, when the helper will be accepted.
Conversion should be split anyway - as i915 uses different repo for i915/gt, sorry for not clearly stating this.

Regards
Andrzej


Anyway, I really find this all offensively ugly and pointless. I'm not
going to pull some "fixed" version of this. This needs to go away and
never come back.
Yeah. So I've rebased locking/core to take out these changes - a simple
revert is too ugly and the history has no value here really.

Will re-send the rest of locking/core.

What was so magically great about the name "__xchg" that it needed to be
taken over by this function? And why was that legibly named version of it
replaced so randomly?
Yeah.

So fetch_and_zero() has a bit of a nomenclature & ambiguity problem as
well: there's already an atomic_fetch_*() API family, and it's easy to
think that fetch_and_zero() is atomic too - a bit like how xchg() is atomic
without mentioning 'atomic'.

Adding to the confusion is that there's already atomic APIs that don't use
atomic_t:

xchg()
cmpxchg()
try_cmpxchg()

... and by *that* implicit nomenclature logic, dropping the atomic_ from a
atomic_fetch_and_zero() API means: 'atomic API, not using atomic_t'. Which
fetch_and_zero() clearly isnt ...

So by all that logic and somewhat idiosynchratic API history, the new
facility should probably not be fetch_and_zero(), but something like
nonatomic_fetch_and_zero(), but that's quite a mouthful for something so
simple - and the API family connection to xchg() is lost as well, which is
a bit sad...

In all that context the least bad approach sounded to add a __ to denote
__xchg() is 'something special and also lighter weight' (which it is).

I *think* the bigger danger in locking nomenclature is to falsely imply
atomicity - in that sense I'm not sure fetch_and_zero() is ideal - but I
can certainly live with it b/c the perfect name keeps eluding me.

The *whole* point of two underscores is to say "don't use this - it's
an internal implementation". That's the historical meaning, and it's
the meaning we have in the kernel too. Two underscores means "this is
special and doesn't do everything required" (it might need locking
around it, for example).
Yeah. I do think we might want to keep one related change though:

e27cff3d2d43 ("arch: rename all internal names __xchg to __arch_xchg")

... not because we want to use the __xchg namespace, but because an _arch
prefix makes it even *less* likely to be used by non-infrastructure code.

So then making a new interface with two underscores and thinking "we
should now make random drivers use this" is fundamentally bogus.

Look, just grep for "__xchg" in the main tree (ie the one *without* this
change). It all makes sense. It's all clearly an internal helper - as
marked by that double underscore - and it's not used by any driver or
filesystem code.

Exactly like K&R and God intended.
Yeah. We'll try this new facility again in v6.5, but with a better name.
Sorry about that!

Thanks,

Ingo