Re: [PATCH] workqueue: Fix memory ordering race in queue_work*()

From: Linus Torvalds
Date: Tue Aug 16 2022 - 04:06:34 EST


On Mon, Aug 15, 2022 at 9:15 PM Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Please revert this as test_and_set_bit was always supposed to be
> a full memory barrier. This is an arch bug.

Yes, the bitops are kind of strange for various legacy reasons:

- set/clear_bit is atomic, but without a memory barrier, and need a
"smp_mb__before_atomic()" or similar for barriers

- test_and_set/clear_bit() are atomic, _and_ are memory barriers

- test_and_set_bit_lock and test_and_clear_bit_unlock are atomic and
_weaker_ than full memory barriers, but sufficient for locking (ie
acquire/release)

Does any of this make any sense at all? No. But those are the
documented semantics exactly because people were worried about
test_and_set_bit being used for locking, since on x86 all the atomics
are also memory barriers.

>From looking at it, the asm-generic implementation is a bit
questionable, though. In particular, it does

if (READ_ONCE(*p) & mask)
return 1;

so it's *entirely* unordered for the "bit was already set" case.

That looks very wrong to me, since it basically means that the
test_and_set_bit() can return "bit was already set" based on an old
value - not a memory barrier at all.

So if you use "test_and_set_bit()" as some kind of "I've done my work,
now I am going to set the bit to tell people to pick it up", then that
early "bit was already set" code completely breaks it.

Now, arguably our atomic bitop semantics are very very odd, and it
might be time to revisit them. But that code looks very very buggy to
me.

The bug seems to go back to commit e986a0d6cb36 ("locking/atomics,
asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs"), and the
fix looks to be as simple as just removing that early READ_ONCE return
case (test_and_clear has the same bug).

Will?

IOW, the proper fix for this should, I think, look something like this
(whitespace mangled) thing:

--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -39,9 +39,6 @@ arch_test_and_set_bit(
unsigned long mask = BIT_MASK(nr);

p += BIT_WORD(nr);
- if (READ_ONCE(*p) & mask)
- return 1;
-
old = arch_atomic_long_fetch_or(mask, (atomic_long_t *)p);
return !!(old & mask);
}
@@ -53,9 +50,6 @@ arch_test_and_clear_bit
unsigned long mask = BIT_MASK(nr);

p += BIT_WORD(nr);
- if (!(READ_ONCE(*p) & mask))
- return 0;
-
old = arch_atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
return !!(old & mask);
}

but the above is not just whitespace-damaged, it's entirely untested
and based purely on me looking at that code.

Linus