Re: [GIT PULL] ucount fix for v5.14-rc

From: Linus Torvalds
Date: Sat Aug 07 2021 - 21:02:20 EST


On Sat, Aug 7, 2021 at 5:42 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
>
> Given the syzbot report, I doubt 3 is correct.

I doubt your whole scenario.

> If 3 is actually correct, however, the fix in this pull request is
> incorrect.

Why do you not accept the fact that the old code was buggy, and the
bug was that the alloc->find didn't increment the count from 0
correctly under the lock?

The fact is, the commit in question is ObviouslyCorrect(tm), and I
don't understand any of your arguments against it.

The old code would look up a uncounts entry, but then drop the lock,
before incrementing it.

That explains *everything*. It means that you have this basic race:

Thread (a) on CPU1: starting out _without_ a reference to the
uncounts, look up entry under the lock, but don't increment the count,
release lock.

Thread (b) on CPU2: have a reference, do a put_ucounts(). Count goes
to zero, take the lock, unhash it, free the entry

Thread (a) continues, increments the count on a UAF entry, triggers KASAN.

Look, the fix in question _fixes_ exactly the above. The KASAN traces
clearly show that alloc_ucounts() was involved. Now it does the right
thing, and it does the count increment under the lock, and the
put_ucounts() thing atomic_dec_and_lock_irqsave().

And this isn't even an interesting case. This was not a subtle bug.
The ucounts code had an _obvious_ and unquestionable bug, and handled
this wrong. The ucounts refcount code wasn't even doing anything
unusual, it was just doing it BADLY and WRONG.

This situation is _literally_ why atomic_dec_and_lock exists in the
first place. The fact that the ucount code had missed this all was
just a sad and pitiful bug, and it was just embarrassing that we
hadn't noticed the obvious problem with commit b6c336528926 ("Use
atomic_t for ucounts reference counting") earlier.

What it is you claim happens that _isn't_ just due to this stupid and
trivial bug? Because the scenario you outlined did not make sense, and
I've pointed out _why_ it did not.

Linus