Re: [RFC 3/8] mm: Avoid using set_page_count() in set_page_recounted()

From: John Hubbard
Date: Mon Nov 01 2021 - 15:35:37 EST


On 11/1/21 07:30, Pasha Tatashin wrote:
On Wed, Oct 27, 2021 at 9:35 PM John Hubbard <jhubbard@xxxxxxxxxx> wrote:

On 10/27/21 18:20, John Hubbard wrote:
But it's still not good to have this function name doing something completely
different than its name indicates.

I see, I can rename it to: 'set_page_recounted/get_page_recounted' ?


What? No, that's not where I was going at all. The function is already
named set_page_refcounted(), and one of the problems I see is that your
changes turn it into something that most certainly does not
set_page_refounted(). Instead, this patch *increments* the refcount.
That is not the same thing.

And then it uses a .config-sensitive assertion to "prevent" problems.
And by that I mean, the wording throughout this series seems to equate
VM_BUG_ON_PAGE() assertions with real assertions. They are only active,
however, in CONFIG_DEBUG_VM configurations, and provide no protection at
all for normal (most distros) users. That's something that the wording,
comments, and even design should be tweaked to account for.

...and to clarify a bit more, maybe this also helps:

These patches are attempting to improve debugging, and that is fine, as

They are attempting to catch potentioal race conditions where
_refcount is changed between the time we verified what it was and we
set it to something else.

They also attempt to prevent overflows and underflows bugs which are
not all tested today, but can be tested with this patch set at least
on kernels where DEBUG_VM is enabled.

OK, but did you get my point about the naming problem?


far as debugging goes. However, a point that seems to be slightly
misunderstood is: incrementing a bad refcount value is not actually any
better than overwriting it, from a recovery point of view. Maybe (?)
it's better from a debugging point of view.

It is better for debugging as well: if one is tracing the page
_refcount history, knowing that the _refcount can only be
incremented/decremented/frozen/unfrozen provides a contiguous history
of refcount that can be tracked. In case when we set refcount in some
places as we do today, the contigous history is lost, as we do not
know the actual _refcount value at the time of the set operation.


OK, that is a reasonable argument. Let's put it somewhere, maybe in a
comment block, if it's not already there.


That's because the problem occurred before this code, and its debug-only
assertions, ran. Once here, the code cannot actually recover: there is
no automatic way to recover from a refcount that it 1, -1, 2, or 706,
when it was supposed to be zero. Incrementing it is, again, not really
necessarily better than setting: setting it might actually make the
broken system appear to run--and in some cases, even avoid symptoms.
Whereas incrementing doesn't cover anything up. The only thing you can
really does is just panic() or BUG(), really.

This is what my patch series attempt to do, I chose to use VM_BUG()
instead of BUG() because this is VM code, and avoid potential
performance regressions for those who chose performance over possible
security implications.

Yes, the VM_BUG() vs. BUG() is awkward. But you cannot rely on VM_BUG()
to stop the system, even if Fedora does turn it on.



Don't get me wrong, I don't want bugs covered up. But the claim that
incrementing is somehow better deserves some actual thinking about it.

I think it does, I described my points above, if you still disagree
please let me know.

Thank you for providing your thoughts on this RFC, I will send out a
new version, and we can continue discussion in the new thread.

Pasha


Yes, let's see what it looks like.

thanks,
--
John Hubbard
NVIDIA