Re: [EXTERNAL] Re: [PATCH v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"

From: Zach O'Keefe
Date: Fri Aug 25 2023 - 11:10:53 EST


On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 25.08.23 14:49, Matthew Wilcox wrote:
> > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> >> Especially, we do have bigger ->huge_fault changes coming up:
> >>
> >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@xxxxxxxxxxxxx

FWIW, one of those patches updates the docs to read,

"->huge_fault() is called when there is no PUD or PMD entry present. This
gives the filesystem the opportunity to install a PUD or PMD sized page.
Filesystems can also use the ->fault method to return a PMD sized page,
so implementing this function may not be necessary. In particular,
filesystems should not call filemap_fault() from ->huge_fault(). [..]"

Which won't work (in the general case) without this patch (well, at
least the ->huge_fault() check part).

So, if we're advertising this is the way it works, maybe that gives a
stronger argument for addressing it sooner vs when the first in-tree
user depends on it?

> >> If the driver is not in the tree, people don't care.
> >>
> >> You really should try upstreaming that driver.
> >>
> >>
> >> So this patch here adds complexity (which I don't like) in order to keep an
> >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> >> fix your driver to not use huge faults in that scenario, it is no longer
> >> supported".
> >>
> >> But I'm just about to vanish for 1.5 week into vacation :)
> >>
> >> @Willy, what are your thoughts?
> >
> > Fundamentally there was a bad assumption with the original patch --
> > it assumed that the only reason to support ->huge_fault was for DAX,
> > and that's not true. It's just that the only drivers in-tree which
> > support ->huge_fault do so in order to support DAX.
>
> Okay, and we are willing to continue supporting that then and it's
> nothing we want to stop OOT drivers from doing.
>
> Fine with me; we should probably reflect that in the patch description.

I can change these paragraphs,

"During the review of the above commits, it was determined that in-tree
users weren't affected by the change; most notably, since the only relevant
user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
explicitly approved early in approval logic. However, there is at least
one occurrence where an out-of-tree driver that used
VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.

Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
any ->huge_fault handler a chance to handle the fault. Note that we
don't validate the file mode or mapping alignment, which is consistent
with the behavior before the aforementioned commits."

To read,

"The above commits, however, overfit the existing in-tree use cases,
and assume that
the only reason to support ->huge_fault was for DAX (which is
explicitly approved early in the approval logic).
This is a bad assumption to make and unnecessarily prevents general
support of ->huge_fault by filesystems. Allow returning "true" if such
a handler exists, giving the fault path an opportunity to exercise it.

Similarly, the rationale for including the VM_NO_KHUGEPAGED check
along the fault path was that it didn't alter any in-tree users, but
was likewise similarly unnecessarily restrictive (and reads odd).
Remove the check from the fault path."

> >
> > Keeping a driver out of tree is always a risky and costly proposition.
> > It will continue to be broken by core kernel changes, particularly
> > if/when it does unusual things.
> >
>
> Yes.
>
> > I think the complexity is entirely on us. I think there's a simpler way
> > to handle the problem, but I'd start by turning all of this "admin and
> > app get to control when THP are used" nonsense into no-ops.
>
> Well, simpler, yes, but also more controversial :)
>
> --
> Cheers,
>
> David / dhildenb
>