Re: [PATCH v4 2/5] mm: LARGE_ANON_FOLIO for improved performance

From: Ryan Roberts
Date: Wed Aug 09 2023 - 12:08:40 EST


[...]

>>>> Let me reiterate [1]:
>>>> My impression is we only agreed on one thing: at the current stage, we
>>>> should respect things we absolutely have to. We didn't agree on what
>>>> "never" means ("never 2MB" or "never >4KB"), and we didn't touch on
>>>> how "always" should behave at all.
>>>>
>>>> And [2]:
>>>> (Thanks to David, now I agree that) we have to interpret MADV_NOHUGEPAGE
>>>> as nothing >4KB.
>>>>
>>>> My final take [3]:
>>>> I agree these points require more discussion. But I don't think we
>>>> need to conclude them now, unless they cause correctness issues like
>>>> ignoring MADV_NOHUGEPAGE would.
>>>
>>> Thanks, I've read all of these comments previously, and appreciate the time you
>>> have put into the feedback. I'm not sure I fully agree with your point that we
>>> don't need to conclude on a policy now; I certainly don't think we need the
>>> whole thing in place on day 1, but I do think that whatever we put in should
>>> strive to be a strict subset of where we think we are going. For example, if we
>>> put something in with one policy (i.e. "never" only means "never 2MB") then find
>>> a problem and have to change that to be more conservative, are we risking perf
>>> regressions for any LAF users that started using it on day 1?
>>
>> It's not that I don't want to -- I just don't think we have enough
>> information before we have a wider deployment [1] and gain a better
>> understanding of real-world scenarios.
>>
>> Of course we could force a conclusion, a mostly opinion-based one. But
>> it would still involve prolonged discussions and delay this series, or
>> rush into decisions we might regret later.
>>
>> [1] Our fleets (servers, laptops and phones) support large-scale
>> experiments and I plan to run them on both client and server devices.

This all sounds great and I'm looking forward to seeing results! But I guess I
had been assuming that this sort of testing would be preferable to do before we
merge; that allows us to get confidence in the approach and reduces the changes
of having to change it later. I guess you have policies that prevent you from
testing this series at the scale you want until it is merged?

I'm not convinced this testing will help us answer the "what does never mean?"
question; if nothing breaks in your testing, it doesn't mean there aren't
systems out there that would break - it's hard to prove a negative. I think its
mostly embedded systems that use thp=never to reduce memory footprint to the
absolute minimum?


>>
>>>> But I should have been clear about the parameters to
>>>> hugepage_vma_check(): enforce_sysfs=false.
>>>
>>> So hugepage_vma_check(..., smaps=false, in_pf=true, enforce_sysfs=false) would
>>> give us:
>>>
>>> | prctl/fw | sysfs | sysfs | sysfs
>>> | disable | never | madvise | always
>>> ----------------|-----------|-----------|-----------|-----------
>>> no hint | S | LAF>S | LAF>S | THP>LAF>S
>>> MADV_HUGEPAGE | S | LAF>S | THP>LAF>S | THP>LAF>S
>>> MADV_NOHUGEPAGE | S | S | S | S
>>>
>>> Where "prctl/fw disable" trumps the sysfs setting.
>>>
>>> I can certainly see the benefit of this approach; it gives us a way to enable
>>> LAF while disabling THP (thp=never). It doesn't give us a way to enable THP
>>> without enabling LAF though (unless you recompile with LAF disabled). Does
>>> anyone see a problem with this?
>>
>> I do myself :)
>>
>> This is just something temporary to get this series landed. We are
>> hiding behind a Kconfig, not making any ABI changes, and not exposing
>> this policy to userspace (i.e., not updating Documentation/, man
>> pages, etc.)
>>
>> Meanwhile, we can keep discussing all the open questions in parallel.

You're right - don't want to slow down the testing, so I'm going to post a v5
tomorrow with the policy in the table above. We're still waiting for the
prerequisites to land before we can kick off testing in anger though.

>
> And the stat ABI changes should be discussed before or at the same
> time. If we came up with a policy but there was *zero* observability
> of how well that policy works...

Yep agreed. I have a series at [1] which I hoped would kickstart that discussion.

[1] https://lore.kernel.org/linux-mm/20230613160950.3554675-1-ryan.roberts@xxxxxxx/

Thanks,
Ryan