Re: [PATCH V7 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem

From: Charan Teja Kalla
Date: Wed Feb 14 2024 - 04:14:39 EST


Hello Hugh,

Based on offline discussion with some folks in the list, it seems that
this syscall can be helpful. This patch might have forgotten and I hope
this ping helps in resurrecting this thread.

On 5/18/2023 6:16 PM, Charan Teja Kalla wrote:
> On 5/17/2023 5:02 PM, Hugh Dickins wrote:
>>> Sure, will include those range calculations for shmem pages too.
>> Oh, I forgot this issue, you would have liked me to look at V8 by now,
>> to see whether I agree with your resolution there. Sorry, no, I've
>> not been able to divert my concentration to it yet.
>>
>> And it's quite likely that I shall disagree, because I've a history of
>> disagreeing even with myself on such range widening/narrowing issues -
>> reconciling conflicting precedents is difficult 🙁
>>
> If you can at least help by commenting which part of the patch you
> disagree with, I can try hard to convince you there:) .
>
>>> Please let me know if I'm missing something where I should be counting
>>> these as NR_ISOLATED.
>> Please grep for NR_ISOLATED, to see where and how they get manipulated
>> already, and follow the existing examples. The case that sticks in my
>> mind is in mm/mempolicy.c, where the migrate_pages() syscall can build
>> up a gigantic quantity of transiently isolated pages: your syscall can
>> do the same, so should account for itself in the same way.

Based on the grep, it seems almost all the call stacks that isolates the
folios is for migrating the pages where after migration the NR_ISOLATED
is decremented (in migrate_folio_done()). The call paths are(compaction,
memory hotplug, mempolicy).

The another call path is reclaim where we isolate 'nr' pages belongs to
a pgdat, account/unaccount them in NR_ISOLATED across the reclaim.

I think it is easy to account for the above call paths as we know "which
folio corresponds to which pgdat".

Where as in this patch, we are isolating a set of folios(can corresponds
to different nodes) and relying on the reclaim_pages() to do the swap
out. It is straightforward to account NR_ISOLATED while isolating, but
it requires unaccounting changes in the shrink_folio_list() where folio
is being freed after swap out. Doing so requires changes in all the
code places(eg: shrink_inactive_list()), where it now requires to
account NR_ISOLATED while isolating and the shrink_folio_list()
unaccounts it.

So, accounting NR_ISOLATED requires changes in other code places where
this patch has not touched.

If isolating a large amount of pages and not being recorded in
NR_ISOLATED is really a problem, then may I please know your opinion on
isolating(with out accounting) and reclaiming in small batches? The
batch size can be considered as SWAP_CLUSTER_MAX of pages.

> I had a V8 posted without this into accounting. Let me make the changes
> to account for the NR_ISOLATED too.

Thanks,
Charan