Re: [PATCH] mm:vmscan: shrink skip folios in the exiting task

From: zhiguojiang
Date: Sun Jan 28 2024 - 21:21:55 EST




在 2024/1/28 14:35, Matthew Wilcox 写道:
On Thu, Jan 25, 2024 at 09:34:53AM +0800, zhiguojiang wrote:
In the scenarios of the low memory system and mutil backed-applications,
the time-consuming problem caused by shrinking the exiting task's folios
will be more severe.
What testing have you done of this patch? How often does it happen?
Are there particular workloads that benefit from this? (I'm not sure
what "mutil backed-applications" are?)
1 Yes, this patch has been tested.

2 When the exiting tasks and shrink_inactive_list occur at the same time,
   the folios which shrink_inactive_list reclaims may be the exiting tasks's
folios
   in lruvecs. And when system is low memory, it more likely to occur,
because
   more backend applidatuions will be killed.
   The shrink_inactive_list reclaims the exiting tasks's folios in lruvecs
and
   transforms the exiting tasks's anon folios into swap memory, which leads
   to the increasing load of the current exiting tasks.
Ah, we're talking about an OOM scenario. OK, that makes some sense.
You should have mentioned that in the patch description.
Hi,

1 Ok, I will update a more comprehensive description in next version.

2 I think this issue can occur not only in OOM scenario, but also in
  normal task exit scenario. So:

1) if (unlikely(!atomic_read(&vma->vm_mm->mm_users))) represents
  the scenario where the task exits normally.

2) if(test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) represents the
   OOM Reaper scenario.

3 MMF_OOM_SKIP can only represent OOM scenario and cannot represent
  normal task exit scenario, as when MMF_OOM_SKIP is set in normal
  task exit scenario, the memory folios of the task have already been
  released. And the shrink_inactive_list should recognize these lru folios
  in exiting task before this exiting task releases its memory folios.

    tlb_gather_mmu_fullmm(&tlb, mm);
    /* update_hiwater_rss(mm) here? but nobody should be looking */
    /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */
    unmap_vmas(&tlb, &mas, vma, 0, ULONG_MAX, ULONG_MAX, false);
    mmap_read_unlock(mm);

    /*
     * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
     * because the memory has been already freed.
     */
    set_bit(MMF_OOM_SKIP, &mm->flags);
3 This patch can alleviate the load of the tasks exiting process. This patch
   can make that the exiting tasks release its anon folios faster instead of
   releasing its swap memory from its anon folios swap-in in
shrink_inactive_list.

4 "mutil backed-applications" means that there are a large number of
    the backend applications in the system.

Thanks
And I do mean specifically of this patch, because to my eyes it
shouldn't even compile.
Has been tested.
That's odd. I thought GCC used to complain about

long x = 0x100000000;

but it does the right thing. Except on 32-bit where it'll say
"warning: integer constant out of range".

In any case, the number you chose is not going to work on 32-bit systems
or on ARM or x86. It conflicts with protection keys on x86 and MTE on
ARM.
You're right, I overlooked the situation with the 32-bit system.
We can do it without any new magic numbers. I'm not sure this is a
great approach, but this should work:

+++ b/mm/rmap.c
@@ -840,6 +840,12 @@ static bool folio_referenced_one(struct folio *folio,
int referenced = 0;
unsigned long start = address, ptes = 0;

+ /* Skip this folio if it's mapped by an exiting task */
+ if (test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) {
+ pra->referenced = -1;
+ return false;
+ }
+
while (page_vma_mapped_walk(&pvmw)) {
address = pvmw.address;

I agree with your point of view. I think this is currently the best solution,
  but I think it also needs to be added with:
if (unlikely(!atomic_read(&vma->vm_mm->mm_users)))

Please help review it again.

Best Regards.