Re: [PATCH v4 6/6] mm: madvise: Avoid split during MADV_PAGEOUT and MADV_COLD

From: Ryan Roberts
Date: Fri Mar 15 2024 - 06:55:36 EST


On 15/03/2024 10:35, David Hildenbrand wrote:
>> -        if (!pageout && pte_young(ptent)) {
>> -            ptent = ptep_get_and_clear_full(mm, addr, pte,
>> -                            tlb->fullmm);
>> -            ptent = pte_mkold(ptent);
>> -            set_pte_at(mm, addr, pte, ptent);
>> -            tlb_remove_tlb_entry(tlb, pte, addr);
>> +        if (!pageout) {
>> +            for (; nr != 0; nr--, pte++, addr += PAGE_SIZE) {
>> +                if (ptep_test_and_clear_young(vma, addr, pte))
>> +                    tlb_remove_tlb_entry(tlb, pte, addr);
>> +            }
>>           }
>
>
> The following might turn out a bit nicer: Make folio_pte_batch() collect
> "any_young", then doing something like we do with "any_writable" in the fork()
> case:
>
> ...
>     nr = folio_pte_batch(folio, addr, pte, ptent, max_nr,
>                  fpb_flags, NULL, any_young);
>     if (any_young)
>         pte_mkyoung(ptent)
> ...
>
> if (!pageout && pte_young(ptent)) {
>     mkold_full_ptes(mm, addr, pte, nr, tlb->fullmm);
>     tlb_remove_tlb_entries(tlb, pte, nr, addr);
> }
>

I thought about that but decided that it would be better to only TLBI the actual
entries that were young. Although looking at tlb_remove_tlb_entry() I see that
it just maintains a range between the lowest and highest address, so this won't
actually make any difference.

So, yes, this will be a nice improvement, and also prevent the O(n^2) pte reads
for the contpte case. I'll change in the next version.