Re: [PATCH v4 6/8] hugetlb: batch PMD split for bulk vmemmap dedup

From: Joao Martins
Date: Tue Sep 19 2023 - 04:28:09 EST


On 19/09/2023 07:42, Muchun Song wrote:
> On 2023/9/19 07:01, Mike Kravetz wrote:
>> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
>>
>> In an effort to minimize amount of TLB flushes, batch all PMD splits
>> belonging to a range of pages in order to perform only 1 (global) TLB
>> flush.
>>
>> Add a flags field to the walker and pass whether it's a bulk allocation
>> or just a single page to decide to remap. First value
>> (VMEMMAP_SPLIT_NO_TLB_FLUSH) designates the request to not do the TLB
>> flush when we split the PMD.
>>
>> Rebased and updated by Mike Kravetz
>>
>> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
>> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>> ---
>>   mm/hugetlb_vmemmap.c | 79 +++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 75 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index 147ed15bcae4..e8bc2f7567db 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -27,6 +27,7 @@
>>    * @reuse_addr:        the virtual address of the @reuse_page page.
>>    * @vmemmap_pages:    the list head of the vmemmap pages that can be freed
>>    *            or is mapped from.
>> + * @flags:        used to modify behavior in bulk operations
>
> Better to describe it as "used to modify behavior in vmemmap page table walking
> operations"
>
OK

>>   void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head
>> *folio_list)
>>   {
>>       struct folio *folio;
>>       LIST_HEAD(vmemmap_pages);
>>   +    list_for_each_entry(folio, folio_list, lru)
>> +        hugetlb_vmemmap_split(h, &folio->page);
>> +
>> +    flush_tlb_all();
>> +
>>       list_for_each_entry(folio, folio_list, lru) {
>>           int ret = __hugetlb_vmemmap_optimize(h, &folio->page,
>>                                   &vmemmap_pages);
>
> This is unlikely to be failed since the page table allocation
> is moved to the above

> (Note that the head vmemmap page allocation
> is not mandatory).

Good point that I almost forgot

> So we should handle the error case in the above
> splitting operation.

But back to the previous discussion in v2... the thinking was that /some/ PMDs
got split, and say could allow some PTE remapping to occur and free some pages
back (each page allows 6 more splits worst case). Then the next
__hugetlb_vmemmap_optimize() will have to split PMD pages again for those
hugepages that failed the batch PMD split (as we only defer the PTE remap tlb
flush in this stage).

Unless this isn't something worth handling

Joao