Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize

From: Nanyong Sun
Date: Mon Mar 25 2024 - 16:07:25 EST


On 2024/3/14 7:32, David Rientjes wrote:

On Thu, 8 Feb 2024, Will Deacon wrote:

How about take a new lock with irq disabled during BBM, like:

+void vmemmap_update_pte(unsigned long addr, pte_t *ptep, pte_t pte)
+{
+    (NEW_LOCK);
+    pte_clear(&init_mm, addr, ptep);
+    flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+    set_pte_at(&init_mm, addr, ptep, pte);
+    spin_unlock_irq(NEW_LOCK);
+}
I really think the only maintainable way to achieve this is to avoid the
possibility of a fault altogether.

Will


Nanyong, are you still actively working on making HVO possible on arm64?

This would yield a substantial memory savings on hosts that are largely
configured with hugetlbfs. In our case, the size of this hugetlbfs pool
is actually never changed after boot, but it sounds from the thread that
there was an idea to make HVO conditional on FEAT_BBM. Is this being
pursued?

If so, any testing help needed?
I'm afraid that FEAT_BBM may not solve the problem here, because from Arm ARM,
I see that FEAT_BBM is only used for changing block size. Therefore, in this HVO feature,
it can work in the split PMD stage, that is, BBM can be avoided in vmemmap_split_pmd,
but in the subsequent vmemmap_remap_pte, the Output address of PTE still needs to be
changed. I'm afraid FEAT_BBM is not competent for this stage. Perhaps my understanding
of ARM FEAT_BBM is wrong, and I hope someone can correct me.
Actually, the solution I first considered was to use the stop_machine method, but we have
products that rely on /proc/sys/vm/nr_overcommit_hugepages to dynamically use hugepages,
so I have to consider performance issues. If your product does not change the amount of huge
pages after booting, using stop_machine() may be a feasible way.
So far, I still haven't come up with a good solution.