Re: [PATCH 3/3] mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to CPUs in kernel mode

From: David Hildenbrand
Date: Wed Apr 19 2023 - 07:31:53 EST


On 06.04.23 20:27, Peter Zijlstra wrote:
On Thu, Apr 06, 2023 at 05:51:52PM +0200, David Hildenbrand wrote:
On 06.04.23 17:02, Peter Zijlstra wrote:

DavidH, what do you thikn about reviving Jann's patches here:

https://bugs.chromium.org/p/project-zero/issues/detail?id=2365#c1

Those are far more invasive, but afaict they seem to do the right thing.


I recall seeing those while discussed on security@xxxxxxxxxx. What we
currently have was (IMHO for good reasons) deemed better to fix the issue,
especially when caring about backports and getting it right.

Yes, and I think that was the right call. However, we can now revisit
without having the pressure of a known defect and backport
considerations.

The alternative that was discussed in that context IIRC was to simply
allocate a fresh page table, place the fresh page table into the list
instead, and simply free the old page table (then using common machinery).

TBH, I'd wish (and recently raised) that we could just stop wasting memory
on page tables for THPs that are maybe never going to get PTE-mapped ... and
eventually just allocate on demand (with some caching?) and handle the
places where we're OOM and cannot PTE-map a THP in some descend way.

... instead of trying to figure out how to deal with these page tables we
cannot free but have to special-case simply because of GUP-fast.

Not keeping them around sounds good to me, but I'm not *that* familiar
with the THP code, most of that happened after I stopped tracking mm. So
I'm not sure how feasible is it.

But it does look entirely feasible to rework this page-table freeing
along the lines Jann did.

It's most probably more feasible, although the easiest would be to just allocate a fresh page table to deposit and free the old one using the mmu gatherer.

This way we can avoid the khugepaged of tlb_remove_table_smp_sync(), but not the tlb_remove_table_one() usage. I suspect khugepaged isn't really relevant in RT kernels (IIRC, most of RT setups disable THP completely).

tlb_remove_table_one() only triggers if __get_free_page(GFP_NOWAIT | __GFP_NOWARN); fails. IIUC, that can happen easily under memory pressure because it doesn't wait for direct reclaim.

I don't know much about RT workloads (so I'd appreciate some feedback), but I guess we can run int memory pressure as well due to some !rt housekeeping task on the system?

--
Thanks,

David / dhildenb