Re: [PATCH v2 2/2] mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to MM CPUs

From: Yang Shi
Date: Thu Jun 22 2023 - 23:39:18 EST


On Wed, Jun 21, 2023 at 11:02 AM Nadav Amit <nadav.amit@xxxxxxxxx> wrote:
>
> >
> > On Jun 20, 2023, at 7:46 AM, Yair Podemsky <ypodemsk@xxxxxxxxxx> wrote:
> >
> > @@ -1525,7 +1525,7 @@ static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *v
> > addr + HPAGE_PMD_SIZE);
> > mmu_notifier_invalidate_range_start(&range);
> > pmd = pmdp_collapse_flush(vma, addr, pmdp);
> > - tlb_remove_table_sync_one();
> > + tlb_remove_table_sync_one(mm);
>
> Can’t pmdp_collapse_flush() have one additional argument “freed_tables”
> that it would propagate, for instance on x86 to flush_tlb_mm_range() ?
> Then you would not need tlb_remove_table_sync_one() to issue an additional
> IPI, no?
>
> It just seems that you might still have 2 IPIs in many cases instead of
> one, and unless I am missing something, I don’t see why.

The tlb_remove_table_sync_one() is used to serialize against fast GUP
for the architectures which don't broadcast TLB flush by IPI, for
example, arm64, etc. It may incur one extra IPI for x86 and some
others, but x86 virtualization needs this since the guest may not
flush TLB by sending IPI IIUC. So if the one extra IPI is really a
problem, we may be able to define an arch-specific function to deal
with it, for example, a pv ops off the top of my head. But I'm not a
virtualization expert, I'm not entirely sure whether it is the best
way or not. But the complexity seems overkilling TBH since khugepaged
is usually not called that often.

>
>