Re: [PATCH v3 2/3] mm/khugepaged: Fix GUP-fast interaction by sending IPI

From: Jann Horn
Date: Mon Nov 28 2022 - 11:59:11 EST


On Mon, Nov 28, 2022 at 2:46 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
> On 25.11.22 22:37, Jann Horn wrote:
> > Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP
> > collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to
> > ensure that the page table was not removed by khugepaged in between.
> >
> > However, lockless_pages_from_mm() still requires that the page table is not
> > concurrently freed.
>
> That's an interesting point. For anon THPs, the page table won't get
> immediately freed, but instead will be deposited in the "pgtable list"
> stored alongside the THP.
>
> From there, it might get withdrawn (pgtable_trans_huge_withdraw()) and
>
> a) Reused as a page table when splitting the THP. That should be fine,
> no garbage in it, simply a page table again.

Depends on the definition of "fine" - it will be a page table again,
but deposited page tables are not associated with a specific address,
so it might be reused at a different address. If GUP-fast on address A
races with a page table from address A being deposited and reused at
address B, and then GUP-fast returns something from address B, that's
not exactly great either.

> b) Freed when zapping the THP (zap_deposited_table()). that would be bad.
>
> ... but I just realized that e.g., radix__pgtable_trans_huge_deposit
> uses actual page content to link the deposited page tables, which means
> we'd already storing garbage in there when depositing the page, not when
> freeing+reusing the page ....
>
> Maybe worth adding to the description.

Yeah, okay, I'll change the commit message and resend...

[...]
> With CONFIG_MMU_GATHER_RCU_TABLE_FREE this will most certainly do the
> right thing. I assume with CONFIG_MMU_GATHER_RCU_TABLE_FREE, the
> assumption is that there will be an implicit IPI.
>
> That implicit IPI has to happen before we deposit. I assume that is
> expected to happen during pmdp_collapse_flush() ?

Yeah, pmdp_collapse_flush() does a TLB flush, as the name says. And as
documented in a comment in mm/gup.c:

* Before activating this code, please be aware that the following assumptions
* are currently made:
*
* *) Either MMU_GATHER_RCU_TABLE_FREE is enabled, and
tlb_remove_table() is used to
* free pages containing page tables or TLB flushing requires IPI broadcast.

I'll go sprinkle that in a comment somewhere, either in the file or in
the commit message...