Re: [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration

From: Nadav Amit
Date: Thu Nov 09 2023 - 05:17:17 EST




> On Nov 8, 2023, at 6:12 AM, Byungchul Park <byungchul@xxxxxx> wrote:
>
> !! External Email
>
> On Mon, Oct 30, 2023 at 09:51:30PM +0900, Byungchul Park wrote:
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index 6c264d2f969c..75dc48b6e15f 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -3359,6 +3359,19 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>>>> if (vmf->page)
>>>> folio = page_folio(vmf->page);
>>>>
>>>> + /*
>>>> + * This folio has its read copy to prevent inconsistency while
>>>> + * deferring TLB flushes. However, the problem might arise if
>>>> + * it's going to become writable.
>>>> + *
>>>> + * To prevent it, give up the deferring TLB flushes and perform
>>>> + * TLB flush right away.
>>>> + */
>>>> + if (folio && migrc_pending_folio(folio)) {
>>>> + migrc_unpend_folio(folio);
>>>> + migrc_try_flush_free_folios(NULL);
>>>
>>> So many potential function calls… Probably they should have been combined
>>> into one and at least migrc_pending_folio() should have been an inline
>>> function in the header.
>>
>> I will try to change it as you mention.
>>
>>>> + }
>>>> +
>>>
>>> What about mprotect? I thought David has changed it so it can set writable
>>> PTEs.
>>
>> I will check it out.
>
> I found mprotect stuff is already performing TLB flushes needed for it.
> So some redundant TLB flushes might happen by migrc but it's not that
> harmful I think. Thanks.

Let me explain the scenario I am concerned with. Assume page P is RO, and
moves from Psrc to Pdst. Pointer “p” points to P. Initially (*p == 0).

Let’s also assume we also have an atomic variable “a”. Initially (a == 0).

I hope I got the migration function names right, but I hope the problem
itself can be clear regardless.

CPU0 CPU1 CPU2 CPU3
---- ---- ---- ----
(user-mode) (user-mode)

Access *p
[Psrc cached in TLB]

migrate_pages_batch()
-> migrate_folio_unmap()

[ PTE updated,
still no flush ]

mprotect(p,
RW)

[ Psrc is
RW ]

[ flush
deferred]


*p = 1 # Pdst

xchg(&a, 1)
mfence
if (a == 1)
assert(*p == 1);



Now at this point the assertion might fail. CPU2 wrote into Pdst, whereas
CPU1 reads from Psrc. But based on x86 memory model, userspace might not
expect this scenario to be possible, hence leading to bugs.