Re: [RFC PATCH 2/3] x86/mm: make sure LAM is up-to-date during context switching

From: Dave Hansen
Date: Thu Mar 07 2024 - 16:40:03 EST


On 3/7/24 13:04, Yosry Ahmed wrote:
> I thought about doing inc_mm_tlb_gen() when LAM is updated, but it felt
> hacky and more importantly doesn't make it clear in switch_mm_irqs_off()
> that we correctly handle LAM updates. We can certainly add a comment,
> but I think an explicit check for CPU LAM vs. mm LAM is much clearer.
>
> WDYT?

The mm generations are literally there so that if the mm changes that
all the CPUs know they need an update. Changing LAM enabling is 100%
consistent with telling other CPUs that they need an update.

I'd be curious of Andy feels differently though.

>> Considering how fun this code path is, a little effort at an actual
>> reproduction would be really appreciated.
>
> I tried reproducing it but gave up quickly. We need a certain sequence
> of events to happen:
>
> CPU 1 CPU 2
> kthread_use_mm()
> /* user thread enables LAM */
> context_switch()
> context_switch() /* to user thread */

First, it would be fine to either create a new kthread for reproduction
purposes or to hack an existing one. For instance, have have the LAM
prctl() take an extra ref on the mm and stick it in a global variable:

mmgrab(current->mm);
global_mm = current->mm;

Then in the kthread, grab the mm and use it:

while (!global_mm);
kthread_use_mm(global_mm);
... check for the race
mmdrop(global_mm);

You can also hackily wait for thread to move with a stupid spin loop:

while (smp_processor_id() != 1);

and then actually move it with sched_setaffinity() from userspace. That
can make it easier to get that series of events to happen in lockstep.