Re: [PATCH v3 06/11] x86/mm: Rework lazy TLB mode and TLB freshness tracking

From: Andy Lutomirski
Date: Wed Jun 21 2017 - 12:24:01 EST


On Wed, Jun 21, 2017 at 2:01 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Tue, 20 Jun 2017, Andy Lutomirski wrote:
>> -/*
>> - * The flush IPI assumes that a thread switch happens in this order:
>> - * [cpu0: the cpu that switches]
>> - * 1) switch_mm() either 1a) or 1b)
>> - * 1a) thread switch to a different mm
>> - * 1a1) set cpu_tlbstate to TLBSTATE_OK
>> - * Now the tlb flush NMI handler flush_tlb_func won't call leave_mm
>> - * if cpu0 was in lazy tlb mode.
>> - * 1a2) update cpu active_mm
>> - * Now cpu0 accepts tlb flushes for the new mm.
>> - * 1a3) cpu_set(cpu, new_mm->cpu_vm_mask);
>> - * Now the other cpus will send tlb flush ipis.
>> - * 1a4) change cr3.
>> - * 1a5) cpu_clear(cpu, old_mm->cpu_vm_mask);
>> - * Stop ipi delivery for the old mm. This is not synchronized with
>> - * the other cpus, but flush_tlb_func ignore flush ipis for the wrong
>> - * mm, and in the worst case we perform a superfluous tlb flush.
>> - * 1b) thread switch without mm change
>> - * cpu active_mm is correct, cpu0 already handles flush ipis.
>> - * 1b1) set cpu_tlbstate to TLBSTATE_OK
>> - * 1b2) test_and_set the cpu bit in cpu_vm_mask.
>> - * Atomically set the bit [other cpus will start sending flush ipis],
>> - * and test the bit.
>> - * 1b3) if the bit was 0: leave_mm was called, flush the tlb.
>> - * 2) switch %%esp, ie current
>> - *
>> - * The interrupt must handle 2 special cases:
>> - * - cr3 is changed before %%esp, ie. it cannot use current->{active_,}mm.
>> - * - the cpu performs speculative tlb reads, i.e. even if the cpu only
>> - * runs in kernel space, the cpu could load tlb entries for user space
>> - * pages.
>> - *
>> - * The good news is that cpu_tlbstate is local to each cpu, no
>> - * write/read ordering problems.
>
> While the new code is really well commented, it would be a good thing to
> have a single place where all of this including the ordering constraints
> are documented.

I'll look at the end of the whole series and see if I can come up with
something good.

>
>> @@ -215,12 +200,13 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
>> VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[0].ctx_id) !=
>> loaded_mm->context.ctx_id);
>>
>> - if (this_cpu_read(cpu_tlbstate.state) != TLBSTATE_OK) {
>> + if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(loaded_mm))) {
>> /*
>> - * leave_mm() is adequate to handle any type of flush, and
>> - * we would prefer not to receive further IPIs.
>> + * We're in lazy mode -- don't flush. We can get here on
>> + * remote flushes due to races and on local flushes if a
>> + * kernel thread coincidentally flushes the mm it's lazily
>> + * still using.
>
> Ok. That's more informative.
>
> Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>