Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

From: Andy Lutomirski
Date: Wed Jul 18 2018 - 19:13:19 EST




> On Jul 18, 2018, at 10:58 AM, Rik van Riel <riel@xxxxxxxxxxx> wrote:
>
>
>
>> On Jul 17, 2018, at 4:04 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>
>>
>> I think you've introduced a minor-ish performance regression due to
>> changing the old (admittedly terribly documented) control flow a bit.
>> Before, if real_prev == next, we would skip:
>>
>> load_mm_cr4(next);
>> switch_ldt(real_prev, next);
>>
>> Now we don't any more. I think you should reinstate that
>> optimization. It's probably as simple as wrapping them in an if
>> (real_priv != next) with a comment like /* Remote changes that would
>> require a cr4 or ldt reload will unconditionally send an IPI even to
>> lazy CPUs. So, if we aren't changing our mm, we don't need to refresh
>> cr4 or the ldt */
>
> Looks like switch_ldt already skips reloading the LDT when prev equals
> next, or when they simply have the same LDT values:
>
> if (unlikely((unsigned long)prev->context.ldt |
> (unsigned long)next->context.ldt))
> load_mm_ldt(next);
>

Read that again? It will reload if thereâs an LDT, even if itâs the same one.

> It appears that the cr4 bits have a similar optimization:
>
> static inline void cr4_set_bits(unsigned long mask)
> {
> unsigned long cr4, flags;
>
> local_irq_save(flags);
> cr4 = this_cpu_read(cpu_tlbstate.cr4);
> if ((cr4 | mask) != cr4)
> __cr4_set(cr4 | mask);
> local_irq_restore(flags);
> }
>>
>> Hmm. load_mm_cr4() should bypass itself when mm == &init_mm. Want to
>> fix that part or should I?
>>
> Looks like there might not be anything to do here, after all.

But if init_mm and the thread that just went idle have different selected cr4 values, weâll still write it. With your lazy TLB work, itâs less of a big deal, but still.

Iâm happy to fix this myself, though.

>
> On to the lazy TLB mm_struct refcounting stuff :)
>

Which refcount? mm_users shouldnât be hot, so I assume youâre talking about mm_count. My suggestion is to get rid of mm_count instead of trying to optimize it.