Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

From: Andy Lutomirski
Date: Mon May 22 2017 - 14:28:33 EST


On Mon, May 22, 2017 at 3:43 AM, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
> Andy Lutomirski <luto@xxxxxxxxxx> writes:
>
>> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
>>> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
>>> this is supposed to work faster than IPIs.
>>>
>>> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
>>> we need to put the input somewhere in memory and we don't really want to
>>> have memory allocation on each call so we pre-allocate per cpu memory areas
>>> on boot. These areas are of fixes size, limit them with an arbitrary number
>>> of 16 (16 gvas are able to specify 16 * 4096 pages).
>>>
>>> pv_ops patching is happening very early so we need to separate
>>> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
>>>
>>> It is possible and easy to implement local TLB flushing too and there is
>>> even a hint for that. However, I don't see a room for optimization on the
>>> host side as both hypercall and native tlb flush will result in vmexit. The
>>> hint is also not set on modern Hyper-V versions.
>>
>> Why do local flushes exit?
>
> "exist"? I don't know, to be honest. To me it makes no difference from
> hypervisor's point of view as intercepting tlb flushing instructions is
> not any different from implmenting a hypercall.
>
> Hyper-V gives its guests 'hints' to indicate if they need to use
> hypercalls for remote/locat TLB flush and I don't remember seeing
> 'local' bit set.

What I meant was: why aren't local flushes handled directly in the
guest without exiting to the host? Or are they? In principle,
INVPCID should just work, right? Even reading and writing CR3 back
should work if the hypervisor sets up the magic list of allowed CR3
values, right?

I guess on older CPUs there might not be any way to flush the local
TLB without exiting, but I'm not *that* familiar with the details of
the virtualization extensions.

>
>>
>>> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>>> + struct mm_struct *mm, unsigned long start,
>>> + unsigned long end)
>>> +{
>>
>> What tree will this go through? I'm about to send a signature change
>> for this function for tip:x86/mm.
>
> I think this was going to get through Greg's char-misc tree but if we
> need to synchronize I think we can push this through x86.

Works for me. Linus can probably resolve the trivial conflict. But
going through the x86 tree might make sense here if that's okay with
you.

>
>>
>> Also, how would this interact with PCID? I have PCID patches that I'm
>> pretty happy with now, and I'm hoping to support PCID in 4.13.
>>
>
> Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
> not going away from pv_mmu_ops, right? In think case we can have both in
> 4.13. Or do you see any other clashes?
>

The issue is that I'm changing the whole flush algorithm. The main
patch that affects this is here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pcid&id=a67bff42e1e55666fdbaddf233a484a8773688c1

The interactions between that patch and paravirt flush helpers may be
complex, and it'll need some thought. PCID makes everything even more
subtle, so just turning off PCID when paravirt flush is involved seems
the safest for now. Ideally we'd eventually support PCID and paravirt
flushes together (and even eventual native remote flushes assuming
they ever get added).

Also, can you share the benchmark you used for these patches?