Re: [PATCH 1/2] KVM: Optimize kvm_make_vcpus_request_mask() a bit

From: Vitaly Kuznetsov
Date: Mon Aug 23 2021 - 04:03:53 EST


Sean Christopherson <seanjc@xxxxxxxxxx> writes:

> On Fri, Aug 20, 2021, Vitaly Kuznetsov wrote:
>> Iterating over set bits in 'vcpu_bitmap' should be faster than going
>> through all vCPUs, especially when just a few bits are set.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
>> ---
>> virt/kvm/kvm_main.c | 49 +++++++++++++++++++++++++++++----------------
>> 1 file changed, 32 insertions(+), 17 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 3e67c93ca403..0f873c5ed538 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -257,34 +257,49 @@ static inline bool kvm_kick_many_cpus(const struct cpumask *cpus, bool wait)
>> return true;
>> }
>>
>> +static void kvm_make_vcpu_request(struct kvm *kvm, struct kvm_vcpu *vcpu,
>> + unsigned int req, cpumask_var_t tmp)
>> +{
>> + int cpu = vcpu->cpu;
>
> This reminds me, syzbot found a data race a while back[*] in kvm_vcpu_kick()
> related to reading vcpu->cpu. That race is benign, but legitimate. I believe
> this code has a similar race, and I'm not as confident that it's benign.
>
> If the target vCPU changes vcpu->cpu after it's read by this code, then the IPI
> can sent to the wrong pCPU, e.g. this pCPU gets waylaid by an IRQ and the target
> vCPU is migrated to a new pCPU.
>
> The TL;DR is that the race is benign because the target vCPU is still guaranteed
> to see the request before entering the guest, even if the IPI goes to the wrong
> pCPU. I believe the same holds true for KVM_REQUEST_WAIT, e.g. if the lockless
> shadow PTE walk gets migrated to a new pCPU just before setting vcpu->mode to
> READING_SHADOW_PAGE_TABLES, this code can use a stale "cpu" for __cpumask_set_cpu().
> The race is benign because the vCPU would have to enter READING_SHADOW_PAGE_TABLES
> _after_ the SPTE modifications were made, as vcpu->cpu can't change while the vCPU
> is reading SPTEs. The same logic holds true for the case where the vCPU is migrated
> after the call to __cpumask_set_cpu(); the goal is to wait for the vCPU to return to
> OUTSIDE_GUEST_MODE, which is guaranteed if the vCPU is migrated even if this path
> doesn't wait for an ack from the _new_ pCPU.
>
> I'll send patches to fix the races later today, maybe they can be folded into
> v2? Even though the races are benign, I think they're worth fixing, if only to
> provide an opportunity to document why it's ok to send IPIs to the
> wrong pCPU.

You're blazingly fast as usual :-) I'll do v2 on top of your patches.

>
> [*] On an upstream kernel, but I don't think the bug report was posted to LKML.
>
>> +
>> + kvm_make_request(req, vcpu);
>> +
>> + if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu))
>> + return;
>> +
>> + if (tmp != NULL && cpu != -1 && cpu != raw_smp_processor_id() &&
>
> For large VMs, might be worth keeping get_cpu() in the caller in passing in @me?

The only reason against was that I've tried keeping the newly introduced
kvm_make_vcpu_request()'s interface nicer, like it can be reused some
day. Will get back to get_cpu() in v2.

>
>> + kvm_request_needs_ipi(vcpu, req))
>> + __cpumask_set_cpu(cpu, tmp);
>> +}
>> +
>> bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
>> struct kvm_vcpu *except,
>> unsigned long *vcpu_bitmap, cpumask_var_t tmp)
>> {
>> - int i, cpu, me;
>> + int i;
>> struct kvm_vcpu *vcpu;
>> bool called;
>>
>> - me = get_cpu();
>> -
>> - kvm_for_each_vcpu(i, vcpu, kvm) {
>> - if ((vcpu_bitmap && !test_bit(i, vcpu_bitmap)) ||
>> - vcpu == except)
>> - continue;
>> -
>> - kvm_make_request(req, vcpu);
>> - cpu = vcpu->cpu;
>> -
>> - if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu))
>> - continue;
>> + preempt_disable();
>>
>> - if (tmp != NULL && cpu != -1 && cpu != me &&
>> - kvm_request_needs_ipi(vcpu, req))
>> - __cpumask_set_cpu(cpu, tmp);
>> + if (likely(vcpu_bitmap)) {
>
> I don't think this is actually "likely". kvm_make_all_cpus_request() is by far
> the most common caller and does not pass in a vcpu_bitmap. Practically speaking
> I highly don't the code organization will matter, but from a documentation
> perspective it's wrong.

Right, I was thinking more about two other users: IOAPIC and Hyper-V who
call kvm_make_vcpus_request_mask() directly but I agree that
kvm_make_all_cpus_request() is probably much more common.

>
>> + for_each_set_bit(i, vcpu_bitmap, KVM_MAX_VCPUS) {
>> + vcpu = kvm_get_vcpu(kvm, i);
>> + if (!vcpu || vcpu == except)
>> + continue;
>> + kvm_make_vcpu_request(kvm, vcpu, req, tmp);
>> + }
>> + } else {
>> + kvm_for_each_vcpu(i, vcpu, kvm) {
>> + if (vcpu == except)
>> + continue;
>> + kvm_make_vcpu_request(kvm, vcpu, req, tmp);
>> + }
>> }
>>
>> called = kvm_kick_many_cpus(tmp, !!(req & KVM_REQUEST_WAIT));
>> - put_cpu();
>> +
>> + preempt_enable();
>>
>> return called;
>> }
>> --
>> 2.31.1
>>
>

--
Vitaly