Re: [PATCH v4] kvm: make vcpu life cycle separated from kvm instance

From: Liu ping fan
Date: Fri Dec 16 2011 - 02:50:36 EST


On Thu, Dec 15, 2011 at 5:10 PM, Gleb Natapov <gleb@xxxxxxxxxx> wrote:
> On Thu, Dec 15, 2011 at 12:28:48PM +0800, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@xxxxxxxxxxxxxxxxxx>
>>
>> Currently, vcpu can be destructed only when kvm instance destroyed.
>> Change this to vcpu's destruction before kvm instance, so vcpu MUST
>> and CAN be destroyed before kvm's destroy.
>>
> I see reference counting is back.
>
Resort to SRCU in next version, I will remove refcnt.

>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index d9cfb78..71dda47 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -141,6 +141,7 @@ void vcpu_load(struct kvm_vcpu *vcpu)
>> Â{
>> Â Â Â int cpu;
>>
>> + Â Â kvm_vcpu_get(vcpu);
>> Â Â Â mutex_lock(&vcpu->mutex);
>> Â Â Â if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) {
>> Â Â Â Â Â Â Â /* The thread running this VCPU changed. */
>> @@ -163,6 +164,7 @@ void vcpu_put(struct kvm_vcpu *vcpu)
>> Â Â Â preempt_notifier_unregister(&vcpu->preempt_notifier);
>> Â Â Â preempt_enable();
>> Â Â Â mutex_unlock(&vcpu->mutex);
>> + Â Â kvm_vcpu_put(vcpu);
>> Â}
>>
> Why is kvm_vcpu_get/kvm_vcpu_put is needed in vcpu_load/vcpu_put?
> As far as I see load/put are only called in vcpu ioctl,
> kvm_arch_vcpu_setup(), kvm_arch_vcpu_destroy() and kvm_arch_destroy_vm().
>
> kvm_arch_vcpu_setup() and kvm_arch_vcpu_destroy() are called before vcpu is
> added to vcpus list, so it can't be accessed by other thread at this
> point. kvm_arch_destroy_vm() is Âcalled on KVM destruction path when all
> vcpus should be destroyed already. So the only interesting place is vcpu
> ioctl and I think we are protected by fd refcount there. vcpu fd can't
> be closed while ioctl is executing for that vcpu. Otherwise we would
> have problem now too.
>
Yeah, ioctl is protected by fd refcount. That is what I had aimed to,
but as you pointed out, it is unnecessary at all.

>> @@ -1539,12 +1547,10 @@ EXPORT_SYMBOL_GPL(kvm_resched);
>> Âvoid kvm_vcpu_on_spin(struct kvm_vcpu *me)
>> Â{
>> Â Â Â struct kvm *kvm = me->kvm;
>> - Â Â struct kvm_vcpu *vcpu;
>> - Â Â int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
>> - Â Â int yielded = 0;
>> - Â Â int pass;
>> - Â Â int i;
>> -
>> + Â Â struct kvm_vcpu *vcpu, *v;
>> + Â Â struct task_struct *task = NULL;
>> + Â Â struct pid *pid;
>> + Â Â int pass, firststart, lastone, yielded;
>> Â Â Â /*
>> Â Â Â Â* We boost the priority of a VCPU that is runnable but not
>> Â Â Â Â* currently running, because it got preempted by something
>> @@ -1552,15 +1558,22 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>> Â Â Â Â* VCPU is holding the lock that we need and will release it.
>> Â Â Â Â* We approximate round-robin by starting at the last boosted VCPU.
>> Â Â Â Â*/
>> - Â Â for (pass = 0; pass < 2 && !yielded; pass++) {
>> - Â Â Â Â Â Â kvm_for_each_vcpu(i, vcpu, kvm) {
>> - Â Â Â Â Â Â Â Â Â Â struct task_struct *task = NULL;
>> - Â Â Â Â Â Â Â Â Â Â struct pid *pid;
>> - Â Â Â Â Â Â Â Â Â Â if (!pass && i < last_boosted_vcpu) {
>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â i = last_boosted_vcpu;
>> + Â Â for (pass = 0, firststart = 0; pass < 2 && !yielded; pass++) {
>> +
>> + Â Â Â Â Â Â rcu_read_lock();
>> + Â Â Â Â Â Â kvm_for_each_vcpu(vcpu, kvm) {
>> + Â Â Â Â Â Â Â Â Â Â if (!pass && !firststart &&
>> + Â Â Â Â Â Â Â Â Â Â Â Â vcpu != kvm->last_boosted_vcpu &&
>> + Â Â Â Â Â Â Â Â Â Â Â Â kvm->last_boosted_vcpu != NULL) {
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â vcpu = kvm->last_boosted_vcpu;
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â firststart = 1;
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â continue;
>> - Â Â Â Â Â Â Â Â Â Â } else if (pass && i > last_boosted_vcpu)
>> + Â Â Â Â Â Â Â Â Â Â } else if (pass && !lastone) {
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â if (vcpu == kvm->last_boosted_vcpu)
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â lastone = 1;
>> + Â Â Â Â Â Â Â Â Â Â } else if (pass && lastone)
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â break;
>> +
>> Â Â Â Â Â Â Â Â Â Â Â if (vcpu == me)
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â continue;
>> Â Â Â Â Â Â Â Â Â Â Â if (waitqueue_active(&vcpu->wq))
>> @@ -1576,15 +1589,29 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â put_task_struct(task);
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â continue;
>> Â Â Â Â Â Â Â Â Â Â Â }
>> + Â Â Â Â Â Â Â Â Â Â v = kvm_vcpu_get(vcpu);
>> + Â Â Â Â Â Â Â Â Â Â if (v == NULL)
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â continue;
>> +
>> + Â Â Â Â Â Â Â Â Â Â rcu_read_unlock();
>> Â Â Â Â Â Â Â Â Â Â Â if (yield_to(task, 1)) {
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â put_task_struct(task);
>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â kvm->last_boosted_vcpu = i;
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â mutex_lock(&kvm->lock);
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â /*Remeber to release it.*/
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â if (kvm->last_boosted_vcpu != NULL)
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â kvm_vcpu_put(kvm->last_boosted_vcpu);
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â kvm->last_boosted_vcpu = vcpu;
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â mutex_unlock(&kvm->lock);
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â yielded = 1;
> I think we can be smart and protect kvm->last_boosted_vcpu with the same
> rcu as vcpus, but yeild_to() can sleep anyway. Hmm may be we should use
> srcu in the first place :( Or rewrite the logic of the functions
> somehow to call yield_to() outside of the loop. This is heuristics anyway.
>
And I think changing to srcu will be easier :-), and have started to do it.

Thanks and regards
ping fan
> --
> Â Â Â Â Â Â Â Â Â Â Â ÂGleb.
èº{.nÇ+‰·Ÿ®‰­†+%ŠËlzwm…ébëæìr¸›zX§»®w¥Š{ayºÊÚë,j­¢f£¢·hš‹àz¹®w¥¢¸ ¢·¦j:+v‰¨ŠwèjØm¶Ÿÿ¾«‘êçzZ+ƒùšŽŠÝj"ú!¶iO•æ¬z·švØ^¶m§ÿðà nÆàþY&—