Re: [PATCH v7] kvm: make vcpu life cycle separated from kvm instance

From: Liu ping fan
Date: Sun Jan 15 2012 - 08:17:39 EST


On Thu, Jan 12, 2012 at 8:37 PM, Avi Kivity <avi@xxxxxxxxxx> wrote:
> On 01/07/2012 04:55 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@xxxxxxxxxxxxxxxxxx>
>>
>> Currently, vcpu will be destructed only after kvm instance is
>> destroyed. This result to vcpu keep idle in kernel, but can not
>> be freed when it is unplugged in guest.
>>
>> Change this to vcpu's destruction before kvm instance, so vcpu MUST
>
> Must?
>
Yes, in kvm_arch_vcpu_destruct-->kvm_put_kvm(kvm); so after all vcpu
destroyed, then can kvm instance

>> and CAN be destroyed before kvm instance. By this way, we can remove
>> vcpu when guest does not need it any longer.
>>
>> TODO: push changes to other archs besides x86.
>>
>> -Rename kvm_vcpu_zap to kvm_vcpu_destruct and so on.
>
> kvm_vcpu_destroy.
>
The name "kvm_arch_vcpu_destroy" is already occupied in different arch.
So change
kvm_vcpu_zap -> kvm_vcpu_destruct
kvm_vcpu_arch_zap -> kvm_vcpu_arch_destruct

>>
>> Âstruct kvm_vcpu {
>> Â Â Â struct kvm *kvm;
>> + Â Â struct list_head list;
>
> vcpu_list_link, so it's clear this is not a head but a link, and so we
> know which list it belongs to.
>
OK
>> - Â Â struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
>> + Â Â struct list_head vcpus;
>
> This has the potential for a slight performance regression by bouncing
> an extra cache line, but it's acceptable IMO. ÂWe can always introduce

Sorry, not clear about this scene, do you mean that the changing of
vcpu link list will cause the invalid of cache between SMP? But the
link list is not changed often.
> an apic ID -> vcpu hash table which improves things all around.
>
>> |
>> @@ -1593,11 +1598,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>> Â{
>> Â Â Â struct kvm *kvm = me->kvm;
>> Â Â Â struct kvm_vcpu *vcpu;
>> - Â Â int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
>> - Â Â int yielded = 0;
>> - Â Â int pass;
>> - Â Â int i;
>> -
>> + Â Â struct task_struct *task = NULL;
>> + Â Â struct pid *pid;
>> + Â Â int pass, firststart, lastone, yielded, idx;
>
> Avoid unrelated changes please.
>
OK
>> @@ -1605,15 +1608,26 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>> Â Â Â Â* VCPU is holding the lock that we need and will release it.
>> Â Â Â Â* We approximate round-robin by starting at the last boosted VCPU.
>> Â Â Â Â*/
>> - Â Â for (pass = 0; pass < 2 && !yielded; pass++) {
>> - Â Â Â Â Â Â kvm_for_each_vcpu(i, vcpu, kvm) {
>> - Â Â Â Â Â Â Â Â Â Â struct task_struct *task = NULL;
>> - Â Â Â Â Â Â Â Â Â Â struct pid *pid;
>> - Â Â Â Â Â Â Â Â Â Â if (!pass && i < last_boosted_vcpu) {
>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â i = last_boosted_vcpu;
>> + Â Â for (pass = 0, firststart = 0; pass < 2 && !yielded; pass++) {
>> +
>> + Â Â Â Â Â Â idx = srcu_read_lock(&kvm->srcu);
>
> Can move the lock to the top level.
>
OK
>> + Â Â Â Â Â Â kvm_for_each_vcpu(vcpu, kvm) {
>> + Â Â Â Â Â Â Â Â Â Â if (kvm->last_boosted_vcpu_id < 0 && !pass) {
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â pass = 1;
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â break;
>> + Â Â Â Â Â Â Â Â Â Â }
>> + Â Â Â Â Â Â Â Â Â Â if (!pass && !firststart &&
>> + Â Â Â Â Â Â Â Â Â Â Â Â vcpu->vcpu_id != kvm->last_boosted_vcpu_id) {
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â continue;
>> + Â Â Â Â Â Â Â Â Â Â } else if (!pass && !firststart) {
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â firststart = 1;
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â continue;
>> - Â Â Â Â Â Â Â Â Â Â } else if (pass && i > last_boosted_vcpu)
>> + Â Â Â Â Â Â Â Â Â Â } else if (pass && !lastone) {
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â if (vcpu->vcpu_id == kvm->last_boosted_vcpu_id)
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â lastone = 1;
>> + Â Â Â Â Â Â Â Â Â Â } else if (pass && lastone)
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â break;
>> +
>
> Seems like a large change. ÂIs this because the vcpu list is unordered?
> Maybe it's better to order it.
>
To find the last boosted vcpu (I guest it is more likely the lock
holder), we must enumerate the vcpu link list. While implemented by
kvm->vcpus[], it is more facile.

> Rik?
>
>> Â Â Â Â Â Â Â Â Â Â Â if (vcpu == me)
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â continue;
>> Â Â Â Â Â Â Â Â Â Â Â if (waitqueue_active(&vcpu->wq))
>> @@ -1629,15 +1643,20 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â put_task_struct(task);
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â continue;
>> Â Â Â Â Â Â Â Â Â Â Â }
>> +
>> Â Â Â Â Â Â Â Â Â Â Â if (yield_to(task, 1)) {
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â put_task_struct(task);
>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â kvm->last_boosted_vcpu = i;
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â mutex_lock(&kvm->lock);
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â kvm->last_boosted_vcpu_id = vcpu->vcpu_id;
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â mutex_unlock(&kvm->lock);
>
> Why take the mutex?
>
In kvm_vcpu_release()
mutex_lock(&kvm->lock);
if (kvm->last_boosted_vcpu_id == vcpu->vcpu_id)

----------------------------------------->CAN NOT break
kvm->last_boosted_vcpu_id = -1;
mutex_unlock(&kvm->lock);

>> @@ -1673,11 +1692,30 @@ static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
>> Â Â Â return 0;
>> Â}
>>
>> +static void kvm_vcpu_destruct(struct kvm_vcpu *vcpu)
>> +{
>> + Â Â kvm_arch_vcpu_destruct(vcpu);
>> +}
>> +
>> Âstatic int kvm_vcpu_release(struct inode *inode, struct file *filp)
>> Â{
>> Â Â Â struct kvm_vcpu *vcpu = filp->private_data;
>> + Â Â struct kvm *kvm = vcpu->kvm;
>> + Â Â filp->private_data = NULL;
>> +
>> + Â Â mutex_lock(&kvm->lock);
>> + Â Â list_del_rcu(&vcpu->list);
>> + Â Â atomic_dec(&kvm->online_vcpus);
>> + Â Â mutex_unlock(&kvm->lock);
>> + Â Â synchronize_srcu_expedited(&kvm->srcu);
>
> Why _expedited?
>
> Even better would be call_srcu() but it doesn't exist.
>
> I think we can actually use regular rcu. ÂThe only user that blocks is
> kvm_vcpu_on_spin(), yes? so we can convert the vcpu to a task using
> get_pid_task(), then, outside the rcu lock, call yield_to().
>
Yes, kvm_vcpu_on_spin() is the only one. But I think if outside the
rcu lock, call yield_to(), it will be like the following

again:
rcu_lock()
kvm_for_each_vcpu(){
......
}
rcu_unlock()
if (yield_to(task, 1)) {
.....
} else
goto again;

We must travel through the linked list again to find the next vcpu.

>
>>
>> - Â Â kvm_put_kvm(vcpu->kvm);
>> + Â Â mutex_lock(&kvm->lock);
>> + Â Â if (kvm->last_boosted_vcpu_id == vcpu->vcpu_id)
>> + Â Â Â Â Â Â kvm->last_boosted_vcpu_id = -1;
>> + Â Â mutex_unlock(&kvm->lock);
>> +
>> + Â Â /*vcpu is out of list,drop it safely*/
>> + Â Â kvm_vcpu_destruct(vcpu);
>
> Can all kvm_arch_vcpu_destroy() directly.
>
>> +static struct kvm_vcpu *kvm_vcpu_create(struct kvm *kvm, u32 id)
>> +{
>> + Â Â struct kvm_vcpu *vcpu;
>> + Â Â vcpu = kvm_arch_vcpu_create(kvm, id);
>> + Â Â if (IS_ERR(vcpu))
>> + Â Â Â Â Â Â return vcpu;
>> + Â Â INIT_LIST_HEAD(&vcpu->list);
>
> Really needed?
>
Yes, it is unnecessary
>> + Â Â return vcpu;
>> +}
>
> Just fold this into the caller.
>
OK

Thanks and regards,
ping fan
>> +
>> Â/*
>> Â * Creates some virtual cpus. ÂGood luck creating more than one.
>> Â */
>> Âstatic int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>> Â{
>> - Â Â int r;
>> + Â Â int r, idx;
>> Â Â Â struct kvm_vcpu *vcpu, *v;
>>
>> - Â Â vcpu = kvm_arch_vcpu_create(kvm, id);
>> + Â Â vcpu = kvm_vcpu_create(kvm, id);
>> Â Â Â if (IS_ERR(vcpu))
>> Â Â Â Â Â Â Â return PTR_ERR(vcpu);
>>
>> @@ -1723,13 +1771,15 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>> Â Â Â Â Â Â Â goto unlock_vcpu_destroy;
>> Â Â Â }
>>
>> - Â Â kvm_for_each_vcpu(r, v, kvm)
>> + Â Â idx = srcu_read_lock(&kvm->srcu);
>> + Â Â kvm_for_each_vcpu(v, kvm) {
>> Â Â Â Â Â Â Â if (v->vcpu_id == id) {
>> Â Â Â Â Â Â Â Â Â Â Â r = -EEXIST;
>> + Â Â Â Â Â Â Â Â Â Â srcu_read_unlock(&kvm->srcu, idx);
>
> Put that in the error path please (add a new label if needed).
>
>> Â Â Â Â Â Â Â Â Â Â Â goto unlock_vcpu_destroy;
>
>>
>> - Â Â kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu;
>> - Â Â smp_wmb();
>> + Â Â /*Protected by kvm->lock*/
>
> Spaces.
>
>> + Â Â list_add_rcu(&vcpu->list, &kvm->vcpus);
>> Â Â Â atomic_inc(&kvm->online_vcpus);
>
>
>
> --
> error compiling committee.c: too many arguments to function
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/