Re: [PATCH 3/4 changelog-v2] KVM: Switch to srcu-lessget_dirty_log()

From: Takuya Yoshikawa
Date: Tue Mar 06 2012 - 09:43:27 EST


Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:

> > + spin_lock(&kvm->mmu_lock);
>
> It is not clear why mmu_lock is needed. Dropping it across the xchg loop
> should be similar to srcu implementation, in that concurrent updates
> will be visible only on the next get_dirty call? Well, it is necessary
> anyway for write protecting the sptes.

My implementation does write protection inside the xchg loop.
Then, after that loop, flushes TLB.

mmu_lock must protect both of these together.

If we do not mind scanning the bitmap twice, we can decouple the
xchg loop and write protection, but it will be a bit slower, and in
any case we need to hold mmu_lock until TLB is flushed.

As can be seen from the unit-test result the majority of time
is being spent on write protecting sptes, so decoupling xchg loop
alone will not alleviate the problem so much -- my guess.

> A cond_resched_lock() would alleviate the potentially long held
> times for mmu_lock (can you measure it with large memslots?)

How to move TLB flush out of mmu_lock critical sections was discussed
before, and there seemed to be some proposals.

Anyone is working on that?

After that we can do many things.

One idea is to make the extra bitmap buffer size shrink to one page
or so and do xchg and write protection loop by that limited size.

Because we can drop mmu_lock, it is possible to copy_to_user part of
the dirty bitmap, and then go to the next part.

After everything is protected, we can then do TLB flush after dropping
mmu_lock.

> Otherwise looks nice.

Thanks,
Takuya


> > - r = -ENOMEM;
> > - slots = kmemdup(kvm->memslots, sizeof(*kvm->memslots), GFP_KERNEL);
> > - if (!slots)
> > - goto out;
> > + for (i = 0; i < n / sizeof(long); i++) {
> > + unsigned long mask;
> > + gfn_t offset;
> >
> > - memslot = id_to_memslot(slots, log->slot);
> > - memslot->nr_dirty_pages = 0;
> > - memslot->dirty_bitmap = dirty_bitmap_head;
> > - update_memslots(slots, NULL);
> > + if (!dirty_bitmap[i])
> > + continue;
> >
> > - old_slots = kvm->memslots;
> > - rcu_assign_pointer(kvm->memslots, slots);
> > - synchronize_srcu_expedited(&kvm->srcu);
> > - kfree(old_slots);
> > + is_dirty = true;
> >
> > - write_protect_slot(kvm, memslot, dirty_bitmap, nr_dirty_pages);
> > + mask = xchg(&dirty_bitmap[i], 0);
> > + dirty_bitmap_buffer[i] = mask;
> >
> > - r = -EFAULT;
> > - if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
> > - goto out;
> > - } else {
> > - r = -EFAULT;
> > - if (clear_user(log->dirty_bitmap, n))
> > - goto out;
> > + offset = i * BITS_PER_LONG;
> > + kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> > }
> > + if (is_dirty)
> > + kvm_flush_remote_tlbs(kvm);
> > +
> > + spin_unlock(&kvm->mmu_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/