Re: reproducible BUG() in kvm_mmu_get_root() in TDP MMU

From: Ben Gardon
Date: Tue Jan 05 2021 - 14:22:58 EST


On Tue, Jan 5, 2021 at 10:46 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Tue, Jan 05, 2021, Paolo Bonzini wrote:
> > On 05/01/21 18:49, Ben Gardon wrote:
> > > for_each_tdp_mmu_root(kvm, root) {
> > > kvm_mmu_get_root(kvm, root);
> > > <Do something, yield the MMU lock>
> > > kvm_mmu_put_root(kvm, root);
> > > }
> > >
> > > In these cases the get and put root calls are there to ensure that the
> > > root is not freed while the function is running, however they do this
> > > too well. If the put root call reduces the root's root_count to 0, it
> > > should be removed from the roots list and freed before the MMU lock is
> > > released. However the above pattern never bothers to free the root.
> > > The following would fix this bug:
> > >
> > > -kvm_mmu_put_root(kvm, root);
> > > +if (kvm_mmu_put_root(kvm, root))
> > > + kvm_tdp_mmu_free_root(kvm, root);
> >
> > Is it worth writing a more complex iterator struct, so that
> > for_each_tdp_mmu_root takes care of the get and put?
>
> Ya, and maybe with an "as_id" variant to avoid the get/put? Not sure that's a
> worthwhile optimization though.

I'll see about adding such an iterator. I don't think avoiding the get
/ put is really worthwhile in this case since they're cheap operations
and putting them in the iterator makes it less obvious that they're
missing if those functions ever need to yield.

>
> On a related topic, there are a few subtleties with respect to
> for_each_tdp_mmu_root() that we should document/comment. The flows that drop
> mmu_lock while iterating over the roots don't protect against the list itself
> from being modified. E.g. the next entry could be deleted, or a new root
> could be added. I think I've convinced myself that there are no existing bugs,
> but we should document that the exact current behavior is required for
> correctness.
>
> The use of "unsafe" list_for_each_entry() in particular is unintuitive, as using
> the "safe" variant would dereference a deleted entry in the "next entry is
> deleted" scenario.
>
> And regarding addomg a root, using list_add_tail() instead of list_add() in
> get_tdp_mmu_vcpu_root() would cause iteration to visit a root that was added
> after the iteration started (though I don't think this would ever be problematic
> in practice?).

A lot of these observations are safe because the operations using this
iterator only consider one root at a time and aren't really interested
in the state of the list.
Your point about the dangers of adding and removing roots while one of
these functions is running is valid, but like the legacy / shadow MMU
implementation, properties which need to be guaranteed across multiple
roots need to be managed at a higher level.
I believe that with the legacy / shadow MMU the creation of a new
root, while enabling write protection dirty logging for example, could
result in entries in the new root being considered, and others not
considered. That's why we set the dirty logging flag on the memslot
before we write protect SPTEs.
I'm not sure where exactly to document these properties, but I agree
it would be a good thing to do in a future patch set.