Re: [KVM PATCH v2 2/2] kvm: use POLLHUP to close an irqfd insteadof an explicit ioctl

From: Michael S. Tsirkin
Date: Thu Jun 18 2009 - 10:37:00 EST


On Thu, Jun 18, 2009 at 10:03:36AM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Thu, Jun 18, 2009 at 08:00:39AM -0400, Gregory Haskins wrote:
> >
> >> Michael S. Tsirkin wrote:
> >>
> >>
> >>> BTW, Gregory, this can be used to fix the race in the design: create a
> >>> thread and let it drop the module reference with module_put_and_exit.
> >>>
> >>>
> >> I had thought of doing something like this initially too, but I think
> >> its racy as well. Ultimately, you need to make sure the eventfd
> >> callback is completely out before its safe to run, and deferring to a
> >> thread would not change this race. The only sane way I can see to do
> >> that is to have the caller infrastructure annotate the event somehow
> >> (either directly with a module_put(), or indirectly with some kind of
> >> state transition that can be tracked with something like
> >> synchronize_sched().
> >>
> >
> > Here's what one could do: create a thread for each irqfd, and increment
> > module ref count, put that thread to sleep. When done with
> > irqfd, don't delete it and don't decrement module refcount, wake thread
> > instead. thread kills irqfd and calls module_put_and_exit.
> >
> > I don't think it's racy
>
>
> I believe it is. How would you prevent the thread from doing the
> module_put_and_exit() before the eventfd callback thread is known to
> have exited the relevant .text section?

Right.

> All this talk does give me an idea, tho. Ill make a patch.

OK, but ask yourself whether this bag of tricks is worth it, and whether
we'll find another hole later. Let's reserve the trickiness for
fast-path, where it's needed, and keep at least the assign/deassign simple.

> >
> >>> Which will work, but I guess at this point we should ask ourselves
> >>> whether all the hearburn with srcu, threads and module references is
> >>> better than just asking the user to call and ioctl.
> >>>
> >>>
> >> I am starting to agree with you, here. :)
> >>
> >> Note one thing: the SRCU stuff is mostly orthogonal from the rest of the
> >> conversation re: the module_put() races. I only tied it into the
> >> current thread because the eventfd_notifier_register() thread gave me a
> >> convenient way to hook some other context to do the module_put(). In
> >> the long term, the srcu changes are for the can_sleep() stuff. So on
> >> that note, lets see if I can convince Davide that the srcu stuff is not
> >> so evil before we revert the POLLHUP patches, since the module_put() fix
> >> is trivial once that is in place.
> >>
> >
> > Can this help with DEASSIGN as well? We need it for migration.
> >
>
> No, but afaict you do not need this for migration anyway. Migrate the
> GSI and re-call kvm_irqfd() on the other side. Would the fd even be
> relevant across a migration anyway? I would think not, but admittedly I
> know little about how qemu/kvm migration actually works.

Yes but that's not live migration. For live migration, the trick is that
you are running locally but send changes to remote guest. For that, we
need to put qemu in the middle between the device and the guest, so it
can detect activity and update the remote side.

And the best way to do that is to take poll eventfd that device assigns
and push eventfd that kvm polls. To switch between this setup
and the one where kvm polls the ventfd from device directly,
you need deassign.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/