Re: [PATCH RFC 11/39] KVM: x86/xen: evtchn signaling via eventfd

From: Joao Martins
Date: Mon Nov 30 2020 - 10:10:01 EST


On 11/30/20 12:55 PM, David Woodhouse wrote:
> On Mon, 2020-11-30 at 12:17 +0000, Joao Martins wrote:
>> On 11/30/20 9:41 AM, David Woodhouse wrote:
>>> On Wed, 2019-02-20 at 20:15 +0000, Joao Martins wrote:
>>>> EVTCHNOP_send short-circuiting happens by marking the event as pending
>>>> in the shared info and vcpu info pages and doing the upcall. For IPIs
>>>> and interdomain event channels, we do the upcall on the assigned vcpu.
>>>
>>> This part I understand, 'peeking' at the EVTCHNOP_send hypercall so
>>> that we can short-circuit IPI delivery without it having to bounce
>>> through userspace.
>>>
>>> But why would I then want then short-circuit the short-circuit,
>>> providing an eventfd for it to signal... so that I can then just
>>> receive the event in userspace in a *different* form to the original
>>> hypercall exit I would have got?
>>>
>>
>> One thing I didn't quite do at the time, is the whitelisting of unregistered
>> ports to userspace. Right now, it's a blacklist i.e. if it's not handled in
>> the kernel (IPIs, timer vIRQ, etc) it goes back to userspace. When the only
>> ones which go to userspace should be explicitly requested as such
>> and otherwise return -ENOENT in the hypercall.
>
> Hm, why would -ENOENT be a fast path which needs to be handled in the
> kernel?
>
It's not that it's a fast path.

Like sending an event channel to an unbound vector, now becomes an possible vector to
worry about in userspace VMM e.g. should that port lookup logic be fragile.

So it's more along the lines of Nack-ing the invalid port earlier to rather go
to go userspace to invalidate it, provided we do the lookup anyway in the kernel.

>> Perhaps eventfd could be a way to express this? Like if you register
>> without an eventfd it's offloaded, otherwise it's assigned to userspace,
>> or if neither it's then returned an error without bothering the VMM.
>
> I much prefer the simple model where the *only* event channels that the
> kernel knows about are the ones it's expected to handle.
>
> For any others, the bypass doesn't kick in, and userspace gets the
> KVM_EXIT_HYPERCALL exit.
>
/me nods

I should comment on your other patch but: if we're going to make it generic for
the userspace hypercall handling, might as well move hyper-v there too. In this series,
I added KVM_EXIT_XEN, much like it exists KVM_EXIT_HYPERV -- but with a generic version
I wonder if a capability could gate KVM_EXIT_HYPERCALL to handle both guest types, while
disabling KVM_EXIT_HYPERV. But this is probably subject of its own separate patch :)

>> But still, eventfd is probably unnecessary complexity when another @type
>> (XEN_EVTCHN_TYPE_USER) would serve, and then just exiting to userspace
>> and let it route its evtchn port handling to the its own I/O handling thread.
>
> Hmm... so the benefit of the eventfd is that we can wake the I/O thread
> directly instead of bouncing out to userspace on the vCPU thread only
> for it to send a signal and return to the guest? Did you ever use that,
> and it is worth the additional in-kernel code?
>
This was my own preemptive optimization to the interface -- it's not worth
the added code for vIRQ and IPI at this point which is what *for sure* the
kernel will handle.

> Is there any reason we'd want that for IPI or VIRQ event channels, or
> can it be only for INTERDOM/UNBOUND event channels which come later?
>
/me nods.

No reason to keep that for IPI/vIRQ.

> I'm tempted to leave it out of the first patch, and *maybe* add it back
> in a later patch, putting it in the union alongside .virq.type.
>
>
> struct kvm_xen_eventfd {
>
> #define XEN_EVTCHN_TYPE_VIRQ 0
> #define XEN_EVTCHN_TYPE_IPI 1
> __u32 type;
> __u32 port;
> __u32 vcpu;
> - __s32 fd;
>
> #define KVM_XEN_EVENTFD_DEASSIGN (1 << 0)
> #define KVM_XEN_EVENTFD_UPDATE (1 << 1)
> __u32 flags;
> union {
> struct {
> __u8 type;
> } virq;
> + struct {
> + __s32 eventfd;
> + } interdom; /* including unbound */
> __u32 padding[2];
> };
> } evtchn;
>
> Does that make sense to you?
>
Yeap! :)

Joao