Re: [PATCH] xen: privcmd: Add support for irqfd

From: Oleksandr Tyshchenko
Date: Thu Jul 13 2023 - 10:40:34 EST




On 13.07.23 10:44, Juergen Gross wrote:

Hello all.


> On 12.07.23 10:48, Viresh Kumar wrote:
>> Xen provides support for injecting interrupts to the guests via the
>> HYPERVISOR_dm_op() hypercall. The same is used by the Virtio based
>> device backend implementations, in an inefficient manner currently.
>>
>> Generally, the Virtio backends are implemented to work with the Eventfd
>> based mechanism. In order to make such backends work with Xen, another
>> software layer needs to poll the Eventfds and raise an interrupt to the
>> guest using the Xen based mechanism. This results in an extra context
>> switch.
>>
>> This is not a new problem in Linux though. It is present with other
>> hypervisors like KVM, etc. as well. The generic solution implemented in
>> the kernel for them is to provide an IOCTL call to pass the interrupt
>> details and eventfd, which lets the kernel take care of polling the
>> eventfd and raising of the interrupt, instead of handling this in user
>> space (which involves an extra context switch).
>>
>> This patch adds support to inject a specific interrupt to guest using
>> the eventfd mechanism, by preventing the extra context switch.
>>
>> Inspired by existing implementations for KVM, etc..


Viresh, great work!

Do you perhaps have corresponding users-space (virtio backend) example
adopted for that feature (I would like to take a look at it if possible)?



>>
>> Signed-off-by: Viresh Kumar <viresh.kumar@xxxxxxxxxx>
>> ---
>>   drivers/xen/privcmd.c      | 285 ++++++++++++++++++++++++++++++++++++-
>>   include/uapi/xen/privcmd.h |  14 ++
>>   2 files changed, 297 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index e2f580e30a86..e8096b09c113 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
>> @@ -9,11 +9,16 @@
>>   #define pr_fmt(fmt) "xen:" KBUILD_MODNAME ": " fmt
>> +#include <linux/eventfd.h>
>> +#include <linux/file.h>
>>   #include <linux/kernel.h>
>>   #include <linux/module.h>
>> +#include <linux/mutex.h>
>> +#include <linux/poll.h>
>>   #include <linux/sched.h>
>>   #include <linux/slab.h>
>>   #include <linux/string.h>
>> +#include <linux/workqueue.h>
>>   #include <linux/errno.h>
>>   #include <linux/mm.h>
>>   #include <linux/mman.h>
>> @@ -833,6 +838,266 @@ static long privcmd_ioctl_mmap_resource(struct
>> file *file,
>>       return rc;
>>   }
>> +/* Irqfd support */
>> +static struct workqueue_struct *irqfd_cleanup_wq;
>> +static DEFINE_MUTEX(irqfds_lock);
>> +static LIST_HEAD(irqfds_list);
>> +
>> +struct privcmd_kernel_irqfd {
>> +    domid_t dom;
>> +    u8 level;
>> +    u32 irq;
>> +    struct eventfd_ctx *eventfd;
>> +    struct work_struct shutdown;
>> +    wait_queue_entry_t wait;
>> +    struct list_head list;
>> +    poll_table pt;
>> +};
>> +
>> +/* From xen/include/public/hvm/dm_op.h */
>> +#define XEN_DMOP_set_irq_level 19
>> +
>> +struct xen_dm_op_set_irq_level {
>> +    u32 irq;
>> +    /* IN - Level: 0 -> deasserted, 1 -> asserted */
>> +    u8 level;
>> +    u8 pad[3];
>> +};
>> +
>> +struct xen_dm_op {
>> +    u32 op;
>> +    u32 pad;
>> +    union {
>> +        /*
>> +         * There are more structures here, we won't be using them, so
>> +         * can skip adding them here.
>> +         */
>> +        struct xen_dm_op_set_irq_level set_irq_level;
>> +    } u;
>> +};
>
> Instead of copying definitions over from Xen into privcmd.c, please just
> update
> the related linux header include/xen/interface/dm_op.h from the Xen public
> header.
>
>> +
>> +static void irqfd_deactivate(struct privcmd_kernel_irqfd *kirqfd)
>> +{
>> +    lockdep_assert_held(&irqfds_lock);
>> +
>> +    list_del_init(&kirqfd->list);
>> +    queue_work(irqfd_cleanup_wq, &kirqfd->shutdown);
>> +}
>> +
>> +static void irqfd_shutdown(struct work_struct *work)
>> +{
>> +    struct privcmd_kernel_irqfd *kirqfd =
>> +        container_of(work, struct privcmd_kernel_irqfd, shutdown);
>> +    u64 cnt;
>> +
>> +    eventfd_ctx_remove_wait_queue(kirqfd->eventfd, &kirqfd->wait, &cnt);
>> +    eventfd_ctx_put(kirqfd->eventfd);
>> +    kfree(kirqfd);
>> +}
>> +
>> +static void irqfd_inject(struct privcmd_kernel_irqfd *kirqfd)
>> +{
>> +    struct xen_dm_op dm_op = {
>> +        .op = XEN_DMOP_set_irq_level,
>> +        .u.set_irq_level.irq = kirqfd->irq,
>> +        .u.set_irq_level.level = kirqfd->level,
>> +    };
>> +    struct xen_dm_op_buf xbufs = {
>> +        .size = sizeof(dm_op),
>> +    };
>> +    u64 cnt;
>> +
>> +    eventfd_ctx_do_read(kirqfd->eventfd, &cnt);
>> +    set_xen_guest_handle(xbufs.h, &dm_op);
>> +
>> +    xen_preemptible_hcall_begin();
>> +    HYPERVISOR_dm_op(kirqfd->dom, 1, &xbufs);
>
> Please add some error handling, e.g. by issuing a message in case this
> hypercall
> was failing. Adding a bool "error" to struct privcmd_kernel_irqfd in
> order to
> avoid multiple error messages for the same device might be a good idea.


In addition to provided comments, I would like to mention that this
particular dm_op has Arm implementation only in vanilla hypervisor.

So this feature cannot be immediately reused on x86 because of
XEN_DMOP_set_irq_level at least. As I understand, the x86's variant is
XEN_DMOP_set_isa_irq_level (there is also XEN_DMOP_set_pci_intx_level
for legacy PCI interrupts).

Please note, I am not asking to wire on x86, but maybe it is worth
considering to put this new feature under something like Kconfig's
XEN_PRIVCMD_IRQFD which depends on Arm for now? Or maybe to put dm_op
specific part of irqfd_inject() under CONFIG_ARCH_XXX here or introduce
per-ARCH helpers to form a dm_op (Arm's variant will use
XEN_DMOP_set_irq_level like in current commit). If not, then at least a
sentence in the description mentioning that this works on Arm only needs
to be added, I think.

Also, I am wondering whether we should foresee the IOCTL_PRIVCMD_IRQFD
interface to be suitable for all existing irq related dm_ops (not only
XEN_DMOP_set_irq_level) right now or this can be extended/updated later
on (when there is a real need)?



[snip]