Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notificationinterface

From: Gregory Haskins
Date: Thu May 07 2009 - 10:55:12 EST


Avi Kivity wrote:
> Gregory Haskins wrote:
>> One thing I was thinking here was that I could create a flag for the
>> kvm_irqfd() function for something like "KVM_IRQFD_MODE_CLEAR". This
>> flag when specified at creation time will cause the event to execute a
>> clear operation instead of a set when triggered. That way, the default
>> mode is an edge-triggered set. The non-default mode is to trigger a
>> clear. Level-triggered ints could therefore create two irqfds, one for
>> raising, the other for clearing.
>>
>
> That's my second choice option.
>
>> An alternative is to abandon the use of eventfd, and allow the irqfd to
>> be a first-class anon-fd. The parameters passed to the write/signal()
>> function could then indicate the desired level. The disadvantage would
>> be that it would not be compatible with eventfd, so we would need to
>> decide if the tradeoff is worth it.
>>
>
> I would really like to keep using eventfd. Which is why I asked
> Davide about the prospects of direct callbacks (vs wakeups).

I saw that request. That would be ideal.

>
>> OTOH, I suspect level triggered interrupts will be primarily in the
>> legacy domain, so perhaps we do not need to worry about it too much.
>> Therefore, another option is that we *could* simply set the stake in the
>> ground that legacy/level cannot use irqfd.
>>
>
> This is my preferred option. For a virtio-net-server in the kernel,
> we'd service its eventfd in qemu, raising and lowering the pci
> interrupt in the traditional way.
>
> But we'd still need to know when to lower the interrupt. How?

IIUC, isn't that usually device/subsystem specific, and out of scope of
the GSI delivery vehicle? For instance, most devices I have seen with
level ints have a register in their device register namespace for acking
the int. As an aside, this is what causes some of the grief in dealing
with shared interrupts like KVM pass-through and/or threaded-isrs:
There isn't a standardized way to ACK them.

You may also see some generalization of masking/acking in things like
the MSI-X table. But again, this would be out of scope of the general
GSI delivery path IIUC.

I understand that there is a feedback mechanism in the ioapic model for
calling back on acknowledgment of the interrupt. But I am not sure what
is how the real hardware works normally, and therefore I am not
convinced that is something we need to feed all the way back (i.e. via
irqfd or whatever). In the interest of full disclosure, its been a few
years since I studied the xAPIC docs, so I might be out to lunch on that
assertion. ;)

-Greg



Attachment: signature.asc
Description: OpenPGP digital signature