Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifierrace conditions

From: Gregory Haskins
Date: Mon Jun 22 2009 - 13:43:59 EST


Davide Libenzi wrote:
> On Mon, 22 Jun 2009, Gregory Haskins wrote:
>
>
>> Davide Libenzi wrote:
>>
>>> On Sun, 21 Jun 2009, Gregory Haskins wrote:
>>>
>>>
>>>
>>>> This looks great, Davide. I am fairly certain I can now solve the races
>>>> and even implement Michael's DEASSIGN feature with this patch in place.
>>>> I will actually fire it up tomorrow when I am back in the office and
>>>> give it a spin, but I do not spy any more races via visual inspection.
>>>>
>>>> Kind Regards,
>>>> -Greg
>>>>
>>>> PS: I was wrong with my previous statement about requiring an embeddable
>>>> object (eventfd_notifier for me, eventfd_pollcb for you). I think you
>>>> can technically solve this issue minimally by merely locking the POLLHUP
>>>> and exposing the kref. However, I think that leads to an more awkward
>>>> interface (e.g. we already have eventfd_fget() plus we add a new one
>>>> like eventfd_refget(), which might confuse users), so I prefer what you
>>>> did here. Just thought I would throw that out there in case you would
>>>> prefer to change even fewer lines.
>>>>
>>>>
>>> I actually ended up exposing the eventfd context anyway, since IMO is a
>>> better option instead of holding references to the eventfd file (that
>>> makes VFS people uneasy).
>>>
>>>
>> I liked "version - 1" better ;)
>>
>> I think ultimately we still want to hold the fget() for
>> eventfd_signal(), as it is the producer side. Without it, we have no
>> way of knowing when the last producer goes away if they happen to be an
>> in-kernel user.
>>
>
> No you don't. Holding a file* reference does not give you any assurance
> whatsoever that the last consumer did not go away. The f_count value will
> simply be 1 (just you).
>

I am probably confused or perhaps have the wrong terminology, but isnt
that "ok". I am concerned about the consumer (the guy getting the
POLLINs) to be able to detect POLLHUP when the last producer
(f_ops->write() from userspace, eventfd_signal() from kernel) goes away.

Consider the following sequence:

-------------------

userspace calls "fd = eventfd()", and gives one to KVM as an irqfd, and
the other to some PCI-passthrough device.

The kvm/irqfd side acquires a kref, the pci side acquires a file. At
this moment, userspace has the fd, and the pci device has the file (for
eventfd_signal()). The fget() count is 2. Userspace closes the fd
because its done with it, and the count drops to 1.

Some time later, pci does an fput(), and KVM sees the POLLHUP and cleans up.

-------------------

In this new model, the POLLHUP would have gone out as soon as userspace
closed the fd, even though the intended producer (the PCI device) and
the consumer (the KVM guest) are still up and running. This doesnt seem
right to me. Or am I missing something?

Thanks Davide,
-Greg

Attachment: signature.asc
Description: OpenPGP digital signature