Re: Review request: draft userfaultfd(2) manual page

From: Michael Kerrisk (man-pages)
Date: Fri Apr 21 2017 - 02:31:14 EST


Hello Mike,

On 03/21/2017 03:01 PM, Mike Rapoport wrote:
> Hello Michael,
>
> On Mon, Mar 20, 2017 at 09:08:05PM +0100, Michael Kerrisk (man-pages) wrote:
>> Hello Andrea, Mike, and all,
>>
>> Mike: thanks for the page that you sent. I've reworked it
>> a bit, and also added a lot of further information,
>> and an example program. In the process, I split the page
>> into two pieces, with one piece describing the userfaultfd()
>> system call and the other describing the ioctl() operations.
>>
>> I'd like to get review input, especially from you and
>> Andrea, but also anyone else, for the current version
>> of this page, which includes a few FIXMEs to be sorted.
>
> Thanks for the update. I'm adressing the FIXME points you've mentioned
> below.

Thanks!

> Otherwise, everything seems the right description of the current upstream.
> 4.11 will have quite a few updates to userfault and we'll need to udpate
> this page and ioctl_userfaultfd(2) to address those updates. I am planning
> to work on the man update in the next few weeks.
>
>> I've shown the rendered version of the page below.
>> The groff source is attached, and can also be found
>> at the branch here:
>
>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_userfaultfd
>>
>> The new ioctl_userfaultfd(2) page follows this mail.
>>
>> Cheers,
>>
>> Michael
>
> --
> Sincerely yours,
> Mike.
>
>
>> USERFAULTFD(2) Linux Programmer's Manual USERFAULTFD(2)
>>
>> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>> âFIXME â
>> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>> âNeed to describe close(2) semantics for userfaulfd â
>> âfile descriptor: what happens when the userfaultfd â
>> âFD is closed? â
>> â â
>> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>
> When userfaultfd is closed, it unregisters all memory ranges that were
> previously registered with it and flushes the outstanding page fault
> events.

Presumably, this is more precisely stated as, "when the last
file descriptor referring to a userfaultfd object is closed..."?

I've made the text:

When the last file descriptor referring to a userfaultfd object
is closed, all memory ranges that were registered with the
object are unregistered and unread page-fault events are
flushed.

[...]

>> Reading from the userfaultfd structure
>> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>> âFIXME â
>> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>> âare the details below correct? â
>> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>
> Yes, at least for the current upstream version. 4.11 will have quite a few
> updates to userfaultfd.

Okay.

>> Each read(2) from the userfaultfd file descriptor returns one
>> or more uffd_msg structures, each of which describes a page-
>> fault event:
>>
>> struct uffd_msg {
>> __u8 event; /* Type of event */
>> ...
>> union {
>> struct {
>> __u64 flags; /* Flags describing fault */
>> __u64 address; /* Faulting address */
>> } pagefault;
>> ...
>> } arg;
>>
>> /* Padding fields omitted */
>> } __packed;
>>
>> If multiple events are available and the supplied buffer is
>> large enough, read(2) returns as many events as will fit in the
>> supplied buffer. If the buffer supplied to read(2) is smaller
>> than the size of the uffd_msg structure, the read(2) fails with
>> the error EINVAL.
>>
>> The fields set in the uffd_msg structure are as follows:
>>
>> event The type of event. Currently, only one value can appear
>> in this field: UFFD_EVENT_PAGEFAULT, which indicates a
>> page-fault event.
>>
>> address
>> The address that triggered the page fault.
>>
>> flags A bit mask of flags that describe the event. For
>> UFFD_EVENT_PAGEFAULT, the following flag may appear:
>>
>> UFFD_PAGEFAULT_FLAG_WRITE
>> If the address is in a range that was registered
>> with the UFFDIO_REGISTER_MODE_MISSING flag (see
>> ioctl_userfaultfd(2)) and this flag is set, this
>> a write fault; otherwise it is a read fault.
>>
>> A read(2) on a userfaultfd file descriptor can fail with the
>> following errors:
>>
>> EINVAL The userfaultfd object has not yet been enabled using
>> the UFFDIO_API ioctl(2) operation
>>
>> The userfaultfd file descriptor can be monitored with poll(2),
>> select(2), and epoll(7). When events are available, the file
>> descriptor indicates as readable.
>>
>>
>> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>> âFIXME â
>> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>> âBut, it seems, the object must be created with â
>> âO_NONBLOCK. What is the rationale for this requireâ â
>> âment? Something needs to be said in this manual â
>> âpage. â
>> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>
> The object can be created without O_NONBLOCK, so probably the above
> sentence can be rephrased as:
>
> When the userfaultfd file descriptor is opened in non-blocking mode, it can
> be monitored with ...

Yes, but why is there this requirement for poll() etc. with the
O_NONBLOCK flag? I think something about that needs to be said in the
man page. Sorry, my FIXME was not clear enough. I've reworded the text
and the FIXME:

If the O_NONBLOCK flag is enabled in the associated open file
description, the userfaultfd file descriptor can be monitored
with poll(2), select(2), and epoll(7). When events are availâ
able, the file descriptor indicates as readable. If the O_NONâ
BLOCK flag is not enabled, then poll(2) (always) indicates the
file as having a POLLERR condition, and select(2) indicates the
file descriptor as both readable and writable.

âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
âFIXME â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
âWhat is the reason for this seemingly odd behavior â
âwith respect to the O_NONBLOCK flag? (see userâ â
âfaultfd_poll() in fs/userfaultfd.c). Something â
âneeds to be said about this. â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ

[...]

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/