Re: [RFC PATCH] vfs: shutdown lease notifications on file close

From: Dan Williams
Date: Fri Oct 13 2017 - 13:43:41 EST


On Fri, Oct 13, 2017 at 10:01 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Oct 13, 2017 at 08:56:10AM -0700, Dan Williams wrote:
>> While implementing MAP_DIRECT, an mmap flag that arranges for an
>> FL_LAYOUT lease to be established, Al noted:
>>
>> You are not even guaranteed that descriptor will remain be still
>> open by the time you pass it down to your helper, nevermind the
>> moment when event actually happens...
>>
>> The first problem can be solved with an fd{get,put} at mmap
>> {entry,exit}.
>
> Huh? fdget() does *NOT* guarantee that descriptor won't get closed. What
> it does is guarantee that struct file won't get closed under you, which
> is nowhere near the same thing. And while we are at it, it certainly
> _is_ called by mmap()...
>
>> The second problem appears to be a general issue.
>>
>> Leases follow the lifetime of the inode, so it is possible for a lease
>> to be broken after the file is closed. When that happens userspace may
>> get a notification on a stale fd. Of course it is not recommended that a
>> process close a file descriptor with an active lease, but if it does we
>> should assume that the notification is not needed either. Walk leases at
>> close time and invalidate any pending fasync instances.
>
> What the hell is special about close(2) and not, e.g. dup2(2)? Or execve(2)
> triggering close-on-exec, etc... Besides, you are changing a user-visible
> behaviour here. Suppose your process forks and the child closes all
> descriptors; should that stop SIGIO delivery to the parent?
>
> Let's step back for a minute; could you describe how the userland is supposed
> to use that thing?

MAP_DIRECT is a meant as a way to safely pass DAX mappings of a file
to the RDMA sub-system, or any sub-system that follows a memory
registration design pattern. RDMA expects that once it has done
get_user_pages() that it has exclusive access to the memory backing
the file mapping indefinitely. With page cache backed file mappings we
can truncate and hole punch the file at will and the RDMA operations
will continue to pages that are no longer part of the file. Yes, that
breaks coherency, but it otherwise does not cause damage to unrelated
file blocks. With DAX we do not have the luxury of an indirect page
for the RDMA to land the operations are going straight to file blocks
in persistent memory.

With MAP_DIRECT the proposal is that when the RDMA memory registration
code sees 'vma_is_dax(vma) == true' it calls a new ->lease_direct()
vm_operation to take an FL_LAYOUT lease against the file to protect
against truncate / fallocate. Lease expiration triggers a callback to
redirect or shutdown RDMA. The filesystem mmap implemantation also
arranges for an FL_LAYOUT lease to be taken at mmap time when the fd
is available to setup a SIGIO notification.

If we don't take a lease at mmap time then we would need to develop a
notification mechanism that is specific to the RDMA code, and using
SIGIO on the mmap fd seemed a more generic solution to me.