Re: [patch] close_range.2: new page documenting close_range(2)

From: Alejandro Colomar (man-pages)
Date: Sat Dec 12 2020 - 12:59:24 EST


Hi Christian,

Makes sense to me.

Thanks,

Alex

On 12/12/20 1:14 PM, Christian Brauner wrote:
> On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) wrote:
>> Hi Christian,
>
> Hi Alex,
>
>>
>> Thanks for confirming that behavior. Seems reasonable.
>>
>> I was wondering...
>> If this call is equivalent to unshare(2)+{close(2) in a loop},
>> shouldn't it fail for the same reasons those syscalls can fail?
>>
>> What about the following errors?:
>>
>> From unshare(2):
>>
>> EPERM The calling process did not have the required privi‐
>> leges for this operation.
>
> unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant
> to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges,
> i.e.
> CLONE_NEWNS
> CLONE_NEWUTS
> CLONE_NEWIPC
> CLONE_NEWNET
> CLONE_NEWPID
> CLONE_NEWCGROUP
> CLONE_NEWTIME
> so the permissions are the same.
>
>>
>> From close(2):
>> EBADF fd isn't a valid open file descriptor.
>>
>> OK, this one can't happen with the current code.
>> Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
>> It's a no-op (although it will still unshare if the flag is set).
>> But souldn't it fail with EBADF?
>
> CLOSE_RANGE_UNSHARE should always give you a private file descriptor
> table independent of whether or not any file descriptors need to be
> closed. That's also how we documented the flag:
>
> /* Unshare the file descriptor table before closing file descriptors. */
> #define CLOSE_RANGE_UNSHARE (1U << 1)
>
> A caller calling unshare(CLONE_FILES) and then an emulated close_range()
> or the proper close_range() syscall wants to make sure that all unwanted
> file descriptors are closed (if any) and that no new file descriptors
> can be injected afterwards. If you skip the unshare(CLONE_FILES) because
> there are no fds to be closed you open up a race window. It would also
> be annoying for userspace if they _may_ have received a private file
> descriptor table but only if any fds needed to be closed.
>
> If people really were extremely keen about skipping the unshare when no
> fd needs to be closed then this could become a new flag. But I really
> don't think that's necessary and also doesn't make a lot of sense, imho.
>
>>
>> EINTR The close() call was interrupted by a signal; see sig‐
>> nal(7).
>>
>> EIO An I/O error occurred.
>>
>> ENOSPC, EDQUOT
>> On NFS, these errors are not normally reported against
>> the first write which exceeds the available storage
>> space, but instead against a subsequent write(2),
>> fsync(2), or close().
>
> None of these will be seen by userspace because close_range() currently
> ignores all errors after it has begun closing files.
>
> Christian
>