[GIT PULL] open: fix O_DIRECTORY | O_CREAT

From: Christian Brauner
Date: Fri Apr 21 2023 - 10:03:22 EST


Hey Linus,

/* Summary */
EINVAL ist keinmal: This contains the changes to make O_DIRECTORY when
specified together with O_CREAT an invalid request.

The wider background is that a regression report about the behavior of
O_DIRECTORY | O_CREAT was sent to fsdevel about a behavior that was
changed multiple years and LTS releases earlier during v5.7 development.

On kernels prior to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL
had the following semantics:

open("/tmp/d", O_DIRECTORY | O_CREAT)
* d doesn't exist: create regular file
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: EISDIR

open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
* d doesn't exist: create regular file
* d exists and is a regular file: EEXIST
* d exists and is a directory: EEXIST

open("/tmp/d", O_DIRECTORY | O_EXCL)
* d doesn't exist: ENOENT
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: open directory

On kernels since to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL
have the following semantics:

open("/tmp/d", O_DIRECTORY | O_CREAT)
* d doesn't exist: ENOTDIR (create regular file)
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: EISDIR

open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
* d doesn't exist: ENOTDIR (create regular file)
* d exists and is a regular file: EEXIST
* d exists and is a directory: EEXIST

open("/tmp/d", O_DIRECTORY | O_EXCL)
* d doesn't exist: ENOENT
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: open directory

This is a fairly substantial semantic change that userspace didn't
notice until someone took the time to _deliberately_ figure out corner
cases. Since no one noticed this breakage it can somewhat safely be
assumed that O_DIRECTORY | O_CREAT combinations are likely unused.

The v5.7 breakage is especially weird because while ENOTDIR is returned
indicating failure a regular file is actually created. This doesn't make
a lot of sense.

Time was spent finding potential users of this combination. Searching on
codesearch.debian.net showed that codebases often express semantical
expectations about O_DIRECTORY | O_CREAT which are completely contrary
to what the code has done and currently does.

The expectation often is that this particular combination would create
and open a directory. This suggests users who tried to use that
combination would stumble upon the counterintuitive behavior no matter
if pre-v5.7 or post v5.7 and quickly realize neither semantics give them
what they want.

There are various ways to address this issue. The lazy/simple option
would be to restore the pre-v5.7 behavior and to just live with that bug
forever. But since there's a real chance that the O_DIRECTORY | O_CREAT
quirk isn't relied upon it was agreed to make this an invalid request
instead.

With this pull request, EINVAL is returned for O_DIRECTORY | O_CREAT
combinations. Now, the following semantics apply:

open("/tmp/d", O_DIRECTORY | O_CREAT)
* d doesn't exist: EINVAL
* d exists and is a regular file: EINVAL
* d exists and is a directory: EINVAL

open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
* d doesn't exist: EINVAL
* d exists and is a regular file: EINVAL
* d exists and is a directory: EINVAL

open("/tmp/d", O_DIRECTORY | O_EXCL)
* d doesn't exist: ENOENT
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: open directory

One additional note, O_TMPFILE is implemented as:

#define __O_TMPFILE 020000000
#define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
#define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)

For older kernels it was important to return an explicit error when
O_TMPFILE wasn't supported. So O_TMPFILE requires that O_DIRECTORY is
raised alongside __O_TMPFILE. It also enforced that O_CREAT wasn't
specified. Since O_DIRECTORY | O_CREAT could be used to create a regular
allowing that combination together with __O_TMPFILE would've meant that
false positives were possible, i.e., that a regular file was created
instead of a O_TMPFILE. This could've been used to trick userspace into
thinking it operated on a O_TMPFILE when it wasn't. Now that O_DIRECTORY
with O_CREAT is completely blocked the interaction and checks for
O_TMPFILE are simplified as well.

This has also been covered in

https://lwn.net/Articles/926782/

which should be publicly available by now. It provides an excellent
summary of the discussion.

/* Testing */
clang: Ubuntu clang version 15.0.6
gcc: (Ubuntu 12.2.0-3ubuntu1) 12.2.0

All patches are based on 6.3-rc3 and have been sitting in linux-next.
No build failures or warnings were observed. All old and new tests in
fstests, selftests, and LTP pass without regressions.

/* Conflicts */
At the time of creating this PR no merge conflicts were reported from
linux-next and no merge conflicts showed up doing a test-merge with
current mainline.

The following changes since commit e8d018dd0257f744ca50a729e3d042cf2ec9da65:

Linux 6.3-rc3 (2023-03-19 13:27:55 -0700)

are available in the Git repository at:

git@xxxxxxxxxxxxxxxxxxx:pub/scm/linux/kernel/git/vfs/vfs tags/v6.4/vfs.open

for you to fetch changes up to 43b450632676fb60e9faeddff285d9fac94a4f58:

open: return EINVAL for O_DIRECTORY | O_CREAT (2023-03-22 11:06:55 +0100)

Please consider pulling these changes from the signed v6.4/vfs.open tag.

Thanks!
Christian

----------------------------------------------------------------
v6.4/vfs.open

----------------------------------------------------------------
Christian Brauner (1):
open: return EINVAL for O_DIRECTORY | O_CREAT

fs/open.c | 18 +++++++++++++-----
include/uapi/asm-generic/fcntl.h | 1 -
tools/include/uapi/asm-generic/fcntl.h | 1 -
3 files changed, 13 insertions(+), 7 deletions(-)