[PATCH v3 04/15] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support

From: John Garry
Date: Wed Jan 24 2024 - 06:41:02 EST


From: Prasad Singamsetty <prasad.singamsetty@xxxxxxxxxx>

Userspace may add flag RWF_ATOMIC to pwritev2() to indicate that the
write is to be issued with torn write prevention, according to special
alignment and length rules.

Torn write prevention means that for a power or any other HW failure, all
or none of the data will be committed to storage, but never a mix of old
and new.

For any syscall interface utilizing struct iocb, add IOCB_ATOMIC for
iocb->ki_flags field to indicate the same.

A call to statx will give the relevant atomic write info:
- atomic_write_unit_min
- atomic_write_unit_max

Both values are a power-of-2.

Applications can avail of atomic write feature by ensuring that the total
length of a write is a power-of-2 in size and also sized between
atomic_write_unit_min and atomic_write_unit_max, inclusive. Applications
must ensure that the write is at a naturally-aligned offset in the file
wrt the total write length.

Add file mode flag FMODE_CAN_ATOMIC_WRITE, so files which do not have the
flag set will have RWF_ATOMIC rejected and not just ignored.

Signed-off-by: Prasad Singamsetty <prasad.singamsetty@xxxxxxxxxx>
Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
---
include/linux/fs.h | 9 +++++++++
include/uapi/linux/fs.h | 5 ++++-
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index c316c0a92fff..95a7e2889d2c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -119,6 +119,10 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
#define FMODE_PWRITE ((__force fmode_t)0x10)
/* File is opened for execution with sys_execve / sys_uselib */
#define FMODE_EXEC ((__force fmode_t)0x20)
+
+/* File supports atomic writes */
+#define FMODE_CAN_ATOMIC_WRITE ((__force fmode_t)0x40)
+
/* 32bit hashes as llseek() offset (for directories) */
#define FMODE_32BITHASH ((__force fmode_t)0x200)
/* 64bit hashes as llseek() offset (for directories) */
@@ -328,6 +332,7 @@ enum rw_hint {
#define IOCB_SYNC (__force int) RWF_SYNC
#define IOCB_NOWAIT (__force int) RWF_NOWAIT
#define IOCB_APPEND (__force int) RWF_APPEND
+#define IOCB_ATOMIC (__force int) RWF_ATOMIC

/* non-RWF related bits - start at 16 */
#define IOCB_EVENTFD (1 << 16)
@@ -3344,6 +3349,10 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
return -EOPNOTSUPP;
kiocb_flags |= IOCB_NOIO;
}
+ if (flags & RWF_ATOMIC) {
+ if (!(ki->ki_filp->f_mode & FMODE_CAN_ATOMIC_WRITE))
+ return -EOPNOTSUPP;
+ }
kiocb_flags |= (__force int) (flags & RWF_SUPPORTED);
if (flags & RWF_SYNC)
kiocb_flags |= IOCB_DSYNC;
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 48ad69f7722e..a0975ae81e64 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -301,9 +301,12 @@ typedef int __bitwise __kernel_rwf_t;
/* per-IO O_APPEND */
#define RWF_APPEND ((__force __kernel_rwf_t)0x00000010)

+/* Atomic Write */
+#define RWF_ATOMIC ((__force __kernel_rwf_t)0x00000040)
+
/* mask of flags supported by the kernel */
#define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
- RWF_APPEND)
+ RWF_APPEND | RWF_ATOMIC)

/* Pagemap ioctl */
#define PAGEMAP_SCAN _IOWR('f', 16, struct pm_scan_arg)
--
2.31.1