atomic_write_unit_max is largest application block size which we canWhy are these different? If the hardware supports 128kB atomic
support, while atomic_write_max_bytes is the max size of an atomic operation
which the HW supports.
writes, why limit applications to something smaller?
From your review on the iomap patch, I assume that now you realise that weI still don't get it - you haven't explained why/what an application
are proposing a write which may include multiple application data blocks
(each limited in size to atomic_write_unit_max), and the limit in total size
of that write is atomic_write_max_bytes.
atomic block write might be, nor why the block device should be
determining the size of application data blocks, etc. If the block
device can do 128kB atomic writes, why wouldn't the device allow the
application to do 128kB atomic writes if they've aligned the atomic
write correctly?
What happens we we get hardware that can do atomic writes at any
alignment, of any size up to atomic_write_max_bytes? Because this
interface defines atomic writes as "must be a multiple of 2 of
atomic_write_unit_min" then hardware that can do atomic writes of
any size can not be effectively utilised by this interface....
user applications should only pay attention to what we return from statx,If applications can issue an multi-atomic_write_unit_max-block
that being atomic_write_unit_min and atomic_write_unit_max.
atomic_write_max_bytes and atomic_write_boundary is only relevant to the
block layer.
writes as a single, non-atomic, multi-bio RWF_ATOMIC pwritev2() IO
and such IO is constrainted to atomic_write_max_bytes, then
atomic_write_max_bytes is most definitely relevant to user
applications.
Applications will greatly care if their atomic IO gets split intoAs above, this is not relevant to the user.+What: /sys/block/<disk>/atomic_write_boundaryHow are users/filesystems supposed to use this?
+Date: May 2023
+Contact: Himanshu Madhani<himanshu.madhani@xxxxxxxxxx>
+Description:
+ [RO] A device may need to internally split I/Os which
+ straddle a given logical block address boundary. In that
+ case a single atomic write operation will be processed as
+ one of more sub-operations which each complete atomically.
+ This parameter specifies the size in bytes of the atomic
+ boundary if one is reported by the device. This value must
+ be a power-of-two.
multiple IOs whose persistence order is undefined.
I think it also
matters for filesystems when it comes to allocation, because we are
going to have to be very careful not to have extents straddle ranges
that will cause an atomic write to be split.
e.g. how does this work with striped devices? e.g. we have a stripe
unit of 16kB, but the devices support atomic_write_unit_max = 32kB.
Instantly, we have a configuration where atomic writes need to be
split at 16kB boundaries, and so the maximum atomic write size that
can be supported is actually 16kB - the stripe unit of RAID device.
This means the filesystem must, at minimum, align all allocations
for atomic IO to 16kB stripe unit alignment, and must not allow
atomic IOs that are not stripe unit aligned or sized to proceed
because they can't be processed as an atomic IO....
Yes, That much is obvious. What I have no idea diea about is whatIt means that an atomic operation which straddles the atomic boundary is not/**I have no idea what "logical block address space which an atomic
@@ -183,6 +186,59 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
}
EXPORT_SYMBOL(blk_queue_max_discard_sectors);
+/**
+ * blk_queue_atomic_write_max_bytes - set max bytes supported by
+ * the device for atomic write operations.
+ * @q: the request queue for the device
+ * @size: maximum bytes supported
+ */
+void blk_queue_atomic_write_max_bytes(struct request_queue *q,
+ unsigned int size)
+{
+ q->limits.atomic_write_max_bytes = size;
+}
+EXPORT_SYMBOL(blk_queue_atomic_write_max_bytes);
+
+/**
+ * blk_queue_atomic_write_boundary - Device's logical block address space
+ * which an atomic write should not cross.
write should not cross" means, especially as the unit is in bytes
and not in sectors (which are the units LBAs are expressed in).
guaranteed to be atomic by the device, so we should (must) not cross it to
maintain atomic behaviour for an application block. That's one reason that
we have all these size and alignment rules.
this means in practice. When is this ever going to be non-zero, and
what should be we doing at the filesystem allocation level when it
is non-zero to ensure that allocations for atomic writes never cross
such a boundary. i.e. how do we prevent applications from ever
needing this functionality to be triggered? i.e. so the filesystem
can guarantee a single RWF_ATOMIC user IO is actually dispatched
as a single REQ_ATOMIC IO....
...Why do we need specific varibles for DIO atomic write alignment
ok, we'll look to fix this up to give a coherent and clear interface.+static inline unsigned int queue_atomic_write_unit_max(const struct request_queue *q)Ah, what? This undocumented interface reports "unit limits" in
+{
+ return q->limits.atomic_write_unit_max << SECTOR_SHIFT;
+}
+
+static inline unsigned int queue_atomic_write_unit_min(const struct request_queue *q)
+{
+ return q->limits.atomic_write_unit_min << SECTOR_SHIFT;
+}
bytes, but it's not using the physical device sector size to convert
between sector units and bytes. This really needs some more
documentation and work to make it present all units consistently and
not result in confusion when devices have 4kB sector sizes and not
512 byte sectors...
Also, I think all the byte ranges should support full 64 bit values,ok, we can do that but would also then make statx field 64b. I'm fine with
otherwise there will be silent overflows in converting 32 bit sector
counts to byte ranges. And, eventually, something will want to do
larger than 4GB atomic IOs
that if it is wise to do so - I don't don't want to wastefully use up an
extra 2 x 32b in struct statx.
limits?
We already have direct IO alignment and size constraints in statx(),
so why wouldn't we just reuse those variables when the user requests
atomic limits for DIO?
i.e. if STATX_DIOALIGN is set, we return normal DIO alignment
constraints. If STATX_DIOALIGN_ATOMIC is set, we return the atomic
DIO alignment requirements in those variables.....
Yes, we probably need the dio max size to be added to statx for
this. Historically speaking, I wanted statx to support this in the
first place because that's what we were already giving userspace
with XFS_IOC_DIOINFO and we already knew that atomic IO when it came
along would require a bound maximum IO size much smaller than normal
DIO limits. i.e.:
struct dioattr {
__u32 d_mem; /* data buffer memory alignment */
__u32 d_miniosz; /* min xfer size */
__u32 d_maxiosz; /* max xfer size */
};
where d_miniosz defined the alignment and size constraints for DIOs.
If we simply document that STATX_DIOALIGN_ATOMIC returns minimum
(unit) atomic IO size and alignment in statx->dio_offset_align (as
per STATX_DIOALIGN) and the maximum atomic IO size in
statx->dio_max_iosize, then we don't burn up anywhere near as much
space in the statx structure....