Re: [PATCH 4/6] fs: xfs: Support atomic write for statx

From: Ojaswin Mujoo
Date: Fri Feb 09 2024 - 02:01:40 EST


Hi John,

Thanks for the patch, I've added some review comments and questions
below.

On Wed, Jan 24, 2024 at 02:26:43PM +0000, John Garry wrote:
> Support providing info on atomic write unit min and max for an inode.
>
> For simplicity, currently we limit the min at the FS block size, but a
> lower limit could be supported in future.
>
> The atomic write unit min and max is limited by the guaranteed extent
> alignment for the inode.
>
> Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
> ---
> fs/xfs/xfs_iops.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_iops.h | 4 ++++
> 2 files changed, 49 insertions(+)
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index a0d77f5f512e..0890d2f70f4d 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -546,6 +546,44 @@ xfs_stat_blksize(
> return PAGE_SIZE;
> }
>
> +void xfs_get_atomic_write_attr(
> + struct xfs_inode *ip,
> + unsigned int *unit_min,
> + unsigned int *unit_max)
> +{
> + xfs_extlen_t extsz = xfs_get_extsz(ip);
> + struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> + struct block_device *bdev = target->bt_bdev;
> + unsigned int awu_min, awu_max, align;
> + struct request_queue *q = bdev->bd_queue;
> + struct xfs_mount *mp = ip->i_mount;
> +
> + /*
> + * Convert to multiples of the BLOCKSIZE (as we support a minimum
> + * atomic write unit of BLOCKSIZE).
> + */
> + awu_min = queue_atomic_write_unit_min_bytes(q);
> + awu_max = queue_atomic_write_unit_max_bytes(q);
> +
> + awu_min &= ~mp->m_blockmask;
> + awu_max &= ~mp->m_blockmask;

I don't understand why we try to round down the awu_max to blocks size
here and not just have an explicit check of (awu_max < blocksize).

I think the issue with changing the awu_max is that we are using awu_max
to also indirectly reflect the alignment so as to ensure we don't cross
atomic boundaries set by the hw (eg we check uint_max % atomic alignment
== 0 in scsi). So once we change the awu_max, there's a chance that even
if an atomic write aligns to the new awu_max it still doesn't have the
right alignment and fails.

It works right now since eveything is power of 2 but it should cause
issues incase we decide to remove that limitation. Anyways, I think
this implicit behavior of things working since eveything is a power of 2
should atleast be documented in a comment, so these things are
immediately clear.

> +
> + align = XFS_FSB_TO_B(mp, extsz);
> +
> + if (!awu_max || !xfs_inode_atomicwrites(ip) || !align ||
> + !is_power_of_2(align)) {

Correct me if I'm wrong but here as well, the is_power_of_2(align) is
esentially checking if the align % uinit_max == 0 (or vice versa if
unit_max is greater) so that an allocation of extsize will always align
nicely as needed by the device.

So maybe we should use the % expression explicitly so that the intention
is immediately clear.

> + *unit_min = 0;
> + *unit_max = 0;
> + } else {
> + if (awu_min)
> + *unit_min = min(awu_min, align);

How will the min() here work? If awu_min is the minumum set by the
device, how can statx be allowed to advertise something smaller than
that?

If I understand correctly, right now the way we set awu_min in scsi and
nvme, the follwoing should usually be true for a sane device:

awu_min <= blocks size of fs <= align

so the min() anyways becomes redundant, but if we do assume that there
might be some weird devices with awu_min absurdly large (SCSI with
high atomic granularity) we still can't actually advertise a min
smaller than that of the device, or am I missing something here?

> + else
> + *unit_min = mp->m_sb.sb_blocksize;
> +
> + *unit_max = min(awu_max, align);
> + }
> +}
> +

Regards,
ojaswin