Re: [PATCH v2 08/16] block: Limit atomic write IO size according to atomic_write_max_sectors

From: John Garry
Date: Fri Dec 15 2023 - 08:56:32 EST


On 15/12/2023 02:27, Ming Lei wrote:
On Tue, Dec 12, 2023 at 11:08:36AM +0000, John Garry wrote:
Currently an IO size is limited to the request_queue limits max_sectors.
Limit the size for an atomic write to queue limit atomic_write_max_sectors
value.

Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
---
block/blk-merge.c | 12 +++++++++++-
block/blk.h | 3 +++
2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 0ccc251e22ff..8d4de9253fe9 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio,
{
unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
- unsigned max_sectors = lim->max_sectors, start, end;
+ unsigned max_sectors, start, end;
+
+ /*
+ * We ignore lim->max_sectors for atomic writes simply because
+ * it may less than bio->write_atomic_unit, which we cannot
+ * tolerate.
+ */
+ if (bio->bi_opf & REQ_ATOMIC)
+ max_sectors = lim->atomic_write_max_sectors;
+ else
+ max_sectors = lim->max_sectors;

Please note that I mentioned this issue in the cover letter, so you can see some discussion there (if missed).


I can understand the trouble for write atomic from bio split, which
may simply split in the max_sectors boundary, however this change is
still too fragile:

1) ->max_sectors may be set from userspace
- so this change simply override userspace setting

yes


2) otherwise ->max_sectors is same with ->max_hw_sectors:

- then something must be wrong in device side or driver side because
->write_atomic_unit conflicts with ->max_hw_sectors, which is supposed
to be figured out before device is setup


Right, so I think that it is proper to limit atomic_write_unit_max et al to max_hw_sectors or whatever DMA engine device limits. I can make that change.

3) too big max_sectors may break driver or device, such as nvme-pci
aligns max_hw_sectors with DMA optimized mapping size

And there might be more(better) choices:

1) make sure atomic write limit is respected when userspace updates
->max_sectors

My mind has been changed and I would say no and we can treat atomic_write_unit_max as special. Indeed, max_sectors and atomic_write_unit_max are complementary. If someone sets max_sectors to 4KB and then tries an atomic write of 16KB then they don't know what they are doing.

My original idea was to dynamically limit atomic_unit_unit_max et al to max_sectors (so that max_sectors is always respected for atomic writes).

As an alternative, how about we keep the value of atomic_unit_unit_max static, but reject an atomic write if it exceeds max_sectors? It's not too different than dynamically limiting atomic_unit_unit_max. But as mentioned, it may be asking for trouble....


2) when driver finds that atomic write limits conflict with other
existed hardware limits, fail or solve(such as reduce write atomic unit) the
conflict before queue is started; With single write atomic limits update API,
the conflict can be figured out earlier by block layer too.

I think that we can do this, but I am not sure what other queue limits may conflict (apart from max_sectors / max_sectors_hw). There is max segment size, but we just rely on a single PAGE per iovec to evaluate atomic_unit_unit_max, so should not be an issue.

Thanks,
John