Re: [bisected] RAID1 direct IO redirecting sector loop since 6.0

From: Dmitrii Tcvetkov
Date: Tue Nov 01 2022 - 16:51:50 EST


On Tue, 1 Nov 2022 11:22:21 -0600
Keith Busch <kbusch@xxxxxxxxxx> wrote:

> On Tue, Nov 01, 2022 at 12:15:58AM +0300, Dmitrii Tcvetkov wrote:
> >
> > # cat /proc/7906/stack
> > [<0>] submit_bio_wait+0xdb/0x140
> > [<0>] blkdev_direct_IO+0x62f/0x770
> > [<0>] blkdev_read_iter+0xc1/0x140
> > [<0>] vfs_read+0x34e/0x3c0
> > [<0>] __x64_sys_pread64+0x74/0xc0
> > [<0>] do_syscall_64+0x6a/0x90
> > [<0>] entry_SYSCALL_64_after_hwframe+0x4b/0xb5
> >
> > After "mdadm --fail" invocation the last line becomes:
> > [pid 7906] pread64(13, 0x627c34c8d200, 4096, 0) = -1 EIO
> > (Input/output error)
>
> It looks like something isn't accounting for the IO size correctly
> when there's an offset. It may be something specific to one of the
> stacking drivers in your block setup. Does this still happen without
> the cryptosetup step?
>
I created setup lvm(mdraid(gpt(HDD))):

# lsblk -t -a
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
...
sdd 0 512 0 512 512 1 bfq 64 128 0B
├─sdd3 0 512 0 512 512 1 bfq 64 128 0B
│ └─md1 0 512 0 512 512 1 128 128 0B
│ ├─512lvmraid-zfs 0 512 0 512 512 1 128 128 0B
│ └─512lvmraid-wrk 0 512 0 512 512 1 128 128 0B
sde 0 512 0 512 512 1 bfq 64 128 0B
├─sde3 0 512 0 512 512 1 bfq 64 128 0B
│ └─md1 0 512 0 512 512 1 128 128 0B
│ ├─512lvmraid-zfs 0 512 0 512 512 1 128 128 0B
│ └─512lvmraid-wrk 0 512 0 512 512 1 128 128 0B

where:
# mdadm --create --level=1 --metadata=1.2 \
--raid-devices=2 /dev/md1 /dev/sdd3 /dev/sde3
# pvcreate /dev/md1
# vgcreate 512lvmraid /dev/md2

In this case problem doesn't reproduce, both guests start successfully.

It also doesn't reproduce with 4096 sector loop:
# lsblk -t -a
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
loop0 0 4096 0 4096 4096 0 none 128 128 0B
└─md2 0 4096 0 4096 4096 0 128 128 0B
├─4096lvmraid-zfs 0 4096 0 4096 4096 0 128 128 0B
└─4096lvmraid-wrk 0 4096 0 4096 4096 0 128 128 0B
loop1 0 4096 0 4096 4096 0 none 128 128 0B
└─md2 0 4096 0 4096 4096 0 128 128 0B
├─4096lvmraid-zfs 0 4096 0 4096 4096 0 128 128 0B
└─4096lvmraid-wrk 0 4096 0 4096 4096 0 128 128 0B

where:
# losetup --sector-size 4096 -f /dev/sdd4
# losetup --sector-size 4096 -f /dev/sde4
# mdadm --create --level=1 --metadata=1.2 \
--raid-devices=2 /dev/md2 /dev/loop0 /dev/loop1
# pvcreate /dev/md2
# vgcreate 4096lvmraid /dev/md2

Indeed then something is wrong in LUKS.

> For a different experiment, it may be safer to just force all
> alignment for stacking drivers. Could you try the following and see
> if that gets it working again?
>
> ---
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 8bb9eef5310e..5c16fdb00c6f 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -646,6 +646,7 @@ int blk_stack_limits(struct queue_limits *t,
> struct queue_limits *b, t->misaligned = 1;
> ret = -1;
> }
> + blk_queue_dma_alignment(t, t->logical_block_size - 1);
>
> t->max_sectors = blk_round_down_sectors(t->max_sectors,
> t->logical_block_size); t->max_hw_sectors =
> blk_round_down_sectors(t->max_hw_sectors, t->logical_block_size); --

This doesn't compile:
CC block/blk-settings.o
block/blk-settings.c: In function ‘blk_stack_limits’:
block/blk-settings.c:649:33: error: passing argument 1 of ‘blk_queue_dma_alignment’ from incompatible pointer type [-Werror=incompatible-pointer-types]
649 | blk_queue_dma_alignment(t, t->logical_block_size - 1);
| ^
| |
| struct queue_limits *
In file included from block/blk-settings.c:9:
./include/linux/blkdev.h:956:37: note: expected ‘struct request_queue *’ but argument is of type ‘struct queue_limits *’
956 | extern void blk_queue_dma_alignment(struct request_queue *, int);

I didn't find obvious way to get a request_queue pointer, which corresponds to struct queue_limits *t.