Re: [PATCH 2/2] md/raid10: handle replacement devices in fix_recovery_read_error

From: Song Liu
Date: Fri Jul 07 2023 - 04:33:57 EST


On Tue, Jun 27, 2023 at 11:42 AM <linan666@xxxxxxxxxxxxxxx> wrote:
>
> From: Li Nan <linan122@xxxxxxxxxx>
>
> In fix_recovery_read_error(), the handling of replacement devices is
> missing. Add it. If io error is from replacement, error this device
> directly. If io error is from other device, just set badblocks for
> replacement.
>
> Signed-off-by: Li Nan <linan122@xxxxxxxxxx>
> ---
> drivers/md/raid10.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 5105273f60e9..6d9025089455 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2551,7 +2551,7 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>
> while (sectors) {
> int s = sectors;
> - struct md_rdev *rdev;
> + struct md_rdev *rdev, *repl;
> sector_t addr;
> int ok;
>
> @@ -2559,6 +2559,7 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
> s = PAGE_SIZE >> 9;
>
> rdev = conf->mirrors[dr].rdev;
> + repl = conf->mirrors[dw].replacement;
> addr = r10_bio->devs[0].addr + sect,
> ok = sync_page_io(rdev,
> addr,
> @@ -2580,6 +2581,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
> set_bit(MD_RECOVERY_NEEDED,
> &rdev->mddev->recovery);
> }
> + if (repl && !sync_page_io(repl, addr, s << 9,
> + pages[idx], REQ_OP_WRITE, false))
> + md_error(mddev, repl);
> }
> if (!ok) {
> /* We don't worry if we cannot set a bad block -
> @@ -2592,7 +2596,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
> /* need bad block on destination too */
> rdev = conf->mirrors[dw].rdev;
> addr = r10_bio->devs[1].addr + sect;
> - if (!rdev_set_badblocks(rdev, addr, s, 0)) {
> + if (!rdev_set_badblocks(rdev, addr, s, 0) ||
> + (repl &&
> + !rdev_set_badblocks(repl, addr, s, 0))) {

Do we really want this in the if () statement? Shall we always set
badblock on both rdev and repl?

Thanks,
Song

> /* just abort the recovery */
> pr_notice("md/raid10:%s: recovery aborted due to read error\n",
> mdname(mddev));
> --
> 2.39.2
>