Re: Please revert a91a2785b20

From: Mike Snitzer
Date: Mon Mar 28 2011 - 19:04:29 EST


On Mon, Mar 28 2011 at 6:43pm -0400,
Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

> Forgot to cc Jens and fixed up the borked mail address of Mike which
> I took from the changelog :(
>
> On Tue, 29 Mar 2011, Thomas Gleixner wrote:
>
> > Out of the blue all my perfectly fine working test machines which use
> > RAID stopped working with the very helpful error message:
> >
> > md/raid1:md1: active with 2 out of 2 mirrors
> > md: pers->run() failed ...
> >
> > Reverting a91a2785b20 fixes the problem.
> >
> > Neither the subject line:
> >
> > block: Require subsystems to explicitly allocate bio_set integrity mempool
> >
> > nor the changelog have any hint why that wreckage is in any way
> > sensible.
> >
> > The wreckage happens due to:
> >
> > - md_integrity_register(mddev);
> > - return 0;
> > + return md_integrity_register(mddev);

Right, a kernel.org BZ was filed on this here:
https://bugzilla.kernel.org/show_bug.cgi?id=32062

Martin is working to "conjure up a patch" to fix the common case where
no devices in the MD have DIF/DIX capabilities.

> > But the changelog does not give the courtesy of explaining these
> > changes. Also there is no fcking reason why the kernel cannot deal
> > with the missing integrity capabilities of a drive just by emitting a
> > warning msg and dealing gracefully with the outcome.
> >
> > All my RAID setups have been working perfectly fine until now, so
> > what's the rationale to break this?
> >
> > Did anyone test this shite on a non enterprise class hardware with a
> > distro default config ? Obviously _NOT_.

Seems not. I didn't even look at the MD changes when I ack'd the DM
changes. But I clearly stated as much and also cc'd neilb:
http://www.redhat.com/archives/dm-devel/2011-March/msg00066.html
and again for the final version that got committed:
http://www.redhat.com/archives/dm-devel/2011-March/msg00098.html

I should've just looked at the MD changes too! As Neil would say, that
damn DM/MD walled garden... luckily that wall is slowly crumbling!

> > FYI, the config files of those machines are based off a fedora default
> > config, so this would hit all raid users based on popular distro
> > configs sooner than later.
> >
> > Thanks for stealing my time,

Sorry this screwed you, the following patch is the stop-gap I suggested
in the kernel.org BZ (it just reverts the MD error propagation, keeping
the good aspects of Martin's commit):

---
drivers/md/linear.c | 3 ++-
drivers/md/multipath.c | 7 ++-----
drivers/md/raid0.c | 3 ++-
drivers/md/raid1.c | 5 +++--
drivers/md/raid10.c | 7 ++-----
5 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index abfb59a..338804f8 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -210,7 +210,8 @@ static int linear_run (mddev_t *mddev)
blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec);
mddev->queue->backing_dev_info.congested_fn = linear_congested;
mddev->queue->backing_dev_info.congested_data = mddev;
- return md_integrity_register(mddev);
+ md_integrity_register(mddev);
+ return 0;
}

static void free_conf(struct rcu_head *head)
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index c358909..5e694b1 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -315,7 +315,7 @@ static int multipath_remove_disk(mddev_t *mddev, int number)
p->rdev = rdev;
goto abort;
}
- err = md_integrity_register(mddev);
+ md_integrity_register(mddev);
}
abort:

@@ -489,10 +489,7 @@ static int multipath_run (mddev_t *mddev)

mddev->queue->backing_dev_info.congested_fn = multipath_congested;
mddev->queue->backing_dev_info.congested_data = mddev;
-
- if (md_integrity_register(mddev))
- goto out_free_conf;
-
+ md_integrity_register(mddev);
return 0;

out_free_conf:
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index e86bf36..95916fd 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -379,7 +379,8 @@ static int raid0_run(mddev_t *mddev)

blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
dump_zones(mddev);
- return md_integrity_register(mddev);
+ md_integrity_register(mddev);
+ return 0;
}

static int raid0_stop(mddev_t *mddev)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c2a21ae..8f34ad5 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1132,7 +1132,7 @@ static int raid1_remove_disk(mddev_t *mddev, int number)
p->rdev = rdev;
goto abort;
}
- err = md_integrity_register(mddev);
+ md_integrity_register(mddev);
}
abort:

@@ -2017,7 +2017,8 @@ static int run(mddev_t *mddev)

mddev->queue->backing_dev_info.congested_fn = raid1_congested;
mddev->queue->backing_dev_info.congested_data = mddev;
- return md_integrity_register(mddev);
+ md_integrity_register(mddev);
+ return 0;
}

static int stop(mddev_t *mddev)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f7b6237..c0d0f5f 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1188,7 +1188,7 @@ static int raid10_remove_disk(mddev_t *mddev, int number)
p->rdev = rdev;
goto abort;
}
- err = md_integrity_register(mddev);
+ md_integrity_register(mddev);
}
abort:

@@ -2343,10 +2343,7 @@ static int run(mddev_t *mddev)

if (conf->near_copies < conf->raid_disks)
blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec);
-
- if (md_integrity_register(mddev))
- goto out_free_conf;
-
+ md_integrity_register(mddev);
return 0;

out_free_conf:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/