Re: [PATCH -next v2] md: synchronize flush io with array reconfiguration

From: Song Liu
Date: Mon Nov 27 2023 - 18:33:17 EST


On Mon, Nov 27, 2023 at 2:16 PM Song Liu <song@xxxxxxxxxx> wrote:
>
> On Fri, Nov 24, 2023 at 10:54 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >
> > From: Yu Kuai <yukuai3@xxxxxxxxxx>
> >
> > Currently rcu is used to protect iterating rdev from submit_flushes():
> >
> > submit_flushes remove_and_add_spares
> > synchronize_rcu
> > pers->hot_remove_disk()
> > rcu_read_lock()
> > rdev_for_each_rcu
> > if (rdev->raid_disk >= 0)
> > rdev->radi_disk = -1;
> > atomic_inc(&rdev->nr_pending)
> > rcu_read_unlock()
> > bi = bio_alloc_bioset()
> > bi->bi_end_io = md_end_flush
> > bi->private = rdev
> > submit_bio
> > // issue io for removed rdev
> >
> > Fix this problem by grabbing 'acive_io' before iterating rdev, make sure
> > that remove_and_add_spares() won't concurrent with submit_flushes().
> >
> > Fixes: a2826aa92e2e ("md: support barrier requests on all personalities.")
> > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
> > ---
> > Changes v2:
> > - Add WARN_ON in case md_flush_request() is not called from
> > md_handle_request() in future.
> >
> > drivers/md/md.c | 22 ++++++++++++++++------
> > 1 file changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 86efc9c2ae56..2ffedc39edd6 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -538,6 +538,9 @@ static void md_end_flush(struct bio *bio)
> > rdev_dec_pending(rdev, mddev);
> >
> > if (atomic_dec_and_test(&mddev->flush_pending)) {
> > + /* The pair is percpu_ref_tryget() from md_flush_request() */
> > + percpu_ref_put(&mddev->active_io);
> > +
> > /* The pre-request flush has finished */
> > queue_work(md_wq, &mddev->flush_work);
> > }
> > @@ -557,12 +560,8 @@ static void submit_flushes(struct work_struct *ws)
> > rdev_for_each_rcu(rdev, mddev)
> > if (rdev->raid_disk >= 0 &&
> > !test_bit(Faulty, &rdev->flags)) {
> > - /* Take two references, one is dropped
> > - * when request finishes, one after
> > - * we reclaim rcu_read_lock
> > - */
> > struct bio *bi;
> > - atomic_inc(&rdev->nr_pending);
> > +
> > atomic_inc(&rdev->nr_pending);
> > rcu_read_unlock();
> > bi = bio_alloc_bioset(rdev->bdev, 0,
> > @@ -573,7 +572,6 @@ static void submit_flushes(struct work_struct *ws)
> > atomic_inc(&mddev->flush_pending);
> > submit_bio(bi);
> > rcu_read_lock();
> > - rdev_dec_pending(rdev, mddev);
> > }
> > rcu_read_unlock();
> > if (atomic_dec_and_test(&mddev->flush_pending))
> > @@ -626,6 +624,18 @@ bool md_flush_request(struct mddev *mddev, struct bio *bio)
> > /* new request after previous flush is completed */
> > if (ktime_after(req_start, mddev->prev_flush_start)) {
> > WARN_ON(mddev->flush_bio);
> > + /*
> > + * Grab a reference to make sure mddev_suspend() will wait for
> > + * this flush to be done.
> > + *
> > + * md_flush_reqeust() is called under md_handle_request() and
> > + * 'active_io' is already grabbed, hence percpu_ref_tryget()
> > + * won't fail, percpu_ref_tryget_live() can't be used because
> > + * percpu_ref_kill() can be called by mddev_suspend()
> > + * concurrently.
> > + */
> > + if (WARN_ON(percpu_ref_tryget(&mddev->active_io)))
>
> This should be "if (!WARN_ON(..))", right?
>
> Song
>
> > + percpu_ref_get(&mddev->active_io);

Actually, we can just use percpu_ref_get(), no?

Thanks,
Song