Re: [Stable-review] [93/93] dm mpath: fix stall when requeueing io

From: Greg KH
Date: Tue Feb 23 2010 - 10:34:39 EST


On Sun, Feb 21, 2010 at 05:07:25PM +0100, Stefan Bader wrote:
> Greg KH wrote:
> > 2.6.32-stable review patch. If anyone has any objections, please let us know.
> >
> > ------------------
> >
> > From: Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx>
> >
> > commit 9eef87da2a8ea4920e0d913ff977cac064b68ee0 upstream.
> >
> > This patch fixes the problem that system may stall if target's ->map_rq
> > returns DM_MAPIO_REQUEUE in map_request().
> > E.g. stall happens on 1 CPU box when a dm-mpath device with queue_if_no_path
> > bounces between all-paths-down and paths-up on I/O load.
> >
> > When target's ->map_rq returns DM_MAPIO_REQUEUE, map_request() requeues
> > the request and returns to dm_request_fn(). Then, dm_request_fn()
> > doesn't exit the I/O dispatching loop and continues processing
> > the requeued request again.
> > This map and requeue loop can be done with interrupt disabled,
> > so 1 CPU system can be stalled if this situation happens.
> >
> > For example, commands below can stall my 1 CPU box within 1 minute or so:
> > # dmsetup table mp
> > mp: 0 2097152 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 2 8:144 1 1
> > # while true; do dd if=/dev/mapper/mp of=/dev/null bs=1M count=100; done &
> > # while true; do \
> > > dmsetup message mp 0 "fail_path 8:144" \
> > > dmsetup suspend --noflush mp \
> > > dmsetup resume mp \
> > > dmsetup message mp 0 "reinstate_path 8:144" \
> > > done
> >
> > To fix the problem above, this patch changes dm_request_fn() to exit
> > the I/O dispatching loop once if a request is requeued in map_request().
> >
> > Signed-off-by: Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx>
> > Signed-off-by: Jun'ichi Nomura <j-nomura@xxxxxxxxxxxxx>
> > Signed-off-by: Alasdair G Kergon <agk@xxxxxxxxxx>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxx>
> >
> > ---
> > drivers/md/dm.c | 21 ++++++++++++++++-----
> > 1 file changed, 16 insertions(+), 5 deletions(-)
> >
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1487,11 +1487,15 @@ static int dm_prep_fn(struct request_que
> > return BLKPREP_OK;
> > }
> >
> > -static void map_request(struct dm_target *ti, struct request *rq,
> > - struct mapped_device *md)
> > +/*
> > + * Returns:
> > + * 0 : the request has been processed (not requeued)
> > + * !0 : the request has been requeued
> > + */
> > +static int map_request(struct dm_target *ti, struct request *clone,
> > + struct mapped_device *md)
> > {
> > - int r;
> > - struct request *clone = rq->special;
>
> This change requires the argument to this function to be a rq->special
> pointer. This is changed in the map_request function by the following
> patch:
>
> commit b4324feeae304ae39e631a254d238a7d63be004b
> Author: Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx>
> Date: Thu Dec 10 23:52:16 2009 +0000
>
> dm: use md pending for in flight IO counting
>
> > + int r, requeued = 0;
> > struct dm_rq_target_io *tio = clone->end_io_data;
> >
> > /*
> > @@ -1516,6 +1520,7 @@ static void map_request(struct dm_target
> > case DM_MAPIO_REQUEUE:
> > /* The target wants to requeue the I/O */
> > dm_requeue_unmapped_request(clone);
> > + requeued = 1;
> > break;
> > default:
> > if (r > 0) {
> > @@ -1527,6 +1532,8 @@ static void map_request(struct dm_target
> > dm_kill_unmapped_request(clone, r);
> > break;
> > }
> > +
> > + return requeued;
> > }
> >
> > /*
> > @@ -1568,12 +1575,16 @@ static void dm_request_fn(struct request
> >
> > blk_start_request(rq);
> > spin_unlock(q->queue_lock);
> > - map_request(ti, rq, md);
> > + if (map_request(ti, rq, md))
> > + goto requeued;
> > spin_lock_irq(q->queue_lock);
> > }
>
> That is the current state of dm_request_function:
>
> clone = rq->special;
> atomic_inc(&md->pending[rq_data_dir(clone)]);
>
> spin_unlock(q->queue_lock);
> if (map_request(ti, clone, md))
>
> While looking over the code I also noticed that the spinlock is dropped with
> spin_unlock and then reacquired with spin_lock_irq. Isn't the irq version too
> much in that case? Or was the intention to have interrupts enabled when unlocking?

Ick, thanks for the review. I'll drop this patch for now and fix it up
in the next release if needed.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/