Re: [PATCH 1/5] Drivers: scsi: storvsc: Make the scsi timeout amodule parameter

From: James Bottomley
Date: Mon Jun 03 2013 - 19:47:36 EST


On Mon, 2013-06-03 at 23:25 +0000, KY Srinivasan wrote:
>
> > -----Original Message-----
> > From: James Bottomley [mailto:jbottomley@xxxxxxxxxxxxx]
> > Sent: Monday, June 03, 2013 7:03 PM
> > To: KY Srinivasan
> > Cc: gregkh@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > devel@xxxxxxxxxxxxxxxxxxxxxx; ohering@xxxxxxxx; hch@xxxxxxxxxxxxx; linux-
> > scsi@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH 1/5] Drivers: scsi: storvsc: Make the scsi timeout a module
> > parameter
> >
> > On Mon, 2013-06-03 at 16:21 -0700, K. Y. Srinivasan wrote:
> > > The standard scsi timeout is not appropriate in some of the environments
> > where
> > > Hyper-V is deployed. Set this timeout appropriately for all devices managed
> > > by this driver. Further make this a module parameter.
> > >
> > > Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> > > Reviewed-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > > ---
> > > drivers/scsi/storvsc_drv.c | 9 +++++++++
> > > 1 files changed, 9 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > index 16a3a0c..8d29a95 100644
> > > --- a/drivers/scsi/storvsc_drv.c
> > > +++ b/drivers/scsi/storvsc_drv.c
> > > @@ -221,6 +221,13 @@ static int storvsc_ringbuffer_size = (20 * PAGE_SIZE);
> > > module_param(storvsc_ringbuffer_size, int, S_IRUGO);
> > > MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size (bytes)");
> > >
> > > +/*
> > > + * Timeout in seconds for all devices managed by this driver.
> > > + */
> > > +static int storvsc_timeout = 180;
> > > +module_param(storvsc_timeout, uint, (S_IRUGO | S_IWUSR));
> > > +MODULE_PARM_DESC(storvsc_timeout, "Device timeout (seconds)");
> > > +
> > > #define STORVSC_MAX_IO_REQUESTS 128
> > >
> > > /*
> > > @@ -1204,6 +1211,8 @@ static int storvsc_device_configure(struct scsi_device
> > *sdevice)
> > >
> > > blk_queue_bounce_limit(sdevice->request_queue, BLK_BOUNCE_ANY);
> > >
> > > + blk_queue_rq_timeout(sdevice->request_queue, (storvsc_timeout *
> > HZ));
> >
> > Why does this need to be a module parameter? It's already a sysfs one
> > in the scsi_device class? Three minutes is also a bit large. The
> > default is 30s with huge cache arrays recommending upping this to
> > 60s ... you're three times this.
>
> James,
> This number was arrived at based on some testing that was done on the
> cloud. On our cloud, we have a 120 second
> timeouts that trigger broader VM level recovery and in cases where
> there is storage access issues
> (which is when we would hit this timeout), it will be better to defer
> to the fabric level recovery than attempt
> Scsi level recovery/retry. The default value chosen for devices
> managed by storvsc should be just fine,

So are you sure you want to set the command timeout to 3 minutes? ...
it's an incredibly high value. The actual complete timeout is this
value multiplied by the number of retries, which is 5 for disk devices,
so you'll be waiting up to 15 minutes before we signal a failure in some
circumstances. It sounds like you want the actual path length of error
recovery to be on average 3 minutes.

The value of the timeout should be a compromise between the longest time
you want the user to wait for a failure and the longest time a device
should take to respond.

> I made it a module parameter to have more flexibility.

It's *already* a sysfs parameter ... why do you want an additional
module parameter? Multiple parameters for the same quantity, especially
ones which can't be altered at runtime like module parameters, end up
confusing users.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/