Re: [PATCH] scsi: core: Rate limit "rejecting I/O" messages

From: Joe Perches
Date: Wed Apr 08 2020 - 15:52:00 EST


On Wed, 2020-04-08 at 15:16 -0400, Ewan D. Milne wrote:
> On Wed, 2020-04-08 at 19:10 +0200, Daniel Wagner wrote:
> > Prevent excessive logging by rate limiting the "rejecting I/O"
> > messages. For example in setups where remote syslog is used the link
> > is saturated by those messages when a storage controller/disk
> > misbehaves.
> >
> > Cc: "James E.J. Bottomley" <jejb@xxxxxxxxxxxxx>
> > Cc: "Martin K. Petersen" <martin.petersen@xxxxxxxxxx>
> > Signed-off-by: Daniel Wagner <dwagner@xxxxxxx>
> > ---
> > drivers/scsi/scsi_lib.c | 4 ++--
> > include/scsi/scsi_device.h | 10 ++++++++++
> > 2 files changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 47835c4b4ee0..01c35c58c6f3 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1217,7 +1217,7 @@ scsi_prep_state_check(struct scsi_device *sdev,
> > struct request *req)
> > */
> > if (!sdev->offline_already) {
> > sdev->offline_already = true;
> > - sdev_printk(KERN_ERR, sdev,
> > + sdev_printk_ratelimited(KERN_ERR, sdev,
> > "rejecting I/O to offline
> > device\n");
>
> I would really prefer we not do it this way if at all possible.
> It loses information we may need to debug SAN outage problems.
>
> The reason I didn't use ratelimit is that the ratelimit structure is
> per-instance of the ratelimit call here, not per-device. So this
> doesn't work right -- it will drop messages for other devices.

Could add a ratelimit_state to struct scsi_device.

Something like:
---
drivers/scsi/scsi_scan.c | 2 ++
include/scsi/scsi_device.h | 2 ++
2 files changed, 4 insertions(+)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index f2437a..938c83f 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -279,6 +279,8 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
scsi_change_queue_depth(sdev, sdev->host->cmd_per_lun ?
sdev->host->cmd_per_lun : 1);

+ ratelimit_state_init(&sdev->rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
scsi_sysfs_device_initialize(sdev);

if (shost->hostt->slave_alloc) {
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index c3cba2..2600de7 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -8,6 +8,7 @@
#include <linux/blkdev.h>
#include <scsi/scsi.h>
#include <linux/atomic.h>
+#include <linux/ratelimit.h>

struct device;
struct request_queue;
@@ -233,6 +234,7 @@ struct scsi_device {
struct mutex state_mutex;
enum scsi_device_state sdev_state;
struct task_struct *quiesced_by;
+ struct ratelimit_state rs;
unsigned long sdev_data[];
} __attribute__((aligned(sizeof(unsigned long))));