[GIT PULL] SCSI fixes for 4.6-rc2

From: James Bottomley
Date: Sat Jun 25 2016 - 15:15:17 EST

Next message: Peter Zijlstra: "Re: [PATCH] locking/osq: Drop the overload of osq lock"
Previous message: Peter Zijlstra: "Re: [PATCH] locking/osq: Drop the overload of osq lock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Two straightforward fixes. One is a concurrency issue only affecting
SAS connected SATA drives, but which could hang the storage subsystem
if it triggers (because the outstanding command count on error never
goes back to zero) and the other is a SCSI_NO_TAG fallout from the
switch to host wide tags which causes the system to crash on module
insertion (we've checked carefully and only the 53c700 family of
drivers is vulnerable to this issue).

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

James Bottomley (1):
53c700: fix BUG on untagged commands

Wei Fang (1):
scsi: fix race between simultaneous decrements of ->host_failed

With the diffstat:

Documentation/scsi/scsi_eh.txt | 8 ++++++--
drivers/ata/libata-eh.c | 2 +-
drivers/scsi/53c700.c | 4 ++--
drivers/scsi/scsi_error.c | 4 +++-
4 files changed, 12 insertions(+), 6 deletions(-)

and full diff below.

James

---

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 8638f61..37eca00 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -263,19 +263,23 @@ scmd->allowed.

3. scmd recovered
ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
- - shost->host_failed--
- clear scmd->eh_eflags
- scsi_setup_cmd_retry()
- move from local eh_work_q to local eh_done_q
LOCKING: none
+ CONCURRENCY: at most one thread per separate eh_work_q to
+ keep queue manipulation lockless

4. EH completes
ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper
- layer of failure.
+ layer of failure. May be called concurrently but must have
+ a no more than one thread per separate eh_work_q to
+ manipulate the queue locklessly
- scmd is removed from eh_done_q and scmd->eh_entry is cleared
- if retry is necessary, scmd is requeued using
scsi_queue_insert()
- otherwise, scsi_finish_command() is invoked for scmd
+ - zero shost->host_failed
LOCKING: queue or finish function performs appropriate locking

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 61dc7a9..c6f0174 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -606,7 +606,7 @@ void ata_scsi_error(struct Scsi_Host *host)
ata_scsi_port_error_handler(host, ap);

/* finish or retry handled scmd's and clean up */
- WARN_ON(host->host_failed || !list_empty(&eh_work_q));
+ WARN_ON(!list_empty(&eh_work_q));

DPRINTK("EXIT\n");
}
diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index d4c2856..3ddc85e 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -1122,7 +1122,7 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
} else {
struct scsi_cmnd *SCp;

- SCp = scsi_host_find_tag(SDp->host, SCSI_NO_TAG);
+ SCp = SDp->current_cmnd;
if(unlikely(SCp == NULL)) {
sdev_printk(KERN_ERR, SDp,
"no saved request for untagged cmd\n");
@@ -1826,7 +1826,7 @@ NCR_700_queuecommand_lck(struct scsi_cmnd *SCp, void (*done)(struct scsi_cmnd *)
slot->tag, slot);
} else {
slot->tag = SCSI_NO_TAG;
- /* must populate current_cmnd for scsi_host_find_tag to work */
+ /* save current command for reselection */
SCp->device->current_cmnd = SCp;
}
/* sanity check: some of the commands generated by the mid-layer
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index a8b610e..106a6ad 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1128,7 +1128,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
*/
void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
{
- scmd->device->host->host_failed--;
scmd->eh_eflags = 0;
list_move_tail(&scmd->eh_entry, done_q);
}
@@ -2227,6 +2226,9 @@ int scsi_error_handler(void *data)
else
scsi_unjam_host(shost);

+ /* All scmds have been handled */
+ shost->host_failed = 0;
+
/*
* Note - if the above fails completely, the action is to take
* individual devices offline and flush the queue of any

Next message: Peter Zijlstra: "Re: [PATCH] locking/osq: Drop the overload of osq lock"
Previous message: Peter Zijlstra: "Re: [PATCH] locking/osq: Drop the overload of osq lock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]