Re: [Regression] commit 8a4aeec8d(libata/ahci: accommodate tag ordered controllers)

From: Dan Williams
Date: Thu Jun 05 2014 - 22:57:16 EST


On Fri, 2014-06-06 at 09:47 +0800, Ming Lei wrote:
> Hi Tejun,
>
> On Thu, Jun 5, 2014 at 9:41 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> > Hello,
> >
> > (cc'ing ahci_xgene folks)
> >
> > On Thu, Jun 05, 2014 at 09:24:04PM +0800, Ming Lei wrote:
> >> On Thu, Jun 5, 2014 at 9:12 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> >> > On Thu, Jun 5, 2014 at 1:53 AM, Ming Lei <ming.lei@xxxxxxxxxxxxx> wrote:
> >> >> Hi Dan and Tejun,
> >> >>
> >> >> Looks the commit 8a4aeec8d(libata/ahci: accommodate tag ordered
> >> >> controllers) causes below sata failure[1] on APM AHCI controller.
> >> >>
> >> >> And the error does disappear after reverting the commit.
> >> >
> >> > Can you give some more details about the workload and the disk? What
> >> > is the APM AHCI?
> >>
> >> The commit causes the APM arm64 based system not bootable, and not
> >> special workload, just when mounting rootfs or reading init files.
> >>
> >> APM AHCI:
> >> drivers/ata/ahci_xgene.c
> >
> > Urgh... so, the controller can't handle tags allocated in round-robin
> > instead of lowest-first? Ming, can you please test with different
> > controllers / disks / ssds and see whether the problem stays with the
> > controller? Loc, can you guys please look into it so that at least
> > the next revision hardware can fix the issue?
>
> The problem has been observed on several apm arm64 boards
> with different disks.
>
> >
> > Provided that the tag allocation itself isn't broken, which isn't too
> > likely given the lack of bug reports on other controllers, we'll need
> > to blacklist ahci_xgene somehow. Either disable NCQ support on it or
> > implement an alternative lowest-first tag allocation for it. If this
> > actually is the controller freaking out about tags allocated in a
> > different order, I'm not sure how much confidence we can put in its
> > NCQ implementation tho and it might be a better idea to simply plug
> > NCQ support on it until we understand the details of the problem.
> >
> > So, something like the following? Ming, can you please verify whether
> > the following makes the machine boot again?
>
> Looks the problem still persists after applying your patch of disabling
> NCQ.

How about the following patch?

8<---------
libata: allow xgene-ahci to opt-out of tag ordered submission

From: Dan Williams <dan.j.williams@xxxxxxxxx>

Ming Lei reports:
"Looks the commit 8a4aeec8d(libata/ahci: accommodate tag ordered
controllers) causes below sata failure[1] on APM AHCI controller.

And the error does disappear after reverting the commit.

ata4.00: exception Emask 0x40 SAct 0xff00 SErr 0x800 action 0x6 frozen
ata4: SError: { HostInt }
ata4.00: failed command: READ FPDMA QUEUED
ata4.00: cmd 60/08:40:e0:a4:88/00:00:04:00:00/40 tag 8 ncq 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x44 (timeout)"

Maintain tag ordered submission as the default, but allow xgene-ahci to
continue with the old behavior.

Reported-by: Ming Lei <ming.lei@xxxxxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
---
drivers/ata/ahci_xgene.c | 1 +
drivers/ata/libata-core.c | 42 +++++++++++++++++++++++++++++++++++++++---
drivers/ata/libata.h | 2 ++
include/linux/libata.h | 1 +
4 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/ahci_xgene.c b/drivers/ata/ahci_xgene.c
index 77c89bf171f1..9be069f5f58a 100644
--- a/drivers/ata/ahci_xgene.c
+++ b/drivers/ata/ahci_xgene.c
@@ -300,6 +300,7 @@ static struct ata_port_operations xgene_ahci_ops = {
.host_stop = xgene_ahci_host_stop,
.hardreset = xgene_ahci_hardreset,
.read_id = xgene_ahci_read_id,
+ .qc_new = ata_qc_new_fifo_order,
};

static const struct ata_port_info xgene_ahci_port_info = {
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 943cc8b83e59..dd554354791f 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -83,6 +83,7 @@ const struct ata_port_operations ata_base_port_ops = {
.error_handler = ata_std_error_handler,
.sched_eh = ata_std_sched_eh,
.end_eh = ata_std_end_eh,
+ .qc_new = ata_qc_new_tag_order,
};

const struct ata_port_operations sata_port_ops = {
@@ -4784,14 +4785,17 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
}

/**
- * ata_qc_new - Request an available ATA command, for queueing
+ * ata_qc_new_tag_order - Request an available ATA command, for queueing
* @ap: target port
*
+ * Accomodates controllers that issue commands in tag order rather
+ * than FIFO order (Intel AHCI).
+ *
* LOCKING:
* None.
*/

-static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
+struct ata_queued_cmd *ata_qc_new_tag_order(struct ata_port *ap)
{
struct ata_queued_cmd *qc = NULL;
unsigned int i, tag;
@@ -4819,6 +4823,38 @@ static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
}

/**
+ * ata_qc_new_fifo_order - Request an available ATA command, for queueing
+ * @ap: target port
+ *
+ * Allocate first available tag, for hosts that maintain fifo issue
+ * order on the wire, or otherwise cannot handle tag order.
+ *
+ * LOCKING:
+ * None.
+ */
+
+struct ata_queued_cmd *ata_qc_new_fifo_order(struct ata_port *ap)
+{
+ struct ata_queued_cmd *qc = NULL;
+ unsigned int i;
+
+ /* no command while frozen */
+ if (unlikely(ap->pflags & ATA_PFLAG_FROZEN))
+ return NULL;
+
+ /* the last tag is reserved for internal command. */
+ for (i = 0; i < ATA_MAX_QUEUE - 1; i++)
+ if (!test_and_set_bit(i, &ap->qc_allocated)) {
+ qc = __ata_qc_from_tag(ap, i);
+ qc->tag = i;
+ break;
+ }
+
+ return qc;
+}
+EXPORT_SYMBOL_GPL(ata_qc_new_fifo_order);
+
+/**
* ata_qc_new_init - Request an available ATA command, and initialize it
* @dev: Device from whom we request an available command structure
*
@@ -4831,7 +4867,7 @@ struct ata_queued_cmd *ata_qc_new_init(struct ata_device *dev)
struct ata_port *ap = dev->link->ap;
struct ata_queued_cmd *qc;

- qc = ata_qc_new(ap);
+ qc = ap->ops->qc_new(ap);
if (qc) {
qc->scsicmd = NULL;
qc->ap = ap;
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 45b5ab3a95d5..381ba41de464 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -90,6 +90,8 @@ extern int ata_down_xfermask_limit(struct ata_device *dev, unsigned int sel);
extern unsigned int ata_dev_set_feature(struct ata_device *dev,
u8 enable, u8 feature);
extern void ata_sg_clean(struct ata_queued_cmd *qc);
+extern struct ata_queued_cmd *ata_qc_new_tag_order(struct ata_port *ap);
+extern struct ata_queued_cmd *ata_qc_new_fifo_order(struct ata_port *ap);
extern void ata_qc_free(struct ata_queued_cmd *qc);
extern void ata_qc_issue(struct ata_queued_cmd *qc);
extern void __ata_qc_complete(struct ata_queued_cmd *qc);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 5ab4e3a76721..852686837d1c 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -880,6 +880,7 @@ struct ata_port_operations {
void (*qc_prep)(struct ata_queued_cmd *qc);
unsigned int (*qc_issue)(struct ata_queued_cmd *qc);
bool (*qc_fill_rtf)(struct ata_queued_cmd *qc);
+ struct ata_queued_cmd *(*qc_new)(struct ata_port *ap);

/*
* Configuration and exception handling


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/