Re: Playing with SATA NCQ

From: Jens Axboe
Date: Thu May 26 2005 - 12:16:07 EST


On Thu, May 26 2005, Jeff Garzik wrote:
> > static void ahci_qc_prep(struct ata_queued_cmd *qc)
> > {
> > struct ahci_port_priv *pp = qc->ap->private_data;
> >- u32 opts;
> >+ void *port_mmio = (void *) qc->ap->ioaddr.cmd_addr;
> > const u32 cmd_fis_len = 5; /* five dwords */
> >+ dma_addr_t cmd_tbl_dma;
> >+ u32 opts;
> >+ int offset;
> >+
> >+ if (qc->flags & ATA_QCFLAG_NCQ) {
> >+ pp->sactive |= (1 << qc->tag);
> >+
> >+ writel(1 << qc->tag, port_mmio + PORT_SCR_ACT);
> >+ readl(port_mmio + PORT_SCR_ACT); /* flush */
> >+ }
>
> Wrong, you should do this in ahci_qc_issue not here.

Are you sure, I moved this on purpose? I think the reason I did this was
the wording at the back of the the sata-ii spec (appendix b) that says
something ala 'preset the active bit and transmit a register FIS'. Feel
free to point me at the authoritative wording in the ACHI spec.

One thing that I definitely think _was_ wrong with the sactive bit
before, is that you set it unconditionally of whether this was an NCQ
command or not. The maxtor drives don't clear sactive on non-fpdma
commands, which confused me at first.

> >+static void ahci_host_ncq_intr_err(struct ata_port *ap)
> >+{
> >+ struct ahci_port_priv *pp = ap->private_data;
> >+ void *mmio = ap->host_set->mmio_base;
> >+ void *port_mmio = ahci_port_base(mmio, ap->port_no);
> >+ unsigned long flags;
> >+ char *buffer;
> >+
> >+ printk(KERN_ERR "ata%u: ncq interrupt error\n", ap->id);
> >+
> >+ /*
> >+ * error all io first
> >+ */
> >+ spin_lock_irqsave(&ap->host_set->lock, flags);
> >+
> >+ while (pp->sactive) {
> >+ struct ata_queued_cmd *qc;
> >+ int tag = ffs(pp->sactive) - 1;
> >+
> >+ pp->sactive &= ~(1 << tag);
> >+ qc = ata_qc_from_tag(ap, tag);
> >+ if (qc)
> >+ ata_qc_complete(qc, ATA_ERR);
>
> else printk error "I couldn't find the tag!"

Agree.

> >@@ -632,18 +767,27 @@
> > status = readl(port_mmio + PORT_IRQ_STAT);
> > writel(status, port_mmio + PORT_IRQ_STAT);
> >
> >- ci = readl(port_mmio + PORT_CMD_ISSUE);
> >- if (likely((ci & 0x1) == 0)) {
> >- if (qc) {
> >- ata_qc_complete(qc, 0);
> >- qc = NULL;
> >+ if (ap->ncq_depth) {
> >+ if (!serr)
>
> incorrect test. serr is not the only error indicator.

Yes, I'm aware of that. There are still quite a few loose ends wrt error
handling, as mentioned. I also don't check the irq fatal stat in the ncq
interrupt handler.

> >@@ -1139,6 +1143,10 @@
> > goto err_out_nosup;
> > }
> >
> >+ if (ap->flags & ATA_FLAG_NCQ)
> >+ if (ata_id_queue_depth(dev->id) > 1)
> >+ dev->flags |= ATA_DFLAG_NCQ;
>
> This will turn on queueing for older TCQ devices that are plugged into
> an NCQ-capable board.

Yup, that is an error. At that time, I could not easily find what
feature bit encompassed NCQ, that needs to be added of course.

> >+int ata_read_log_page(struct ata_port *ap, unsigned int device, char page,
> >+ char *buffer, unsigned int sectors)
> >+{
> >+ struct ata_device *dev = &ap->device[device];
> >+ DECLARE_COMPLETION(wait);
> >+ struct ata_queued_cmd *qc;
> >+ unsigned long flags;
> >+ u8 status;
> >+ int rc;
> >+
> >+ assert(dev->class == ATA_DEV_ATA);
> >+
> >+ ata_dev_select(ap, device, 1, 1);
>
> is this needed? These types of calls need to be removed, in general, as
> they don't make sense on FIS-based hardware at all.

You tell me, this read_log_page() was mainly copy-pasted from the pio
driven function above it. I'll try and kill the select when doing error
testing.

> >+ printk("RLP issue\n");
> >+ spin_lock_irqsave(&ap->host_set->lock, flags);
> >+ rc = ata_qc_issue(qc);
> >+ spin_unlock_irqrestore(&ap->host_set->lock, flags);
> >+ printk("RLP issue done\n");
> >+
> >+ if (rc)
> >+ return -EIO;
> >+
> >+ wait_for_completion(&wait);
> >+
> >+ printk("RLP wait done\n");
> >+
> >+ status = ata_chk_status(ap);
> >+ if (status & (ATA_ERR | ATA_ABORTED))
> >+ return -EIO;
>
> we need to get rid of this too for AHCI-like devices

Can you expand on that?

> >@@ -2753,6 +2830,16 @@
> > struct ata_port *ap = qc->ap;
> > unsigned int tag, do_clear = 0;
> >
> >+ if (likely(qc->flags & ATA_QCFLAG_ACCOUNT)) {
> >+ if (qc->flags & ATA_QCFLAG_NCQ) {
> >+ assert(ap->ncq_depth);
> >+ ap->ncq_depth--;
> >+ } else {
> >+ assert(ap->depth);
> >+ ap->depth--;
> >+ }
> >+ }
>
> why is this accounting conditional?

Because if you free a qc before it has gone to the device, you don't
want to account it. Hmm, perhaps ACTIVE would work fine in fact. I'll
double check!

> > #ifdef CONFIG_PCI
> > EXPORT_SYMBOL_GPL(pci_test_config_bits);
> >Index: drivers/scsi/libata-scsi.c
> >===================================================================
> >--- f5c58b6b0cfd2a92fb3b1d1f4cbfdfb3df6f45d6/drivers/scsi/libata-scsi.c
> >(mode:100644)
> >+++ uncommitted/drivers/scsi/libata-scsi.c (mode:100644)
> >@@ -336,6 +336,7 @@
> > if (sdev->id < ATA_MAX_DEVICES) {
> > struct ata_port *ap;
> > struct ata_device *dev;
> >+ int depth;
> >
> > ap = (struct ata_port *) &sdev->host->hostdata[0];
> > dev = &ap->device[sdev->id];
> >@@ -353,6 +354,13 @@
> > */
> > blk_queue_max_sectors(sdev->request_queue, 2048);
> > }
> >+
> >+ if (dev->flags & ATA_DFLAG_NCQ) {
> >+ int ddepth = ata_id_queue_depth(dev->id) + 1;
> >+
> >+ depth = min(sdev->host->can_queue, ddepth);
> >+ scsi_adjust_queue_depth(sdev, MSG_ORDERED_TAG,
> >depth);
>
> For all hardware that uses SActive (all NCQ), the max is 31 not 32.

That's not true, the max is 32 counting 0 as a valid tag. So 31 is
indeed th max tag value, but 32 is the depth.

> 32 tags == 0xffffffff SActive, which is the same value as that which
> occurs when the hardware is dead/unplugged/etc.

Well you would think you would notice the hardware going dead by other
means than just SActive, right? Seems silly to cap the depth because of
that.

> Also... NCQ does not provide ordered tags. I think MSG_SIMPLE_TAG is
> more appropriate.

Yep indeed, thanks.

> >+ if (ncq)
> >+ qc->flags |= ATA_QCFLAG_NCQ;
> >+
> > if (scsicmd[0] == READ_10 || scsicmd[0] == WRITE_10) {
> >- if (lba48) {
> >+ if (ncq) {
> >+ tf->hob_feature = scsicmd[7];
> >+ tf->feature = scsicmd[8];
> >+ tf->nsect = qc->tag << 3;
> >+ tf->hob_lbal = scsicmd[2];
> >+ qc->nsect = ((unsigned int)scsicmd[7] << 8) |
> >+ scsicmd[8];
> >+ } else if (lba48) {
> > tf->hob_nsect = scsicmd[7];
> > tf->hob_lbal = scsicmd[2];
> >
> >@@ -569,7 +588,8 @@
> > qc->nsect = scsicmd[8];
> > }
> >
> >- tf->nsect = scsicmd[8];
> >+ if (!ncq)
> >+ tf->nsect = scsicmd[8];
>
> the other changes seem fine, but this seem strange

How so? FPDMA has the tag in nsect, others store the bottom 8 bits of
lba.

> >@@ -308,6 +317,11 @@
> > struct ata_queued_cmd qcmd[ATA_MAX_QUEUE];
> > unsigned long qactive;
> > unsigned int active_tag;
> >+ int ncq_depth;
> >+ int depth;
>
> I don't think we need two depths

The two depths were added because we need to differentiate between the
two for issuing new commands. ncq_depth > 0 is fine for issuing a new
FPDMA request, where as non-FPDMA commands need both !ncq_depth and
!depth.

Thanks for your comments!

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/