Re: [PATCH] ahci: libahci: clear pending interrupt status

From: Damien Le Moal
Date: Thu Aug 10 2023 - 06:00:48 EST


On 8/10/23 18:31, Szuying Chen wrote:
> On 8/10/23 14:12, Damien Le Moal wrote:
>> On 8/10/23 14:05, Szuying Chen wrote:
>> > When ISR handle interface fatal error with error recovery after clear PxIS
>> > and PxIE. Another FIS(SDB FIS with err) that set PxIS.IFS to 1 is recevied
>> > during error recovery, which causing the HBA not issue any new commands
>> > after cmd.ST set 1.
>>
>> This is not very clear. If there was a fatal error, the drive should be in
>> error state and no other SDB FIS can be received as the drive does absolutely
>> nothing while in error state (it only waits for a read log 10h command to be>
>> issued to get it out of error state). So if you are seeing 2 SDB FIS with
>> errors one after the other, you have a buggy device...
>>
>> However, I may be misunderstanding your issue here. Could you provide more
>> details and a dmesg output example of the issue ?
>
> According to AHCI 1.3.1 specification ch6.1.9, when an R_ERR is received
> on an H2D Data FIS in normal operation, the HBA sets PxIS.IFS to 1
> (fatal error) and halts operation. Referring to SATA 3.0 specification we
> know the device will halt queued command processing and transmit SDB FIS to
> host with ERR bit in Status field set to one(set PxIS.TFES to 1).

Sure, but that SBD FIS should be completely ignored by the adapter since it
stopped operation. If you see it, then it means that the handling of the first
error was incomplete.

> In our case, the ISR handles fatal errors(PxIS.IFS) and enters error
> recovery after cleaning up PxIS and PxIE. Then a SDB FIS is received
> with interrupt bit(PxIS.TFES) set to 1. According to AHCI 1.3.1
> specification ch6.2.2, HBA can't issue(cmd.ST set to 1) any new commands
> under PxIS.TFES alive during error recovery.

But how come you see a new command being issued ? This entire situation should
result in a port reset with the first error. I do not see how this is possible.
Are you saying that the port reset is not cleaning things up properly ? Could
you share the dmesg output of this case ?

>
>> >
>> > Signed-off-by: Szuying Chen <Chloe_Chen@xxxxxxxxxxxxxx>
>> > ---
>> > drivers/ata/libahci.c | 12 ++++++++++++
>> > 1 file changed, 12 insertions(+)
>> >
>> > diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
>> > index 06aec35f88f2..0ae51fd95d46 100644
>> > --- a/drivers/ata/libahci.c
>> > +++ b/drivers/ata/libahci.c
>> > @@ -679,9 +679,21 @@ static int ahci_scr_write(struct ata_link *link, unsigned int sc_reg, u32 val)
>> >
>> > void ahci_start_engine(struct ata_port *ap)
>> > {
>> > + struct ahci_host_priv *hpriv = ap->host->private_data;
>> > void __iomem *port_mmio = ahci_port_base(ap);
>> > u32 tmp;
>> >
>> > + /* clear SError */
>> > + tmp = readl(port_mmio + PORT_SCR_ERR);
>> > + writel(tmp, port_mmio + PORT_SCR_ERR);
>> > +
>> > + /* clear port IRQ */
>> > + tmp = readl(port_mmio + PORT_IRQ_STAT);
>> > + if (tmp)
>> > + writel(tmp, port_mmio + PORT_IRQ_STAT);
>> > +
>> > + writel(1 << ap->port_no, hpriv->mmio + PORT_IRQ_STAT);
>> > +
>> > /* start DMA */
>> > tmp = readl(port_mmio + PORT_CMD);
>> > tmp |= PORT_CMD_START;
>> > --
>> > 2.39.2
>> >
>>
>> --
>> Damien Le Moal
>> Western Digital Research
>>
> Thanks.
>

--
Damien Le Moal
Western Digital Research