pm80xx: Issues with SATA drives behind expander

From: Michal Grzedzicki
Date: Wed Aug 30 2023 - 16:45:02 EST


Hi,
I'm trying to run Linux 6.5-rc6 on a x86_64 system with an old Adaptec HBA
using pm80xx driver (Adaptec Device 8074 Subsystem: PMC-Sierra Inc. Device 0800).

Machine has 2 expanders 10 SATA disks each and 2 SAS drives connected directly.

pm80xx ----- port0 (3 phy's) ---> exp0 ---> SATA, ..SATA, SES enc0 * works
\---------- port1 ---> SAS * works
\--------- port2 (3 phy's) ---> exp1 ---> SATA, ..SATA, SES enc1 * does not work
\-------- port3 ---> SAS * works


If CONFIG_SCSI_SAS_ATA is not enabled, machine only discovers 2 SAS drives
and works correctly.

When it's enabled kernel runs out of reserved task tags and never finish the discovery.

Both expanders have the same sas address, but they are connected to different ports.

If I pass "libata.dma=0 libata.force=noncq" and with bellow changes kernel is able to detect drives on the first expander,
drives on the second expander are detected by the link layer but they all fail to complete ata IDENTIFY commands.

[pm80xx] : Do not leak reserved tag in mpi_set_controller_config_resp()
Save 1 tag from leaking.

diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
index 97f54fbb3812..3a6157b9a77b 100644
--- a/drivers/scsi/pm8001/pm80xx_hwi.c
+++ b/drivers/scsi/pm8001/pm80xx_hwi.c
@@ -3673,10 +3673,12 @@ static int mpi_set_controller_config_resp(struct pm8001_hba_info *pm8001_ha,
(struct set_ctrl_cfg_resp *)(piomb + 4);
u32 status = le32_to_cpu(pPayload->status);
u32 err_qlfr_pgcd = le32_to_cpu(pPayload->err_qlfr_pgcd);
+ u32 tag = le32_to_cpu(pPayload->tag);

pm8001_dbg(pm8001_ha, MSG,
"SET CONTROLLER RESP: status 0x%x qlfr_pgcd 0x%x\n",
status, err_qlfr_pgcd);
+ pm8001_tag_free(pm8001_ha, tag);

return 0;
}


[pm80xx] : Decrease running_req for null tasks in mpi_sata_completion
Without it the discovery process never finishes

diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
index 39a12ee94a72..97f54fbb3812 100644
--- a/drivers/scsi/pm8001/pm80xx_hwi.c
+++ b/drivers/scsi/pm8001/pm80xx_hwi.c
@@ -2292,6 +2292,8 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha,
pm8001_dbg(pm8001_ha, FAIL, "task null, freeing CCB tag %d\n",
ccb->ccb_tag);
pm8001_ccb_free(pm8001_ha, ccb);
+ if (pm8001_dev)
+ atomic_dec(&pm8001_dev->running_req);
return;
}


[pm80xx] : Increase PM8001_RESERVE_SLOT so it can abort jobs on more than 8 devices
Without it driver runs out of tags and loops while trying to abort all 10 failed ata IDENTIFY commands.

diff --git a/drivers/scsi/pm8001/pm8001_defs.h b/drivers/scsi/pm8001/pm8001_defs.h
index 501b574239e8..f7d348165f7e 100644
--- a/drivers/scsi/pm8001/pm8001_defs.h
+++ b/drivers/scsi/pm8001/pm8001_defs.h
@@ -90,7 +90,7 @@ enum port_type {
#define PM8001_MAX_PORTS 16 /* max. possible ports */
#define PM8001_MAX_DEVICES 2048 /* max supported device */
#define PM8001_MAX_MSIX_VEC 64 /* max msi-x int for spcv/ve */
-#define PM8001_RESERVE_SLOT 8
+#define PM8001_RESERVE_SLOT 64

#define CONFIG_SCSI_PM8001_MAX_DMA_SG 528
#define PM8001_MAX_DMA_SG CONFIG_SCSI_PM8001_MAX_DMA_SG


Both expanders are visible, and discovers the devices correctly using smp. Same HW works correctly on FreeBSD,
and the devices discovered over smp discovery are consistent with ones reported by FreeBSD's camcontrol smpphylist.

# smp_discover_list /dev/bsg/expander-0\:0
phy 0:U:attached:[500e004abbbbbb00:00 t(SATA)] 6 Gbps ZG:10
phy 1: inaccessible (phy vacant)
phy 2:U:attached:[500e004abbbbbb02:00 t(SATA)] 6 Gbps ZG:10
phy 3: inaccessible (phy vacant)
..
phy 22:U:attached:[ffffffffffffffff:00 i(SSP+STP+SMP)] 12 Gbps ZG:8
phy 23:U:attached:[ffffffffffffffff:01 i(SSP+STP+SMP)] 12 Gbps ZG:8
..
phy 36:D:attached:[500e004abbbbbb7e:36 V i(SSP) t(SSP)] 12 Gbps ZG:2

# smp_discover_list /dev/bsg/expander-0\:1
phy 0: inaccessible (phy vacant)
phy 1:U:attached:[500e004abbbbbb01:00 t(SATA)] 6 Gbps ZG:11
phy 2: inaccessible (phy vacant)
phy 3:U:attached:[500e004abbbbbb03:00 t(SATA)] 6 Gbps ZG:11
phy 4:U:attached:[500e004abbbbbb04:00 t(SATA)] 6 Gbps ZG:11
...
phy 36:D:attached:[500e004abbbbbb7e:36 V i(SSP) t(SSP)] 12 Gbps ZG:2


working SATA drive
# smp_rep_phy_sata -p 0 /dev/bsg/expander-0\:0
Report phy SATA response:
expander change count: 36861
phy identifier: 0
STP I_T nexus loss occurred: 0
affiliations supported: 1
affiliation valid: 1
STP SAS address: 0x500e004abbbbbb00
register device to host FIS:
34 00 50 01 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
affiliated STP initiator SAS address: 0xffffffffffffffff
STP I_T nexus loss SAS address: 0x0
affiliation context: 0
current affiliation contexts: 1
maximum affiliation contexts: 1

not working on the second expander
# smp_rep_phy_sata -p 3 /dev/bsg/expander-0\:1
Report phy SATA response:
expander change count: 36861
^^^^^^^^^
reported change count is the same for both expanders, that looks suspicious

phy identifier: 3
STP I_T nexus loss occurred: 1
affiliations supported: 1
affiliation valid: 0
^^^^^^^
affiliation valid is zero

STP SAS address: 0x500e004abbbbbb03
register device to host FIS:
34 00 50 01 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
affiliated STP initiator SAS address: 0xffffffffffffffff
^^^^^
does this mean the affiliation was successful but was undone by nexus loss or other event ?

STP I_T nexus loss SAS address: 0xffffffffffffffff
affiliation context: 0
current affiliation contexts: 0
maximum affiliation contexts: 1

Logs:
https://gist.github.com/mge-fbe-com/084abe34038f5f10630b5c4519f301d2

Verbose logs:
https://gist.github.com/mge-fbe-com/a7c830599e6cc7f8017b4722bb58a901


Thanks,
Michal