mlx5: Regression VFs fail to probe on v6.8-rc1

From: Niklas Schnelle
Date: Mon Jan 22 2024 - 06:22:55 EST


Hi Saeed, Hi Leon,

On current v6.8-rc1 on both s390x and on an Intel x86_64 test system
with a ConnectX-6 DX the mlx5 driver fails to probe for VFs (On x86
"echo 1 > /sys/bus/pci/devices/<dev>/sriov_numvfs" after a fresh boot
is enough and is 100% reproducible).

In dmesg I see the following messages (from the Intel server but it's
basically the same on s390x):

[ 110.443950] mlx5_core 0000:6f:00.1: E-Switch: Enable: mode(LEGACY), nvfs(1), necvfs(0), active vports(2)
[ 110.546248] pci 0000:6f:08.2: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[ 110.546340] pci 0000:6f:08.2: enabling Extended Tags
[ 110.547626] pci 0000:6f:08.2: Adding to iommu group 115
[ 110.553328] mlx5_core 0000:6f:08.2: enabling device (0000 -> 0002)
[ 110.553478] mlx5_core 0000:6f:08.2: firmware version: 22.36.1010
[ 110.718748] mlx5_core 0000:6f:08.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 110.730136] mlx5_core 0000:6f:08.2: Assigned random MAC address ce:a6:ec:9e:70:49
[ 110.734351] mlx5_core 0000:6f:08.2: mlx5_cmd_out_err:808:(pid 650): CREATE_TIS(0x912) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x595b5d), err(-22)
[ 110.735776] mlx5_core 0000:6f:08.2: mlx5e_create_mdev_resources:174:(pid 650): alloc tises failed, -22
[ 110.736819] mlx5_core 0000:6f:08.2: _mlx5e_probe:6076:(pid 650): mlx5e_resume failed, -22
[ 110.749146] mlx5_core.eth: probe of mlx5_core.eth.2 failed with error -22
[ 110.776533] mlx5_core 0000:6f:08.2: is_dpll_supported:213:(pid 650): Missing SyncE capability

I've actually encountered this problem before on December 21 on linux-
next but then didn't investigate further as the holidays were coming up
and it was affecting x86 as well. It was gone after the holidays on
next-20240104. Somehow it's now back on both linux-next and v6.8-rc1.
This same configuration of course works fine on v6.7. On s390x at least
this also affects ConnectX-4 and ConnectX-5 as well and also occurs
when the VF is passed-through to a different logical partition from the
one controlling the PF.

One point of difference to other common setups may be that this Intel
Sapphire Rapids server as well as s390x are running with IOMMU enabled
and no pass-through for kernel code i.e. on the Intel server my kernel
command line includes "iommu=nopt intel_iommu=on".

Thanks,
Niklas