Re: [regression] Bug 217218 - Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235

From: Linux regression tracking (Thorsten Leemhuis)
Date: Mon May 22 2023 - 07:34:41 EST


On 22.05.23 13:01, Robin Murphy wrote:
> On 2023-05-22 11:26, Linux regression tracking (Thorsten Leemhuis) wrote:
>>
>> I might be missing something, but it looks to me like this regression
>> was never fixed in mainline. Which is strange, as we apparently had a
>> patch from Robin that fixed the issue for the reporter.
>>
>> Did it fall through the cracks or what am I missing?
>
> Strictly, the regression itself has not been fixed - I guess it does
> just about qualify since the rather-out-of-date
> Documentation/arch/x86/x86_64/boot-options.rst does still say that
> iommu=soft "can be used to prevent the usage of an available hardware
> IOMMU", and that seems to be what has stopped happening here.
>
> What it exposed was a latent issue that this particular device has never
> been properly supported for use with an IOMMU, and that's what I guessed
> at a fix for.

Thx for the summary. This sounds a lot like you have no interest in
submitting the quirk entry yourself (please correct me if I'm wrong).
Jason from looking at lore doesn't seem to be involved in kernel
development regularly. And I try to stay out of such waters as well, as
I try to draw a line there. Which leads to the question:

Who will now submit the quirk entry?

>From "git blame" it seems Bjorn and Alex added most of the other quirk
entries for the marvel controllers (both CCed now). Could one of you add
this one that Ron suggested in [1] as well?

/me wonders if they'd need a "Signed-off-by" from Ron for a one-liner
that is mainly copy-n-paste

Ciao, Thorsten

[1]
https://lore.kernel.org/all/1539e760-392f-a33e-436e-bbf043e79bfc@xxxxxxx/

>> On 25.04.23 15:58, Jason Adriaanse wrote:
>>> I am happy to report that the change worked, this is what
>>> drivers/pci/quirks.c looks like
>>>
>>> /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
>>>                           quirk_dma_func1_alias);
>>> /* https://bugzilla.kernel.org/show_bug.cgi?id=217218 */
>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
>>>                           quirk_dma_func1_alias);
>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
>>>                           quirk_dma_func1_alias);
>>>
>>> Relevant output of dmesg -T with the new kernel running
>>>
>>> Tue Apr 25 21:45:13 2023] scsi host0: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host1: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host2: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host3: ahci
>>> [Tue Apr 25 21:45:13 2023] ata1: SATA max UDMA/133 abar m2048@0xf7d06000
>>> port 0xf7d06100 irq 40
>>> [Tue Apr 25 21:45:13 2023] ata2: SATA max UDMA/133 abar m2048@0xf7d06000
>>> port 0xf7d06180 irq 40
>>> [Tue Apr 25 21:45:13 2023] ata3: DUMMY
>>> [Tue Apr 25 21:45:13 2023] ata4: DUMMY
>>> [Tue Apr 25 21:45:13 2023] igb 0000:05:00.0 enp5s0: renamed from eth0
>>> [Tue Apr 25 21:45:13 2023] ahci 0000:07:00.0: AHCI 0001.0000 32 slots 4
>>> ports 6 Gbps 0xf impl SATA mode
>>> [Tue Apr 25 21:45:13 2023] ahci 0000:07:00.0: flags: 64bit ncq sntf led
>>> only pmp fbs pio slum part sxs
>>> [Tue Apr 25 21:45:13 2023] scsi host4: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host5: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host6: ahci
>>> [Tue Apr 25 21:45:13 2023] scsi host7: ahci
>>> [Tue Apr 25 21:45:13 2023] ata5: SATA max UDMA/133 abar m2048@0xf7b10000
>>> port 0xf7b10100 irq 41
>>> [Tue Apr 25 21:45:13 2023] ata6: SATA max UDMA/133 abar m2048@0xf7b10000
>>> port 0xf7b10180 irq 41
>>> [Tue Apr 25 21:45:13 2023] ata7: SATA max UDMA/133 abar m2048@0xf7b10000
>>> port 0xf7b10200 irq 41
>>> [Tue Apr 25 21:45:13 2023] ata8: SATA max UDMA/133 abar m2048@0xf7b10000
>>> port 0xf7b10280 irq 41
>>> [Tue Apr 25 21:45:13 2023] usb 1-1: new high-speed USB device number 2
>>> using ehci-pci
>>> [Tue Apr 25 21:45:14 2023] usb 3-1: new high-speed USB device number 2
>>> using ehci-pci
>>> [Tue Apr 25 21:45:14 2023] ata8: SATA link up 6.0 Gbps (SStatus 133
>>> SControl 300)
>>> [Tue Apr 25 21:45:14 2023] ata6: SATA link up 6.0 Gbps (SStatus 133
>>> SControl 300)
>>> [Tue Apr 25 21:45:14 2023] ata7: SATA link up 6.0 Gbps (SStatus 133
>>> SControl 300)
>>> [Tue Apr 25 21:45:14 2023] ata5: SATA link up 6.0 Gbps (SStatus 133
>>> SControl 300)
>>> [Tue Apr 25 21:45:14 2023] ata7.00: ATA-9: WDC WD40EFRX-68WT0N0,
>>> 80.00A80, max UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata6.00: ATA-9: WDC WD40EFRX-68WT0N0,
>>> 80.00A80, max UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata8.00: ATA-9: WDC WD40EFRX-68WT0N0,
>>> 80.00A80, max UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata5.00: ATA-10: CT2000BX500SSD1, M6CR030,
>>> max UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata6.00: 7814037168 sectors, multi 0: LBA48
>>> NCQ (depth 32), AA
>>> [Tue Apr 25 21:45:14 2023] ata7.00: 7814037168 sectors, multi 0: LBA48
>>> NCQ (depth 32), AA
>>> [Tue Apr 25 21:45:14 2023] ata8.00: 7814037168 sectors, multi 0: LBA48
>>> NCQ (depth 32), AA
>>> [Tue Apr 25 21:45:14 2023] ata5.00: 3907029168 sectors, multi 1: LBA48
>>> NCQ (depth 32), AA
>>> [Tue Apr 25 21:45:14 2023] ata6.00: configured for UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata7.00: configured for UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata8.00: configured for UDMA/133
>>> [Tue Apr 25 21:45:14 2023] ata1: SATA link down (SStatus 0 SControl 300)
>>> [Tue Apr 25 21:45:14 2023] ata5.00: Features: Dev-Sleep
>>> [Tue Apr 25 21:45:14 2023] ata5.00: configured for UDMA/133
>>> [Tue Apr 25 21:45:14 2023] usb 1-1: New USB device found, idVendor=8087,
>>> idProduct=0024, bcdDevice= 0.00
>>> [Tue Apr 25 21:45:14 2023] usb 1-1: New USB device strings: Mfr=0,
>>> Product=0, SerialNumber=0
>>> [Tue Apr 25 21:45:14 2023] hub 1-1:1.0: USB hub found
>>> [Tue Apr 25 21:45:14 2023] hub 1-1:1.0: 4 ports detected
>>> [Tue Apr 25 21:45:14 2023] usb 3-1: New USB device found, idVendor=8087,
>>> idProduct=0024, bcdDevice= 0.00
>>> [Tue Apr 25 21:45:14 2023] usb 3-1: New USB device strings: Mfr=0,
>>> Product=0, SerialNumber=0
>>> [Tue Apr 25 21:45:14 2023] hub 3-1:1.0: USB hub found
>>> [Tue Apr 25 21:45:14 2023] hub 3-1:1.0: 6 ports detected
>>> [Tue Apr 25 21:45:14 2023] ata2: SATA link down (SStatus 0 SControl 300)
>>> [Tue Apr 25 21:45:14 2023] scsi 4:0:0:0: Direct-Access ATA
>>> CT2000BX500SSD1  030  PQ: 0 ANSI: 5
>>> [Tue Apr 25 21:45:14 2023] scsi 5:0:0:0: Direct-Access ATA      WDC
>>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>>> [Tue Apr 25 21:45:14 2023] scsi 6:0:0:0: Direct-Access ATA      WDC
>>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>>> [Tue Apr 25 21:45:14 2023] scsi 7:0:0:0: Direct-Access ATA      WDC
>>> WD40EFRX-68W 0A80 PQ: 0 ANSI: 5
>>>
>>> Thanks everyone for all your help.
>>>
>>> Jason
>>>
>>>
>>> On 25/04/2023 19:37, Robin Murphy wrote:
>>>> On 2023-04-25 05:17, Jason Adriaanse wrote:
>>>>> Ok great,
>>>>>
>>>>> I take it a change needs to be made in
>>>>> drivers/pci/quirks.c
>>>>> ?
>>>>> I do not mind making the change locally here and letting you know if
>>>>> it works or not.
>>>>
>>>> Indeed, something like this (make sure the IDs actually match what your
>>>> device reports, I'm just guessing):
>>>>
>>>>
>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>> index 44cab813bf95..a9166e886b75 100644
>>>> --- a/drivers/pci/quirks.c
>>>> +++ b/drivers/pci/quirks.c
>>>> @@ -4161,6 +4161,8 @@
>>>> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9220,
>>>>   /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
>>>>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
>>>>                quirk_dma_func1_alias);
>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9235,
>>>> +             quirk_dma_func1_alias);
>>>>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
>>>>                quirk_dma_func1_alias);
>>>>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0645,
>>>>
>>>>
>>>> Marvell themselves seem to lump the 88SE92xx products together as a
>>>> closely-related family, so given that we do have quirks for 3 of the 4
>>>> already, this one does rather seem conspicuous by its absence...
>>>>
>>>> Thanks,
>>>> Robin.
>>>>
>>>>> On 24/04/2023 22:07, Robin Murphy wrote:
>>>>>> On 2023-04-24 14:44, Jason Adriaanse wrote:
>>>>>>> I took out "iommu=soft" and the server failed to boot, so yes it
>>>>>>> does break.
>>>>>>>
>>>>>>> The first error was
>>>>>>> ata7.00: Failed to IDENTIFY (INIT_DEV_PARAMS failed , err_mask=0x80)
>>>>>>
>>>>>> OK, great, that confirms the underlying issue existed all along, so
>>>>>> the regression is only a change in who wins a fight between certain
>>>>>> conflicting command-line arguments, which is arguably not so
>>>>>> critical.
>>>>>>
>>>>>> The rest of the evidence points to 88SE9235 wanting the same phantom
>>>>>> function quirk as most other Marvell controllers, since although
>>>>>> it's apparently been half-fixed such that DMA for two of the ports
>>>>>> is being correctly emitted from function 0 - given that you say two
>>>>>> of the disks *are* detected OK - the other two are still claiming to
>>>>>> be function 1 after all.
>>>>>>
>>>>>> Thanks,
>>>>>> Robin.
>>>>>>
>>>>>>> On 24/04/2023 21:20, Robin Murphy wrote:
>>>>>>>> On 2023-04-22 07:25, Jason Adriaanse wrote:
>>>>>>>>> Hi Christoph,
>>>>>>>>>
>>>>>>>>> Sorry for my late reply, I have been on the road.
>>>>>>>>>
>>>>>>>>> So, if I boot with
>>>>>>>>> intel_iommu=off
>>>>>>>>> Then the server boots fine..although that is not a solution
>>>>>>>>> because I need Intel iommu for virtualisation.
>>>>>>>>>
>>>>>>>>> Also, I build all my kernels with CONFIG_INTEL_IOMMU=y
>>>>>>>>>
>>>>>>>>
>>>>>>>> If you boot 5.15 *without* the "iommu=soft" argument, just
>>>>>>>> "intel_iommu=on", does that also break?
>>>>>>>>
>>>>>>>> Robin.
>>>
>>>
>
>