Re: Regression from dcadfd7f7c74ef9ee415e072a19bdf6c085159eb

From: Takashi Sakamoto
Date: Tue Nov 07 2023 - 07:26:09 EST


Hi Mario,

Thanks for the report.

I apologize for the inconvenience you and your reporter facing, however
I can not avoid to say that the problem appears to be specific to the AMD
Ryzen machines.

I've already received the similar report[1], and have been
investigating it in the last few weeks, then got the insight. Please take
a look at my short report about it in PR to Linus for 6.7-rc1:
https://lore.kernel.org/lkml/20231105144852.GA165906@workstation.local/

I can confirm that I have been abe to reproduce the problem on AMD Ryzen
machine. However, it's important to note that I have not observed the
problem on the following systems:

* Intel machine (Sandy Bridge and Skylake generations)
* AMD machines predating Ryzen (Sempron 145)
* Machines using different 1394 OHCI hardware from other vendors such as
TI
* VIA VT6307 connected directly to PCI slot (i.e. without the issued
PCIe/PCI bridge)

Currently, I have not been able to obtain any useful debug output from
the Linux system or any hardware error reports when the system reboots.
It seems that the system reboots spontaneously. My assumption at this
point is that AMD Ryzen machines detect a specific hardware error
triggered by Ryzen machine quirk related to the combination of the Asmedia
ASM1083/1085 and VIA VT6306/6307/6308, leading to power reset.

I genuinely appreciate your assistance in debugging this elusive
hardware issue. If any workaround specific to AMD Ryzen machine quirk is
required in PCI driver for 1394 OHCI hardware, I'm willing to apply it.
However, it is preferable to figure out the reboot mechanism at first,
I think.

On Mon, Nov 06, 2023 at 02:14:39PM -0600, Mario Limonciello wrote:
> Hi,
>
> I recently came across a kernel bugzilla that bisected a boot problem [1]
> introduced in kernel 6.5 to this change.
>
> commit dcadfd7f7c74ef9ee415e072a19bdf6c085159eb (HEAD -> dcadfd7f7c7)
> Author: Takashi Sakamoto <o-takashi@xxxxxxxxxxxxx>
> Date: Tue May 30 08:12:40 2023 +0900
>
> firewire: core: use union for callback of transaction completion
>
> Removing the firewire card from the system fixes it for both reporters
> (CC'ed)
>
> As the author of this issue can you please take a look at it?
>
> Thanks,
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217993


[1] https://bugzilla.suse.com/show_bug.cgi?id=1215436
[2] https://bugzilla.kernel.org/show_bug.cgi?id=217994

Thanks

Takashi Sakamoto