Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of system)

From: Thorsten Leemhuis
Date: Fri Jun 30 2023 - 09:02:23 EST


On 27.06.23 00:34, Nick Hastings wrote:
> * Linux regression tracking (Thorsten Leemhuis) <regressions@xxxxxxxxxxxxx> [230626 21:09]:
>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>> for once, to make this easily accessible to everyone.
>>
>> Nick, what's the status/was there any progress? Did you do what Mario
>> suggested and file a nouveau bug?
>
> It was not apparent that the suggestion to open "a Nouveau drm bug" was
> addressed to me.

I wish things were earlier for reporters, but from what I can see this
is the only way forward if you or some silent bystander cares.

>> I ask, as I still have this on my list of regressions and it seems there
>> was no progress in three+ weeks now.
>
> I have not pursued this further since as far as I could tell I already
> provided all requested information and I don't actually use nouveau, so
> I blacklisted it.

I doubt any developer cares enough to take a closer look[1] without a
proper nouveau bug and some help & prodding from someone affected. And
looks to me like reverting the culprit now might create even bigger
problems for users.

Hence I guess then this won't be fixed in the end. In a ideal world this
would not happen, but we don't live in one and all have just 24 hours in
a day. :-/

Nevertheless: thx for your report your help through this thread.

[1] some points on the following page kinda explain this
https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kernel-bug-reports-are-ignored/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot inconclusive: reporting deadlock (see thread for details)



>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>> #regzbot backburner: slow progress, likely just affects one machine
>> #regzbot poke
>>
>>
>> On 02.06.23 02:57, Limonciello, Mario wrote:
>>> [AMD Official Use Only - General]
>>>
>>>> -----Original Message-----
>>>> From: Nick Hastings <nicholaschastings@xxxxxxxxx>
>>>> Sent: Thursday, June 1, 2023 7:02 PM
>>>> To: Karol Herbst <kherbst@xxxxxxxxxx>
>>>> Cc: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; Lyude Paul
>>>> <lyude@xxxxxxxxxx>; Lukas Wunner <lukas@xxxxxxxxx>; Salvatore
>>>> Bonaccorso <carnil@xxxxxxxxxx>; 1036530@xxxxxxxxxxxxxxx; Rafael J.
>>>> Wysocki <rafael@xxxxxxxxxx>; Len Brown <lenb@xxxxxxxxxx>; linux-
>>>> acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>>>> regressions@xxxxxxxxxxxxxxx
>>>> Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI
>>>> string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of system)
>>>>
>>>> Hi,
>>>>
>>>> * Karol Herbst <kherbst@xxxxxxxxxx> [230602 03:10]:
>>>>> On Thu, Jun 1, 2023 at 7:21 PM Limonciello, Mario
>>>>> <Mario.Limonciello@xxxxxxx> wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Karol Herbst <kherbst@xxxxxxxxxx>
>>>>>>> Sent: Thursday, June 1, 2023 12:19 PM
>>>>>>> To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>
>>>>>>> Cc: Nick Hastings <nicholaschastings@xxxxxxxxx>; Lyude Paul
>>>>>>> <lyude@xxxxxxxxxx>; Lukas Wunner <lukas@xxxxxxxxx>; Salvatore
>>>>>>> Bonaccorso <carnil@xxxxxxxxxx>; 1036530@xxxxxxxxxxxxxxx; Rafael J.
>>>>>>> Wysocki <rafael@xxxxxxxxxx>; Len Brown <lenb@xxxxxxxxxx>; linux-
>>>>>>> acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>>>>>>> regressions@xxxxxxxxxxxxxxx
>>>>>>> Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI
>>>>>>> string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of
>>>> system)
>>>>>>>
>>>>>>> On Thu, Jun 1, 2023 at 6:54 PM Limonciello, Mario
>>>>>>> <Mario.Limonciello@xxxxxxx> wrote:
>>>>>>>>
>>>>>>>> [AMD Official Use Only - General]
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Karol Herbst <kherbst@xxxxxxxxxx>
>>>>>>>>> Sent: Thursday, June 1, 2023 11:33 AM
>>>>>>>>> To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>
>>>>>>>>> Cc: Nick Hastings <nicholaschastings@xxxxxxxxx>; Lyude Paul
>>>>>>>>> <lyude@xxxxxxxxxx>; Lukas Wunner <lukas@xxxxxxxxx>; Salvatore
>>>>>>>>> Bonaccorso <carnil@xxxxxxxxxx>; 1036530@xxxxxxxxxxxxxxx; Rafael
>>>> J.
>>>>>>>>> Wysocki <rafael@xxxxxxxxxx>; Len Brown <lenb@xxxxxxxxxx>; linux-
>>>>>>>>> acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>>>>>>>>> regressions@xxxxxxxxxxxxxxx
>>>>>>>>> Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video
>>>> _OSI
>>>>>>>>> string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of
>>>>>>> system)
>>>>>>>>>
>>>>>>>>> On Thu, Jun 1, 2023 at 6:18 PM Limonciello, Mario
>>>>>>>>>>
>>>>>>>>>> Lyude, Lukas, Karol
>>>>>>>>>>
>>>>>>>>>> This thread is in relation to this commit:
>>>>>>>>>>
>>>>>>>>>> 24867516f06d ("ACPI: OSI: Remove Linux-Dell-Video _OSI string")
>>>>>>>>>>
>>>>>>>>>> Nick has found that runtime PM is *not* working for nouveau.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> keep in mind we have a list of PCIe controllers where we apply a
>>>>>>>>> workaround:
>>>>>>>>>
>>>>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers
>>>>>>>>> /gpu/drm/nouveau/nouveau_drm.c?h=v6.4-rc4#n682
>>>>>>>>>
>>>>>>>>> And I suspect there might be one or two more IDs we'll have to add
>>>>>>>>> there. Do we have any logs?
>>>>>>>>
>>>>>>>> There's some archived onto the distro bug. Search this page for
>>>>>>> "journalctl.log.gz"
>>>>>>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1036530
>>>>>>>>
>>>>>>>
>>>>>>> interesting.. It seems to be the same controller used here. I wonder
>>>>>>> if the pci topology is different or if the workaround is applied at
>>>>>>> all.
>>>>>>
>>>>>> I didn't see the message in the log about the workaround being applied
>>>>>> in that log, so I guess PCI topology difference is a likely suspect.
>>>>>>
>>>>>
>>>>> yeah, but I also couldn't see a log with the usual nouveau messages,
>>>>> so it's kinda weird.
>>>>>
>>>>> Anyway, the output of `lspci -tvnn` would help
>>>>
>>>> % lspci -tvnn
>>>> -[0000:00]-+-00.0 Intel Corporation Device [8086:3e20]
>>>> +-01.0-[01]----00.0 NVIDIA Corporation TU117M [GeForce GTX 1650
>>>> Mobile / Max-Q] [10de:1f91]
>>>
>>> So the bridge it's connected to is the same that the quirk *should have been* triggering.
>>>
>>> May 29 15:02:42 xps kernel: pci 0000:00:01.0: [8086:1901] type 01 class 0x060400
>>>
>>> Since the quirk isn't working and this is still a problem in 6.4-rc4 I suggest opening a
>>> Nouveau drm bug to figure out why.
>>>
>>>> +-02.0 Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630]
>>>> [8086:3e9b]
>>>> +-04.0 Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core
>>>> Processor Thermal Subsystem [8086:1903]
>>>> +-08.0 Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 /
>>>> 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
>>>> +-12.0 Intel Corporation Cannon Lake PCH Thermal Controller
>>>> [8086:a379]
>>>> +-14.0 Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller
>>>> [8086:a36d]
>>>> +-14.2 Intel Corporation Cannon Lake PCH Shared SRAM [8086:a36f]
>>>> +-15.0 Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0
>>>> [8086:a368]
>>>> +-15.1 Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1
>>>> [8086:a369]
>>>> +-16.0 Intel Corporation Cannon Lake PCH HECI Controller [8086:a360]
>>>> +-17.0 Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller
>>>> [8086:a353]
>>>> +-1b.0-[02-3a]----00.0-[03-3a]--+-00.0-[04]----00.0 Intel Corporation
>>>> JHL6340 Thunderbolt 3 NHI (C step) [Alpine Ridge 2C 2016] [8086:15d9]
>>>> | +-01.0-[05-39]--
>>>> | \-02.0-[3a]----00.0 Intel Corporation JHL6340
>>>> Thunderbolt 3 USB 3.1 Controller (C step) [Alpine Ridge 2C 2016]
>>>> [8086:15db]
>>>> +-1c.0-[3b]----00.0 Intel Corporation Wi-Fi 6 AX200 [8086:2723]
>>>> +-1c.4-[3c]----00.0 Realtek Semiconductor Co., Ltd. RTS525A PCI
>>>> Express Card Reader [10ec:525a]
>>>> +-1d.0-[3d]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller
>>>> SM981/PM981/PM983 [144d:a808]
>>>> +-1f.0 Intel Corporation Cannon Lake LPC Controller [8086:a30e]
>>>> +-1f.3 Intel Corporation Cannon Lake PCH cAVS [8086:a348]
>>>> +-1f.4 Intel Corporation Cannon Lake PCH SMBus Controller
>>>> [8086:a323]
>>>> \-1f.5 Intel Corporation Cannon Lake PCH SPI Controller
>>>> [8086:a324]
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Nick.
>>>
>
>
>