Fwd: Kernels v6.5 and v6.6 break resume from standby (s3) on some Intel systems if VT-d is enabled

From: Bagas Sanjaya
Date: Tue Nov 28 2023 - 08:09:38 EST


Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> Note:
>
> I'm just a Linux user, I don't work in IT or even write code, so, I'm probably using terms to describe the issue that are not the ones someone who knows code and what the system does under the hood would use.
>
> Affected system:
>
> Thinkpad, Intel Kaby Lake (i7-7600U) chipset / cpu and onboard gpu (Intel HD 620), no separate graphics card, current bios firmware; running Void Linux, xfce / lightdm
>
> Symptom / problem:
>
> Since the upgrade to kernel v6.5.5 (from v6.3.13) my system doesn't wake up from standby, i.e. resume from s3 fails 100% of the time.
> When pressing a key or the power button nothing happens. The LED that indicates different states of the system, keeps indicating standby mode.
> The only way to use the system again is hard reset by pressing the power button for a few seconds.
>
> So, there is no crashing on resume or incomplete resume or only sometimes failing to resume or failing to go into standby in the first place.
>
> Granted, this issue was present with kernels before v6.5, but only occasionally and it would not re-appear for many many boot cycles. So, I never had any lead as to why it would happen.
>
> I installed kernel v6.4.16 to test for the bug - it's not in there.
>
> For further testing I also installed kernel v6.5.2, as this was the first kernel of the 6.5 series available on void linux, (and because the kernel logs mention VT-d for kernel v6.5.5 and v6.5.3, see below). Result: The bug is already in v6.5.2, too.
>
> There's only one thing I noticed from comparing logs between kernels v6.5/6.6 vs v6.1/6.3/6.4. In the moment the system goes into standby, if running one of the latter three kernel versions the system would print the following messages:
>
> [elogind-daemon] Entering sleep state 'suspend'...
> [kernel] PM: suspend entry (deep)
>
>
> But with kernels v6.5/6.6, the kernel message is missing, only the elogind-daemon message shows up in the logs. As if the kernel didn't get the memo and thus didn't prepare and didn't listen for the wake-up call to resume.
>
>
> To see, if this is a bug that might be tight to a certain chipset / cpu generation, I tested kernel v6.5 on my old Thinkpad (Intel Sandy Bridge chipset / cpu, and also onboard graphics only). Its BIOS also has VT-d enabled. Interestingly, on that system, resume from standby with kernel v6.5 is no problem, even though its system is set up the same as the current Thinkpad.
>
> So, this bug seems to be limited to certain set of chipset / cpu. Which seems feasible, as I couldn't find a bug report on this - not too many seem to be affected.
>
>
>
> There's an older bug report on similar symptoms, but the cure doesn't work on my system:
>
> "intel_iommu=on breaks resume from suspend on several Thinkpad models"
> https://bugzilla.kernel.org/show_bug.cgi?id=197029
>
>
> Although it sounds just like what my system is experiencing - apart from the fact that term suspend being sometimes also used to describe hibernation and it is not specified which one is meant in the bug report.
>
> So, I was hopeful on the one hand that the (workaround) fix (adding intel_iommu=off to the kernel parameters) would work on my system, too - on the other hand, this bug report was for kernel v4.13, so it's probably not necessarily relevant to similar symptoms with kernel v6.5 and v6.6, respectively.
>
> Anyway, adding intel_iommu=off to the kernel parameters didn't change anything on my system. I made, of course, sure once the system was running, that intel_iommu=off was in indeed used as one of the kernel parameters.
>
>
> With this information in mind I did a regular internet search and found some information that in case intel_iommu=off in the kernel parameters doesn't help, disabling VT-d in BIOS might.
> And in my case it does indeed help avoiding the bug - for both kernel versions, v6.5 and v6.6.
>
> Reading some other bug reports and some changelogs, I noticed that iommu and vt-s are connected, to I posted this bug report in drivers/iommu. If it is misplaced here, please feel free to move it to the correct category.
>
>
> I attached a file with the output of some commands I found being used in several other bug reports on here, just in case they might be needed / helpful.
>
>
> Thank you very much for your help in advance!

See Bugzilla for the full thread.

Anyway, I'm adding this regression to regzbot:

#regzbot introduced: v6.3..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=218191
#regzbot title: resume from standby fails on Thinkpad with Kaby Lake CPU

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218191

--
An old man doll... just what I always wanted! - Clara