RE: [PATCH] platform/x86/amd: pmc: Add a workaround for an s0i3 issue on Cezanne

From: Limonciello, Mario
Date: Mon Dec 05 2022 - 18:03:05 EST


[Public]



> -----Original Message-----
> From: Hans de Goede <hdegoede@xxxxxxxxxx>
> Sent: Thursday, November 17, 2022 10:09
> To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; S-k, Shyam-sundar
> <Shyam-sundar.S-k@xxxxxxx>
> Cc: Mahapatra, Rajib <Rajib.Mahapatra@xxxxxxx>; Raul Rangel
> <rrangel@xxxxxxxxxxxx>; Mark Gross <markgross@xxxxxxxxxx>; platform-
> driver-x86@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] platform/x86/amd: pmc: Add a workaround for an s0i3
> issue on Cezanne
>
> Hi,
>
> On 11/17/22 17:06, Limonciello, Mario wrote:
> > [Public]
> >
> >
> >
> >> -----Original Message-----
> >> From: Hans de Goede <hdegoede@xxxxxxxxxx>
> >> Sent: Thursday, November 17, 2022 08:06
> >> To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; S-k, Shyam-sundar
> >> <Shyam-sundar.S-k@xxxxxxx>
> >> Cc: Mahapatra, Rajib <Rajib.Mahapatra@xxxxxxx>; Raul Rangel
> >> <rrangel@xxxxxxxxxxxx>; Mark Gross <markgross@xxxxxxxxxx>;
> platform-
> >> driver-x86@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH] platform/x86/amd: pmc: Add a workaround for an
> s0i3
> >> issue on Cezanne
> >>
> >> Hi Mario,
> >>
> >> On 11/16/22 16:43, Mario Limonciello wrote:
> >>> Cezanne platforms under the right circumstances have a synchronization
> >>> problem where attempting to enter s2idle may fail if the x86 cores are
> >>> put into HLT before hardware resume from the previous attempt has
> >>> completed.
> >>>
> >>> To avoid this issue add a 10-20ms delay before entering s2idle another
> >>> time. This workaround will only be applied on interrupts that wake the
> >>> hardware but don't break the s2idle loop.
> >>>
> >>> Cc: "Mahapatra, Rajib" <Rajib.Mahapatra@xxxxxxx>
> >>> Cc: "Raul Rangel" <rrangel@xxxxxxxxxxxx>
> >>> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> >>
> >> Thank you for your patch, I've applied this patch to my review-hans
> >> branch:
> >>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.k
> %2F&amp;data=05%7C01%7Cmario.limonciello%40amd.com%7Cb3c04b4449
> 154cad4f8208dac8b61509%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C
> 0%7C638042981632459900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
> 7C%7C&amp;sdata=3ZzdcI0BsHknBInf8V4MfrmNCkkc2U9ygYf4IP25LJ4%3D&
> amp;reserved=0
> ernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fpdx86%2Fplatform-
> >> drivers-x86.git%2Flog%2F%3Fh%3Dreview-
> >>
> hans&amp;data=05%7C01%7Cmario.limonciello%40amd.com%7C674f8bf7a8
> >>
> 114f83a3b408dac8a4d941%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C
> >>
> 0%7C638042907591739047%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> >>
> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
> >>
> 7C%7C&amp;sdata=XYwl%2FOvUFy%2Bgz9EY9oa35M%2BkLf%2Bud8PKXynQ
> >> FlrUdoE%3D&amp;reserved=0
> >>
> >> Please let me know if it important to get this as a fix into 6.1,
> >> I wasn't really planning on doing any more fixes pull-reqs for 6.1,
> >> but I can do one if necessary.
> >>
> >
> > AFAIK it's a corner case. I think it can wait until 6.2, but I think it should
> probably
> > be Cc to 6.1 stable (which has the ability to run code in the check()) phase.
>
> Ok, I have added a:
>
> Cc: stable@xxxxxxxxxxxxxxx # 6.1
>
> to the commit msg.
>
> Regards,
>
> Hans

Hi Hans,

I just wanted to update you on this workaround. Previously it was believed to
only be a very specific set of circumstances that happened on chromebooks running
coreboot and an EC running cros_ec being utilized with unfortunate timing.

However it turns out that it can be "relatively" easily reproduced on UEFI machines
as well though by suspending the laptop and then issuing anything that causes an
ACPI event that otherwise shouldn't break the s2idle loop (such as closing the lid or
unplugging the power adapter).

What will happen is that the SOC enters the deepest state up until the time of that
ACPI event and then never enters again. The most common case this will break I think
is someone suspends the laptop in GNOME, closes the lid and then tosses it in their bag.
If you examine /sys/kernel/debug/amd_pmc/* you'll see that the duration of time in
deepest state matches the time between suspending in GNOME and closing the lid.

I wanted to provide you that context to decide if this should still try to catch this in
a 6.1 final pull request or not. Had I known how widely this helped at that time I
would have advocated accordingly.

Thanks!

>
>
>
> >>> ---
> >>> drivers/platform/x86/amd/pmc.c | 6 ++++++
> >>> 1 file changed, 6 insertions(+)
> >>>
> >>> diff --git a/drivers/platform/x86/amd/pmc.c
> >> b/drivers/platform/x86/amd/pmc.c
> >>> index ef4ae977b8e0..439d282aafd1 100644
> >>> --- a/drivers/platform/x86/amd/pmc.c
> >>> +++ b/drivers/platform/x86/amd/pmc.c
> >>> @@ -739,8 +739,14 @@ static void amd_pmc_s2idle_prepare(void)
> >>> static void amd_pmc_s2idle_check(void)
> >>> {
> >>> struct amd_pmc_dev *pdev = &pmc;
> >>> + struct smu_metrics table;
> >>> int rc;
> >>>
> >>> + /* CZN: Ensure that future s0i3 entry attempts at least 10ms passed
> >> */
> >>> + if (pdev->cpu_id == AMD_CPU_ID_CZN &&
> >> !get_metrics_table(pdev, &table) &&
> >>> + table.s0i3_last_entry_status)
> >>> + usleep_range(10000, 20000);
> >>> +
> >>> /* Dump the IdleMask before we add to the STB */
> >>> amd_pmc_idlemask_read(pdev, pdev->dev, NULL);
> >>>
> >

<<attachment: winmail.dat>>