Re: Excessive delays from GHES polling on dual-socket AMD EPYC

From: Alexander Monakov
Date: Thu Dec 02 2021 - 13:46:47 EST


On Thu, 2 Dec 2021, Yazen Ghannam wrote:

> I believe the large number of GHES structures you have are intended to be used
> for the ACPI "GHES_ASSIST" feature. The GHES structures in this case are not
> to be used as independent sources. However, this feature is not implemented
> yet in Linux, so the kernel does set up these GHES structures as independent
> error sources.

Yes, our HEST has "GHES Assist: 1". But it is disappointing those sources have
"Polled" type, ACPI allocated eight bits for the type, and only 12 types are
registered so far, so it's not like they were running out of space to designate
a separate type for this kind of sources.

[snip increasing polling interval]

> Ultimately, I think we'd want the kernel to ignore the GHES structures used
> for GHES_ASSIST, and then GHES_ASSIST support can be implemented and used
> where appropriate.
>
> I can send a patchset for ignoring the structures. This would be setup for
> another set than can fully implement the GHES_ASSIST feature. Would you be
> willing to test out that first set to see if it resolves the issue?

Sure, please Cc me on the patches.

Alexander