Re: [PATCH] RAS/AMD/ATL: Add MI300 support

From: Borislav Petkov
Date: Mon Jan 29 2024 - 04:28:27 EST


On Sun, Jan 28, 2024 at 09:59:50AM -0600, Yazen Ghannam wrote:
> From: Muralidhara M K <muralidhara.mk@xxxxxxx>
>
> AMD MI300 systems include on-die HBM3 memory and a unique topology. And
> they fall under Data Fabric version 4.5 in overall design.
>
> Generally, topology information (IDs, etc.) is gathered from Data Fabric
> registers. However, the unique topology for MI300 means that some
> topology information is fixed in hardware and follows arbitrary
> mappings. Furthermore, not all hardware instances are software-visible,
> so register accesses must be adjusted.
>
> Recognize and add helper functions for the new MI300 interleave modes.
> Add lookup tables for fixed values where appropriate. Adjust how Die and
> Node IDs are found and used.
>
> Also, fix some register bitmasks that were mislabeled.
>
> Signed-off-by: Muralidhara M K <muralidhara.mk@xxxxxxx>
> Co-developed-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>

Applied, thanks.

> ---
> Notes:
> This patch is based on patches 2, 4, and 5 from the following set.
> https://lore.kernel.org/r/20231129073521.2127403-1-muralimk@xxxxxxx
>
> Patch 3 from above set is still needed for complete MI300 address
> translation support. This will be the first to follow.
>
> Patch 6 from above set is needed for expanding page retirement on MI300
> systems. This will be the second to follow.
>
> Patch 1 from above set adds MI200 support to ATL. This will be deferred
> until after priority MI300 updates.

Yap, agreed.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette