Re: [PATCH 1/2] platform/x86/amd: Introduce AMD Address Translation Library

From: Yazen Ghannam
Date: Tue Aug 08 2023 - 12:07:56 EST


On 8/8/2023 10:37 AM, Borislav Petkov wrote:
On Tue, Aug 08, 2023 at 10:28:51AM -0400, Yazen Ghannam wrote:
Because this isn't intended to be only for MCA errors. The translation code
is related to the AMD Data Fabric. And it'll be a common back-end for memory
errors coming from MCA and CXL.

But EDAC is not only about memory errors. Why not extend this into
something which does other RAS functionality instead of doing a second
one which is more or less related?

mce_amd is already loaded on the system, why add a second module if it
can be part of the first one just the same?


I think it would be better to avoid dependencies between independent things.

For example, amd_smn_read() is mostly used in amd64_edac. EDAC was the original user of SMN accesses, and all the SMN stuff could have been included in EDAC. However, SMN is not specifically for EDAC, so it was added to amd_nb.c to be commonly available. Currently, SMN accesses are done in other modules. I don't think it would have been a good idea to force other modules or subsystems to require EDAC to be used.

This is my reasoning for a separate, independent module for the translation. EDAC is the first user of this. But there will be future code that can leverage this, like CXL, and even the MCE subsystem. And, yes, mce_amd may be already loaded, but this isn't a given. A person may want MCE and CXL support without wanting to use EDAC.

Furthermore, some things using the translation will be built-in, so the translation module will need to be built-in. And it seems unnecessary to require all of mce_amd to be built-in just for the translation part.

Strictly speaking, this all should've been drivers/ras/ from the very
beginning and all EDAC should move there but that's going to be madness
to do now.


I agree. And I don't think much of the existing things in EDAC should be moved out. But this is new code, so there's an opportunity to have it in a more appropriate place.

And, thinking on it more, this could be another example for future "common RAS" functionality. Isn't that why the CEC is in drivers/ras? It seems like things go into EDAC because it's thought of as the de facto RAS location. But why have something in EDAC if it doesn't provide EDAC functionality? Other RAS things, like AER, APEI, etc., don't live in EDAC.

Thanks,
Yazen