Re: [PATCH 2/2] x86/MCE/AMD Support new memory interleaving schemes during address translation

From: Ingo Molnar
Date: Sat Aug 15 2020 - 18:06:48 EST



* Yazen Ghannam <Yazen.Ghannam@xxxxxxx> wrote:

> + /* Read D18F1x208 (System Fabric ID Mask 0). */
> + if (amd_df_indirect_read(nid, 1, 0x208, umc, &tmp))
> + goto out_err;
> +
> + /* Determine if system is a legacy Data Fabric type. */
> + legacy_df = !(tmp & 0xFF);

1)

I see this pattern in a lot of places in the code, first the magic
constant 0x208 is explained a comment, then it is *repeated* and used
it in the code...

How about introducing an obviously named enum for it instead, which
would then be self-documenting, saving the comment and removing magic
numbers:

if (amd_df_indirect_read(nid, 1, AMD_REG_FAB_ID, umc, &reg_fab_id))
goto out_err;

(The symbolic name should be something better, I just guessed
something quickly.)

Please clean this up in a separate patch, not part of the already
large patch that introduces a new feature.

2)

'tmp & 0xFF' is some sort of fabric version ID value, with a value of
0 denoting legacy (pre-Rome) systems, right?

How about making that explicit:

df_version = reg_fab_id & 0xFF;

I'm pretty sure such a version ID might come handy later on, should
there be quirks or new capabilities with the newer systems ...


> ret_addr -= hi_addr_offset;
> @@ -728,23 +740,31 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
> }
>
> lgcy_mmio_hole_en = tmp & BIT(1);
> - intlv_num_chan = (tmp >> 4) & 0xF;
> - intlv_addr_sel = (tmp >> 8) & 0x7;
> - dram_base_addr = (tmp & GENMASK_ULL(31, 12)) << 16;
>
> - /* {0, 1, 2, 3} map to address bits {8, 9, 10, 11} respectively */
> - if (intlv_addr_sel > 3) {
> - pr_err("%s: Invalid interleave address select %d.\n",
> - __func__, intlv_addr_sel);
> - goto out_err;
> + if (legacy_df) {
> + intlv_num_chan = (tmp >> 4) & 0xF;
> + intlv_addr_sel = (tmp >> 8) & 0x7;
> + } else {
> + intlv_num_chan = (tmp >> 2) & 0xF;
> + intlv_num_dies = (tmp >> 6) & 0x3;
> + intlv_num_sockets = (tmp >> 8) & 0x1;
> + intlv_addr_sel = (tmp >> 9) & 0x7;
> }
>
> + dram_base_addr = (tmp & GENMASK_ULL(31, 12)) << 16;
> +
> /* Read D18F0x114 (DramLimitAddress). */
> if (amd_df_indirect_read(nid, 0, 0x114 + (8 * base), umc, &tmp))
> goto out_err;
>
> - intlv_num_sockets = (tmp >> 8) & 0x1;
> - intlv_num_dies = (tmp >> 10) & 0x3;
> + if (legacy_df) {
> + intlv_num_sockets = (tmp >> 8) & 0x1;
> + intlv_num_dies = (tmp >> 10) & 0x3;
> + dst_fabric_id = tmp & 0xFF;
> + } else {
> + dst_fabric_id = tmp & 0x3FF;
> + }
> +
> dram_limit_addr = ((tmp & GENMASK_ULL(31, 12)) << 16) | GENMASK_ULL(27, 0);

Could we please structure this code in a bit more readable fashion?

1)

Such as not using the meaningless 'tmp' variable name to first read
out DramOffset, then DramLimitAddress?

How about naming them a bit more obviously, and retrieving them in a
single step:

if (amd_df_indirect_read(nid, 0, 0x1B4, umc, &reg_dram_off))
goto out_err;

/* Remove HiAddrOffset from normalized address, if enabled: */
if (reg_dram_off & BIT(0)) {
u64 hi_addr_offset = (tmp & GENMASK_ULL(31, 20)) << 8;

if (norm_addr >= hi_addr_offset) {
ret_addr -= hi_addr_offset;
base = 1;
}
}

if (amd_df_indirect_read(nid, 0, 0x114 + (8 * base), umc, &reg_dram_lim))
goto out_err;

('reg' stands for register value - but 'val' would work too.)

Side note: why is the above code using BIT() and GENMASK_UUL() when
all the other and new code is using fixed masks? Use one of these
versions instead of a weird mix ...

2)

Then all the fabric version dependent logic could be consolidated
instead of being spread out:

if (df_version) {
intlv_num_chan = (reg_dram_off >> 2) & 0xF;
intlv_num_dies = (reg_dram_off >> 6) & 0x3;
intlv_num_sockets = (reg_dram_off >> 8) & 0x1;
intlv_addr_sel = (reg_dram_off >> 9) & 0x7;

dst_fabric_id = (reg_dram_lim >> 0) & 0x3FF;
} else {
intlv_num_chan = (reg_dram_off >> 4) & 0xF;
intlv_num_dies = (reg_dram_lim >> 10) & 0x3;
intlv_num_sockets = (reg_dram_lim >> 8) & 0x1;
intlv_addr_sel = (reg_dram_off >> 8) & 0x7;

dst_fabric_id = (reg_dram_lim >> 0) & 0xFF;
}

Also note a couple of more formatting & ordering edits I did to the
code, to improve the structure. My copy & paste job is untested
though.

3)

Notably, note how the new code on current systems is the first branch
- that's the most interesting code most of the time anyaway, legacy
systems being legacy.

BTW., please do any such suggested code flow and structure clean-up
patches first in the series, then introduce the new logic, to make it
easier to verify.

Thanks,

Ingo