[PATCH] cxl/pci: Fix disabling CXL memory for zero-based addressing

From: Robert Richter
Date: Fri Feb 09 2024 - 14:35:40 EST


Based on CPU implementation and architecture, the CXL memory address
decode per memory channel can be implemented as zero based address for
addressing the CXL attached memory. In such case, the CXL host
physical address may not match the system address. The CFMWS contains
CXL ranges that are based on the system address range for the host
physical address and may not match with the CXL decoders.

During HDM decoder setup, the DVSEC CXL range registers (cxl-3.1,
8.1.3.8) are checked if the memory is enabled and the CXL range is in
an HPA window that is described in a CFMWS structure of the CXL host
bridge (cxl-3.1, 9.18.1.3).

Now, if the range registers are programmed with zero-based addresses,
the ranges do not match the CFMWS windows and the CXL memory range
will be disabled. The HDM decoder stops working then which causes
system memory being disabled and further a kernel hang during HDM
decoder initialization, typically when a CXL enabled kernel boots.

If the decoder is programmed with a zero-based hardware address and
the range is enabled, the CXL memory range is then in use by the
system.

Fix a kernel hang due to disabling of CXL memory during HDM decoder
initialization by adding a check for zero-based address ranges, mark
such ranges as used which prevents the CXL memory from being disabled.

Note this patch only fixes HDM initialization for zero-based address
ranges and a kernel hang this may cause. Decoder setup still does not
enable the HPA ranges for zero-based address ranges, the HDM decoder
cannot be added then and the kernel shows a message like the
following:

cxl decoder1.0: failed to add to region: 0x0-0x3ffffffff

However, support for this can be added in a later series.

Fix for stable, please add stable tag.

Fixes: 9de321e93c3b ("cxl/pci: Refactor cxl_hdm_decode_init()")
Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
Signed-off-by: Robert Richter <rrichter@xxxxxxx>
---
drivers/cxl/core/pci.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 569354a5536f..3a36a2f0c94f 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -466,6 +466,18 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
struct device *cxld_dev;

+ /*
+ * Handle zero-based hardware addresses
+ */
+ if (!info->dvsec_range[i].start &&
+ info->dvsec_range[i].end != CXL_RESOURCE_NONE &&
+ info->dvsec_range[i].end) {
+ dev_dbg(dev, "Zero-based hardware range found [%#llx - %#llx]\n",
+ info->dvsec_range[i].start, info->dvsec_range[i].end);
+ allowed++;
+ continue;
+ }
+
cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i],
dvsec_range_allowed);
if (!cxld_dev) {
--
2.39.2