RE: [PATCH] cxl/pci: Fix disabling CXL memory for zero-based addressing

From: Dan Williams
Date: Fri Feb 09 2024 - 18:44:28 EST


Dan Williams wrote:
> Robert Richter wrote:
> > Based on CPU implementation and architecture, the CXL memory address
> > decode per memory channel can be implemented as zero based address for
> > addressing the CXL attached memory. In such case, the CXL host
> > physical address may not match the system address. The CFMWS contains
> > CXL ranges that are based on the system address range for the host
> > physical address and may not match with the CXL decoders.
> >
> > During HDM decoder setup, the DVSEC CXL range registers (cxl-3.1,
> > 8.1.3.8) are checked if the memory is enabled and the CXL range is in
> > an HPA window that is described in a CFMWS structure of the CXL host
> > bridge (cxl-3.1, 9.18.1.3).
> >
> > Now, if the range registers are programmed with zero-based addresses,
> > the ranges do not match the CFMWS windows and the CXL memory range
> > will be disabled. The HDM decoder stops working then which causes
> > system memory being disabled and further a kernel hang during HDM
> > decoder initialization, typically when a CXL enabled kernel boots.
> >
> > If the decoder is programmed with a zero-based hardware address and
> > the range is enabled, the CXL memory range is then in use by the
> > system.
> >
> > Fix a kernel hang due to disabling of CXL memory during HDM decoder
> > initialization by adding a check for zero-based address ranges, mark
> > such ranges as used which prevents the CXL memory from being disabled.
> >
> > Note this patch only fixes HDM initialization for zero-based address
> > ranges and a kernel hang this may cause. Decoder setup still does not
> > enable the HPA ranges for zero-based address ranges, the HDM decoder
> > cannot be added then and the kernel shows a message like the
> > following:
> >
> > cxl decoder1.0: failed to add to region: 0x0-0x3ffffffff
> >
> > However, support for this can be added in a later series.
> >
> > Fix for stable, please add stable tag.
> >
> > Fixes: 9de321e93c3b ("cxl/pci: Refactor cxl_hdm_decode_init()")
> > Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
> > Signed-off-by: Robert Richter <rrichter@xxxxxxx>
> > ---
> > drivers/cxl/core/pci.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > index 569354a5536f..3a36a2f0c94f 100644
> > --- a/drivers/cxl/core/pci.c
> > +++ b/drivers/cxl/core/pci.c
> > @@ -466,6 +466,18 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
> > for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
> > struct device *cxld_dev;
> >
> > + /*
> > + * Handle zero-based hardware addresses
> > + */
> > + if (!info->dvsec_range[i].start &&
> > + info->dvsec_range[i].end != CXL_RESOURCE_NONE &&
> > + info->dvsec_range[i].end) {
> > + dev_dbg(dev, "Zero-based hardware range found [%#llx - %#llx]\n",
> > + info->dvsec_range[i].start, info->dvsec_range[i].end);
> > + allowed++;
> > + continue;
> > + }
> > +
>
> I am not comfortable with this. It should be checking a platform
> specific quirk, or similar for the possibility of HPA != SPA. The
> entirety of the Linux CXL subsystem is built on the assumption that HPA
> == SPA, and if a platform wants to inject an offset between those Linux
> needs some way to enumerate that it is running in that new world. Yes,
> nothing in the CXL specification precludes HPA != SPA, but Linux has
> long since shipped the opposite assumption.

In other words, this nothing to do with "zero-based addressing", this is
about a missing platform translation that the CXL subsystem needs to
apply to the values it reads from hardware to correctly map them into
the HPA == SPA expectations of the Linux implementation. Something like
this (untested, not even compiled) where cxl_acpi picks up a quirk from
somewhere to identify the platform and replace default_cxl_xlat_hpa()
with what is needed:

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index dcf2b39e1048..8543c9230484 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -312,8 +312,18 @@ static int cxl_acpi_qos_class(struct cxl_root *cxl_root,
return cxl_acpi_evaluate_qtg_dsm(handle, coord, entries, qos_class);
}

+static resource_size_t default_xlat_hpa(struct pci_dev *pdev, resource_size_t dev_hpa)
+{
+ /*
+ * default expectation is that device HPA values match host
+ * physical address resource values
+ */
+ return hpa;
+}
+
static const struct cxl_root_ops acpi_root_ops = {
.qos_class = cxl_acpi_qos_class,
+ .xlat_hpa = default_cxl_xlat_hpa,
};

static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 6c9c8d92f8f7..a945a0eccd6a 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -322,7 +322,7 @@ static int devm_cxl_enable_hdm(struct device *host, struct cxl_hdm *cxlhdm)
return devm_add_action_or_reset(host, disable_hdm, cxlhdm);
}

-int cxl_dvsec_rr_decode(struct device *dev, int d,
+int cxl_dvsec_rr_decode(struct cxl_root *cxl_root, struct device *dev, int d,
struct cxl_endpoint_dvsec_info *info)
{
struct pci_dev *pdev = to_pci_dev(dev);
@@ -411,6 +411,7 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,

base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;

+ base = cxl_root->ops->xlat_hpa(pdev, base);
info->dvsec_range[i] = (struct range) {
.start = base,
.end = base + size - 1
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 97c21566677a..83fe7018d243 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -91,6 +91,7 @@ static int cxl_switch_port_probe(struct cxl_port *port)

static int cxl_endpoint_port_probe(struct cxl_port *port)
{
+ struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port);
struct cxl_endpoint_dvsec_info info = { .port = port };
struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
@@ -98,7 +99,7 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
struct cxl_port *root;
int rc;

- rc = cxl_dvsec_rr_decode(cxlds->dev, cxlds->cxl_dvsec, &info);
+ rc = cxl_dvsec_rr_decode(cxl_root, cxlds->dev, cxlds->cxl_dvsec, &info);
if (rc < 0)
return rc;