Re: [PATCH] cxl/memdev: Avoid mailbox functionality on device memory CXL devices

From: Davidlohr Bueso
Date: Mon Jul 31 2023 - 23:31:47 EST


On Fri, 28 Jul 2023, Dan Williams wrote:

Ira Weiny wrote:
Using the proposed type-2 cxl-test device[1] the following
splat was observed:

BUG: kernel NULL pointer dereference, address: 0000000000000278
[...]
RIP: 0010:devm_cxl_add_memdev+0x1de/0x2c0 [cxl_core]

It would be useful to decode this to a line number, the rest of this
call trace is not adding much.

[...]
Call Trace:
<TASK>
? __die+0x1f/0x70
? page_fault_oops+0x149/0x420
? fixup_exception+0x22/0x310
? kernelmode_fixup_or_oops+0x84/0x110
? exc_page_fault+0x6d/0x150
? asm_exc_page_fault+0x22/0x30
? devm_cxl_add_memdev+0x1de/0x2c0 [cxl_core]
cxl_mock_mem_probe+0x632/0x870 [cxl_mock_mem]
platform_probe+0x40/0x90
really_probe+0x19e/0x3e0
? __pfx___driver_attach+0x10/0x10
__driver_probe_device+0x78/0x160
driver_probe_device+0x1f/0x90
__driver_attach+0xce/0x1c0
bus_for_each_dev+0x63/0xa0
bus_add_driver+0x112/0x210
driver_register+0x55/0x100
? __pfx_cxl_mock_mem_driver_init+0x10/0x10 [cxl_mock_mem]
[...]

Commit f6b8ab32e3ec made the mailbox functionality optional. However,
some mailbox functionality was merged after that patch. Therefore some
mailbox functionality can be accessed on a device which did not set up
the mailbox.

cxl_memdev_security_init() definitely needs to move out of
devm_cxl_add_memdev() and after that I do not think @mds NULL checks
need to be sprinkled everywhere. In other words something is wrong at a
higher level if we get into some of these helper functions without the
memory device state.

Right, so we can move it directly into cxl_pci_probe() - just as with other
mbox based functionality. This leaves me wondering, however, what to do about
the cxl_memdev_security_shutdown() counterpart. As with the below diff, leaving
it as is and just adding a mds nil check might still be considering a layering
violation in that it would be asymmetrical wrt to the init; but this is tightly
coupled with cxl_memdev_unregister().

Ira does the below fix the crash?

Thanks,
Davidlohr

----8<-------
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 14b547c07f54..4d1bf80c0e54 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -561,7 +561,7 @@ static void cxl_memdev_security_shutdown(struct device *dev)
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
- if (mds->security.poll)
+ if (mds && mds->security.poll)
cancel_delayed_work_sync(&mds->security.poll_dwork);
}
@@ -1009,11 +1009,11 @@ static void put_sanitize(void *data)
sysfs_put(mds->security.sanitize_node);
}
-static int cxl_memdev_security_init(struct cxl_memdev *cxlmd)
+int cxl_memdev_security_state_init(struct cxl_memdev_state *mds)
{
- struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
- struct device *dev = &cxlmd->dev;
+
+ struct cxl_dev_state *cxlds = &mds->cxlds;
+ struct device *dev = &cxlds->cxlmd->dev;
struct kernfs_node *sec;
sec = sysfs_get_dirent(dev->kobj.sd, "security");
@@ -1029,7 +1029,8 @@ static int cxl_memdev_security_init(struct cxl_memdev *cxlmd)
}
return devm_add_action_or_reset(cxlds->dev, put_sanitize, mds);
- }
+}
+EXPORT_SYMBOL_NS_GPL(cxl_memdev_security_state_init, CXL);
struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
{
@@ -1059,10 +1060,6 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
if (rc)
goto err;
- rc = cxl_memdev_security_init(cxlmd);
- if (rc)
- goto err;
-
rc = devm_add_action_or_reset(cxlds->dev, cxl_memdev_unregister, cxlmd);
if (rc)
return ERR_PTR(rc);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index f86afef90c91..441270770519 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -884,6 +884,7 @@ static inline void cxl_mem_active_dec(void)
#endif
int cxl_mem_sanitize(struct cxl_memdev_state *mds, u16 cmd);
+int cxl_memdev_security_state_init(struct cxl_memdev_state *mds);
struct cxl_hdm {
struct cxl_component_regs regs;
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 1cb1494c28fe..5242dbf0044d 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -887,6 +887,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (IS_ERR(cxlmd))
return PTR_ERR(cxlmd);
+ rc = cxl_memdev_security_state_init(mds);
+ if (rc)
+ return rc;
+
rc = cxl_memdev_setup_fw_upload(mds);
if (rc)
return rc;