RE: [PATCH] cxl/memdev: Avoid mailbox functionality on device memory CXL devices

From: Dan Williams
Date: Fri Jul 28 2023 - 19:59:06 EST


Ira Weiny wrote:
> Using the proposed type-2 cxl-test device[1] the following
> splat was observed:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000278
> [...]
> RIP: 0010:devm_cxl_add_memdev+0x1de/0x2c0 [cxl_core]

It would be useful to decode this to a line number, the rest of this
call trace is not adding much.

> [...]
> Call Trace:
> <TASK>
> ? __die+0x1f/0x70
> ? page_fault_oops+0x149/0x420
> ? fixup_exception+0x22/0x310
> ? kernelmode_fixup_or_oops+0x84/0x110
> ? exc_page_fault+0x6d/0x150
> ? asm_exc_page_fault+0x22/0x30
> ? devm_cxl_add_memdev+0x1de/0x2c0 [cxl_core]
> cxl_mock_mem_probe+0x632/0x870 [cxl_mock_mem]
> platform_probe+0x40/0x90
> really_probe+0x19e/0x3e0
> ? __pfx___driver_attach+0x10/0x10
> __driver_probe_device+0x78/0x160
> driver_probe_device+0x1f/0x90
> __driver_attach+0xce/0x1c0
> bus_for_each_dev+0x63/0xa0
> bus_add_driver+0x112/0x210
> driver_register+0x55/0x100
> ? __pfx_cxl_mock_mem_driver_init+0x10/0x10 [cxl_mock_mem]
> [...]
>
> Commit f6b8ab32e3ec made the mailbox functionality optional. However,
> some mailbox functionality was merged after that patch. Therefore some
> mailbox functionality can be accessed on a device which did not set up
> the mailbox.

cxl_memdev_security_init() definitely needs to move out of
devm_cxl_add_memdev() and after that I do not think @mds NULL checks
need to be sprinkled everywhere. In other words something is wrong at a
higher level if we get into some of these helper functions without the
memory device state.

So definitely this uncovered a problem where cxl_memdev_security_init()
needs to move, but the rest of the mds NULL checks need clear
reproduction scenarios and expect most of them are precluded higher in
the call stack.