Re: [PATCH v3 2/6] cxl/mbox: Add GET_POISON_LIST mailbox command

From: Dan Williams
Date: Wed Dec 07 2022 - 16:40:12 EST


Alison Schofield wrote:
> On Tue, Dec 06, 2022 at 06:41:34PM -0800, Dan Williams wrote:
> > alison.schofield@ wrote:
[..]
> > > +int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> > > + struct cxl_region *cxlr)
> > > +{
> > > + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > > + const char *memdev_name = dev_name(&cxlmd->dev);
> > > + const char *pcidev_name = dev_name(cxlds->dev);
> > > + struct cxl_mbox_poison_payload_out *po;
> > > + struct cxl_mbox_poison_payload_in pi;
> > > + int nr_records = 0;
> > > + int rc;
> > > +
> > > + po = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> > > + if (!po)
> > > + return -ENOMEM;
> > > +
> > > + pi.offset = cpu_to_le64(offset);
> > > + pi.length = cpu_to_le64(len);
> > > +
> > > + rc = mutex_lock_interruptible(&cxlds->poison_list_mutex);
> >
> > So I do not know what this mutex is protecting if there is an allocation
> > per cxl_mem_get_poison() invocation. Although I suspect that's somewhat
> > wasteful. Just allocate one buffer at the beginning of time and then use
> > the lock to protect that buffer.
>
> Intent was a single lock on the device to protect the state of the
> poison list retrieval - do not allow > 1 reader. With > 1 reader
> software may not know if it retrieved the complete list.
>
> I'm not understanding about protecting a buffer, instead of protecting
> the state. Here I try to protect the state.

Ah, sorry I read cxlds->poison_list_mutex and assumed it was serializing
access to the buffer, not a state machine. I do think it would be
worthwhile to make this a self contained structure with its own kdoc to
explain what is going on, something like:

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index ab138004f644..02697d1d951c 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -193,6 +193,19 @@ struct cxl_endpoint_dvsec_info {
struct range dvsec_range[2];
};

+/**
+ * struct cxl_poison_state - summary
+ * @payload: ...
+ * @lock: ...
+ *
+ * A bit more description of why state needs to be held over successive
+ * mbox commands...
+ */
+struct cxl_poison_state {
+ void *payload;
+ struct mutex lock;
+};
+
/**
* struct cxl_dev_state - The driver device state
*
@@ -210,6 +223,7 @@ struct cxl_endpoint_dvsec_info {
* @lsa_size: Size of Label Storage Area
* (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
* @mbox_mutex: Mutex to synchronize mailbox access.
+ * @poison: Poison list retrieval and tracing
* @firmware_version: Firmware version for the memory device.
* @enabled_cmds: Hardware commands found enabled in CEL.
* @exclusive_cmds: Commands that are kernel-internal only
@@ -244,6 +258,7 @@ struct cxl_dev_state {
size_t payload_size;
size_t lsa_size;
struct mutex mbox_mutex; /* Protects device mailbox and firmware */
+ struct cxl_poison_state poison;
char firmware_version[0x10];
DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);