Re: [PATCH 09/11] cxl/test: Add generic mock events

From: Ira Weiny
Date: Tue Nov 29 2022 - 13:29:57 EST


On Wed, Nov 16, 2022 at 04:00:53PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:56 -0800
> ira.weiny@xxxxxxxxx wrote:
>
> > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> >
> > Facilitate testing basic Get/Clear Event functionality by creating
> > multiple logs and generic events with made up UUID's.
> >
> > Data is completely made up with data patterns which should be easy to
> > spot in trace output.
> >
> > A single sysfs entry resets the event data and triggers collecting the
> > events for testing.
> >
> > Test traces are easy to obtain with a small script such as this:
> >
> > #!/bin/bash -x
> >
> > devices=`find /sys/devices/platform -name cxl_mem*`
> >
> > # Turn on tracing
> > echo "" > /sys/kernel/tracing/trace
> > echo 1 > /sys/kernel/tracing/events/cxl/enable
> > echo 1 > /sys/kernel/tracing/tracing_on
> >
> > # Generate fake interrupt
> > for device in $devices; do
> > echo 1 > $device/event_trigger
> > done
> >
> > # Turn off tracing and report events
> > echo 0 > /sys/kernel/tracing/tracing_on
> > cat /sys/kernel/tracing/trace
> >
> > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> Hi Ira,
>
> I don't think your mocked device is now obeying the spec
> after changes in the core code that mean it gets a larger
> request than previously.
> If it has more than 1 record and the read is for 3 it
> must return more than 1 and only set MORE_RECORDS if there
> are more than 3.

Based on our other conversations I think it is fine if this test only returns 1
record at a time. As long as it sets Event Record Count to 1 and More Event
Records appropriately. The Clear can loop through the number of handles
specified (which will be 1 in this case but it is not hard to do this).

>
> Gah. The more event records approach also suffers the
> same problem that poison list does. You have no way
> to be sure that "previous software" (which may have crashed)
> hasn't already read some. So in the core code we probably
> need to do one more read on initial boot to be sure we have
> all the records. Not sure how I spotted that for poison
> but never noticed it for these. At least for these records
> the expectation is that there won't be a huge number of them
> so reading one more time is fine - particularly as you clear
> them on that initial read so the list will get shorter.

I can throw in another round of reads on start up. Actually to be 100% sure we
get all the events cleared the initial read of events needs to be _after_ the
irq setup. Because irqs are only triggered when the event logs go from zero
to non-zero events. :-(

Thanks for making me think though this more.
Ira

>
> Jonathan
>
> >
> > ---
> > Changes from RFC v2:
> > Adjust to simulate the event status register
> >
> > Changes from RFC:
> > Separate out the event code
> > Adjust for struct changes.
> > Clean up devm_cxl_mock_event_logs()
> > Clean up naming and comments
> > Jonathan
> > Remove dynamic allocation of event logs
> > Clean up comment
> > Remove unneeded xarray
> > Ensure event_trigger sysfs is valid prior to the driver
> > going active.
> > Dan
> > Remove the fill/reset event sysfs as these operations
> > can be done together
> > ---
> > drivers/cxl/core/mbox.c | 31 +++--
> > drivers/cxl/cxlmem.h | 1 +
> > tools/testing/cxl/test/Kbuild | 2 +-
> > tools/testing/cxl/test/events.c | 222 ++++++++++++++++++++++++++++++++
> > tools/testing/cxl/test/events.h | 9 ++
> > tools/testing/cxl/test/mem.c | 35 ++++-
> > 6 files changed, 286 insertions(+), 14 deletions(-)
> > create mode 100644 tools/testing/cxl/test/events.c
> > create mode 100644 tools/testing/cxl/test/events.h
>
>
> > diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
> > new file mode 100644
> > index 000000000000..a4816f230bb5
> > --- /dev/null
> > +++ b/tools/testing/cxl/test/events.c
>
>
> xl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
> > +};
> > +
> > +struct mock_event_store {
> > + struct cxl_dev_state *cxlds;
> > + struct mock_event_log mock_logs[CXL_EVENT_TYPE_MAX];
> > + u32 ev_status;
> > +};
> > +
> > +DEFINE_XARRAY(mock_dev_event_store);
>
> Perhaps add a comment on what this xarray is for.
> I think it's all to allow associating some extra data with the devices
> without bloating structures outside of tests?
>
> > +
> > +struct mock_event_log *find_event_log(struct device *dev, int log_type)
> > +{
> > + struct mock_event_store *mes = xa_load(&mock_dev_event_store,
> > + (unsigned long)dev);
> > +
> > + if (!mes || log_type >= CXL_EVENT_TYPE_MAX)
> > + return NULL;
> > + return &mes->mock_logs[log_type];
> > +}
> > +
>
> > +
> > +int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> > +{
> > + struct cxl_get_event_payload *pl;
> > + struct mock_event_log *log;
> > + u8 log_type;
> > +
> > + /* Valid request? */
> > + if (cmd->size_in != sizeof(log_type))
> > + return -EINVAL;
> > +
> > + log_type = *((u8 *)cmd->payload_in);
> > + if (log_type >= CXL_EVENT_TYPE_MAX)
> > + return -EINVAL;
> > +
> > + log = find_event_log(cxlds->dev, log_type);
> > + if (!log || log_empty(log))
> > + goto no_data;
> > +
> > + pl = cmd->payload_out;
> > + memset(pl, 0, sizeof(*pl));
> > +
> > + pl->record_count = cpu_to_le16(1);
>
> Not valid. Kernel now requests 3 and as I read the spec we have
> to return 3 if we have 3 or more to return. Can't send 1 and set
> MORE_RECORDS as done here.
>
> > +
> > + if (log_rec_left(log) > 1)
> > + pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
> > +
> > + memcpy(&pl->record[0], get_cur_event(log), sizeof(pl->record[0]));
> > + pl->record[0].hdr.handle = get_cur_event_handle(log);
> > + return 0;
> > +
> > +no_data:
> > + /* Room for header? */
>
> Why check for space here, but not when setting records above?
>
> > + if (cmd->size_out < (sizeof(*pl) - sizeof(pl->record[0])))
> > + return -EINVAL;
> > +
> > + memset(cmd->payload_out, 0, cmd->size_out);
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(mock_get_event);
> > +
> > +/*
> > + * Get and clear event only handle 1 record at a time as this is what is
> > + * currently implemented in the main code.
> > + */
> > +int mock_clear_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> > +{
> > + struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
> > + struct mock_event_log *log;
> > + u8 log_type = pl->event_log;
> > +
> > + /* Don't handle more than 1 record at a time */
> > + if (pl->nr_recs != 1)
> > + return -EINVAL;
> > +
> > + if (log_type >= CXL_EVENT_TYPE_MAX)
> > + return -EINVAL;
> > +
> > + log = find_event_log(cxlds->dev, log_type);
> > + if (!log)
> > + return 0; /* No mock data in this log */
> > +
> > + /*
> > + * Test code only reported 1 event at a time. So only support 1 event
> > + * being cleared.
> > + */
> > + if (log->cur_event != le16_to_cpu(pl->handle[0])) {
> > + dev_err(cxlds->dev, "Clearing events out of order\n");
> > + return -EINVAL;
> > + }
> > +
> > + log->cur_event++;
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(mock_clear_event);
>
> ...
>
> > +
> > +struct cxl_event_record_raw maint_needed = {
> > + .hdr = {
> > + .id = UUID_INIT(0xDEADBEEF, 0xCAFE, 0xBABE,
> > + 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > + .length = sizeof(struct cxl_event_record_raw),
> > + .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
> > + /* .handle = Set dynamically */
>
> Multiple devices... So this should be const and a copy made for each one
> to avoid races.
>
> > + .related_handle = cpu_to_le16(0xa5b6),
> > + },
> > + .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > +};
> > +
> > +struct cxl_event_record_raw hardware_replace = {
> > + .hdr = {
> > + .id = UUID_INIT(0xBABECAFE, 0xBEEF, 0xDEAD,
> > + 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > + .length = sizeof(struct cxl_event_record_raw),
> > + .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> > + /* .handle = Set dynamically */
> > + .related_handle = cpu_to_le16(0xb6a5),
> > + },
> > + .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > +};
> > +
> > +u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
> > +{
> > + struct device *dev = cxlds->dev;
> > + struct mock_event_store *mes;
> > +
> > + mes = devm_kzalloc(dev, sizeof(*mes), GFP_KERNEL);
> > + if (WARN_ON(!mes))
> > + return 0;
> > + mes->cxlds = cxlds;
> > +
> > + if (xa_insert(&mock_dev_event_store, (unsigned long)dev, mes,
> > + GFP_KERNEL)) {
> > + dev_err(dev, "Event store not available for %s\n",
> > + dev_name(dev));
> > + return 0;
> > + }
> > +
> > + event_store_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
> > + mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
> > +
> > + event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> > + mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL;
> > +
> > + return mes->ev_status;
> > +}
> > +EXPORT_SYMBOL_GPL(cxl_mock_add_event_logs);
> > +
> > +void cxl_mock_remove_event_logs(struct device *dev)
> > +{
> > + struct mock_event_store *mes;
> > +
> > + mes = xa_erase(&mock_dev_event_store, (unsigned long)dev);
> > +}
> > +EXPORT_SYMBOL_GPL(cxl_mock_remove_event_logs);
>
> > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > index e2f5445d24ff..333fa8527a07 100644
> > --- a/tools/testing/cxl/test/mem.c
> > +++ b/tools/testing/cxl/test/mem.c
> ...
>
>
> > static int cxl_mock_mem_probe(struct platform_device *pdev)
> > {
> > struct device *dev = &pdev->dev;
> > struct cxl_memdev *cxlmd;
> > struct cxl_dev_state *cxlds;
> > + u32 ev_status;
> > void *lsa;
> > int rc;
> >
> > @@ -281,11 +304,13 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> > if (rc)
> > return rc;
> >
> > + ev_status = cxl_mock_add_event_logs(cxlds);
> For below comment, add a devm_add_action_or_reset() here to
> undo this. If nothing else, without one you should have error
> handling...
>
> > +
> > cxlmd = devm_cxl_add_memdev(cxlds);
> > if (IS_ERR(cxlmd))
> > return PTR_ERR(cxlmd);
> >
> > - cxl_mem_get_event_records(cxlds);
> > + __cxl_mem_get_event_records(cxlds, ev_status);
> >
> > if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> > rc = devm_cxl_add_nvdimm(dev, cxlmd);
> > @@ -293,6 +318,12 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> > return 0;
> > }
> >
> > +static int cxl_mock_mem_remove(struct platform_device *pdev)
> > +{
> > + cxl_mock_remove_event_logs(&pdev->dev);
>
> Given you have a bunch of devm above, probably better to just use
> a devm_add_action_or_reset() to clean this up.
> Saves on introducing remove for just this one call + any potential
> ordering issues (I'm too lazy to check if there are any ;)
>
>
> > + return 0;
> > +}
> > +
> > static const struct platform_device_id cxl_mock_mem_ids[] = {
> > { .name = "cxl_mem", },
> > { },
> > @@ -301,9 +332,11 @@ MODULE_DEVICE_TABLE(platform, cxl_mock_mem_ids);
> >
> > static struct platform_driver cxl_mock_mem_driver = {
> > .probe = cxl_mock_mem_probe,
> > + .remove = cxl_mock_mem_remove,
> > .id_table = cxl_mock_mem_ids,
> > .driver = {
> > .name = KBUILD_MODNAME,
> > + .dev_groups = cxl_mock_event_groups,
> > },
> > };
> >
>