Re: [PATCH V2 02/11] cxl/mem: Implement Get Event Records command

From: Ira Weiny
Date: Fri Dec 02 2022 - 16:47:30 EST


On Thu, Dec 01, 2022 at 05:39:12PM -0800, Dan Williams wrote:
> ira.weiny@ wrote:
> > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> >

[snip]

> >
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/cxl.h>
> > +
> > #include "core.h"
> >
> > static bool cxl_raw_allow_all;
> > @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> > CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
> > #endif
> > CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> > + CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> > CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> > CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> > CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
>
> Similar to this patch:
>
> https://lore.kernel.org/linux-cxl/166993221008.1995348.11651567302609703175.stgit@xxxxxxxxxxxxxxxxxxxxxxxxx/
>
> CXL_MEM_COMMAND_ID_GET_EVENT_RECORD, should be added to the "always
> kernel" / cxlds->exclusive_cmds mask.

Done for all the commands. I'll rebase as well before sending this out.

>
> > @@ -704,6 +708,106 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> > }
> > EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> >
> > +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > + enum cxl_event_log_type type)
> > +{
> > + struct cxl_get_event_payload *payload;
> > + u16 nr_rec;
> > +
> > + mutex_lock(&cxlds->event_buf_lock);
> > +
> > + payload = cxlds->event_buf;
> > +
> > + do {
> > + u8 log_type = type;
> > + int rc;
> > +
> > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > + &log_type, sizeof(log_type),
> > + payload, cxlds->payload_size);
> > + if (rc) {
> > + dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> > + cxl_event_log_type_str(type), rc);
> > + goto unlock_buffer;
> > + }
> > +
> > + nr_rec = le16_to_cpu(payload->record_count);
> > + if (trace_cxl_generic_event_enabled()) {
>
> This feels like a premature micro-optimization as none of this code is
> fast path and it is dwarfed by the cost of executing the mailbox
> command. I started with trying to reduce the 80 column collision
> pressure, but then stepped back even further and asked, why?

Because Steven told me to. :-( I should have been smarter than that.

>
> > + int i;
> > +
> > + for (i = 0; i < nr_rec; i++)
> > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > + type,
> > + &payload->records[i]);
>
> As far as I can tell trace_cxl_generic_event() always expects a
> device-name as its first argument. So why not enforce that with
> type-safety? I.e. I think trace_cxl_generic_event() should take a
> "struct device *", not a string unless it is really the case that any
> old string will do as the first argument to the trace event. Otherwise
> the trace point can do "__string(dev_name, dev_name(dev))", and mandate
> that callers pass devices.

>From a trace point view 'any old string' will do. There was nothing else the
trace needed from struct device so I skipped it.

[snip]

> > +
> > +/**
> > + * cxl_mem_get_event_records - Get Event Records from the device
> > + * @cxlds: The device data for the operation
> > + *
> > + * Retrieve all event records available on the device and report them as trace
> > + * events.
> > + *
> > + * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> > + */
> > +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> > +{
> > + u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> > +
> > + dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
> > +
> > + if (!cxlds->event_buf) {
> > + cxlds->event_buf = alloc_event_buf(cxlds);
> > + if (WARN_ON_ONCE(!cxlds->event_buf))
> > + return;
> > + }
>
> What's the point of having an event_buf_lock if event_buf is reallocated
> every call?

This is only called on start up.

>
> Just allocate event_buf once at the beginning of time during init if the
> device supports event log retrieval, and fail the driver load if that
> allocation fails. No runtime WARN() for memory allocation.

It was. I'll make that more clear in the next series.

>
> I notice this patch does not clear events, I trust that comes later in
> the series, but I think it belongs here to make this patch a complete
> standalone thought.

Squashed. But it does make for a large patch. Which I'm not a fan of for
review. Lucky that now we have a lot of review on the parts.

>
> > + if (status & CXLDEV_EVENT_STATUS_INFO)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> > + if (status & CXLDEV_EVENT_STATUS_WARN)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> > + if (status & CXLDEV_EVENT_STATUS_FAIL)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> > + if (status & CXLDEV_EVENT_STATUS_FATAL)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
>
> This retrieval order should be flipped. If there is a FATAL pending I
> expect a monitor wants that first and would be happy to parse the INFO
> later. I would go so far as to say that if the INFO logger is looping
> and a FATAL comes in the driver should get that out first before going
> back for more INFO logs. That would mean executing Clear Events and
> looping through the logs by priority until all the status bits fall
> silent inside cxl_mem_get_records_log().

I'll flip them. And determine if this is really what we want to do for the
irq.

The issue with the irq handling calling a single function which checks all
status is that we may end up with some odd interrupts doing nothing depending
on racing etc.

A buffer per log would eliminate this to some extent if the message numbers are
not shared. I don't doubt that vendors are unlikely to burn more than 1
message number so the irq may indeed be shared and serialized anyway.

For simplicity I'll throw them all together.

>
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> > +
> > /**
> > * cxl_mem_get_partition_info - Get partition info
> > * @cxlds: The device data for the operation
> > @@ -846,6 +950,7 @@ struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> > }
> >
> > mutex_init(&cxlds->mbox_mutex);
> > + mutex_init(&cxlds->event_buf_lock);
> > cxlds->dev = dev;
> >
> > return cxlds;
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index f680450f0b16..d4baae74cd97 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -132,6 +132,13 @@ static inline int ways_to_cxl(unsigned int ways, u8 *iw)
> > #define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
> > #define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
> >
> > +/* CXL 3.0 8.2.8.3.1 Event Status Register */
> > +#define CXLDEV_DEV_EVENT_STATUS_OFFSET 0x00
> > +#define CXLDEV_EVENT_STATUS_INFO BIT(0)
> > +#define CXLDEV_EVENT_STATUS_WARN BIT(1)
> > +#define CXLDEV_EVENT_STATUS_FAIL BIT(2)
> > +#define CXLDEV_EVENT_STATUS_FATAL BIT(3)
> > +
> > /* CXL 2.0 8.2.8.4 Mailbox Registers */
> > #define CXLDEV_MBOX_CAPS_OFFSET 0x00
> > #define CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index cd35f43fedd4..55d57f5a64bc 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -4,6 +4,7 @@
> > #define __CXL_MEM_H__
> > #include <uapi/linux/cxl_mem.h>
> > #include <linux/cdev.h>
> > +#include <linux/uuid.h>
> > #include "cxl.h"
> >
> > /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> > @@ -250,12 +251,16 @@ struct cxl_dev_state {
> >
> > bool msi_enabled;
> >
> > + struct cxl_get_event_payload *event_buf;
> > + struct mutex event_buf_lock;
> > +
>
> Missing kdoc.

Got it from Jonathan.

>
> > int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> > };
> >
> > enum cxl_opcode {
> > CXL_MBOX_OP_INVALID = 0x0000,
> > CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
> > + CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
> > CXL_MBOX_OP_GET_FW_INFO = 0x0200,
> > CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
> > CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
> > @@ -325,6 +330,72 @@ struct cxl_mbox_identify {
> > u8 qos_telemetry_caps;
> > } __packed;
> >
> > +/*
> > + * Common Event Record Format
> > + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> > + */
> > +struct cxl_event_record_hdr {
> > + uuid_t id;
> > + u8 length;
> > + u8 flags[3];
> > + __le16 handle;
> > + __le16 related_handle;
> > + __le64 timestamp;
> > + u8 maint_op_class;
> > + u8 reserved[0xf];
>
> Nit, but lets not copy the spec here and just make all the field sizes
> decimal. It even saves a character to write 15 instead of 0xf, and @flags
> is also decimal.

Ok.

>
> > +} __packed;
> > +
> > +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> > +struct cxl_event_record_raw {
> > + struct cxl_event_record_hdr hdr;
> > + u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
> > +} __packed;
> > +
> > +/*
> > + * Get Event Records output payload
> > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> > + */
> > +#define CXL_GET_EVENT_FLAG_OVERFLOW BIT(0)
> > +#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
> > +struct cxl_get_event_payload {
> > + u8 flags;
> > + u8 reserved1;
> > + __le16 overflow_err_count;
> > + __le64 first_overflow_timestamp;
> > + __le64 last_overflow_timestamp;
> > + __le16 record_count;
> > + u8 reserved2[0xa];
>
> Same nit.

Done.

[snip]

> > +
> > +/* This part must be outside protection */
> > +#undef TRACE_INCLUDE_FILE
> > +#define TRACE_INCLUDE_FILE cxl
> > +#include <trace/define_trace.h>
> > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > index c71021a2a9ed..70459be5bdd4 100644
> > --- a/include/uapi/linux/cxl_mem.h
> > +++ b/include/uapi/linux/cxl_mem.h
> > @@ -24,6 +24,7 @@
> > ___C(IDENTIFY, "Identify Command"), \
> > ___C(RAW, "Raw device command"), \
> > ___C(GET_SUPPORTED_LOGS, "Get Supported Logs"), \
> > + ___C(GET_EVENT_RECORD, "Get Event Record"), \
> > ___C(GET_FW_INFO, "Get FW Info"), \
> > ___C(GET_PARTITION_INFO, "Get Partition Information"), \
> > ___C(GET_LSA, "Get Label Storage Area"), \
>
> Yikes, no, this is an enum. New commands need to come at the end
> otherwise different kernels will disagree about the command numbering.

Ooops sorry about that. I somehow thought these were tied to the command
values.

Thanks, Changed for all the commands,

> Likely needs a comment here alerting future devs about the ABI breakage
> danger here.

Additional patch added.

Ira