Re: [PATCH 2/4] acpi/ghes: Process CXL Component Events

From: Ira Weiny
Date: Fri Mar 01 2024 - 17:05:33 EST


Dan Williams wrote:
> Ira Weiny wrote:
> > BIOS can configure memory devices as firmware first. This will send CXL
> > events to the firmware instead of the OS. The firmware can then send
> > these events to the OS via UEFI. Currently a configuration such as this
> > will trace a non standard event in the log. Using the specific CXL
> > trace points with the additional information CXL can provide is much
^^^^^^^^^^^^^^^^^^^^^^^^
See below.
> > more useful to users.
> > Specifically, future support can be added to CXL
> > provide the DPA to HPA mapping configured at the time of the event.
>
> One could argue that support should have happened first and taken the
> event all the way to EDAC, so this needs to merged on faith that that
> those patches are in flight.

Sure but then you also don't get the additional information already
provided by the CXL trace points. Reading this back I see that my use of
'Specifically' masked the fact that the CXL code already has better
support to decode these records and we want to leverage that.

>
> > UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> > format for CXL Component Events. The format is mostly the same as the
> > CXL Common Event Record Format. The difference is the use of a GUID in
> > the Section Type rather than a UUID as part of the event itself.
> >
> > Add GHES support to detect CXL CPER records and call into the CXL code
> > to process the event.
> >
> > Multiple methods were considered for the call into the CXL code. A
> > notifier chain was considered for the callback but the complexity did
> > not justify the use case.
>
> Not sure what this adds. If you want to talk about alternatives and
> tradeoffs, great, but that should be a comparative analysis in support
> of the chosen direction.

I'll remove it. This was carried from the previous versions.

>
> > Furthermore, the CXL code is required to be called from process
> > context as it needs to take a device lock so a simple callback
> > register proved difficult. Dan Williams suggested using 2 work items
> > as an atomic way of switching between a callback being registered and
> > not. This allows the callback to run without any locking.[1]
> >
> > Note that a local work item is required to dump any messages seen during
> > a race between any check done in cxl_cper_post_event() and the
> > scheduling of work. That said, no attempt is made to stop the addition
> > of messages into the kfifo because this local work item provides a hook
> > to add a local CXL CPER trace point in a future patch.
> >
> > This new combined patch addresses the report by Dan Carpenter[2]. Thus
> > the reported by tag.
> >
> > [1] https://lore.kernel.org/all/65d111eb87115_6c745294ac@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch/
> > [2] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
> >
> > Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx>
> > Cc: Tony Luck <tony.luck@xxxxxxxxx>
> > Cc: Borislav Petkov <bp@xxxxxxxxx>
> > Reported-by: Dan Carpenter <dan.carpenter@xxxxxxxxxx>
>
> checkpatch will whine about a missing Closes: tag.

I know. How about Suggested-by? I want to give Dan credit because your
revert already closed the actual bug.


[snip]

> > +
> > +DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, 32);
> > +static DEFINE_SPINLOCK(cxl_cper_read_lock);
> > +static DEFINE_SPINLOCK(cxl_cper_write_lock);
> > +
> > +static cxl_cper_callback cper_callback;
> > +/* cb function dumps the records */
> > +static void cxl_cper_cb_fn(struct work_struct *work)
> > +{
> > + struct cxl_cper_work_data wd;
> > +
> > + while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1,
> > + &cxl_cper_read_lock)) {
>
> Why is this taking the lock on retrieval? The work item is
> single-threaded. The only potential contention is between
> cxl_cper_local_fn() and cxl_cper_cb_fn() collision, but that can be
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Because of this collision, Yes.

> handled by a cancel_work_sync(&cxl_local_work) on registration to pair
> with the cancel_work_sync(&cxl_cb_work) on unregistration.

Ok yea I could use cancel_work_sync().

>
> ...but I am not sure 2 work items are needed unless some default
> processing is going to happen in the "local" case.

I thought about merging patch 3 in this one and that would make the use of
this work function more clear.

>
> > + cper_callback(wd.event_type, &wd.rec);
> > + }
> > +}
> > +static DECLARE_WORK(cxl_cb_work, cxl_cper_cb_fn);
> > +
> > +static void cxl_cper_local_fn(struct work_struct *work)
> > +{
> > + struct cxl_cper_work_data wd;
> > +
> > + while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1,
> > + &cxl_cper_read_lock)) {
>
> This just looks like open coded / less efficient kfifo_reset_out().
>
> > + /* drop msg */
>
> If the proposal is to do nothing when no callback is registered then no
> need to have have cxl_local_work at all.

Patch 4 makes use of this.

>
> > + }
> > +}
> > +static DECLARE_WORK(cxl_local_work, cxl_cper_local_fn);
> > +
> > +/* Pointer for atomic switch of record processing */
> > +struct work_struct *cxl_cper_work = &cxl_local_work;
> > +
> > +static void cxl_cper_post_event(enum cxl_event_type event_type,
> > + struct cxl_cper_event_rec *rec)
> > +{
> > + struct cxl_cper_work_data wd;
> > +
> > + if (rec->hdr.length <= sizeof(rec->hdr) ||
> > + rec->hdr.length > sizeof(*rec)) {
> > + pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
> > + rec->hdr.length);
> > + return;
> > + }
> > +
> > + if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
> > + pr_err(FW_WARN "CXL CPER invalid event\n");
> > + return;
> > + }
> > +
> > + wd.event_type = event_type;
> > + memcpy(&wd.rec, rec, sizeof(wd.rec));
>
> Unfortunate to have a double copy of the record into the stack variable
> and then again into the kfifo, but I can not immediately think of a way
> around that.

Yep. I could not either.

>
> > +
> > + kfifo_in_spinlocked(&cxl_cper_fifo, &wd, 1, &cxl_cper_write_lock);
> > + schedule_work(cxl_cper_work);
>
> I think you don't need 2 work items if you write it this way:
>
> spin_lock_irqsave(&cxl_cper_write_lock, flags);
> if (cxl_cper_work) {
> if (kfifo_put(&cxl_cper_fifo, &wd))
> schedule_work(cxl_cper_work);
> else
> pr_err_ratelimited(
> "buffer overflow when queuing CXL CPER record\n");
> }
> spin_lock_irqrestore(&cxl_cper_write_lock, flags);
>
> ...and then take the write lock when modifying that cxl_cper_work
> pointer between NULL and non-NULL.

I'll merge patch 4 here. Then this will be used.

Ira