Re: [PATCH V4 2/9] cxl/mem: Read, trace, and clear events on driver load

From: Ira Weiny
Date: Wed Jan 04 2023 - 18:53:31 EST


On Sun, Dec 18, 2022 at 03:55:53PM +0000, Jonathan Cameron wrote:
> On Sun, 18 Dec 2022 08:25:34 +0800
> johnny <johnny.li@xxxxxxxxxxxxxxxx> wrote:
>

[snip]

> > >
> > > > > + }
> > > > > +
> > > > > + mbox_cmd = (struct cxl_mbox_cmd) {
> > > > > + .opcode = CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> > > > > + .payload_in = &payload,
> > > > > + .size_in = pl_size,
> > > >
> > > > This payload size should be whatever we need to store the records,
> > > > not the max size possible. Particularly as that size is currently
> > > > bigger than the mailbox might be.
> > >
> > > But the above check and set ensures that does not happen.
> > >
> > > >
> > > > It shouldn't fail (I think) simply because a later version of the spec might
> > > > add more to this message and things should still work, but definitely not
> > > > good practice to tell the hardware this is much longer than it actually is.
> > >
> > > I don't follow.
> > >
> > > The full payload is going to be sent even if we are just clearing 1 record
> > > which is inefficient but it should never overflow the hardware because it is
> > > limited by the check above.
> > >
> > > So why would this be a problem?
> > >
> >
> > per spec3.0, Event Record Handles field is "A list of Event Record Handles the
> > host has consumed and the device shall now remove from its internal Event Log
> > store.". Extra unused handle list does not folow above description. And also
> > spec mentions "All event record handles shall be nonzero value. A value of 0
> > shall be treated by the device as an invalid handle.". So if there is value 0
> > in extra unused handles, device shall return invalid handle error code
>
> I don't think we call into that particular corner as the number of event
> record handles is set correctly. Otherwise I agree this isn't following the
> spec - though I think key here is that it won't be broken against CXL 3.0 devices
> (with that rather roundabout argument that a CXL 3.0 devices should handle later
> spec messages as those should be backwards compatible) but it might be broken
> against CXL 3.0+ ones that interpret the 0s at the end as having meaning.

I'm respining this to add the pci_set_master() anyway. So I'm going to change
this as well. I really don't see how hardware would go off anything but the
number of records to process the handles I could see some overly strict
firmware wanting to validate the size being exactly equal to the number
specified rather than just less than (which is what I would anticipate an issue
with).

Dan has agreed to land the movement of the trace point definition to
drivers/cxl patch I need to cxl/next. After that I will rebase and send out.

Ira

>
> Thanks,
>
> Jonathan
>