Re: [PATCH 04/14] cxl/mem: Implement polled mode mailbox

From: Dan Williams
Date: Mon Feb 01 2021 - 14:28:58 EST


On Mon, Feb 1, 2021 at 11:13 AM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
>
> On 21-02-01 12:54:00, Konrad Rzeszutek Wilk wrote:
> > > +#define cxl_doorbell_busy(cxlm) \
> > > + (cxl_read_mbox_reg32(cxlm, CXLDEV_MB_CTRL_OFFSET) & \
> > > + CXLDEV_MB_CTRL_DOORBELL)
> > > +
> > > +#define CXL_MAILBOX_TIMEOUT_US 2000
> >
> > You been using the spec for the values. Is that number also from it ?
> >
>
> Yes it is. I'll add a comment with the spec reference.
>
> > > +
> > > +enum opcode {
> > > + CXL_MBOX_OP_IDENTIFY = 0x4000,
> > > + CXL_MBOX_OP_MAX = 0x10000
> > > +};
> > > +
> > > +/**
> > > + * struct mbox_cmd - A command to be submitted to hardware.
> > > + * @opcode: (input) The command set and command submitted to hardware.
> > > + * @payload_in: (input) Pointer to the input payload.
> > > + * @payload_out: (output) Pointer to the output payload. Must be allocated by
> > > + * the caller.
> > > + * @size_in: (input) Number of bytes to load from @payload.
> > > + * @size_out: (output) Number of bytes loaded into @payload.
> > > + * @return_code: (output) Error code returned from hardware.
> > > + *
> > > + * This is the primary mechanism used to send commands to the hardware.
> > > + * All the fields except @payload_* correspond exactly to the fields described in
> > > + * Command Register section of the CXL 2.0 spec (8.2.8.4.5). @payload_in and
> > > + * @payload_out are written to, and read from the Command Payload Registers
> > > + * defined in (8.2.8.4.8).
> > > + */
> > > +struct mbox_cmd {
> > > + u16 opcode;
> > > + void *payload_in;
> > > + void *payload_out;
> >
> > On a 32-bit OS (not that we use those that more, but lets assume
> > someone really wants to), the void is 4-bytes, while on 64-bit it is
> > 8-bytes.
> >
> > `pahole` is your friend as I think there is a gap between opcode and
> > payload_in in the structure.
> >
> > > + size_t size_in;
> > > + size_t size_out;
> >
> > And those can also change depending on 32-bit/64-bit.
> >
> > > + u16 return_code;
> > > +#define CXL_MBOX_SUCCESS 0
> > > +};
> >
> > Do you want to use __packed to match with the spec?
> >
> > Ah, reading later you don't care about it.
> >
> > In that case may I recommend you move 'return_code' (or perhaps just
> > call it rc?) to be right after opcode? Less of gaps in that structure.
> >
>
> I guess I hadn't realized we're supposed to try to fully pack structs by
> default.

This is just the internal parsed context of a command, I can't imagine
packing is relevant here. pahole optimization feels premature as well.

>
> > > +
> > > +static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)
> > > +{
> > > + const int timeout = msecs_to_jiffies(CXL_MAILBOX_TIMEOUT_US);
> > > + const unsigned long start = jiffies;
> > > + unsigned long end = start;
> > > +
> > > + while (cxl_doorbell_busy(cxlm)) {
> > > + end = jiffies;
> > > +
> > > + if (time_after(end, start + timeout)) {
> > > + /* Check again in case preempted before timeout test */
> > > + if (!cxl_doorbell_busy(cxlm))
> > > + break;
> > > + return -ETIMEDOUT;
> > > + }
> > > + cpu_relax();
> > > + }
> >
> > Hm, that is not very scheduler friendly. I mean we are sitting here for
> > 2000us (2 ms) - that is quite the amount of time spinning.
> >
> > Should this perhaps be put in a workqueue?
>
> So let me first point you to the friendlier version which was shot down:
> https://lore.kernel.org/linux-cxl/20201111054356.793390-8-ben.widawsky@xxxxxxxxx/
>
> I'm not opposed to this being moved to a workqueue at some point, but I think
> that's unnecessary complexity currently. The reality is that it's expected that
> commands will finish way sooner than this or be implemented as background
> commands. I've heard a person who makes a lot of the spec decisions say, "if
> it's 2 seconds, nobody will use these things".

That said, asynchronous probe needs to be enabled for the next driver update.