Re: [PATCH 1/2] cxl/pci: Add generic MSI-X/MSI irq support

From: Dan Williams
Date: Sun Oct 23 2022 - 22:09:11 EST


Ira Weiny wrote:
> On Sat, Oct 22, 2022 at 03:05:45PM -0700, Dan Williams wrote:
> > Davidlohr Bueso wrote:
> > > Introduce a generic irq table for CXL components/features that can have
> > > standard irq support - DOE requires dynamic vector sizing and is not
> > > considered here. For now the table is empty.
> > >
> > > Create an infrastructure to query the max vectors required for the CXL
> > > device. Upon successful allocation, users can plug in their respective isr
> > > at any point thereafter, which is supported by a new cxlds->has_irq flag,
> > > for example, if the irq setup is not done in the PCI driver, such as
> > > the case of the CXL-PMU.
> > >
> > > Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> > > Signed-off-by: Davidlohr Bueso <dave@xxxxxxxxxxxx>
> > > ---
> > > drivers/cxl/cxlmem.h | 3 ++
> > > drivers/cxl/pci.c | 72 ++++++++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 75 insertions(+)
> > >
> > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > > index 88e3a8e54b6a..72b69b003302 100644
> > > --- a/drivers/cxl/cxlmem.h
> > > +++ b/drivers/cxl/cxlmem.h
> > > @@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
> > > * @info: Cached DVSEC information about the device.
> > > * @serial: PCIe Device Serial Number
> > > * @doe_mbs: PCI DOE mailbox array
> > > + * @has_irq: PCIe MSI-X/MSI support
> > > * @mbox_send: @dev specific transport for transmitting mailbox commands
> > > *
> > > * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> > > @@ -247,6 +248,8 @@ struct cxl_dev_state {
> > >
> > > struct xarray doe_mbs;
> > >
> > > + bool has_irq;
> > > +
> > > int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> > > };
> > >
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index faeb5d9d7a7a..9c3e95ebaa26 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -428,6 +428,73 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > }
> > > }
> > >
> > > +/**
> > > + * struct cxl_irq_cap - CXL feature that is capable of receiving MSI-X/MSI irqs.
> > > + *
> > > + * @name: Name of the device/component generating this interrupt.
> > > + * @get_max_msgnum: Get the feature's largest interrupt message number. If the
> > > + * feature does not have the Interrupt Supported bit set, then
> > > + * return -1.
> > > + */
> > > +struct cxl_irq_cap {
> > > + const char *name;
> > > + int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
> >
> > Why is this a callback, why not just have the features populate their
> > irq numbers?
>
> I think we have decided to forgo the callback but I'm not sure what you mean by
> 'populate their irq numbers'?
>
> >
> > > +};
> > > +
> > > +static const struct cxl_irq_cap cxl_irq_cap_table[] = {
> > > + NULL
> > > +};
> > > +
> > > +static void cxl_pci_free_irq_vectors(void *data)
> > > +{
> > > + pci_free_irq_vectors(data);
> > > +}
> > > +
> > > +/*
> > > + * Attempt to allocate the largest amount of necessary vectors.
> > > + *
> > > + * Returns 0 upon a successful allocation of *all* vectors, or a
> > > + * negative value otherwise.
> > > + */
> > > +static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> > > +{
> > > + struct device *dev = cxlds->dev;
> > > + struct pci_dev *pdev = to_pci_dev(dev);
> > > + int rc, i, vectors = -1;
> > > +
> > > + for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
> > > + int irq;
> > > +
> > > + if (!cxl_irq_cap_table[i].get_max_msgnum)
> > > + continue;
> > > +
> > > + irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
> > > + vectors = max_t(int, irq, vectors);
> > > + }
> >
> > Forgive me if I have missed something, I only look at interrupt enable
> > code once every few years, and the APIs are always a bit different, but
> > is this not too early to read the message number? The number is not
> > stable until either MSI or MSI-X has been selected below at
> > pci_alloc_irq_vectors() time?
>
> Well I keep getting wrapped around the axle on this one too.
>
> This all started back when Jonathan originally attempted to allocate the
> maximum number of vectors a device _could_ allocate. But it was recommended that
> we determine the max number first then allocate that number.
>
> This seems like a chicken and egg issue. How is the number not stable before
> calling pci_alloc_irq_vectors() when you need the max msg number in that call?

Are we talking about the same thing? I am talking about the value in the
"Interrupt Message Number" field. That depends on whether MSI or MSI-X
gets enabled. The number of vectors the device can support is static.

Since CXL is such an a la carte spec I think this is situation to just
specify a large number of amx vectors to pci_alloc_irq_vectors() and
then find out after the fact if all of the interrupt generators that
today's cxl_pci knows about in the device each got their own vector.