Re: [PATCH 1/2] cxl/pci: Add generic MSI-X/MSI irq support

From: Ira Weiny
Date: Fri Oct 21 2022 - 00:15:33 EST


On Tue, Oct 18, 2022 at 11:52:27AM +0100, Jonathan Cameron wrote:
> On Tue, 18 Oct 2022 10:36:19 +0100
> Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
>
> > On Mon, 17 Oct 2022 20:00:09 -0700
> > Davidlohr Bueso <dave@xxxxxxxxxxxx> wrote:
> >
> > > Introduce a generic irq table for CXL components/features that can have
> > > standard irq support - DOE requires dynamic vector sizing and is not
> > > considered here. For now the table is empty.
> > >
> > > Create an infrastructure to query the max vectors required for the CXL
> > > device. Upon successful allocation, users can plug in their respective isr
> > > at any point thereafter, which is supported by a new cxlds->has_irq flag,
> > > for example, if the irq setup is not done in the PCI driver, such as
> > > the case of the CXL-PMU.
> > >
> > > Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> > > Signed-off-by: Davidlohr Bueso <dave@xxxxxxxxxxxx>
> >
> > A few nitpicks inline.
> >
> > With the comment one tidied up (other one optional)
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> >
> > I'll rebase my cpmu code on top of this shortly.
> Hi Davidlohr,
>
> Doing the CPMU rebase has shown up that using this generic infrastructure
> ends up rather ugly.
>
> Previously I had a local array to manage the required register maps
> that was then freed. Now I have to move that into the cxl device state
> just so I can get at it from the irq finding callback.
>
> So I have an extra step to be able to use this generic framework.
>
> 1. Query how many CPMU devices there are. Stash that and register map
> info in cxlds. I could do this in the callback but that's really really
> horrible layering issue as most of what is done has nothing to do
> with finding the vector numbers.

FWIW I did this for the event stuff and did not find it so distasteful... :-/

However the information I am stashing in the cxlds is all interrupt
information. So I think it is different from what I see in the CPMU stuff.

> 2. The callback below to find those numbers
> 3. Registration of the cpmu devices.
>
> Reality is that it is cleaner to more or less ignore the infrastructure
> proposed in this patch.
>
> 1. Query how many CPMU devices there are. Whilst there stash the maximim
> cpmu vector number in the cxlds.
> 2. Run a stub in this infrastructure that does max(irq, cxlds->irq_num);
> 3. Carry on as before.
>
> Thus destroying the point of this infrastructure for that usecase at least
> and leaving an extra bit of state in the cxl_dev_state that is just
> to squirt a value into the callback...

I'm not sure I follow? Do you mean this?

static int cxl_cpmu_get_max_msgnum(struct cxl_dev_state *cxlds)
{
return cxlds->cpmu_max_vector;
}

>
> So with that in mind I'm withdrawing the RB above. This looks to be
> an idea that with hindsight doesn't necessarily pan out.
> Long hand equivalent with the specific handling needed for each case
> is probably going to be neater than walking a table of much more
> restricted callbacks.

I'm not married to the idea of the array of callbacks but I'm not sure how this
solves having to iterate on the CPMU devices twice?

Ira

> Maybe there is a nice way to fit the CPMU
> registration into this infrastructure, but I'm not immediately seeing it.
>
> One other note inline via a compiler warning.
>
> Jonathan
>
> >
> > Jonathan
> >
> >
> > > ---
> > > drivers/cxl/cxlmem.h | 3 ++
> > > drivers/cxl/pci.c | 72 ++++++++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 75 insertions(+)
> > >
> > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > > index 88e3a8e54b6a..72b69b003302 100644
> > > --- a/drivers/cxl/cxlmem.h
> > > +++ b/drivers/cxl/cxlmem.h
> > > @@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
> > > * @info: Cached DVSEC information about the device.
> > > * @serial: PCIe Device Serial Number
> > > * @doe_mbs: PCI DOE mailbox array
> > > + * @has_irq: PCIe MSI-X/MSI support
> > > * @mbox_send: @dev specific transport for transmitting mailbox commands
> > > *
> > > * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> > > @@ -247,6 +248,8 @@ struct cxl_dev_state {
> > >
> > > struct xarray doe_mbs;
> > >
> > > + bool has_irq;
> > > +
> > > int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> > > };
> > >
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index faeb5d9d7a7a..9c3e95ebaa26 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -428,6 +428,73 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > }
> > > }
> > >
> > > +/**
> > > + * struct cxl_irq_cap - CXL feature that is capable of receiving MSI-X/MSI irqs.
> > > + *
> > > + * @name: Name of the device/component generating this interrupt.
> > > + * @get_max_msgnum: Get the feature's largest interrupt message number. If the
> > > + * feature does not have the Interrupt Supported bit set, then
> > > + * return -1.
> > > + */
> > > +struct cxl_irq_cap {
> > > + const char *name;
> > > + int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
> > > +};
> > > +
> > > +static const struct cxl_irq_cap cxl_irq_cap_table[] = {
> > > + NULL
>
> That's not valid, just make it empty instead.
>
>
> > > +};
> > > +
> > > +static void cxl_pci_free_irq_vectors(void *data)
> > > +{
> > > + pci_free_irq_vectors(data);
> > > +}
> > > +
> > > +/*
> > > + * Attempt to allocate the largest amount of necessary vectors.
> > > + *
> > > + * Returns 0 upon a successful allocation of *all* vectors, or a
> >
> > Technically not all vectors. If we wanted to do that we could
> > just directly query that via pci_msix_vec_count() etc that gets
> > it from the MSIX capability. That's frowned upon because it's common
> > to stick lots of extra vectors on the end for stuff that linux never
> > cares about (debug etc, or optional features).
> >
> > All vectors up to the maximum one the code uses would be more accurate.
> >
> > > + * negative value otherwise.
> > > + */
> > > +static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> > > +{
> > > + struct device *dev = cxlds->dev;
> > > + struct pci_dev *pdev = to_pci_dev(dev);
> > > + int rc, i, vectors = -1;
> > > +
> > > + for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
> > > + int irq;
> > > +
> > > + if (!cxl_irq_cap_table[i].get_max_msgnum)
> > > + continue;
> > > +
> > > + irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
> > > + vectors = max_t(int, irq, vectors);
> > > + }
> > > +
> > > + /*
> > > + * Semantically lack of irq support is not an error, but we
> > > + * still fail to allocate, so return negative.
> > > + */
> > > + if (vectors == -1)
> > > + return -1;
> > > +
> > > + vectors++;
> > > + rc = pci_alloc_irq_vectors(pdev, vectors, vectors,
> > > + PCI_IRQ_MSIX | PCI_IRQ_MSI);
> > > + if (rc < 0)
> > > + return rc;
> > > +
> > > + if (rc != vectors) {
> > > + dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
> > > + /* some got allocated, clean them up */
> > > + cxl_pci_free_irq_vectors(pdev);
> > > + return -ENOSPC;
> > > + }
> > > +
> > > + return devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
> > > +}
> > > +
> > > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > > {
> > > struct cxl_register_map map;
> > > @@ -494,6 +561,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > > if (rc)
> > > return rc;
> > >
> > > + if (!cxl_pci_alloc_irq_vectors(cxlds)) {
> > > + cxlds->has_irq = true;
> > > + } else
> > > + cxlds->has_irq = false;
> > > +
> > cxlds->has_irq = !(cxl_pci_aloc_irq_vectors(cxlds) < 0);
> >
> > maybe...
> >
> > > cxlmd = devm_cxl_add_memdev(cxlds);
> > > if (IS_ERR(cxlmd))
> > > return PTR_ERR(cxlmd);
> >
>