RE: [PATCH] cxl/pci: Skip irq features if MSI/MSI-X are not supported

From: Dan Williams
Date: Fri Jan 12 2024 - 18:17:12 EST


Ira Weiny wrote:
> CXL 3.1 Section 3.1.1 states:
>
> "A Function on a CXL device must not generate INTx messages if
> that Function participates in CXL.cache protocol or CXL.mem
> protocols."
>
> The generic CXL memory driver only supports devices which use the
> CXL.mem protocol. The current driver attempts to allocate MSI/MSI-X
> vectors in anticipation of their need for mailbox interrupts or event
> processing. However, the above requirement does not require a device to
> support interrupts, only that they use MSI/MSI-X. For example, a device
> may disable mailbox interrupts and be configured for firmware first
> event processing and function well with the driver.
>
> Rather than fail device probe if interrupts are not supported; flag that
> irqs are not enabled and avoid features which require interrupts.
> Emit messages appropriate for the situation to aid in debugging should
> device behavior be unexpected due to a failure to allocate vectors.
>
> Note that it is possible for a device to have host based event
> processing through polling. However, the driver does not support
> polling and it is not anticipated to be required. Leave that case to
> the future if such a device comes along.
>
> Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> ---
> Changes in v1:
> - [djbw: remove persistent irq boolean]
> - [djbw: Simplify error messages]
> - [Alison: spell check]
> - [iweiny: test]
> - Link to RFC: https://lore.kernel.org/r/20240108-dont-fail-irq-v1-1-4407228debd2@xxxxxxxxx
> ---
> drivers/cxl/pci.c | 29 +++++++++++++++++++----------
> 1 file changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 0155fb66b580..bd12b97bb38e 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -381,7 +381,7 @@ static int cxl_pci_mbox_send(struct cxl_memdev_state *mds,
> return rc;
> }
>
> -static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds)
> +static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
> {
> struct cxl_dev_state *cxlds = &mds->cxlds;
> const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
> @@ -443,6 +443,11 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds)
> if (!(cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ))
> return 0;
>
> + if (!irq_avail) {
> + dev_err(cxlds->dev, "Mailbox irq enabled but no interrupt vectors.\n");
> + return 0;
> + }

It is just odd for this to be dev_err() when the very next thing this
function does is:

irq = pci_irq_vector(to_pci_dev(cxlds->dev), msgnum);
if (irq < 0)
return 0;

..which is silently return if using the interrupt does not work, or
there is a problem with the device where it returns an invalid @msgnum.
Why does one device "bug" deserve a dev_err() callout and the other
doesn't?

So at a minimum the dev_err() should be a dev_dbg(), but I would just
fold the @irq_avail check with the @cap check and be silent, because
nobody will care that this device is not sending mailbox interrupts
polling will just work.

> +
> msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap);
> irq = pci_irq_vector(to_pci_dev(cxlds->dev), msgnum);
> if (irq < 0)
> @@ -587,7 +592,7 @@ static int cxl_mem_alloc_event_buf(struct cxl_memdev_state *mds)
> return devm_add_action_or_reset(mds->cxlds.dev, free_event_buf, buf);
> }
>
> -static int cxl_alloc_irq_vectors(struct pci_dev *pdev)
> +static bool cxl_alloc_irq_vectors(struct pci_dev *pdev)
> {
> int nvecs;
>
> @@ -604,9 +609,9 @@ static int cxl_alloc_irq_vectors(struct pci_dev *pdev)
> PCI_IRQ_MSIX | PCI_IRQ_MSI);
> if (nvecs < 1) {
> dev_dbg(&pdev->dev, "Failed to alloc irq vectors: %d\n", nvecs);
> - return -ENXIO;
> + return false;
> }
> - return 0;
> + return true;
> }
>
> static irqreturn_t cxl_event_thread(int irq, void *id)
> @@ -742,7 +747,7 @@ static bool cxl_event_int_is_fw(u8 setting)
> }
>
> static int cxl_event_config(struct pci_host_bridge *host_bridge,
> - struct cxl_memdev_state *mds)
> + struct cxl_memdev_state *mds, bool irq_avail)
> {
> struct cxl_event_interrupt_policy policy;
> int rc;
> @@ -754,6 +759,11 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge,
> if (!host_bridge->native_cxl_error)
> return 0;
>
> + if (!irq_avail) {
> + dev_info(mds->cxlds.dev, "No interrupt vectors, no polling, skip event processing.\n");

Not going to demand dev_dbg() here just because a device is mandated to
have event command support and it is driver policy to skip enabling it.
However, do drop the "no polling" that's implied. So I think it is:

"No interrupt support, disable event processing."