Re: [PATCH v4 9/9] PCI/MSI: Introduce pci_auto_enable_msi*() familyhelpers

From: Bjorn Helgaas
Date: Tue Dec 17 2013 - 19:30:19 EST


On Mon, Dec 16, 2013 at 09:35:02AM +0100, Alexander Gordeev wrote:
> Currently many device drivers need contiguously call functions
> pci_enable_msix() for MSI-X or pci_enable_msi_block() for MSI
> in a loop until success or failure. This update generalizes
> this usage pattern and introduces pci_auto_enable_msi*() family
> helpers.

I think the idea of this series is excellent and will really make MSI/MSI-X
easier to use and less error-prone for drivers, so I don't want this to
sound discouraging. I haven't been paying attention to this in detail, so
likely some of my questions have already been hashed out and I missed the
answers.

After this patch, we would have:

pci_enable_msi() # existing (1 vector)
pci_enable_msi_block(nvec) # existing
pci_enable_msi_block_auto(maxvec) # existing (removed)

pci_auto_enable_msi(maxvec) # new (1-maxvec)
pci_auto_enable_msi_range(minvec, maxvec) # new
pci_auto_enable_msi_exact(nvec) # new (nvec-nvec)

pci_enable_msix(nvec) # existing

pci_auto_enable_msix(maxvec) # new (1-maxvec)
pci_auto_enable_msix_range(minvec, maxvec) # new
pci_auto_enable_msix_exact(nvec) # new (nvec-nvec)

That seems like a lot of interfaces to document and understand, especially
since most of them are built on each other. I'd prefer just these:

pci_enable_msi() # existing (1 vector)
pci_enable_msi_range(minvec, maxvec) # new

pci_enable_msix(nvec) # existing
pci_enable_msix_range(minvec, maxvec) # new

with examples in the documentation about how to call them with ranges like
(1, maxvec), (nvec, nvec), etc. I think that will be easier than
understanding several interfaces.

I don't think the "auto" in the names really adds anything, does it? The
whole point of supplying a range is that the core has the flexibility to
choose any number of vectors within the range.

> As result, device drivers do not have to deal with tri-state
> return values from pci_enable_msix() and pci_enable_msi_block()
> functions directly and expected to have more clearer and straight
> code.

I only see five users of pci_enable_msi_block() (nvme, ath10k, wil6210,
ipr, vfio); we can easily convert those to use pci_enable_msi_range() and
then remove pci_enable_msi_block().

pci_enable_msi() itself can simply be pci_enable_msi_range(1, 1).

There are nearly 80 callers of pci_enable_msix(), so that's a bit harder.
Can we deprecate that somehow, and incrementally convert callers to use
pci_enable_msix_range() instead? Maybe you're already planning that; I
know you dropped some driver patches from the series for now, and I didn't
look to see exactly what they did.

It would be good if pci_enable_msix() could be implemented in terms of
pci_enable_msix_range(nvec, nvec), with a little extra glue to handle the
positive return values.

> So i.e. the request loop described in the documentation...
>
> int foo_driver_enable_msix(struct foo_adapter *adapter,
> int nvec)
> {
> while (nvec >= FOO_DRIVER_MINIMUM_NVEC) {
> rc = pci_enable_msix(adapter->pdev,
> adapter->msix_entries,
> nvec);
> if (rc > 0)
> nvec = rc;
> else
> return rc;
> }
>
> return -ENOSPC;
> }

I think we should remove this example from the documentation because we
want to get rid of the tri-state return idea completely. I think the same
thing could be accomplished with pci_enable_msix_range() (correct me if I'm
wrong).

> ...would turn into a single helper call....
>
> rc = pci_auto_enable_msix_range(adapter->pdev,
> adapter->msix_entries,
> FOO_DRIVER_MINIMUM_NVEC,
> nvec);
>
> Device drivers with more specific requirements (i.e. a number of
> MSI-Xs which is a multiple of a certain number within a specified
> range) would still need to implement the loop using the two old
> functions.
>
> Signed-off-by: Alexander Gordeev <agordeev@xxxxxxxxxx>
> Suggested-by: Ben Hutchings <bhutchings@xxxxxxxxxxxxxx>
> Reviewed-by: Tejun Heo <tj@xxxxxxxxxx>
> ---
> Documentation/PCI/MSI-HOWTO.txt | 134 +++++++++++++++++++++++++++++++++++++--
> drivers/pci/msi.c | 74 +++++++++++++++++++++
> include/linux/pci.h | 57 +++++++++++++++++
> 3 files changed, 260 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
> index 7d19656..168d9c3 100644
> --- a/Documentation/PCI/MSI-HOWTO.txt
> +++ b/Documentation/PCI/MSI-HOWTO.txt
> @@ -127,7 +127,62 @@ on the number of vectors that can be allocated; pci_enable_msi_block()
> returns as soon as it finds any constraint that doesn't allow the
> call to succeed.
>
> -4.2.3 pci_disable_msi
> +4.2.3 pci_auto_enable_msi_range
> +
> +int pci_auto_enable_msi_range(struct pci_dev *dev, struct msix_entry *entries,
> + int minvec, int maxvec)
> +
> +This variation on pci_enable_msi_block() call allows a device driver to
> +request any number of MSIs within specified range 'minvec' to 'maxvec'.
> +Whenever possible device drivers are encouraged to use this function
> +rather than explicit request loop calling pci_enable_msi_block().

I think we should remove pci_enable_msi_block() completely, including its
mention here.

> +If this function returns a negative number, it indicates an error and
> +the driver should not attempt to request any more MSI interrupts for
> +this device.
> +
> +If this function returns a positive number it indicates at least the
> +returned number of MSI interrupts have been successfully allocated (it may
> +have allocated more in order to satisfy the power-of-two requirement).

I assume this means the return value may be larger than the "maxvec"
requested, right? And the driver is free to use all the vectors up to the
return value, even those above maxvec, right?

> +Device drivers can use this number to further initialize devices.
> +
> +4.2.4 pci_auto_enable_msi
> +
> +int pci_auto_enable_msi(struct pci_dev *dev,
> + struct msix_entry *entries, int maxvec)
> +
> +This variation on pci_enable_msi_block() call allows a device driver to
> +request any number of MSIs up to 'maxvec'. Whenever possible device drivers
> +are encouraged to use this function rather than explicit request loop
> +calling pci_enable_msi_block().
> +
> +If this function returns a negative number, it indicates an error and
> +the driver should not attempt to request any more MSI interrupts for
> +this device.
> +
> +If this function returns a positive number it indicates at least the
> +returned number of MSI interrupts have been successfully allocated (it may
> +have allocated more in order to satisfy the power-of-two requirement).
> +Device drivers can use this number to further initialize devices.
> +
> +4.2.5 pci_auto_enable_msi_exact
> +
> +int pci_auto_enable_msi_exact(struct pci_dev *dev,
> + struct msix_entry *entries, int nvec)
> +
> +This variation on pci_enable_msi_block() call allows a device driver to
> +request exactly 'nvec' MSIs.
> +
> +If this function returns a negative number, it indicates an error and
> +the driver should not attempt to request any more MSI interrupts for
> +this device.
> +
> +If this function returns the value of 'nvec' it indicates MSI interrupts
> +have been successfully allocated. No other value in case of success could
> +be returned. Device drivers can use this value to further allocate and
> +initialize device resources.
> +
> +4.2.6 pci_disable_msi
>
> void pci_disable_msi(struct pci_dev *dev)
>
> @@ -142,7 +197,7 @@ on any interrupt for which it previously called request_irq().
> Failure to do so results in a BUG_ON(), leaving the device with
> MSI enabled and thus leaking its vector.
>
> -4.2.4 pci_get_msi_vec_count
> +4.2.7 pci_get_msi_vec_count
>
> int pci_get_msi_vec_count(struct pci_dev *dev)
>
> @@ -222,7 +277,76 @@ static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec)
> return -ENOSPC;
> }
>
> -4.3.2 pci_disable_msix
> +4.3.2 pci_auto_enable_msix_range
> +
> +int pci_auto_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
> + int minvec, int maxvec)
> +
> +This variation on pci_enable_msix() call allows a device driver to request
> +any number of MSI-Xs within specified range 'minvec' to 'maxvec'. Whenever
> +possible device drivers are encouraged to use this function rather than
> +explicit request loop calling pci_enable_msix().

I guess maybe I'm wrong, and there *are* cases where
pci_enable_msix_range() isn't sufficient to replace pci_enable_msix()?
Can you remind me what they are?

> +If this function returns a negative number, it indicates an error and
> +the driver should not attempt to allocate any more MSI-X interrupts for
> +this device.
> +
> +If this function returns a positive number it indicates the number of
> +MSI-X interrupts that have been successfully allocated. Device drivers
> +can use this number to further allocate and initialize device resources.
> +
> +A modified function calling pci_enable_msix() in a loop might look like:

There's no loop in this example. Don't use "A modified function"; that
only makes sense during the transition from pci_enable_msix() to
pci_enable_msix_range(). "A function using pci_enable_msix_range() might
look like this:" should be sufficient.

> +static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec)
> +{
> + rc = pci_auto_enable_msix_range(adapter->pdev, adapter->msix_entries,
> + FOO_DRIVER_MINIMUM_NVEC, nvec);
> + if (rc < 0)
> + return rc;
> +
> + rc = foo_driver_init_other(adapter, rc);
> + if (rc < 0)
> + pci_disable_msix(adapter->pdev);
> +
> + return rc;
> +}
> +
> +4.3.3 pci_auto_enable_msix
> +
> +int pci_auto_enable_msix(struct pci_dev *dev,
> + struct msix_entry *entries, int maxvec)
> +
> +This variation on pci_enable_msix() call allows a device driver to request
> +any number of MSI-Xs up to 'maxvec'. Whenever possible device drivers are
> +encouraged to use this function rather than explicit request loop calling
> +pci_enable_msix().
> +
> +If this function returns a negative number, it indicates an error and
> +the driver should not attempt to allocate any more MSI-X interrupts for
> +this device.
> +
> +If this function returns a positive number it indicates the number of
> +MSI-X interrupts that have been successfully allocated. Device drivers
> +can use this number to further allocate and initialize device resources.
> +
> +4.3.4 pci_auto_enable_msix_exact
> +
> +int pci_auto_enable_msix_exact(struct pci_dev *dev,
> + struct msix_entry *entries, int nvec)
> +
> +This variation on pci_enable_msix() call allows a device driver to request
> +exactly 'nvec' MSI-Xs.
> +
> +If this function returns a negative number, it indicates an error and
> +the driver should not attempt to allocate any more MSI-X interrupts for
> +this device.
> +
> +If this function returns the value of 'nvec' it indicates MSI-X interrupts
> +have been successfully allocated. No other value in case of success could
> +be returned. Device drivers can use this value to further allocate and
> +initialize device resources.
> +
> +4.3.5 pci_disable_msix
>
> void pci_disable_msix(struct pci_dev *dev)
>
> @@ -236,14 +360,14 @@ on any interrupt for which it previously called request_irq().
> Failure to do so results in a BUG_ON(), leaving the device with
> MSI-X enabled and thus leaking its vector.
>
> -4.3.3 The MSI-X Table
> +4.3.6 The MSI-X Table
>
> The MSI-X capability specifies a BAR and offset within that BAR for the
> MSI-X Table. This address is mapped by the PCI subsystem, and should not
> be accessed directly by the device driver. If the driver wishes to
> mask or unmask an interrupt, it should call disable_irq() / enable_irq().
>
> -4.3.4 pci_get_msix_vec_count
> +4.3.7 pci_get_msix_vec_count
>
> int pci_get_msix_vec_count(struct pci_dev *dev)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 18e877f5..ccfd49b 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1093,3 +1093,77 @@ void pci_msi_init_pci_dev(struct pci_dev *dev)
> if (dev->msix_cap)
> msix_set_enable(dev, 0);
> }
> +
> +/**
> + * pci_auto_enable_msi_range - configure device's MSI capability structure
> + * @dev: device to configure
> + * @minvec: minimal number of interrupts to configure
> + * @maxvec: maximum number of interrupts to configure
> + *
> + * This function tries to allocate a maximum possible number of interrupts in a
> + * range between @minvec and @maxvec. It returns a negative errno if an error
> + * occurs. If it succeeds, it returns the actual number of interrupts allocated
> + * and updates the @dev's irq member to the lowest new interrupt number;
> + * the other interrupt numbers allocated to this device are consecutive.
> + **/
> +int pci_auto_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec)
> +{
> + int nvec = maxvec;
> + int rc;
> +
> + if (maxvec < minvec)
> + return -ERANGE;
> +
> + do {
> + rc = pci_enable_msi_block(dev, nvec);
> + if (rc < 0) {
> + return rc;
> + } else if (rc > 0) {
> + if (rc < minvec)
> + return -ENOSPC;
> + nvec = rc;
> + }
> + } while (rc);
> +
> + return nvec;
> +}
> +EXPORT_SYMBOL(pci_auto_enable_msi_range);
> +
> +/**
> + * pci_auto_enable_msix_range - configure device's MSI-X capability structure
> + * @dev: pointer to the pci_dev data structure of MSI-X device function
> + * @entries: pointer to an array of MSI-X entries
> + * @minvec: minimum number of MSI-X irqs requested
> + * @maxvec: maximum number of MSI-X irqs requested
> + *
> + * Setup the MSI-X capability structure of device function with a maximum
> + * possible number of interrupts in the range between @minvec and @maxvec
> + * upon its software driver call to request for MSI-X mode enabled on its
> + * hardware device function. It returns a negative errno if an error occurs.
> + * If it succeeds, it returns the actual number of interrupts allocated and
> + * indicates the successful configuration of MSI-X capability structure
> + * with new allocated MSI-X interrupts.
> + **/
> +int pci_auto_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
> + int minvec, int maxvec)
> +{
> + int nvec = maxvec;
> + int rc;
> +
> + if (maxvec < minvec)
> + return -ERANGE;
> +
> + do {
> + rc = pci_enable_msix(dev, entries, nvec);
> + if (rc < 0) {
> + return rc;
> + } else if (rc > 0) {
> + if (rc < minvec)
> + return -ENOSPC;
> + nvec = rc;
> + }
> + } while (rc);
> +
> + return nvec;

I think it would be better to make pci_enable_msix_range() the fundamental
implementation, with pci_enable_msix() built on top of it. That way we
could deprecate and eventually remove pci_enable_msix() and its tri-state
return values.

Bjorn

> +}
> +EXPORT_SYMBOL(pci_auto_enable_msix_range);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 7941f06..7e30b52 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1193,6 +1193,38 @@ static inline int pci_msi_enabled(void)
> {
> return 0;
> }
> +
> +int pci_auto_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec)
> +{
> + return -ENOSYS;
> +}
> +static inline int pci_auto_enable_msi(struct pci_dev *dev, int maxvec)
> +{
> + return -ENOSYS;
> +}
> +static inline int pci_auto_enable_msi_exact(struct pci_dev *dev, int nvec)
> +{
> + return -ENOSYS;
> +}
> +
> +static inline int
> +pci_auto_enable_msix_range(struct pci_dev *dev,
> + struct msix_entry *entries, int minvec, int maxvec)
> +{
> + return -ENOSYS;
> +}
> +static inline int
> +pci_auto_enable_msix(struct pci_dev *dev,
> + struct msix_entry *entries, int maxvec)
> +{
> + return -ENOSYS;
> +}
> +static inline int
> +pci_auto_enable_msix_exact(struct pci_dev *dev,
> + struct msix_entry *entries, int nvec)
> +{
> + return -ENOSYS;
> +}
> #else
> int pci_get_msi_vec_count(struct pci_dev *dev);
> int pci_enable_msi_block(struct pci_dev *dev, int nvec);
> @@ -1205,6 +1237,31 @@ void pci_disable_msix(struct pci_dev *dev);
> void msi_remove_pci_irq_vectors(struct pci_dev *dev);
> void pci_restore_msi_state(struct pci_dev *dev);
> int pci_msi_enabled(void);
> +
> +int pci_auto_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec);
> +static inline int pci_auto_enable_msi(struct pci_dev *dev, int maxvec)
> +{
> + return pci_auto_enable_msi_range(dev, 1, maxvec);
> +}
> +static inline int pci_auto_enable_msi_exact(struct pci_dev *dev, int nvec)
> +{
> + return pci_auto_enable_msi_range(dev, nvec, nvec);
> +}
> +
> +int pci_auto_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
> + int minvec, int maxvec);
> +static inline int
> +pci_auto_enable_msix(struct pci_dev *dev,
> + struct msix_entry *entries, int maxvec)
> +{
> + return pci_auto_enable_msix_range(dev, entries, 1, maxvec);
> +}
> +static inline int
> +pci_auto_enable_msix_exact(struct pci_dev *dev,
> + struct msix_entry *entries, int nvec)
> +{
> + return pci_auto_enable_msix_range(dev, entries, nvec, nvec);
> +}
> #endif
>
> #ifdef CONFIG_PCIEPORTBUS
> --
> 1.7.7.6
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/