Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X

From: Marc Zyngier
Date: Mon Aug 23 2021 - 07:28:19 EST


On Mon, 23 Aug 2021 12:03:08 +0100,
Barry Song <21cnbao@xxxxxxxxx> wrote:
>
> On Mon, Aug 23, 2021 at 10:30 PM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> >
> > On Sat, 21 Aug 2021 23:14:35 +0100,
> > Barry Song <21cnbao@xxxxxxxxx> wrote:
> > >
> > > On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> > > >
> > > > Hi Bjorn,
> > > >
> > > > On Sat, 21 Aug 2021 00:33:28 +0100,
> > > > Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > >
> >
> > [...]
> >
> > > > > In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > > > > get a saved .default_irq in each one?
> > > >
> > > > That's a key point.
> > > >
> > > > Old-school PCI/MSI is represented by a single interrupt, and you
> > > > *could* somehow make it relatively easy for drivers that only
> > > > understand INTx to migrate to MSI if you replaced whatever is held in
> > > > dev->irq (which should only represent the INTx mapping) with the MSI
> > > > interrupt number. Which I guess is what the MSI code is doing.
> > > >
> > > > This is the 21st century, and nobody should ever rely on such horror,
> > > > but I'm sure we do have such drivers in the tree. Boo.
> > > >
> > > > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > > > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > > > zero guarantee that the allocated interrupts will be in a contiguous
> > > > space.
> > > >
> > > > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > > > again!)".
> > > >
> > >
> > > The only thing is that dev->irq is an sysfs ABI to userspace. Due to
> > > the inconsistency between legacy PCI INTx, MSI, MSI-X, this ABI
> > > should have been absolutely broken nowadays. This is actually what
> > > the patchset was originally aiming at to fix.
> >
> > I do not think we should expose more of a broken abstraction to
> > userspace. We will have to carry on exposing the first MSI in this
> > field forever, but it doesn't mean we should have to do it for MSI-X.
> >
> > > One more question from me is that does dev->irq actually hold any
> > > valid hardware INTx information while hardware is using MSI-X? At
> > > least in my hardware, sysfs ABI for PCI is all "0".
> >
> > That's probably because nothing actually configured the interrupt, or
> > that there is no INTx implementation. I have that on systems with
> > pretty dodgy (or incomplete) firmware.
> >
> > > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat irq
> > > 0
> > >
> > > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# ls -l msi_irqs/*
> > > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/499
> > > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/500
> > > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/501
> > > ...
> > > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat msi_irqs/499
> > > msix
> > >
> > > Not quite sure how it is going on different hardware platforms.
> >
> > My D05 does that as well, and it doesn't expose any INTx support.
> >
> > >
> > > > MSI-X is not something you can "accidentally" use. You have to
> > > > actively embrace it. In all honesty, this patch tries to move in the
> > > > wrong direction. If anything, we should kill this hack altogether and
> > > > fix the (handful of?) drivers that rely on it. That'd actually be a
> > > > good way to find whether they are still worth keeping in the tree. And
> > > > if it breaks too many of them, then at least we'll know where we
> > > > stand.
> > > >
> > > > I'd be tempted to leave the below patch simmer in -next for a few
> > > > weeks and see if how many people shout:
> > >
> > > This looks like a more proper direction to go.
> > > but here i am wondering how sysfs ABI document should follow the below change
> > > doc is patch 2/2:
> > > https://lore.kernel.org/lkml/20210820223744.8439-3-21cnbao@xxxxxxxxx/
> > >
> > > On the other hand, my feeling is that nobody should depend on sysfs
> > > irq entry nowadays.
> >
> > Too late. It is there, and we need to preserve it. I just don't think
> > feeding it more erroneous information is the right thing to do.
> >
> > My patch was only dealing with the kernel side of things, not the
> > userspace ABI. That ABI should be carried on unchanged.
>
> it seems this isn't true. your patch is also changing userspace ABI
> as long as you change pci_dev->irq which will be shown in sysfs irq
> entry.

I guess I wasn't clear enough above. Let me rephrase this:

My patch was only dealing with the kernel side of things, not the
userspace ABI. That ABI should be carried on unchanged, which requires
additional changes in the sysfs code.

> if we don't want to change the behaviour of any existing ABI, it
> seems the only thing we can do here to document it well in ABI
> doc. i actually doubt anyone has really understood what the irq
> entry is really showing.

Given that we can't prove that it is actually the case, I believe this
is the only option.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.