Re: [PATCH v3] xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled

From: Marek Marczykowski-Górecki
Date: Fri Nov 18 2022 - 16:46:45 EST


On Fri, Nov 18, 2022 at 03:46:47PM -0500, Jason Andryuk wrote:
> On Fri, Nov 18, 2022 at 10:50 AM Marek Marczykowski-Górecki
> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
> > the table is filled. Then it disables INTx just before clearing MASKALL
> > bit. Currently this approach is rejected by xen-pciback.
> > According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
> > enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).
> >
> > Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
> > applies to three places:
> > - checking currently enabled interrupts type,
> > - transition to MSI/MSI-X - where INTx would be implicitly disabled,
> > - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
> > enabled, as device should consider INTx disabled anyway in that case
> >
> > Fixes: 5e29500eba2a ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too")
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > ---
> > Changes in v3:
> > - allow clearing INTx regardless of MSI/MSI-X state, to be consistent
> > with enabling MSI/MSI-X
> > Changes in v2:
> > - restructure the patch to consider not only MASKALL bit, but enabling
> > MSI/MSI-X generally, without explicitly disabling INTx first
> > ---
>
> I was trying to test your xen-pciback v3 patch, and I am having
> assignment fail consistently now. It is actually failing to
> quarantine to domIO in the first place, which matches the failure from
> the other day (when I more carefully read through the logs). It now
> consistently fails to quarantine on every boot unlike the other day
> where it happened once.

Does this include the very first assignment too, or only after domain
reboot? If the latter, maybe some cleanup missed clearing MASKALL?

FWIW, the patch applied to Qubes
(https://github.com/QubesOS/qubes-linux-kernel/pull/680) seems to work
fine (the full test run is still in progress, but I see some green marks
already).

> I added some printks and it 's getting -EBUSY from pdev_msix_assign()
> which means pci_reset_msix_state() is failing:
> if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
> PCI_MSIX_FLAGS_MASKALL )
> return -EBUSY;
>
> # lspci -vv -s 14.3
> ...
> Capabilities: [80] MSI-X: Enable- Count=16 Masked+
> Vector table: BAR=0 offset=00002000
> PBA: BAR=0 offset=00003000
>
> So it looks like MASKALL is set and prevents assignment.
>
> setpci -s 00:14.3 82.W=f
> cleared that out for me and I could assign the device.
>
> My dom0 boots, it runs flask-label-pci for a set of PCI devices
> (including iwlwifi), then xl pci-assignable-add for all PCI devices
> which will be passed through, then a little later it boots the
> associated domains. Dom0 does not have a driver for iwlwifi.
>
> I'll have to investigate more to see how MASKALL is getting set. This
> had not been an issue before your recent patches.

I guess before the patches nothing set anything in MSI-X capability,
because it was hidden...

Anyway, to support my cleanup hypothesis, I tried to destroy a
PCI-having domain, and it left MSI-X enabled (at least according to the
config space). MASKALL was _not_ set, but I haven't checked masking of
individual vectors. TBH, I'm not sure what should be responsible for the
MSI-X cleanup after guest destroy. Should it be Xen? Qemu? Pciback?
Pciback calls PHYSDEVOP_{prepare,release}_msix only when
binding/unbinding from the device (so - xl pci-assignable-{add,remove}),
so this isn't the right place.
Should that be in Xen, in deassign_device() (part of
DOMCTL_deassign_device)?

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature