Re: [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments

From: Niklas Schnelle
Date: Thu Oct 06 2022 - 07:53:17 EST


On Wed, 2022-10-05 at 08:48 -0300, Jason Gunthorpe wrote:
> On Wed, Oct 05, 2022 at 09:58:58AM +0200, Niklas Schnelle wrote:
>
> > A failed aperture test leaving the IOAT registered would indeed be bad.
> > I guess I focused too much on the failure scenarios at the state after
> > these patches where this can't happen. I think this would leave us in a
> > bad state because zpci_register_ioat() succeeded with the domain's DMA
> > table but we won't have attached leading to the wrong decisions in
> > recovery paths (see below).
>
> Domain attach should either completely move to the new domain and
> succeed, or it should leave everything as is and fail.
>
> So it looks OK to me.
>
> > Recovery (via zpci_hot_reset_device()) should then be able to deal with
> > these situations as long as zdev->dma_table matches the IOAT
> > registration state.
>
> If you are doing reset the s390 driver should keep track of what
> domain is supposed to be attached and fix it when the reset is
> completed. In this case it should not fail attach here for the
> mandatory success domain types.

Our reset/recovery code won't do a detach/attach, it directly re-
establishes the DMA table that was previously in use with firmware. If
that fails the reset fails and one will have to "power cyle"* the
device.

Also automatic recovery is blocked while the IOMMU API is in use.
Though "echo 1 > /sys/bus/pci/../reset" is available and does re-
register the DMA table if the device was in an error state.

>
> The core code does not reasonably handle failures from this routine,
> it must be avoided if you want it to be robust.
>
> Jason

Makes sense. I see the following failure cases:

1. After patch 3 failure in the aperture check leaves
everything as it is. Before that my proposal would
leave it with DMA blocked and no domain attached
so it will need to be "power cycled"*.

2. If zpci_register_ioat() fails the device is left detached
from all domains. This however only happens in one of two cases:

2a. The device was surprise unplugged. This seems fine as
we tear things down and the calling code just needs to
back off which from what I can see it does.
2b. The device has entered an error state.

In case 2b the device is going to need recovery and will not be usable
until that succeeded (DMA and MMIO access is blocked). In automatic
recovery if zdev->dma_table == NULL the device will re-initialized for
use with the DMA API while if the IOMMU API is in use we currently
don't attempt recovery and the user needs to "power cycle"* the device
manually. The "re-initalized for DMA API" part of course doesn't work
for the upcoming DMA API conversion.

One option I see would be to ignore the error return from
zpci_register_ioat() if it indicates case 2b. Then we would still add
the device to the IOMMU's devices list and return success despite
knowing that the device is inaccessible (DMA and MMIO blocked).

Then the recovery/reset code will register the new domain once the
device comes out of the error state. At least from an IOMMU API point
of view that would make the attachment always succeed for all
zpci_register_ioat() error cases that aren't programming bugs and can
conceivably be recovered from.

If you agree I would propose adding this as a robustness improvement as
part of my upcoming series of IOMMU improvements needed for the DMA API
conversion. As stated above before the DMA API conversion any error
that would cause zpci_register_ioat() to fail while the IOMMU API is
being used will need a "power cycle" anyway so postponing this doesn't
hurt.

* I say "power cycle" but this isn't usually a real power cycle rather
an architecture specific low level disabled/enable but from Linux
driver point of view the device is completely unplugged and re-plugged.

Niklas