Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

From: Halil Pasic
Date: Wed Feb 03 2021 - 19:22:19 EST


On Wed, 3 Feb 2021 18:13:09 -0500
Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:

> On 1/12/21 12:55 PM, Halil Pasic wrote:
> > On Tue, 12 Jan 2021 02:12:51 +0100
> > Halil Pasic <pasic@xxxxxxxxxxxxx> wrote:
> >
> >>> @@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> >>> apqi = AP_QID_QUEUE(q->apqn);
> >>> vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >>>
> >>> - if (q->matrix_mdev)
> >>> + if (q->matrix_mdev) {
> >>> + matrix_mdev = q->matrix_mdev;
> >>> vfio_ap_mdev_unlink_queue(q);
> >>> + vfio_ap_mdev_refresh_apcb(matrix_mdev);
> >>> + }
> >>>
> >>> kfree(q);
> >>> mutex_unlock(&matrix_dev->lock);
> > Shouldn't we first remove the queue from the APCB and then
> > reset? Sorry, I missed this one yesterday.
>
> I agreed to move the reset, however if the remove callback is
> invoked due to a manual unbind of the queue and the queue is
> in use by a guest, the cleanup of the IRQ resources after the
> reset of the queue will not happen because the link from the
> queue to the matrix mdev was removed. Consequently, I'm going
> to have to change the patch 05/15 to split the vfio_ap_mdev_unlink_queue()
> function into two functions: one to remove the link from the matrix mdev to
> the queue; and, one to remove the link from the queue to the matrix
> mdev.

Does that mean we should reset before the unlink (or before the second
part of it after the split up)?

I mean have a look at unassign_adapter_store() with all patches
of this series applied. It does an unlink but doesn't do any reset,
or cleanup IRQ resources. And after the unlink we can't clean up
the IRQ resources properly.

But before all this we should resolve this circular lock dependency
problem in a satisfactory way. I'm quite worried about how it is going
to mesh with this series and dynamic ap pass-through.

Regards,
Halil

>Only the first will be used for the remove callback which should
> be fine since the queue object is freed at the end of the remove
> function anyway.
>
> >
> > Regards,
> > Halil
>