Re: [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use

From: Cornelia Huck
Date: Thu Sep 17 2020 - 08:15:53 EST


On Tue, 15 Sep 2020 15:32:35 -0400
Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:

> On 9/14/20 11:29 AM, Cornelia Huck wrote:
> > On Fri, 21 Aug 2020 15:56:04 -0400
> > Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
> >
> >> Introduces a new driver callback to prevent a root user from unbinding
> >> an AP queue from its device driver if the queue is in use. The intent of
> >> this callback is to provide a driver with the means to prevent a root user
> >> from inadvertently taking a queue away from a matrix mdev and giving it to
> >> the host while it is assigned to the matrix mdev. The callback will
> >> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> >> attributes would result in one or more AP queues being removed from its
> >> driver. If the callback responds in the affirmative for any driver
> >> queried, the change to the apmask or aqmask will be rejected with a device
> >> in use error.
> >>
> >> For this patch, only non-default drivers will be queried. Currently,
> >> there is only one non-default driver, the vfio_ap device driver. The
> >> vfio_ap device driver facilitates pass-through of an AP queue to a
> >> guest. The idea here is that a guest may be administered by a different
> >> sysadmin than the host and we don't want AP resources to unexpectedly
> >> disappear from a guest's AP configuration (i.e., adapters, domains and
> >> control domains assigned to the matrix mdev). This will enforce the proper
> >> procedure for removing AP resources intended for guest usage which is to
> >> first unassign them from the matrix mdev, then unbind them from the
> >> vfio_ap device driver.
> >>
> >> Signed-off-by: Tony Krowiak <akrowiak@xxxxxxxxxxxxx>
> >> Reported-by: kernel test robot <lkp@xxxxxxxxx>
> > This looks a bit odd...
>
> I've removed all of those. These kernel test robot errors were flagged
> in the last series. The review comments from the robot suggested
> the reported-by, but I assume that was for patches intended to
> fix those errors, so I am removing these as per Christian's comments.

Yes, I think the Reported-by: mostly makes sense if you include a patch
to fix something on top.

>
> >
> >> ---
> >> drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
> >> drivers/s390/crypto/ap_bus.h | 4 +
> >> 2 files changed, 142 insertions(+), 10 deletions(-)
> >>
> > (...)
> >
> >> @@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
> >> return rc;
> >> }
> >>
> >> +static int __verify_card_reservations(struct device_driver *drv, void *data)
> >> +{
> >> + int rc = 0;
> >> + struct ap_driver *ap_drv = to_ap_drv(drv);
> >> + unsigned long *newapm = (unsigned long *)data;
> >> +
> >> + /*
> >> + * No need to verify whether the driver is using the queues if it is the
> >> + * default driver.
> >> + */
> >> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> >> + return 0;
> >> +
> >> + /* The non-default driver's module must be loaded */
> >> + if (!try_module_get(drv->owner))
> >> + return 0;
> >> +
> >> + if (ap_drv->in_use)
> >> + if (ap_drv->in_use(newapm, ap_perms.aqm))
> >> + rc = -EADDRINUSE;
> > ISTR that Christian suggested -EBUSY in a past revision of this series?
> > I think that would be more appropriate.
>
> I went back and looked and sure enough, he did recommend that.
> You have a great memory! I didn't respond to that comment, so I
> must have missed it at the time.
>
> I personally prefer EADDRINUSE because I think it is more indicative
> of the reason an AP resource can not be assigned back to the host
> drivers is because it is in use by a guest or, at the very least, reserved
> for use by a guest (i.e., assigned to an mdev). To say it is busy implies
> that the device is busy performing encryption services which may or
> may not be true at a given moment. Even if so, that is not the reason
> for refusing to allow reassignment of the device.

I have a different understanding of these error codes: EADDRINUSE is
something used in the networking context when an actual address is
already used elsewhere. EBUSY is more of a generic error that indicates
that a certain resource is not free to perform the requested operation;
it does not necessarily mean that the resource is currently actively
doing something. Kind of when you get EBUSY when trying to eject
something another program holds a reference on: that other program
might not actually be doing anything, but it potentially could.