Re: [PATCH v2] s390/vfio-ap: fix memory leak in mdev remove callback

From: Halil Pasic
Date: Wed May 19 2021 - 07:26:04 EST


On Wed, 19 May 2021 10:17:49 +0200
Christian Borntraeger <borntraeger@xxxxxxxxxx> wrote:

> On 19.05.21 01:27, Halil Pasic wrote:
> > On Tue, 18 May 2021 19:01:42 +0200
> > Christian Borntraeger <borntraeger@xxxxxxxxxx> wrote:
> >
> >> On 18.05.21 17:33, Halil Pasic wrote:
> >>> On Tue, 18 May 2021 15:59:36 +0200
> >>> Christian Borntraeger <borntraeger@xxxxxxxxxx> wrote:
> > [..]
> >>>>>>
> >>>>>> Would it help, if the code in priv.c would read the hook once
> >>>>>> and then only work on the copy? We could protect that with rcu
> >>>>>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after
> >>>>>> unsetting the pointer?
> >>>
> >>> Unfortunately just "the hook" is ambiguous in this context. We
> >>> have kvm->arch.crypto.pqap_hook that is supposed to point to
> >>> a struct kvm_s390_module_hook member of struct ap_matrix_mdev
> >>> which is also called pqap_hook. And struct kvm_s390_module_hook
> >>> has function pointer member named "hook".
> >>
> >> I was referring to the full struct.
> >>>
> >>>>>
> >>>>> I'll look into this.
> >>>>
> >>>> I think it could work. in priv.c use rcu_readlock, save the
> >>>> pointer, do the check and call, call rcu_read_unlock.
> >>>> In vfio_ap use rcu_assign_pointer to set the pointer and
> >>>> after setting it to zero call sychronize_rcu.
> >>>
> >>> In my opinion, we should make the accesses to the
> >>> kvm->arch.crypto.pqap_hook pointer properly synchronized. I'm
> >>> not sure if that is what you are proposing. How do we usually
> >>> do synchronisation on the stuff that lives in kvm->arch?
> >>>
> >>
> >> RCU is a method of synchronization. We make sure that structure
> >> pqap_hook is still valid as long as we are inside the rcu read
> >> lock. So the idea is: clear pointer, wait until all old readers
> >> have finished and the proceed with getting rid of the structure.
> >
> > Yes I know that RCU is a method of synchronization, but I'm not
> > very familiar with it. I'm a little confused by "read the hook
> > once and then work on a copy". I guess, I would have to read up
> > on the RCU again to get clarity. I intend to brush up my RCU knowledge
> > once the patch comes along. I would be glad to have your help when
> > reviewing an RCU based solution for this.
>
> Just had a quick look. Its not trivial, as the hook function itself
> takes a mutex and an rcu section must not sleep. Will have a deeper
> look.

I refreshed my RCU knowledge and RCU seems to be a reasonable choice
here. I don't think we have to make the rcu read section span the
call to the callback. That is something like

--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -613,6 +613,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
unsigned long reg0;
int ret;
uint8_t fc;
+ int (*pqap_hook)(struct kvm_vcpu *vcpu);

/* Verify that the AP instruction are available */
if (!ap_instructions_available())
@@ -657,14 +658,21 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
* Verify that the hook callback is registered, lock the owner
* and call the hook.
*/
+ rcu_read_lock();
if (vcpu->kvm->arch.crypto.pqap_hook) {
- if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
+ if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) {
+ rcu_read_unlock();
return -EOPNOTSUPP;
- ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
+ }
+ pqap_hook = READ_ONCE(vcpu->kvm->arch.crypto.pqap_hook->hook);
+ rcu_read_unlock();
+ ret = pqap_hook();
module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
kvm_s390_set_psw_cc(vcpu, 3);
return ret;
+ } else {
+ rcu_read_unlock();
}
/*
* A vfio_driver must register a hook.

Should be sufficient. The module get ensures that the pointee is still
around for the duration of the call. The handle_pqap() from
vfio_ap_ops.c checks the vcpu->kvm->arch.crypto.pqap_hook the same
lock that is used to set it to NULL, and bails out if it is NULL. It
is a bit convoluted, but it should work.

Regards,
Halil