Re: [PATCH] s390/cio: Fix a memleak in css_alloc_subchannel

From: Halil Pasic
Date: Sun Sep 24 2023 - 14:02:50 EST


On Fri, 22 Sep 2023 21:15:48 +0200
Vineeth Vijayan <vneethv@xxxxxxxxxxxxx> wrote:

> On 9/22/23 15:20, Halil Pasic wrote:
> >> Author of 2ec2298412e1 here. If I don't completely misremember things,
> >> this was for the orphanage stuff (i.e. ccw devices that were still kept
> >> as disconnected, like dasd still in use, that had to be moved from their
> >> old subchannel object because a different device appeared on that
> >> subchannel.) That orphanage used a single dummy subchannel for all ccw
> >> devices moved there.
> >>
> >> I have no idea how the current common I/O layer works, but that might
> >> give you a hint about what to look for 😄
> > Yes, that is what the commit states and what the series is about. I hope
> > Vineeth can give us some answers 😄 maybe even out of the top of his
> > head... If not, I would trust his judgment on whether figuring things
> > out is worthwhile or not.
> >
> As Corny mentioned, orphanage is the only case i remember where
> this scenario of dynamically allocated sch->lock being used. I hope
> you remember the cdev->ccwlock, which is nothing but the copy of
> sch->lock pointer. This is rather a tricky design, where we are using
> the sch->lock and cdev->ccwlock, which are same pointers.
> Because this sch is exclusively for the cdev ops. But at the same time,
> a CC3 code in the stsch can make the attached device an orphanage and
> remove the sch.
>
> We have already seen an issue with this approach and had couple of
> discussions about avoiding this pointer usage without using an extra
> lock but do not have a right solution for this now.

Based on your response it seem you do understand the problem but are
struggling to find a solution. You are ahead of me. I'm still at the
stage where I don't understand the problem. I had another look at
that orphanage code, especially at ccw_device_move_to_sch(). Looks
to me that the *(sch->lock) ins not required outlive the *sch and
also that there is no move semantic in place.

Based on that let's take this offline, find a quiet hour and have a look
at the code and the problem. Maybe I can help with the solution once I
understand the problem -- but maybe not.

Regards,
Halil