Re: [PATCH v2 0/4] zram: fix few sysfs races

From: Greg Kroah-Hartman
Date: Fri May 21 2021 - 16:45:05 EST


On Fri, May 21, 2021 at 08:16:18PM +0000, Luis Chamberlain wrote:
> On Fri, May 21, 2021 at 10:01:52PM +0200, Greg Kroah-Hartman wrote:
> > On Wed, May 19, 2021 at 08:20:23PM +0000, Luis Chamberlain wrote:
> > > Greg,
> > >
> > > your feedback would be appreciated here.
> >
> > Appreciated where? This is a zram patchset, what do I need to mess with
> > it for?
>
> This patchset has 2 issues which I noted in the last series that are
> generic, and could best be dealt with on sysfs, and suggested
> how this could actually be dealt with on sysfs / kernfs.
>
> > > Greg, can you comment on technical levels why a general core fix is not
> > > desirable upstream for those two issues?
> >
> > What issues exactly?
>
> When I suggested the generic way to fix this your main argument against
> a generic solution was that we don't support module removal. Given that
> argument did not seem to hold any water it begs the question if you
> still would rather not see this fixed in sysfs / kernfs.
>
> If you however are more open to it now, I can instead take that work, and
> send a proper patch for review.

I looked at the last patch here and I really do not see the issue.

In order for the module to be removed, zram_exit() has to return, right?
And that function calls destroy_devices() which will then remove all
devices in sysfs that are associated with this driver. At that point in
time, sysfs detaches the attributes from kernfs so that any open file
handle that happened to be around for an attribute file will not call
back into the show/store function for that device.

Then destroy_devices() returns, and zram_exit() returns, and the module
is unloaded.

So how can a show/store function in zram_drv.c be called after
destroy_devices() returns?

The changelog text in patch 4/4 is odd, destroy_devices() shouldn't be
racing with anything as devices have reference counts in order to
protect this type of thing from happening, right? How can a store
function be called when a device is somehow removed from memory at the
same time? Don't we properly incremement/decrement the device
structure's reference count? If not, wouldn't that be the simplest
solution here?

And who is ripping out zram drivers while the system is running anyway?
What workflow causes this to happen so much so that the sysfs files need
to be "protected"? What tool/script/whatever is hammering on those
sysfs files so much while someone wants to unload the module?

What am I missing?

thanks,

greg k-h