Re: [PATCH v3 2/3] zram: fix deadlock with sysfs attribute usage and driver removal

From: Luis Chamberlain
Date: Tue Jun 22 2021 - 13:00:14 EST


On Tue, Jun 22, 2021 at 06:51:13PM +0200, Greg KH wrote:
> On Tue, Jun 22, 2021 at 09:40:27AM -0700, Luis Chamberlain wrote:
> > On Tue, Jun 22, 2021 at 06:27:52PM +0200, Greg KH wrote:
> > > On Tue, Jun 22, 2021 at 08:27:13AM -0700, Luis Chamberlain wrote:
> > > > On Tue, Jun 22, 2021 at 09:41:23AM +0200, Greg KH wrote:
> > > > > On Mon, Jun 21, 2021 at 04:36:34PM -0700, Luis Chamberlain wrote:
> > > > > > + ssize_t __ret; \
> > > > > > + if (!try_module_get(THIS_MODULE)) \
> > > > >
> > > > > try_module_get(THIS_MODULE) is always racy and probably does not do what
> > > > > you want it to do. You always want to get/put module references from
> > > > > code that is NOT the code calling these functions.
> > > >
> > > > In this case, we want it to trump module removal if it succeeds. That's all.
> > >
> > > True, but either you stop the race, or you do not right? If you are so
> > > invested in your load/unload test, this should show up with this code
> > > eventually as well.
> >
> > I still do not see how the race is possible give the goal to prevent
> > module removal if a sysfs file is being used. If rmmod is taking
> > place, this simply will bail out.
> >
> > > > > > + return -ENODEV; \
> > > > > > + __ret = _name ## _store(dev, attr, buf, len); \
> > > > > > + module_put(THIS_MODULE); \
> > > > >
> > > > > This too is going to be racy.
> > > > >
> > > > > While fun to poke at, I still think this is pointless.
> > > >
> > > > If you have a better idea, which does not "DOS" module removal, please
> > > > let me know!
> > >
> > > I have yet to understand why you think that the load/unload in a loop is
> > > a valid use case.
> >
> > That is dependent upon the intrastructure tests built for a driver.
> >
> > In the case of fstests and blktests we have drivers which *always* get
> > removed and loaded on each test. Take for instance scsi_debug, which
> > creates / destroys virtual devices on the per test. Likewise, to build
> > confidence that failure rate is as close as possible to 0, one must run
> > a test as many times as possible in a loop. And, to build confidence in
> > a test, in some situations one ends up running modprobe / rmmod in a
> > loop.
> >
> > In this case a customer does have a complex system of tests, and by looking
> > at the crash logs I managed to simplify the way to reproduce it using
> > simple shell scripts.
>
> And is _this_ change needed even with the changes in patch 1/3?

Oh absolutely. This patch is needed 100%. Without it, it is actually
pretty trivial to deadlock as noted in my instructions on how to
reproduce.

> I think that commit fixes your issues given that you will not unload the
> module until after the sysfs devices are removed from the system. Have
> you tried that alone with your test?

I have tried that, and it does not resolve the deadlock.

It was *why* I have been insisting that this is a real issue, and why I
decided to instead try to implement something generic after I was hinted
by livepatch folks that they also had observed a similar deadlock, and
so that a generic solution would be appreciated by them.

Luis