Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal

From: Greg KH
Date: Fri Jul 23 2021 - 07:14:16 EST


On Thu, Jul 22, 2021 at 02:31:37PM -0700, Luis Chamberlain wrote:
> On Wed, Jul 21, 2021 at 01:30:29PM +0200, Greg KH wrote:
> > On Thu, Jul 01, 2021 at 03:48:16PM -0700, Luis Chamberlain wrote:
> > > On Fri, Jun 25, 2021 at 02:56:03PM -0700, Luis Chamberlain wrote:
> > > > On Thu, Jun 24, 2021 at 01:09:03PM +0200, Greg KH wrote:
> > > > > thanks for making this change and sticking with it!
> > > > >
> > > > > Oh, and with this change, does your modprobe/rmmod crazy test now work?
> > > >
> > > > It does but I wrote a test_syfs driver and I believe I see an issue with
> > > > this. I'll debug a bit more and see what it was, and I'll then also use
> > > > the driver to demo the issue more clearly, and then verification can be
> > > > an easy selftest test.
> > >
> > > OK my conclusion based on a new selftest driver I wrote is we can drop
> > > this patch safely. The selftest will cover this corner case well now.
> > >
> > > In short: the kernfs active reference will ensure the store operation
> > > still exists. The kernfs mutex is not enough, but if the driver removes
> > > the operation prior to getting the active reference, the write will just
> > > fail. The deferencing inside of the sysfs operation is abstract to
> > > kernfs, and while kernfs can't do anything to prevent a driver from
> > > doing something stupid, it at least can ensure an open file ensure the
> > > op is not removed until the operation completes.
> >
> > Ok, so all is good?
>
> It would seem to be the case.
>
> > Then why is your zram test code blowing up so badly?
>
> I checked the logs for the backtrace where the crash did happen
> and we did see clear evidence of the race we feared here. The *first*
> bug that happened was the CPU hotplug race:
>
> [132004.787099] Error: Removing state 61 which has instances left.
> [132004.787124] WARNING: CPU: 17 PID: 9307 at ../kernel/cpu.c:1879 __cpuhp_remove_state_cpuslocked+0x1c4/0x1d0

I do not understand what this issue is, is it fixed? Why is a cpu being
hot unplugged at the same time a zram?

thanks,

greg k-h