Re: [PATCH 5.10 000/406] 5.10.195-rc1 review

From: Greg Kroah-Hartman
Date: Thu Sep 21 2023 - 16:29:32 EST


On Wed, Sep 20, 2023 at 12:13:25PM -0700, Kyle Zeng wrote:
> On Wed, Sep 20, 2023 at 10:01:55AM -0700, Florian Fainelli wrote:
> > On 9/20/23 08:18, Guenter Roeck wrote:
> > > On 9/20/23 01:11, Greg Kroah-Hartman wrote:
> > > > On Tue, Sep 19, 2023 at 09:57:25PM -0700, Guenter Roeck wrote:
> > > > > On 9/17/23 12:07, Greg Kroah-Hartman wrote:
> > > > > > This is the start of the stable review cycle for the 5.10.195 release.
> > > > > > There are 406 patches in this series, all will be posted as a response
> > > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > > let me know.
> > > > > >
> > > > > > Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000.
> > > > > > Anything received after that time might be too late.
> > > > > >
> > > > >
> > > > > chromeos-5.10 locks up in configfs_lookup() after the merge of
> > > > > v5.10.195.
> > > > >
> > > > > I am a bit puzzled because I see
> > > > >
> > > > > c709c7ca020a configfs: fix a race in configfs_lookup()
> > > > >
> > > > > in v5.10.195 but not in the list of commits below. I guess I must be
> > > > > missing something.
> > > >
> > > > It was part of the big patchset, it was posted here:
> > > >     https://lore.kernel.org/r/20230917191101.511939651@xxxxxxxxxxxxxxxxxxx
> > > >
> > > > Not hidden at all :)
> > > >
> > > > and was submitted here:
> > > >     https://lore.kernel.org/r/ZPOZFHHA0abVmGx+@westworld
> > > >
> > > > > Either case, the code now looks as follows.
> > > > >
> > > > > configfs_lookup()
> > > > > {
> > > > >      ...
> > > > >      spin_lock(&configfs_dirent_lock);
> > > > >      ...
> > > > >          err = configfs_attach_attr(sd, dentry);
> > > > >      ...
> > > > >      spin_unlock(&configfs_dirent_lock);
> > > > >      ...
> > > > > }
> > > > >
> > > > > and
> > > > >
> > > > > configfs_attach_attr(...)
> > > > > {
> > > > >      ...
> > > > >      spin_lock(&configfs_dirent_lock);
> > > > >      ...
> > > > > }
> > > > >
> > > > > which unless it is way too late here and I really need to go to sleep
> > > > > just won't work.
> > > >
> > > > Kyle, you did the backport, any comments?
> > > >
> > >
> > > After a good night sleep, the code still looks wrong to me. Reverting
> > > the offending patch in chromeos-5.10 solved the problem there.
> > > That makes me suspect that no one actually tests configfs.
> >
> > Humm indeed, looking at our testing we don't have our USB devices being
> > tested which would exercise configfs since we switch the USB device between
> > different configurations (mass storage, serial, networking etc.). Let me see
> > about adding that so we get some coverage.
> > --
> > Florian
> >
>
> Sorry for the wrong patch. My intention was to backport c42dd069be8dfc9b2239a5c89e73bbd08ab35de0
> to v5.10 to avoid a race condition triggered in my test. I tested the
> patch with my PoC program and made sure it won't trigger the crash. But
> I didn't notice that it could hang the kernel.
> I sincerely apologize for the mistake.
>
> My new proposed patch backports both
> c42dd069be8dfc9b2239a5c89e73bbd08ab35de0 and d07f132a225c013e59aa77f514ad9211ecab82ee.
> I made sure it does not trigger the race condition anymore.
> Can anyone having access to more comprehensive tests please check whether it works?
>
> Also, I'm not sure whether it is OK or how to backport two patches in
> one patch. Please advise on how to do it properly.

Please backport them both individually, do not merge them together.

I'll go revert the current change now and push out a release with it so
that it fixes users of this kernel tree.

thanks,

greg k-h