Re: linux 5.14.3: free_user_ns causes NULL pointer dereference

From: Jordan Glover
Date: Sun Oct 03 2021 - 15:38:09 EST


On Wednesday, September 29th, 2021 at 5:36 PM, Alexey Gladkov <legion@xxxxxxxxxx> wrote:

> On Tue, Sep 28, 2021 at 01:40:48PM +0000, Jordan Glover wrote:
>
> > On Thursday, September 16th, 2021 at 5:30 PM, ebiederm@xxxxxxxxxxxx wrote:
> >
> > > Jordan Glover Golden_Miller83@xxxxxxxxxxxxx writes:
> > >
> > > > On Wednesday, September 15th, 2021 at 10:42 PM, Jordan Glover Golden_Miller83@xxxxxxxxxxxxx wrote:
> > > >
> > > > > I had about 2 containerized (flatpak/bubblewrap) apps (browser + music player) running . I quickly closed them with intent to shutdown the system but instead get the freeze and had to use magic sysrq to reboot. System logs end with what I posted and before there is nothing suspicious.
> > > > >
> > > > > Maybe it's some random fluke. I'll reply if I hit it again.
> > > >
> > > > Heh, it jut happened again. This time closing firefox alone had such
> > > >
> > > > effect:
> > >
> > > Ok. It looks like he have a couple of folks seeing issues here.
> > >
> > > I thought we had all of the issues sorted out for the release of v5.14,
> > >
> > > but it looks like there is still some little bug left.
> > >
> > > If Alex doesn't beat me to it I will see if I can come up with a
> > >
> > > debugging patch to make it easy to help track down where the reference
> > >
> > > count is going wrong. It will be a little bit as my brain is mush at
> > >
> > > the moment.
> > >
> > > Eric
> >
> > As the issue persist in 5.14.7 I would be very interested in such patch.
> >
> > For now the thing is mostly reproducible when I close several tabs in ff then
> >
> > close the browser in short period of time. When I close tabs then wait out
> >
> > a bit then close the browser it doesn't happen so I guess some interrupted
> >
> > cleanup triggers it.
>
> I'm still investigating, but I would like to rule out one option.
>
> Could you check out the patch?
>
> diff --git a/kernel/ucount.c b/kernel/ucount.c
>
> index bb51849e6375..f23f906f4f62 100644
>
> --- a/kernel/ucount.c
>
> +++ b/kernel/ucount.c
>
> @@ -201,11 +201,14 @@ void put_ucounts(struct ucounts *ucounts)
>
> {
>
> unsigned long flags;
>
> - if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
>
>
>
> - spin_lock_irqsave(&ucounts_lock, flags);
>
>
> - if (atomic_dec_and_test(&ucounts->count)) {
>
> hlist_del_init(&ucounts->node);
>
> spin_unlock_irqrestore(&ucounts_lock, flags);
> kfree(ucounts);
>
>
> - return;
> }
>
>
> - spin_unlock_irqrestore(&ucounts_lock, flags);
>
>
>
> }
>
> static inline bool atomic_long_inc_below(atomic_long_t *v, int u)
>
> ---------------------------------------------------------------------
>
> Rgrds, legion

I'm still able to reproduce the issue with above patch although situation
changed/improved a bit as now I have to close tabs and browser really fast
to hit it which means it's more unlikely to happen during real usage.

On the other hand the kernel logging cuts off much earlier, just after few
lines:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 20387 at kernel/ucount.c:256 dec_ucount+0x43/0x50
Modules linked in: ...

Jordan