Re: [PATCH] pid: Add the judgment of whether ns is NULL in the find_pid_ns

From: Xuewen Yan
Date: Wed Jul 26 2023 - 21:22:06 EST


On Wed, Jul 26, 2023 at 5:25 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> On Wed, Jul 26, 2023 at 09:23:13AM +0800, Xuewen Yan wrote:
> > On Tue, Jul 25, 2023 at 8:47 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Jul 25, 2023 at 08:24:18PM +0800, Xuewen Yan wrote:
> > > > On Tue, Jul 25, 2023 at 4:49 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Jul 13, 2023 at 03:17:13PM +0800, Xuewen Yan wrote:
> > > > > > There is no the judgment of whether namspace is NULL in find_pid_ns.
> > > > > > But there is a corner case when ns is null, for example: if user
> > > > > > call find_get_pid when current is in exiting, the following stack would
> > > > > > set thread_id be null:
> > > > > > release_task
> > > > > > __exit_signal(p);
> > > > > > __unhash_process(tsk, group_dead);
> > > > > > detach_pid(p, PIDTYPE_PID);
> > > > > > __change_pid(task, type, NULL);
> > > > > >
> > > > > > If user call find_get_pid at now, in find_vpid function, the
> > > > >
> > > > > I fail to see how this can happen. The code you're referencing is in
> > > > > release_task(). If current has gone through that then current obviously
> > > > > can't call find_vpid() on itself anymore or anything else for that
> > > > > matter.
> > > >
> > > > This happened when user calls find_vpid() in irq.
> > > >
> > > > [72117.635162] Call trace:
> > > > [72117.635595] idr_find+0xc/0x24
> > > > [72117.636103] find_get_pid+0x40/0x68
> > > > [72117.636812] send_event+0x88/0x180 [demux]
> > > > [72117.637593] vbvop_copy_data+0x150/0x344 [demux]
> > > > [72117.638434] dmisr_video_parsing_mpeg12+0x29c/0x42c [demux]
> > > > [72117.639393] dmisr_video_parsing_switch+0x68/0xec [demux]
> > > > [72117.640332] dmisr_handle_video_pes+0x10c/0x26c [demux]
> > > > [72117.641108] tasklet_action_common+0x130/0x224
> > > > [72117.641784] tasklet_action+0x28/0x34
> > > > [72117.642366] __do_softirq+0x128/0x4dc
> > > > [72117.642944] irq_exit+0xf8/0xfc
> > > > [72117.643459] __handle_domain_irq+0xb0/0x108
> > > > [72117.644102] gic_handle_irq+0x6c/0x124
> > > > [72117.644691] el1_irq+0x108/0x200
> > > > [72117.645217] _raw_write_unlock_irq+0x2c/0x5c
> > > > [72117.645870] release_task+0x144/0x1ac <<<<<<
> > > > [72117.646447] do_exit+0x524/0x94c
> > > > [72117.646970] __do_sys_exit_group+0x0/0x14
> > > > [72117.647591] do_group_exit+0x0/0xa0
> > > > [72117.648146] __se_sys_exit+0x0/0x20
> > > > [72117.648704] el0_svc_common+0xcc/0x1bc
> > > > [72117.649292] el0_svc_handler+0x2c/0x3c
> > > > [72117.649881] el0_svc+0x8/0xc
> > > >
> > > > In release_task, write_unlock_irq(&tasklist_lock) will open irq, at
> > > > this time, if user calls find_get_pid() in irq, because
> > > > current->thread_id is NULL,
> > > > it will handle the NULL pointer.
> > >
> > > Uhm, where is that code from? This doesn't seem to be upstream.
> >
> > It's from our own platform, we found someone called find_get_pid() in
> > the module, and caused the crash.
>
> So this is a bug report for an out of tree driver which I'm sure you're
> aware we consider mostly irrelevant unless this is an upstream issue.
>
> Please work around or better fix this in your out of tree driver or
> please show a reproducer how this can happen on upstream kernels.
>
> Otherwise I don't see why we'd care.

Okay, Thanks a lot!

BR