Re: [PATCH] pid: Add the judgment of whether ns is NULL in the find_pid_ns

From: Christian Brauner
Date: Wed Jul 26 2023 - 05:27:41 EST


On Wed, Jul 26, 2023 at 09:23:13AM +0800, Xuewen Yan wrote:
> On Tue, Jul 25, 2023 at 8:47 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > On Tue, Jul 25, 2023 at 08:24:18PM +0800, Xuewen Yan wrote:
> > > On Tue, Jul 25, 2023 at 4:49 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jul 13, 2023 at 03:17:13PM +0800, Xuewen Yan wrote:
> > > > > There is no the judgment of whether namspace is NULL in find_pid_ns.
> > > > > But there is a corner case when ns is null, for example: if user
> > > > > call find_get_pid when current is in exiting, the following stack would
> > > > > set thread_id be null:
> > > > > release_task
> > > > > __exit_signal(p);
> > > > > __unhash_process(tsk, group_dead);
> > > > > detach_pid(p, PIDTYPE_PID);
> > > > > __change_pid(task, type, NULL);
> > > > >
> > > > > If user call find_get_pid at now, in find_vpid function, the
> > > >
> > > > I fail to see how this can happen. The code you're referencing is in
> > > > release_task(). If current has gone through that then current obviously
> > > > can't call find_vpid() on itself anymore or anything else for that
> > > > matter.
> > >
> > > This happened when user calls find_vpid() in irq.
> > >
> > > [72117.635162] Call trace:
> > > [72117.635595] idr_find+0xc/0x24
> > > [72117.636103] find_get_pid+0x40/0x68
> > > [72117.636812] send_event+0x88/0x180 [demux]
> > > [72117.637593] vbvop_copy_data+0x150/0x344 [demux]
> > > [72117.638434] dmisr_video_parsing_mpeg12+0x29c/0x42c [demux]
> > > [72117.639393] dmisr_video_parsing_switch+0x68/0xec [demux]
> > > [72117.640332] dmisr_handle_video_pes+0x10c/0x26c [demux]
> > > [72117.641108] tasklet_action_common+0x130/0x224
> > > [72117.641784] tasklet_action+0x28/0x34
> > > [72117.642366] __do_softirq+0x128/0x4dc
> > > [72117.642944] irq_exit+0xf8/0xfc
> > > [72117.643459] __handle_domain_irq+0xb0/0x108
> > > [72117.644102] gic_handle_irq+0x6c/0x124
> > > [72117.644691] el1_irq+0x108/0x200
> > > [72117.645217] _raw_write_unlock_irq+0x2c/0x5c
> > > [72117.645870] release_task+0x144/0x1ac <<<<<<
> > > [72117.646447] do_exit+0x524/0x94c
> > > [72117.646970] __do_sys_exit_group+0x0/0x14
> > > [72117.647591] do_group_exit+0x0/0xa0
> > > [72117.648146] __se_sys_exit+0x0/0x20
> > > [72117.648704] el0_svc_common+0xcc/0x1bc
> > > [72117.649292] el0_svc_handler+0x2c/0x3c
> > > [72117.649881] el0_svc+0x8/0xc
> > >
> > > In release_task, write_unlock_irq(&tasklist_lock) will open irq, at
> > > this time, if user calls find_get_pid() in irq, because
> > > current->thread_id is NULL,
> > > it will handle the NULL pointer.
> >
> > Uhm, where is that code from? This doesn't seem to be upstream.
>
> It's from our own platform, we found someone called find_get_pid() in
> the module, and caused the crash.

So this is a bug report for an out of tree driver which I'm sure you're
aware we consider mostly irrelevant unless this is an upstream issue.

Please work around or better fix this in your out of tree driver or
please show a reproducer how this can happen on upstream kernels.

Otherwise I don't see why we'd care.