RE: 答复: [External Mail]Re: [PATCH 1/1] sched: fix user_mask double free

From: Wenjie Li (Evan)
Date: Thu Nov 24 2022 - 07:07:00 EST


Hi, Waiman.

"The clearing of user_cpus_ptr is protected by pi_lock. IOW, racing between dup_user_cpus_ptr() and do_set_cpus_allowed is not possible and double free like what you have suggested should not happen." We still can understand why it is impossible to happen. Because we indeed met this issue. Following is we got from ftrace.

1. Task A pid 27961 run on core6 and is forking/cloning task pid 28051, and task B pid 28051 will copy task struct data from task A pid 27961. So task A p->user_cpus_ptr = ffffff884fbf9200 is equal to task B p->user_cpus_ptr=ffffff884fbf9200 through arch_dup_task_struct.

2. Then core6 met hotplug which will task migration/6 pid 61 preempt schedule task A 27961
Trac log: rso-inner-27961 [006] 56114.972822: sched_switch: rso-inner:27961 [139] R ==> migration/6:61 [0]

3. Then task migration/6 pid 61 migrate task A 27961 from core6 to core0 and clear task A 27961 p->user_cpus_ptr through this process: migrate_tasks -> select_fallback_rq -> do_set_cpus_allowed -> free user_cpus_ptr and change task A 27961 user_cpus_ptr = NULL.
Trace log: migration/6-61 [006] 56114.972937: bprint: do_set_cpus_allowed: do_set_cpus_allowed: p->comm:rso-inner pid:27961, maskp:0 ac.user_mask:ffffff884fbf9200

4. Then task A pid 27961 enqueue on core0 and run on core0.
Ttrace log : binder:27524_5-27775 [000] 56114.973592: sched_switch: binder:27524_5:27775 [120] S ==> rso-inner:27961 [139]

5. Then task A pid 27961 call dup_user_cpus_ptr which found src->user_cpus_ptr is NULL, So It will directly return and will not allocate task B pid 28051's user_cpus_ptr. So task B pid 28051's user_cpus_ptr still is ffffff884fbf9200.

6. Then task A pid 27961 wake up task B28051 in this process "kernel_clone-> wake_up_new_task-> select_fallback_rq -> do_set_cpus_allowed" which will call do_set_cpus_allowed again will double free ffffff884fbf9200:
Trace log: rso-inner-27961 [000] 56114.973966: bprint: do_set_cpus_allowed: do_set_cpus_allowed: p->comm:rso-inner pid:28051, maskp:0 ac.user_mask:ffffff884fbf9200


Thanks

-----Original Message-----
From: Waiman Long <longman@xxxxxxxxxx>
Sent: Thursday, November 24, 2022 12:00 PM
To: David Wang 王标 <wangbiao3@xxxxxxxxxx>; Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: mingo@xxxxxxxxxx; juri.lelli@xxxxxxxxxx; vincent.guittot@xxxxxxxxxx; brauner@xxxxxxxxxx; bsegall@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Wenjie Li (Evan) <wenjieli@xxxxxxxxxxxxxxxx>; 陈冠有 <chenguanyou@xxxxxxxxxx>; Will Deacon <will@xxxxxxxxxx>; Ting11 Wang 王婷 <wangting11@xxxxxxxxxx>
Subject: Re: 答复: [External Mail]Re: [PATCH 1/1] sched: fix user_mask double free

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

On 11/23/22 21:37, David Wang 王标 wrote:
>
> Dear Waiman,
>
> Yes, we have read
> https://lore.kernel.org/lkml/20221123131612.914906-1-longman@xxxxxxxxx
> m/
> and checked it carefully.
>
> We mean dup_user_cpus_ptr should not judge if the src->user_cpus_ptr
> is null at the entry of the Function dup_user_cpus_ptr.
> If do this when user_cpus_ptr is freed by othe thread, but the parent
> task copy the user_cpus_ptr data for new task through dup_task_struct
> <https://opengrok.qualcomm.com/source/s?refs=dup_task_struct&project=K
> ERNEL.PLATFORM.2.0>
> -> arch_dup_task_struct
> <https://opengrok.qualcomm.com/source/xref/KERNEL.PLATFORM.2.0/kernel_
> platform/common/kernel/fork.c#arch_dup_task_struct>(tsk
> <https://opengrok.qualcomm.com/source/s?defs=tsk&project=KERNEL.PLATFO
> RM.2.0>,
> orig
> <https://opengrok.qualcomm.com/source/s?defs=orig&project=KERNEL.PLATF
> ORM.2.0>)
> before the user_cpus_ptr
> ,
> is freed, ,next , when dup_task_struct
> <https://opengrok.qualcomm.com/source/s?refs=dup_task_struct&project=K
> ERNEL.PLATFORM.2.0>
> call dup_user_cpus_ptr
> <https://opengrok.qualcomm.com/source/s?defs=dup_user_cpus_ptr&project
> =KERNEL.PLATFORM.2.0>(tsk
> <https://opengrok.qualcomm.com/source/s?defs=tsk&project=KERNEL.PLATFO
> RM.2.0>,
> orig
> <https://opengrok.qualcomm.com/source/s?defs=orig&project=KERNEL.PLATF
> ORM.2.0>,
> node
> <https://opengrok.qualcomm.com/source/s?defs=node&project=KERNEL.PLATF
> ORM.2.0>),it
> will return directly
> without doing nothing. When wake up new task , then call
> select_fallback_rqàdo_set_cpus_allowed,
> it will meet slub double free issue. Then new path can not fix issue
> in this scenario.
> void do_set_cpus_allowed(struct task_struct *p, const struct cpumask
> *new_mask)
> {
> struct affinity_context ac = {
> .new_mask = new_mask,
> .user_mask = NULL,
> .flags = SCA_USER, /* clear the user requested
> mask */
> };
> __do_set_cpus_allowed(p, &ac);
> kfree(ac.user_mask);
> }
> Kfree kfree(ac.user_mask) cause double free issue. New patch just
> cover the user_cpus_ptr is freed after code running into
> raw_spin_lock_irqsave, if it can not enter into pi_lock critical
> section, what will happen.
>
> Maybe should delte following code at the entry of fuction . Please
> help check it.
>
> - if (!src->user_cpus_ptr) //delete this
>
> - return 0; //delete this
>
> We think maybe path needs a little more modification like following :
>
> kernel/sched/core.c
> <https://lore.kernel.org/lkml/20221123131612.914906-1-longman@redhat.c
> om/#Z31kernel:sched:core.c>
> | 23 +++++++++++++++++++++--
>
> 1 file changed
> <https://lore.kernel.org/lkml/20221123131612.914906-1-longman@redhat.c
> om/#related>,
> 21 insertions(+), 2 deletions(-)
>
> diff
> <https://lore.kernel.org/lkml/20221123131612.914906-1-longman@redhat.c
> om/#iZ31kernel:sched:core.c> --git a/kernel/sched/core.c
> b/kernel/sched/core.c
>
> index 8df51b08bb38..6b259d9e127a 100644
>
> --- a/kernel/sched/core.c
>
> +++ b/kernel/sched/core.c
>
> @@ -2624,8 +2624,14 @@ void do_set_cpus_allowed(struct task_struct *p,
> const struct cpumask *new_mask)
>
> int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct
> *src,
>
> int node)
>
> {
>
> + cpumask_t *user_mask = NULL;
>
> unsigned long flags;
>
> + /*
>
> + * This check is racy and losing the race is a valid situation.
>
> + * It is not worth the extra overhead of taking the pi_lock on
>
> + * every fork/clone.
>
> + */
>
> - if (!src->user_cpus_ptr) //delete this
>
> - return 0; //delete this
>
The clearing of user_cpus_ptr is protected by pi_lock. IOW, racing between dup_user_cpus_ptr() and do_set_cpus_allowed is not possible and double free like what you have suggested should not happen. Yes, the user_cpus_ptr check here is racy. The worst case result is that a user_cpus_ptr has just been set in the task to be cloned, but it fail to copy over the user mask. With or without the check, the race can happen.
The check is an optimization. Its effect is just make one outcome more likely than the other.

Cheers,
Longman