Re: [PATCH-tip] sched: Don't call kfree() in do_set_cpus_allowed()

From: Waiman Long
Date: Tue Nov 22 2022 - 14:32:11 EST


On 11/22/22 14:24, Peter Zijlstra wrote:
On Tue, Nov 22, 2022 at 10:23:43AM -0500, Waiman Long wrote:
index 78b2d5cabcc5..5fac4aa6ac7f 100644
--- a/kernel/sched/core.c
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b2d5cabcc5..5fac4aa6ac7f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2593,6 +2593,11 @@ __do_set_cpus_allowed(struct task_struct *p, struct
affinity_context *ctx)
                set_next_task(rq, p);
 }

+union cpumask_rcuhead {
+       void *cpumask;
+       struct rcu_head rcu;
+};
+
Hehe; I had this union too; I just figured it'd be nice to not have to
spend these 4 lines to express this. Esp. since we're casting pointers
*anyway*.
Well, that is true. As long as the NULL check is there, I am OK with calling kvfree_call_rcu() directly if Paul doesn't object.
 /*
  * Used for kthread_bind() and select_fallback_rq(), in both cases the user
  * affinity (if any) should be destroyed too.
@@ -2606,7 +2611,12 @@ void do_set_cpus_allowed(struct task_struct *p, const
struct cpumask *new_mask)
        };

        __do_set_cpus_allowed(p, &ac);
-       kfree(ac.user_mask);
+       /*
+        * Because this is called with p->pi_lock held, it is not possible
+        * to use kfree() here (when PREEMPT_RT=y), therefore punt to using
+        * kfree_rcu().
+        */
+       kfree_rcu((union cpumask_rcuhead *)ac.user_mask, rcu);
 }

 int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
@@ -8196,7 +8206,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask
*in_mask)
        struct affinity_context ac;
        struct cpumask *user_mask;
        struct task_struct *p;
-       int retval;
+       int retval, size;

        rcu_read_lock();

@@ -8229,7 +8239,11 @@ long sched_setaffinity(pid_t pid, const struct
cpumask *in_mask)
        if (retval)
                goto out_put_task;

-       user_mask = kmalloc(cpumask_size(), GFP_KERNEL);
+       /*
+        * See do_set_cpus_allowed() for the rcu_head usage.
+        */
+       size = max_t(int, cpumask_size(), sizeof(union cpumask_rcuhead));
+       user_mask = kmalloc(size, GFP_KERNEL);
        if (!user_mask) {
                retval = -ENOMEM;
                goto out_put_task;

We also should fix the allocation in dup_user_cpus_ptr() -- perhaps pull
the thing into a helper.

I have just sent out a new patch to fix that before I saw your email. I do forgot to put -tip in the subject line.

Cheers,
Longman