Re: [PATCH-tip] sched: Fix use-after-free bug in dup_user_cpus_ptr()

From: Waiman Long
Date: Fri Dec 02 2022 - 09:32:09 EST


On 12/2/22 05:18, Will Deacon wrote:
On Thu, Dec 01, 2022 at 12:03:39PM -0500, Waiman Long wrote:
On 12/1/22 08:44, Will Deacon wrote:
On Sun, Nov 27, 2022 at 08:44:41PM -0500, Waiman Long wrote:
Since commit 07ec77a1d4e8 ("sched: Allow task CPU affinity to be
restricted on asymmetric systems"), the setting and clearing of
user_cpus_ptr are done under pi_lock for arm64 architecture. However,
dup_user_cpus_ptr() accesses user_cpus_ptr without any lock
protection. When racing with the clearing of user_cpus_ptr in
__set_cpus_allowed_ptr_locked(), it can lead to user-after-free and
double-free in arm64 kernel.

Commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask") fixes this problem as user_cpus_ptr, once set, will never
be cleared in a task's lifetime. However, this bug was re-introduced
in commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") which allows the clearing of user_cpus_ptr in
do_set_cpus_allowed(). This time, it will affect all arches.

Fix this bug by always clearing the user_cpus_ptr of the newly
cloned/forked task before the copying process starts and check the
user_cpus_ptr state of the source task under pi_lock.

Note to stable, this patch won't be applicable to stable releases.
Just copy the new dup_user_cpus_ptr() function over.

Fixes: 07ec77a1d4e8 ("sched: Allow task CPU affinity to be restricted on asymmetric systems")
Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
CC: stable@xxxxxxxxxxxxxxx
Reported-by: David Wang 王标 <wangbiao3@xxxxxxxxxx>
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
kernel/sched/core.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
As per my comments on the previous version of this patch:

https://lore.kernel.org/lkml/20221201133602.GB28489@willie-the-truck/T/#t

I think there are other issues to fix when racing affinity changes with
fork() too.
It is certainly possible that there are other bugs hiding somewhere:-)
Right, but I actually took the time to hit the same race for the other
affinity mask field so it seems a bit narrow-minded for us just to fix the
one issue.

I focused on this particular one because of a double-free bug report from David. What other fields have you found to be subjected to data race?


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8df51b08bb38..f2b75faaf71a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2624,19 +2624,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
int node)
{
+ cpumask_t *user_mask;
unsigned long flags;
+ /*
+ * Always clear dst->user_cpus_ptr first as their user_cpus_ptr's
+ * may differ by now due to racing.
+ */
+ dst->user_cpus_ptr = NULL;
+
+ /*
+ * This check is racy and losing the race is a valid situation.
+ * It is not worth the extra overhead of taking the pi_lock on
+ * every fork/clone.
+ */
if (!src->user_cpus_ptr)
return 0;
data_race() ?
Race is certainly possible, but the clearing of user_cpus_ptr before will
mitigate any risk.
Sorry, I meant let's wrap this access in the data_race() macro and add a
comment so that KCSAN won't report the false positive.

Good point. I should have done that.

Thanks,
Longman