Re: [PATCH v5 08/24] sched: Introduce per memory space current virtual cpu id

From: Mathieu Desnoyers
Date: Wed Nov 09 2022 - 10:04:36 EST


On 2022-11-09 04:28, Peter Zijlstra wrote:
On Thu, Nov 03, 2022 at 04:03:43PM -0400, Mathieu Desnoyers wrote:

diff --git a/kernel/fork.c b/kernel/fork.c
index 08969f5aa38d..6a2323266942 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1047,6 +1047,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
tsk->reported_split_lock = 0;
#endif
+#ifdef CONFIG_SCHED_MM_VCPU
+ tsk->mm_vcpu = -1;
+ tsk->mm_vcpu_active = 0;
+#endif
return tsk;
free_stack:

Note how the above hunk does exactly the same as the below thunk, and I
think they're even on the same code-path.

How about moving all of this to __sched_fork() or something?

@@ -1579,6 +1586,7 @@ static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
tsk->mm = mm;
tsk->active_mm = mm;
+ sched_vcpu_fork(tsk);
return 0;
}

+void sched_vcpu_fork(struct task_struct *t)
+{
+ WARN_ON_ONCE((t->flags & PF_KTHREAD) || !t->mm);
+ t->mm_vcpu = -1;
+ t->mm_vcpu_active = 1;
+}

Let's look at how things are brought up in copy_process():

p = dup_task_struct(current, node);
tsk->mm_vcpu = -1;
tsk->mm_vcpu_active = 0;
­[...]

/* Perform scheduler related setup. Assign this task to a CPU. */
retval = sched_fork(clone_flags, p);

-> I presume that from this point the task is observable by the scheduler. However, tsk->mm does not point to the new mm yet.
The "mm_vcpu_active" flag == 0 prevents the scheduler from trying to poke into the wrong mm vcpu_id bitmaps across early mm-struct
lifetime (clone/fork), late in the mm-struct lifetime (exit), and across reset of the mm-struct (exec).

[...]

retval = copy_mm(clone_flags, p);
new_mm = dup_mm(old_mm)
mm_init(new_mm)
mm_init_vcpu(new_mm)
-> At this point it becomes OK for the scheduler to poke into the tsk->mm vcpu_id bitmaps. Therefore, sched_vcpu_fork() sets
mm_vcpu_active flag = 1.
sched_vcpu_fork(tsk)
-> From this point the scheduler can poke into tsk->mm's vcpu_id bitmaps.

So what I think we should to do here is to remove the extra "t->mm_vcpu = -1;" assignment from sched_vcpu_fork(), because
it has already been set by dup_task_struct. We could actually turn that into a "WARN_ON_ONCE(t->mm_vcpu != -1)".

However, if my understanding is correct, keeping "tsk->mm_vcpu_active = 0;" early in dup_task_struct and "tsk->mm_vcpu_active = 1"
in sched_vcpu_fork() after the new mm is initialized is really important. Or am I missing something ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com