[PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

From: Vipin Sharma
Date: Wed Dec 22 2021 - 17:53:56 EST


VM worker kthreads can linger in the VM process's cgroup for sometime
after KVM terminates the VM process.

KVM terminates the worker kthreads by calling kthread_stop() which waits
on the 'exited' completion, triggered by exit_mm(), via mm_release(),
during kthread's exit. However, these kthreads are removed from the
cgroup using cgroup_exit() call which happens after exit_mm(). A VM
process can terminate between the time window of exit_mm() to
cgroup_exit(), leaving only worker kthreads in the cgroup.

Moving worker kthreads back to the original cgroup (kthreadd_task's
cgroup) makes sure that cgroup is empty as soon as the main VM process
is terminated.

kthreadd_task is not an exported symbol which causes build errors if KVM
is built as a loadable module. Both users (kvm_main & vhost) of
cgroup_attach_task_all(), have the same issue, therefore, using
kthreadd_task as a default option is chosen when the API is called with
NULL argument.

Signed-off-by: Vipin Sharma <vipinsh@xxxxxxxxxx>
---

v2:
- Use kthreadd_task in the cgroup API to avoid build issue.

v1: https://lore.kernel.org/lkml/20211214050708.4040200-1-vipinsh@xxxxxxxxxx/

kernel/cgroup/cgroup-v1.c | 5 +++++
virt/kvm/kvm_main.c | 15 ++++++++++++++-
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 81c9e0685948..81d4b2f2acf0 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -51,6 +51,8 @@ bool cgroup1_ssid_disabled(int ssid)
* @from: attach to all cgroups of a given task
* @tsk: the task to be attached
*
+ * If @from is NULL then use kthreadd_task for finding the destination cgroups.
+ *
* Return: %0 on success or a negative errno code on failure
*/
int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
@@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
struct cgroup_root *root;
int retval = 0;

+ if (!from)
+ from = kthreadd_task;
+
mutex_lock(&cgroup_mutex);
percpu_down_write(&cgroup_threadgroup_rwsem);
for_each_root(root) {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b0f7e6eb00ff..f7504578c374 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5785,7 +5785,7 @@ static int kvm_vm_worker_thread(void *context)
init_context = NULL;

if (err)
- return err;
+ goto out;

/* Wait to be woken up by the spawner before proceeding. */
kthread_parkme();
@@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context)
if (!kthread_should_stop())
err = thread_fn(kvm, data);

+out:
+ /*
+ * We need to move the kthread back to its original cgroups, so that it
+ * doesn't linger in the cgroups of the user process after the user
+ * process has already terminated.
+ *
+ * kthread_stop() waits on 'exited' completion condition which is set
+ * in exit_mm(), via mm_release(), in do_exit(). However, kthread
+ * is removed from cgroups in the cgroup_exit() which is called after
+ * exit_mm(). This causes lingering of kthreads in cgroups after main
+ * VM process has finished.
+ */
+ WARN_ON(cgroup_attach_task_all(NULL, current));
return err;
}


base-commit: 5e4e84f1124aa02643833b7ea40abd5a8e964388
--
2.34.1.307.g9b7440fafd-goog