Re: [RFC v1 2/4] kernel/fork.c: implement new process_mmput_async syscall

From: Eric W. Biederman
Date: Thu Nov 11 2021 - 14:20:27 EST


Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> writes:

> The goal of this new syscall is to be able to asynchronously free the
> mm of a dying process. This is especially useful for processes that use
> huge amounts of memory (e.g. databases or KVM guests). The process is
> allowed to terminate immediately, while its mm is cleaned/reclaimed
> asynchronously.
>
> A separate process needs use the process_mmput_async syscall to attach
> itself to the mm of a running target process. The process will then
> sleep until the last user of the target mm has gone.
>
> When the last user of the mm has gone, instead of synchronously free
> the mm, the attached process is awoken. The syscall will then continue
> and clean up the target mm.
>
> This solution has the advantage that the cleanup of the target mm can
> happen both be asynchronous and properly accounted for (e.g. cgroups).
>
> Tested on s390x.
>
> A separate patch will actually wire up the syscall.

I am a bit confused.

You want the process report that it has finished immediately,
and you want the cleanup work to continue on in the background.

Why do you need a separate process?

Why not just modify the process cleanup code to keep the task_struct
running while allowing waitpid to reap the process (aka allowing
release_task to run)? All tasks can be already be reaped after
exit_notify in do_exit.

I can see some reasons for wanting an opt-in. It is nice to know all of
a processes resources have been freed when waitpid succeeds.

Still I don't see why this whole thing isn't exit_mm returning
the mm_sturct when a flag is set, and then having an exit_mm_late
being called and passed the returned mm after exit_notify.

Or maybe something with schedule_work or task_work, instead of an
exit_mm_late. I don't see any practical difference.

I really don't see why this needs a whole other process to connect to
the process you care about asynchronously.

This whole thing seems an exercise in spending lots of resources to free
resources much later.

Eric