Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK

From: Andrew Morton
Date: Tue Aug 15 2017 - 18:51:23 EST


On Fri, 11 Aug 2017 17:28:29 -0400 riel@xxxxxxxxxx wrote:

> From: Rik van Riel <riel@xxxxxxxxxx>
>
> Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> empty in the child process after fork. This differs from MADV_DONTFORK
> in one important way.
>
> If a child process accesses memory that was MADV_WIPEONFORK, it
> will get zeroes. The address ranges are still valid, they are just empty.
>
> If a child process accesses memory that was MADV_DONTFORK, it will
> get a segmentation fault, since those address ranges are no longer
> valid in the child after fork.
>
> Since MADV_DONTFORK also seems to be used to allow very large
> programs to fork in systems with strict memory overcommit restrictions,
> changing the semantics of MADV_DONTFORK might break existing programs.
>
> MADV_WIPEONFORK only works on private, anonymous VMAs.
>
> The use case is libraries that store or cache information, and
> want to know that they need to regenerate it in the child process
> after fork.
>
> Examples of this would be:
> - systemd/pulseaudio API checks (fail after fork)
> (replacing a getpid check, which is too slow without a PID cache)
> - PKCS#11 API reinitialization check (mandated by specification)
> - glibc's upcoming PRNG (reseed after fork)
> - OpenSSL PRNG (reseed after fork)
>
> The security benefits of a forking server having a re-inialized
> PRNG in every child process are pretty obvious. However, due to
> libraries having all kinds of internal state, and programs getting
> compiled with many different versions of each library, it is
> unreasonable to expect calling programs to re-initialize everything
> manually after fork.
>
> A further complication is the proliferation of clone flags,
> programs bypassing glibc's functions to call clone directly,
> and programs calling unshare, causing the glibc pthread_atfork
> hook to not get called.
>
> It would be better to have the kernel take care of this automatically.

I'll add "The patch also adds MADV_KEEPONFORK, to undo the effects of a
prior MADV_WIPEONFORK." here.

I guess it isn't worth mentioning that these things can cause VMA
merges and splits.

> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -80,6 +80,17 @@ static long madvise_behavior(struct vm_area_struct *vma,
> }
> new_flags &= ~VM_DONTCOPY;
> break;
> + case MADV_WIPEONFORK:
> + /* MADV_WIPEONFORK is only supported on anonymous memory. */
> + if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> + error = -EINVAL;
> + goto out;
> + }
> + new_flags |= VM_WIPEONFORK;
> + break;
> + case MADV_KEEPONFORK:
> + new_flags &= ~VM_WIPEONFORK;
> + break;
> case MADV_DONTDUMP:
> new_flags |= VM_DONTDUMP;
> break;

It seems odd to permit MADV_KEEPONFORK against other-than-anon vmas?