Re: Fwd: Memory corruption in multithreaded user space program while calling fork

From: Linus Torvalds
Date: Sat Jul 08 2023 - 14:05:17 EST


On Sat, 8 Jul 2023 at 10:39, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> That was the v1 fix, but after some discussion
> (https://lkml.kernel.org/r/20230705063711.2670599-1-surenb@xxxxxxxxxx)
> it was decided to take the "excessive" approach.

That makes absolutely _zero_ sense.

It seems to be complete voodoo programming.

To some degree I don't care what happens in stable kernels, but
there's no way we'll do that kind of thing in mainline without some
logic or reason, when it makes no sense.

flush_cache_dup_mm() is entirely irrelevant to the whole issue, for
several reason, but the core one being that it only matters on broken
virtually indexed caches, so none of the architectures that do per-vma
locking.

And the argument that "After the mmap_write_lock_killable(), there
will still be a period where page faults can happen" may be true
(that's kind of the *point* of per-vma locking), but it's irrelevant.

It's true for *all* users of mmap_write_lock_killable, whether in fork
or anywhere else. What makes fork() so magically special?

It's why we have that vma_start_write(), to say "I'm now modifying
*this* vma, so stop accessing it in parallel".

Because no, flush_cache_dup_mm() is not the magical reason to do that thing.

Maybe there is something else going on, but no, we don't write crazy
code without a reason for it. That's completely unmaintainable,
because people will look at that code, not understand it (because
there is nothing to understand) and be afraid to touch it. For no
actual reason.

The obvious place to say "I'm now starting to modify the vma" is when
you actually start to modify the vma.

> Also, this change needs a couple more updates:

Those updates seem sane, and come with explanations of why they exist.
Looks fine to me.

Suren, please send me the proper fixes. Not the voodoo one. The ones
you can explain.

And if stable wants to do something else, then that's fine. But for
the development kernel,. we have two options:

- fix the PER_VMA_LOCK code

- decide that it's not worth it, and just revert it all

and honestly, I'm ok with that second option, simply because this has
all been way too much pain.

But no, we don't mark it broken thinking we can't deal with it, or do
random non-sensible code code we can't explain.

Linus