Re: [PATCH 1/2] fs/exec: allow to unshare a time namespace on vfork+exec

From: Christian Brauner
Date: Wed Jun 15 2022 - 04:00:20 EST


On Wed, Jun 15, 2022 at 09:53:29AM +0200, Florian Weimer wrote:
> * Kees Cook:
>
> > On Sun, Jun 12, 2022 at 11:07:22PM -0700, Andrei Vagin wrote:
> >> Right now, a new process can't be forked in another time namespace
> >> if it shares mm with its parent. It is prohibited, because each time
> >> namespace has its own vvar page that is mapped into a process address
> >> space.
> >>
> >> When a process calls exec, it gets a new mm and so it could be "legal"
> >> to switch time namespace in that case. This was not implemented and
> >> now if we want to do this, we need to add another clone flag to not
> >> break backward compatibility.
> >>
> >> We don't have any user requests to switch times on exec except the
> >> vfork+exec combination, so there is no reason to add a new clone flag.
> >> As for vfork+exec, this should be safe to allow switching timens with
> >> the current clone flag. Right now, vfork (CLONE_VFORK | CLONE_VM) fails
> >> if a child is forked into another time namespace. With this change,
> >> vfork creates a new process in parent's timens, and the following exec
> >> does the actual switch to the target time namespace.
> >
> > This seems like a very special case. None of the other namespaces do
> > this, do they?
>
> I think this started with CLONE_NEWPID, which had a similar delayed
> effect with unshare: it happens only after fork, not for the current
> process image. I think it's just a limitation of the unshare interface.
> Some of the effects simply have to be delayed due to their nature.

I tried to give more context in another mail wrt to time namespaces
specifically.

For pid namespaces one problem would be that it could end up confusing a
process about its own pid. This was a more serious problem when the pid
cache was still active in glibc; but fwiw systemd still has a pid cache
afair.

Christian