Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

From: Bernd Edlinger
Date: Thu Apr 02 2020 - 15:31:31 EST


On 4/2/20 9:04 PM, Linus Torvalds wrote:
> On Wed, Apr 1, 2020 at 9:16 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>>
>> The work on exec starts solving a long standing issue with exec that
>> it takes mutexes of blocking userspace applications, which makes exec
>> extremely deadlock prone. For the moment this adds a second mutex
>> with a narrower scope that handles all of the easy cases. Which
>> makes the tricky cases easy to spot. With a little luck the code to
>> solve those deadlocks will be ready by next merge window.
>
> So this worries me.
>
> I've pulled it, but I'm not entirely happy about some of it.
>
> For example, the "rationale" for some of the changes is
>
> This should be safe, as the credentials are only used for reading.
>

What I meant, but did probably not find a good way to say it.

There are places where credentials of other threads are written,
e.g. set no new privs on a thread, that already started to execve
a setuid process.

You always have the right to change the credentials of the own thread,
you dont need a mutex for it.

This is at least what is my impression how the existing mutexes are used,
a mutex called "cred_guard_mutex" is a not very good self explaining name,
in my opinion, it is totally unclear what it does "guard", and why.


Bernd.

> which is just nonsensical. "Only used for reading" is immaterial, and
> there's no explanation for why that would matter at all. Most of the
> credentials are ever used for reading, and the worry about execve() is
> that the credentials can change, and people race with them and use the
> new 'suid' credentials and allow things that shouldn't be allowed. So
> the rationale makes no sense at all.
>
> Btw, if "this only takes it for reading" is such a big deal, why is
> that mutex not an rw-semaphore?
>
> The pidfd change at least has a rationale that makes sense:
>
> This should be safe, as the credentials do not change
> before exec_update_mutex is locked. Therefore whatever
> file access is possible with holding the cred_guard_mutex
> here is also possbile with the exec_update_mutex.
>
> so now you at least have an explanation of why that particular lock
> makes sense and works and is equivalent.
>
> It's still not a *great* explanation for why it would be equivalent,
> because cred_guard_mutex ends up not just guarding the write of the
> credentials, but makes it atomic wrt *other* data. That's the same
> problem as "I'm only reading".
>
> Locking is not about *one* value being consistent - if that was the
> case, then you could just do a "get refcount on the credentials, now I
> have a stable set of creds I can read forever". No lock needed.
>
> So locking is about *multiple* values being consistent, which is why
> that "I'm only reading" is not an explanation for why you can change
> the lock.
>
> It's also why that second one is questionable: it's a _better_ attempt
> at explaining things, but the point is really that cred_guard_mutex
> protects *other* things too.
>
> A real explanation would have absolutely *nothing* to do with
> "reading" or "same value of credentials". Both of those are entirely
> immaterial, since - as mentioned - you could just get a snapshot of
> the creds instead.
>
> A real explanation would be about how there is no other state that
> cred_guard_mutex protects that matters.
>
> See what I'm saying?
>
> This code is subtle as h*ll, and we've had bugs in it, and it has a
> series of tens of patches to fix them. But that also means that the
> explanations for the patches should take the subtleties into account,
> and not gloss over them with things like this.
>
> Ok, enough about the explanations. The actual _code_ is kind of odd
> too. For example, you have that "bprm->called_exec_mmap" flag to say
> "I've taken the exec_update_mutex, and need to drop it".
>
> But that flag is not set anywhere _near_ actually taking the lock.
> Sure, it is taken after exec_mmap() returns successfully, and that
> makes sense from a naming standpoint, but wouldn't it have been a
> _lot_ more obvious if you just set the flag when you took that lock,
> and instead of naming it by some magical code sequence, you named it
> for what it does?
>
> Again, this looks all technically correct, but it's written in a way
> that doesn't seem to make a lot of sense. Why is the code literally
> written with a magical assumption of "calling exec_mmap takes this
> lock, so if the flag named called_exec_mmap is set, I have to free
> that lock that is not named that at all".
>
> I hate conditional locking in the first place, but if it has to exist,
> then the conditional should be named after the lock, and the lock
> getting should be very very explicitly tied to it.
>
> Wouldn't it have been much clearer if you called that flag
> "exec_update_mutex_taken", and set it WHEN YOU TAKE IT?
>
> In fact, then you could drop the
>
> mutex_unlock(&tsk->signal->exec_update_mutex);
>
> in the error case of exec_mmap(), because now the error handling in
> free_bprm() would do the cleanup automatically.
>
> See what I'm saying? You've made the locking more complex and subtle
> than it needed to be. And since the whole point of the *new* lock is
> that it should replace an old lock that was really complex and subtle,
> that's a problem.
>
> Linus
>