Re: How should rlimits, suid exec, and capabilities interact?

From: Eric W. Biederman
Date: Wed Feb 23 2022 - 20:33:00 EST



Linus Torvalds <linus@xxxxxxxxxxxx> writes:

> Basic rule: it's better to be too lenient than to be too strict.

Thank you. With that guideline I can explore the space of what is
possible.

Question: Running a suid program today charges the activity of that
program to the user who ran that program, not to the user the program
runs as. Does anyone see a problem with charging the user the program
runs as?

The reason I want to change which user is charged with a process
(besides it making more sense in my head) is so that
"capable(CAP_SYS_RESOURCE)" can be used instead of the magic incantation
"(cred->user == INIT_USER)".

Today "capable(CAP_SYS_RESOURCE)" with respect to RLIMIT_NPROC is
effectively meaningless for suid programs because the of the mismatch of
charging the real user with the effective users credentials.

An accidental experiment happened in v5.14-rc1 in July when the ucount
rlimit code was merged. It was only this last week when after Michal
Koutný discovered the discrepancy through code inspection I merged a
bug fix because the code was not preserving the existing behavior as
intended.


This behavior has existed in some form since Linux v1.0 when per user
process limits were added.

The original code in v1.0 was:
> static int find_empty_process(void)
> {
> int free_task;
> int i, tasks_free;
> int this_user_tasks;
>
> repeat:
> if ((++last_pid) & 0xffff8000)
> last_pid=1;
> this_user_tasks = 0;
> tasks_free = 0;
> free_task = -EAGAIN;
> i = NR_TASKS;
> while (--i > 0) {
> if (!task[i]) {
> free_task = i;
> tasks_free++;
> continue;
> }
> if (task[i]->uid == current->uid)
> this_user_tasks++;
> if (task[i]->pid == last_pid || task[i]->pgrp == last_pid ||
> task[i]->session == last_pid)
> goto repeat;
> }
> if (tasks_free <= MIN_TASKS_LEFT_FOR_ROOT ||
> this_user_tasks > MAX_TASKS_PER_USER)
> if (current->uid)
> return -EAGAIN;
> return free_task;
> }

Having tracked the use of real uid in limits back this far my guess
is that it was an accident of the implementation and real uid vs
effective uid had not be considered.

Does anyone know if choosing the real uid vs the effective uid for
accounting a users processes was a deliberate decision anywhere in the
history of Linux?



Linus you were talking about making it possible to login as I think a
non-root user to be able to use sudo and kill a fork bomb.

The counter case is apache having a dedicated user for running
cgi-scripts and using RLIMIT_NPROC to limit how many of those processes
can exist. Unless I am misunderstanding something that looks exactly
like your login as non-root so you can run sudo to kill a fork-bomb.

A comment from an in-process cleanup patch explains this as best I can:
/*
* In general rlimits are only enforced when a new resource
* is acquired. That would be during fork for RLIMIT_NPROC.
* That is insufficient for RLIMIT_NPROC as many attributes of
* a new process must be set between fork and exec.
*
* A case where this matter is when apache runs forks a process
* and calls setuid to run cgi-scripts as a different user.
* Generating those processes through a code sequence like:
*
* fork()
* setrlimit(RLIMIT_NPROC, ...)
* execve() -- suid wrapper
* setuid()
* execve() -- cgi script
*
* The cgi-scripts are unlikely to fork on their own so unless
* RLIMIT_NPROC is checked after the user change and before
* the cgi-script starts, RLIMIT_NPROC simply will not be enforced
* for the cgi-scripts.
*
* So the code tracks if between fork and exec if an operation
* occurs that could cause the RLIMIT_NPROC check to fail. If
* such an operation has happened re-check RLIMIT_NPROC.
*/


Answered-Question: I was trying to ask if anyone knows of a reason why
we can't just sanitize the rlimits of the process during suid exec?
Linus your guideline would appear to allow that behavior. Unfortunately
that looks like it would break current usage of apache suexec.

Eric