Re: [PATCH] pipe: limit the per-user amount of pages allocated in pipes

From: Tetsuo Handa
Date: Sun Jan 10 2016 - 04:01:33 EST


On 2015/12/28 23:04, Willy Tarreau wrote:
>
> On no-so-small systems, it is possible for a single process to cause an
> OOM condition by filling large pipes with data that are never read. A
> typical process filling 4000 pipes with 1 MB of data will use 4 GB of
> memory. On small systems it may be tricky to set the pipe max size to
> prevent this from happening.
>
> This patch makes it possible to enforce a per-user limit above which
> new pipes will be limited to a single page, effectively limiting them
> to 4 kB each. This has the effect of protecting the system against
> memory abuse without hurting other users, and still allowing pipes to
> work correctly though with less data at once.
>
> The limit is controlled by the new sysctl user-max-pipe-pages, and may
> be disabled by setting it to zero. The default limit allows the default
> number of FDs per process (1024) to create pipes of the default size
> (64kB), thus reaching a limit of 64MB before starting to create only
> smaller pipes. With a process enforcing the FD limit to 4096, this
> results in 64MB + 3072 * 4kB = 72 MB.

"1024 * 64kB + 3072 * 4kB = 72MB" might be easier to understand.

But description of upper limit is not clear for me. It is "With a *process*
enforcing the FD limit to 4096", not "With a *user* enforcing the FD limit
to 4096"? Then, stronger upper limit would be expected because actual upper
limit is nearly
"1024 * 64kB + ("max user processes" * "open files" - 1024) * 4kB"
(e.g. "max user processes" = 4096 and "open files" = 1024 would result in
16GB that may be still too much).

>
> Reported-by: socketpair@xxxxxxxxx
> Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Willy Tarreau <w@xxxxxx>

Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Mitigates: CVE-2013-4312 (Linux 2.0+)

> +/* Maximum allocatable pages per user, matches default values */
> +unsigned long user_max_pipe_pages = PIPE_DEF_BUFFERS * INR_OPEN_CUR;
> +EXPORT_SYMBOL(user_max_pipe_pages);

I think both fs/pipe.o and kernel/sysctl.o are built-in modules.
Do we need to export this variable?

> @@ -1066,7 +1094,8 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg)
> if (!nr_pages)
> goto out;
>
> - if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) {
> + if (!capable(CAP_SYS_RESOURCE) &&
> + (size > pipe_max_size || too_many_pipe_buffers(pipe->user))) {
> ret = -EPERM;
> goto out;
> }

Maybe !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN) is better
for too_many_pipe_buffers(pipe->user) case because the reason to limit
resembles too_many_unix_fds() ?

Maybe -ENOMEM is better for too_many_pipe_buffers(pipe->user) case?