Re: [PATCH] pipe: limit the per-user amount of pages allocated in pipes

From: Willy Tarreau
Date: Sun Jan 10 2016 - 05:31:09 EST


On Sun, Jan 10, 2016 at 06:00:05PM +0900, Tetsuo Handa wrote:
> On 2015/12/28 23:04, Willy Tarreau wrote:
> >
> > On no-so-small systems, it is possible for a single process to cause an
> > OOM condition by filling large pipes with data that are never read. A
> > typical process filling 4000 pipes with 1 MB of data will use 4 GB of
> > memory. On small systems it may be tricky to set the pipe max size to
> > prevent this from happening.
> >
> > This patch makes it possible to enforce a per-user limit above which
> > new pipes will be limited to a single page, effectively limiting them
> > to 4 kB each. This has the effect of protecting the system against
> > memory abuse without hurting other users, and still allowing pipes to
> > work correctly though with less data at once.
> >
> > The limit is controlled by the new sysctl user-max-pipe-pages, and may
> > be disabled by setting it to zero. The default limit allows the default
> > number of FDs per process (1024) to create pipes of the default size
> > (64kB), thus reaching a limit of 64MB before starting to create only
> > smaller pipes. With a process enforcing the FD limit to 4096, this
> > results in 64MB + 3072 * 4kB = 72 MB.
>
> "1024 * 64kB + 3072 * 4kB = 72MB" might be easier to understand.
>
> But description of upper limit is not clear for me. It is "With a *process*
> enforcing the FD limit to 4096", not "With a *user* enforcing the FD limit
> to 4096"?

Yes, and the reported amount of memory is for the first process of this
user.

> Then, stronger upper limit would be expected because actual upper
> limit is nearly
> "1024 * 64kB + ("max user processes" * "open files" - 1024) * 4kB"
> (e.g. "max user processes" = 4096 and "open files" = 1024 would result in
> 16GB that may be still too much).

Actually it's not hard to limit the per-user number of process on a machine,
and it's generally done on most (all?) massively shared user environments.
Trying to enforce stricter limits may end up randomly breaking software,
which is why I preferred to stick to the single-page pipe solution. The real
issue is that by default, even trying to play with aggressive process limits
and per-process FD limits, it's very hard to maintain low memory usage with
pipes. Now it becomes possible. In a shared environment, limiting to 256
processes and 1024 FDs per process, you end up with "only" 1 GB of RAM which
is more reasonable a on large shared system.

Ideally we'd like to be able to only allocate the pipes on the fly when the
limits are lowered. But that requires changing a lots more things in the
system to handle a queue of pending allocation requests and such things.
While it may make sense for the long term, it's not acceptable as a fix.

>
> >
> > Reported-by: socketpair@xxxxxxxxx
> > Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Willy Tarreau <w@xxxxxx>
>
> Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Mitigates: CVE-2013-4312 (Linux 2.0+)

OK will add this.

> > +/* Maximum allocatable pages per user, matches default values */
> > +unsigned long user_max_pipe_pages = PIPE_DEF_BUFFERS * INR_OPEN_CUR;
> > +EXPORT_SYMBOL(user_max_pipe_pages);
>
> I think both fs/pipe.o and kernel/sysctl.o are built-in modules.
> Do we need to export this variable?

Good question, I don't think so, indeed.

> > @@ -1066,7 +1094,8 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg)
> > if (!nr_pages)
> > goto out;
> >
> > - if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) {
> > + if (!capable(CAP_SYS_RESOURCE) &&
> > + (size > pipe_max_size || too_many_pipe_buffers(pipe->user))) {
> > ret = -EPERM;
> > goto out;
> > }
>
> Maybe !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN) is better
> for too_many_pipe_buffers(pipe->user) case because the reason to limit
> resembles too_many_unix_fds() ?

That makes sense, I'll do this.

> Maybe -ENOMEM is better for too_many_pipe_buffers(pipe->user) case?

I have hesitated on this one. EPERM makes it clear that the error may go
away with more privileges, while ENOMEM would not give this hint. But I'm
not strongly attached to it.

Thanks!
Willy