Re: Possible deny of service with memfd_create()

From: Christian König
Date: Fri Feb 05 2021 - 19:37:42 EST


Am 05.02.21 um 11:50 schrieb Michal Hocko:
On Fri 05-02-21 08:54:31, Christian König wrote:
Am 05.02.21 um 01:32 schrieb Hugh Dickins:
On Thu, 4 Feb 2021, Michal Hocko wrote:
On Thu 04-02-21 17:32:20, Christian Koenig wrote:
Hi Michal,

as requested in the other mail thread the following sample code gets my test
system down within seconds.

The issue is that the memory allocated for the file descriptor is not
accounted to the process allocating it, so the OOM killer pics whatever
process it things is good but never my small test program.

Since memfd_create() doesn't need any special permission this is a rather
nice deny of service and as far as I can see also works with a standard
Ubuntu 5.4.0-65-generic kernel.
Thanks for following up. This is really nasty but now that I am looking
at it more closely, this is not really different from tmpfs in general.
You are free to create files and eat the memory without being accounted
for that memory because that is not seen as your memory from the sysstem
POV. You would have to map that memory to be part of your rss.
I mostly agree. The big difference is that tmpfs is only available when
mounted.

And tmpfs can be restricted in size per mount point as well as per user
quotas IIRC. Looking at my desktop system those restrictions are actually
exactly what I see there.
I cannot find anything about per user quotas for tmpfs in the tmpfs man
page. Or maybe I am looking at a wrong layer and there is a generic
handling somewhere in the vfs core?

I think so, yes. I briefly remember a discussion about how to implement quotas for tmpfs, but that was a really long time ago and I didn't followed it till the end.

But memfd_create() is just free for all, you don't have any size limit nor
access restriction as far as I can see.
Yes, this is unfortunate and a design decision that should have been
considered when the syscall has been introduced. But this boat has
sailed looong ago to change that without risking a userspace breakage.

The only existing protection right now is to use memoery cgroup
controller because the tmpfs memory is accounted to the process which
faults the memory in (or write to the file).
Agreed, but having to rely on cgroup is not really satisfying when you have
to maintain a hardened server.
Yes I do recognize the pain. The only other way to mitigate the risk is
to disallow the syscall to untrusted users in a hardened environment.
You should be very strict in tmpfs usage there already.


Well it is perfectly valid for a process to use as much memory as it wants, the problem is that we are not holding the process accountable for it.

As I said we have similar problems with GPU drivers and I think we just need a way to do this.

Let me think about it a bit, maybe we can somehow use the file owner for this.

Thanks,
Christian.