Re: [GIT PULL] execve updates for v6.8-rc1

From: Linus Torvalds
Date: Sat Jan 20 2024 - 17:19:09 EST


On Thu, 11 Jan 2024 at 09:42, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> It's the "don't allocate filp until you actually need it" that looks
> nasty. And yes, atomic_open() is part of the problem, but so is the
> fact that wee end up saving some flags in the filp early.

So just an update on this, since I came back to it.

It turns out that the bulk of the cost of the 'struct file *'
allocation is actually the exact same thing that was discussed the
other day about file locking: it's the fact that the file allocation
is done with SLAB_ACCOUNT. See

https://lore.kernel.org/all/CAHk-=wg_CoTOfkREgaQQA6oJ5nM9ZKYrTn=E1r-JnvmQcgWpSg@xxxxxxxxxxxxxx/

and that thread on the recent file locking accounting discussion.

Now, the allocation itself isn't free either, but the SLAB_ACCOUNT
really does make it noticeable more expensive than it should be.

It's a bit more spread out: the cost of the slab allocation itself is
mainly the (optimized) path that does a cmpxchg and the memset, but
the SLAB_ACCOUNT cost is spread out in mod_objcg_state,
__memcg_slab_post_alloc_hook, obj_cgroup_charge,
__memcg_slab_free_hook).

And that actually opens the door up for a _somewhat_ simple partial
workaround: instead of using SLAB_ACCOUNT, we could just do the memcg
accounting when we set FMODE_OPEN instead, and de-account it when we
free the filp (which checks FMODE_OPEN since other cleanup is
dependent on that anyway).

That would help not just the execve() case, but the whole "open
non-existent file" case too.

And I suspect "open()" with ENOENT is actually way more common than
execve() is. All those open->fstat paths for various perfectly normal
loads.

Anyway, I didn't actually write that patch, but I did want to mention
it as a smaller-scale fix (because getting rid of the 'struct file'
allocation entirely does look somewhat painful).

End result: I committed my "move do_open_execat() to the beginning of
execve()" patch, since it's clearly an improvement on the existing
behavior, and that whole "struct file allocations are unnecessarily
expensive" issue is a separate thing.

Linus