Re: [GIT PULL] execve updates for v6.8-rc1

From: Linus Torvalds
Date: Wed Jan 10 2024 - 15:12:43 EST


On Wed, 10 Jan 2024 at 11:24, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> I've been trying to figure out how to measure only the execve portion of
> a workload (with perf)[1] to get a more real-world measurement, but the
> above does show improvements for the "open once early". I'll get the
> behavior landed in -next after the merge window closes, and we can
> continue examining if we can make do_filp_open() better...

Well, so the workload that shows the "early open" issue most is
actually something disgusting like this:

#include <unistd.h>

int main(int argc, char **argv, char **envp)
{
for (int i = 0; i < 10000000; i++)
execve("nonexistent", argv, envp);
return 0;
}

and it's trivial to run under perf. You'll get something like this
with my patch:

8.65% [k] strncpy_from_user
8.37% [k] kmem_cache_alloc
7.71% [k] kmem_cache_free
5.14% [k] mod_objcg_state
4.84% [k] link_path_walk
4.36% [k] memset_orig
3.93% [k] percpu_counter_add_batch
3.66% [k] do_syscall_64
3.63% [k] path_openat

and with the hacky "open twice" you'll see that kmem_cache_alloc/free
should be much lower - it still does a kmem_cache_alloc/free() pair
for the pathname, but the 'struct file *' allocation/free goes away.

Anyway, you can see a very similar issue by replacing the "execve()" line with

open("nonexistent", O_RDONLY);

instead, and for exactly the same reason. Just to show that this issue
isn't execve-related.

I really think that the "open twice" is wrong. It will look
artificially good in this "does not exist" case, but it will penalize
other cases, and it just hides this issue.

Without either of the patches, you'll see that execve case spend all
its time counting environment variables, and be much slower as a
result. Instead of that "strncpy_from_user()", you'll see
"strnlen_user()" and ccopy_from_user() shoot up because of that.

The above perf profile is actually quote good in general: the slab
alloc/free is a big issue only because nothing else is.

Oh, and the above profile depends *heavily* on your particular
microarchitecture and which mitigations you have in place. System call
overhead might be at the top, for example.

And the cost of "strncpy_from_user()" is so high above not because we
do a lot of copies (it's just that shortish filename), but simply
mainly because user copies are so insanely expensive on some uarchs
due to CLAC/STAC being expensive.

So even a short filename copy can end up taking more than the whole path walk.

So your exact mileage will vary, but you should see that pattern of
"kmem_cache_alloc/free" (and the "strnlen_user()" issue with none of
the patches being applied) etc etc.

Linus