Re: [PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm

From: Mateusz Guzik
Date: Tue Nov 07 2023 - 19:03:36 EST


On 11/8/23, Kees Cook <kees@xxxxxxxxxx> wrote:
>
>
> On November 7, 2023 3:08:47 PM PST, Mateusz Guzik <mjguzik@xxxxxxxxx>
> wrote:
>>On 11/7/23, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>> On Tue, Nov 07, 2023 at 10:23:16PM +0100, Mateusz Guzik wrote:
>>>> If the patch which dodges second lookup still somehow appears slower a
>>>> flamegraph or other profile would be nice. I can volunteer to take a
>>>> look at what's going on provided above measurements will be done and
>>>> show funkyness.
>>>
>>> When I looked at this last, it seemed like all the work done in
>>> do_filp_open() (my patch, which moved the lookup earlier) was heavier
>>> than the duplicate filename_lookup().
>>>
>>> What I didn't test was moving the sched_exec() before the mm creation,
>>> which Peter confirmed shouldn't be a problem, but I think that might be
>>> only a tiny benefit, if at all.
>>>
>>> If you can do some comparisons, that would be great; it always takes me
>>> a fair bit of time to get set up for flame graph generation, etc. :)
>>>
>>
>>So I spawned *one* process executing one statocally linked binary in a
>>loop, test case from http://apollo.backplane.com/DFlyMisc/doexec.c .
>>
>>The profile is definitely not what I expected:
>> 5.85% [kernel] [k] asm_exc_page_fault
>> 5.84% [kernel] [k] __pv_queued_spin_lock_slowpath
>>[snip]
>>
>>I'm going to have to recompile with lock profiling, meanwhile
>>according to bpftrace
>>(bpftrace -e 'kprobe:__pv_queued_spin_lock_slowpath { @[kstack()] =
>> count(); }')
>>top hits would be:
>>
>>@[
>> __pv_queued_spin_lock_slowpath+1
>> _raw_spin_lock+37
>> __schedule+192
>> schedule_idle+38
>> do_idle+366
>> cpu_startup_entry+38
>> start_secondary+282
>> secondary_startup_64_no_verify+381
>>]: 181
>>@[
>> __pv_queued_spin_lock_slowpath+1
>> _raw_spin_lock_irq+43
>> wait_for_completion+141
>> stop_one_cpu+127
>> sched_exec+165
>
> There's the suspicious sched_exec() I was talking about! :)
>
> I think it needs to be moved, and perhaps _later_ instead of earlier?
> Hmm...
>

I'm getting around 3.4k execs/s. However, if I "taskset -c 3
./static-doexec 1" the number goes up to about 9.5k and lock
contention disappears from the profile. So off hand looks like the
task is walking around the box when it perhaps could be avoided -- it
is idle apart from running the test. Again this is going to require a
serious look instead of ad hoc pokes.

Side note I actually read your patch this time around instead of
skimming through it and assuming it did what I thought.

do_filp_open is of course very expensive and kmalloc + kfree are slow.
On top of it deallocating a file object even after a failed open was
very expensive due to delegation to task_work (recently fixed).

What I claim should be clear-cut faster is that lookup as in the
original patch and only messing with file allocation et al if it
succeeds.

--
Mateusz Guzik <mjguzik gmail.com>