Re: [RFC PATCH 00/14] context_tracking,x86: Defer some IPIs until a user->kernel transition

From: Nadav Amit
Date: Wed Jul 05 2023 - 14:48:29 EST




> On Jul 5, 2023, at 11:12 AM, Valentin Schneider <vschneid@xxxxxxxxxx> wrote:
>
> Deferral approach
> =================
>
> Storing each and every callback, like a secondary call_single_queue turned out
> to be a no-go: the whole point of deferral is to keep NOHZ_FULL CPUs in
> userspace for as long as possible - no signal of any form would be sent when
> deferring an IPI. This means that any form of queuing for deferred callbacks
> would end up as a convoluted memory leak.
>
> Deferred IPIs must thus be coalesced, which this series achieves by assigning
> IPIs a "type" and having a mapping of IPI type to callback, leveraged upon
> kernel entry.

I have some experience with similar an optimization. Overall, it can make
sense and as you show, it can reduce the number of interrupts.

The main problem of such an approach might be in cases where a process
frequently enters and exits the kernel between deferred-IPIs, or even worse -
the IPI is sent while the remote CPU is inside the kernel. In such cases, you
pay the extra cost of synchronization and cache traffic, and might not even
get the benefit of reducing the number of IPIs.

In a sense, it's a more extreme case of the overhead that x86’s lazy-TLB
mechanism introduces while tracking whether a process is running or not. But
lazy-TLB would change is_lazy much less frequently than context tracking,
which means that the deferring the IPIs as done in this patch-set has a
greater potential to hurt performance than lazy-TLB.

tl;dr - it would be beneficial to show some performance number for both a
“good” case where a process spends most of the time in userspace, and “bad”
one where a process enters and exits the kernel very frequently. Reducing
the number of IPIs is good but I don’t think it is a goal by its own.

[ BTW: I did not go over the patches in detail. Obviously, there are
various delicate points that need to be checked, as avoiding the
deferring of IPIs if page-tables are freed. ]