Re: [tip:timers/core] [timers] 7ee9887703: netperf.Throughput_Mbps -1.2% regression

From: Frederic Weisbecker
Date: Tue Mar 12 2024 - 19:57:38 EST


Le Fri, Mar 01, 2024 at 04:09:24PM +0800, kernel test robot a écrit :
>
>
> Hello,
>
> kernel test robot noticed a -1.2% regression of netperf.Throughput_Mbps on:
>
>
> commit: 7ee988770326fca440472200c3eb58935fe712f6 ("timers: Implement the hierarchical pull model")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git timers/core
>
> testcase: netperf
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> parameters:
>
> ip: ipv4
> runtime: 300s
> nr_threads: 200%
> cluster: cs-localhost
> test: SCTP_STREAM
> cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> | Closes: https://lore.kernel.org/oe-lkp/202403011511.24defbbd-oliver.sang@xxxxxxxxx
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240301/202403011511.24defbbd-oliver.sang@xxxxxxxxx
>
> =========================================================================================
> cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
> cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-8.3/200%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp2/SCTP_STREAM/netperf
>
> commit:
> 57e95a5c41 ("timers: Introduce function to check timer base is_idle flag")
> 7ee9887703 ("timers: Implement the hierarchical pull model")

So I can reproduce. And after hours staring at traces I haven't really found
the real cause of this. 1% difference is not always easy to track down.
But here are some sort of conclusion so far:

_ There is an increase of ksoftirqd use (+13%) but if I boot with threadirqs
before and after the patch (which means that ksoftirqd is used all the time
for softirq handling) I still see the performance regression. So this
shouldn't play a role here.

_ I suspected that timer migrators handling big queues of timers on behalf of
idle CPUs would delay NET_RX softirqs but it doesn't seem to be the case. I
don't see TIMER vector delaying NET_RX vector after the hierarchical pull
model, quite the opposite actually, they are less delayed overall.

_ I suspected that timer migrators handling big queues would add scheduling
latency. But it doesn't seem to be the case. Quite the opposite again,
surprisingly.

_ I have observed that, in average, timers execute later with the hierarchical
pull model. The following delta:
time of callback execution - bucket_expiry
is 3 times higher with the hierarchical pull model. Whether that plays a role
is unclear. It might still be interesting to investigate.

_ The initial perf profile seem to suggest a big increase of task migration. Is
it the result of ping-pong wakeup? Does that play a role?

Thanks.