Re: [PATCH v2] sched/task_group: Re-layout structure to reduce false sharing

From: Aaron Lu
Date: Fri Jun 30 2023 - 05:35:39 EST


On Wed, Jun 28, 2023 at 01:18:34PM +0800, Aaron Lu wrote:
> On Tue, Jun 27, 2023 at 12:14:37PM +0200, Peter Zijlstra wrote:
> > and can we still measure an improvement over this with that approach?
>
> Let me re-run those tests and see how things change.
>
> In my previous tests I didn't turn on CONFIG_RT_GROUP_SCHED. To test
> this, I suppose I'll turn CONFIG_RT_GROUP_SCHED on and apply this change
> here that made tg->load_avg in a dedicated cacheline, then see how
> performances change with the "Make tg->load_avg per node" patch. Will
> report back once done.

The test summary is:
- On 2sockets/112cores/224threads SPR, it's still overall a win.
Transactions of postgres_sysbench improved 47.7%, hackbench improved
13.5% and netperf improved 39.5%;
- On 2sockets/64cores/128threads Icelake, hackbench and netperf have
improvement while postgres_sysbench transaction slightly dropped.
hackbench improved 6.2%, netperf improved 20.3% and transactions of
postgres_sysbench dropped 1.2%;
- On 2sockets/48cores/96threads CascadeLake, hackbench and netperf are
roughly flat.

Below are detailed results:

SPR: 2socket/112cores/224threads

postgres_sysbench/1instance/100%(nr_client=nr_cpu)
kernel transactions(higher is better)
aligned 89623.85±0.35%
per_node 132401.37±0.83%

hackbench/pipe/threads
kernel time(less is better)
aligned 47.43±0.48%
per_node 41.02±0.69%

netperf/UDP_RR/100%(nr_client=nr_cpu)
kernel throughput(higher is better)
aligned 9415.97±3.81%
per_node 13131.24±2.67%

ICL: 2sockets/64cores/128threads

postgres_sysbench/1instance/75%
kernel transactions
aligned 62291.58±0.64%
per_node 61561.40±0.39%

hackbench/pipe/threads
kernel time
aligned 41.66±0.04%
per_node 39.07±0.36%

netperf/UDP_RR/100%
kernel throughput
aligned 21365.01±3.32%
per_node 25692.05±2.03%

CSL: 2sockets/48cores/96threads

hackbench/pipe/threads
kernel time
aligned: 48.78±0.61%
per_node: 48.95±1.06

netperf/UDP_RR/100%
kernel throughput
aligned 25853.82±11.46%
per_node 25264.38±0.85%

I think I'll spin a new version for the "Make tg->load_avg per-node"
patch with all the information I collected.