Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

From: Tejun Heo
Date: Wed Jun 21 2023 - 16:38:21 EST


Hello, Swapnil.

On Mon, Jun 19, 2023 at 10:00:33AM +0530, Swapnil Sapkal wrote:
...
> Thanks for the patchset. I tested the patchset with fiotests.
> Tests were run on a dual socket 3rd Generation EPYC server(2 x64C/128T)
> with NPS1, NPS2 and NPS4 modes.

Can you elaborate or point me to a doc explaining the differences between
NPS1, 2 and 4? My feeble attempt at googling didn't lead to anything useful.
What's the test doing and how long are they running?

> With affinity-scopes-v2, below are the observations:
> BW, LAT AVG and CLAT AVG shows improvement with some combinations
> of the params in NPS1 and NPS2 while all other combinations of params
> show no loss or gain in the performance. Those combinations showing
> improvement are marked with ### and those showing drop in performance
> are marked with ***. CLAT 99 shows mixed results in all the NPS modes.
> SLAT 99 is suffering tremendously in all NPS mode.

Lower thread count tests showing larger variance is consistent with my
experience. Sometimes the scheduling and its interaction with workload seems
to exhibit bi(or higher degree)-modal behaviors and the swings get a lot
more severe when clock boosting is involved.

Outside of that tho, I'm having a difficult time interpreting the results.
It's definitely possible that I made some mistakes but in theory NUMA should
behave about the same as before the patchset, which seem sto hold for most
of the results but there are some striking outliers.

So, here's a suggestion. How about we pick two scenarios, one where CACHE is
doing better and one worse, and then run those two specific scenarios
multiple times and see how consistent the results are?

Thanks.

--
tejun