Re: [PATCHSET v1 wq/for-6.5] workqueue: Improve unbound workqueue execution locality

From: Pin-yen Lin
Date: Thu Jun 29 2023 - 05:49:51 EST


Hi Linus and Tejun,

On Thu, Jun 22, 2023 at 3:32 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 21 Jun 2023 at 12:16, Tejun Heo <tj@xxxxxxxxxx> wrote:
> >
> > I find that perplexing given that switching to a per-cpu workqueue remedies
> > the situation quite a bit, which is how this patchset came to be. #3 is the
> > same as per-cpu workqueue, so if you're seeing noticeably different
> > performance numbers between #3 and per-cpu workqueue, there's something
> > wrong with either the code or test setup.
>
In our case, per-cpu workqueue (removing WQ_UNBOUND) doesn't bring us
better results. But given that pinning tasks to a single CPU core
helps, we thought that the regression is related to the behavior of
WQ_UNBOUND. Our findings are listed in [1].

We already use WQ_SYSFS and the sysfs interface to pin the tasks, but
thanks for the suggestion.

[1]: https://lore.kernel.org/all/ZFvpJb9Dh0FCkLQA@xxxxxxxxxx/

> Or maybe there's some silly thinko in the wq code that is hidden by
> the percpu code.
>
> For example, WQ_UNBOUND triggers a lot of other overhead at least on
> wq allocation and free. Maybe some of that stuff then indirectly
> affects workqueue execution even when strict cpu affinity is set.
>
> Pin-Yen Li - can you do a system-wide profile of the two cases (the
> percpu case vs the "strict cpu affinity" one), to see if something
> stands out?

The two actually have similar performances, so I guess the profiling
is not interesting for you. Please let me know if you want to see any
data and I'll be happy to collect them and update here.

Best regards,
Pin-yen
>
> Linus