Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

From: Tejun Heo
Date: Wed Jun 21 2023 - 16:15:26 EST


Hello,

On Wed, Jun 14, 2023 at 11:49:53AM -0700, Sandeep Dhavale wrote:
> Thank you for your patches! I tested the affinity-scopes-v2 with app launch
> benchmarks. The numbers below are total scheduling latency for erofs kworkers
> and last column is with percpu highpri kthreads that is
> CONFIG_EROFS_FS_PCPU_KTHREAD=y
> CONFIG_EROFS_FS_PCPU_KTHREAD_HIPRI=y
>
> Scheduling latency is the latency between when the task became eligible to run
> to when it actually started running. The test does 50 cold app launches for each
> and aggregates the numbers.
>
> | Table | Upstream | Cache nostrict | CPU nostrict | PCPU hpri |
> |--------------+----------+----------------+--------------+-----------|
> | Average (us) | 12286 | 7440 | 4435 | 2717 |
> | Median (us) | 12528 | 3901 | 3258 | 2476 |
> | Minimum (us) | 287 | 555 | 638 | 357 |
> | Maximum (us) | 35600 | 35911 | 13364 | 6874 |
> | Stdev (us) | 7918 | 7503 | 3323 | 1918 |
> |--------------+----------+----------------+--------------+-----------|
>
> We see here that with affinity-scopes-v2 (which defaults to cache nostrict),
> there is a good improvement when compared to the current codebase.
> Affinity scope "CPU nostrict" for erofs workqueue has even better numbers
> for my test launches and it resembles logically to percpu highpri kthreads
> approach. Percpu highpri kthreads has the lowest latency and variation,
> probably down to running at higher priority as those threads are set to
> sched_set_fifo_low().

If you set workqueue to CPU strict and set its nice value to -19 in the
sysfs interface, it should behave simliar to the hardcoded PCPU hpri. I'd
also love to see the comparison between strict and nostrict too if possible.

> At high level, the app launch numbers itself improved with your series as
> entire workqueue subsystem improved across the board.

Glad to hear.

Thanks.

--
tejun