Re: [PATCH v4] erofs: replace erofs_unzipd workqueue with per-cpu threads

From: Gao Xiang
Date: Mon Feb 06 2023 - 05:02:27 EST


Hi Sandeep,

On Fri, Jan 06, 2023 at 07:35:01AM +0000, Sandeep Dhavale wrote:
> Using per-cpu thread pool we can reduce the scheduling latency compared
> to workqueue implementation. With this patch scheduling latency and
> variation is reduced as per-cpu threads are high priority kthread_workers.
>
> The results were evaluated on arm64 Android devices running 5.10 kernel.
>
> The table below shows resulting improvements of total scheduling latency
> for the same app launch benchmark runs with 50 iterations. Scheduling
> latency is the latency between when the task (workqueue kworker vs
> kthread_worker) became eligible to run to when it actually started
> running.
> +-------------------------+-----------+----------------+---------+
> | | workqueue | kthread_worker | diff |
> +-------------------------+-----------+----------------+---------+
> | Average (us) | 15253 | 2914 | -80.89% |
> | Median (us) | 14001 | 2912 | -79.20% |
> | Minimum (us) | 3117 | 1027 | -67.05% |
> | Maximum (us) | 30170 | 3805 | -87.39% |
> | Standard deviation (us) | 7166 | 359 | |
> +-------------------------+-----------+----------------+---------+
>
> Background: Boot times and cold app launch benchmarks are very
> important to the android ecosystem as they directly translate to
> responsiveness from user point of view. While erofs provides
> a lot of important features like space savings, we saw some
> performance penalty in cold app launch benchmarks in few scenarios.
> Analysis showed that the significant variance was coming from the
> scheduling cost while decompression cost was more or less the same.
>
> Having per-cpu thread pool we can see from the above table that this
> variation is reduced by ~80% on average. This problem was discussed
> at LPC 2022. Link to LPC 2022 slides and
> talk at [1]
>
> [1] https://lpc.events/event/16/contributions/1338/
>
> Signed-off-by: Sandeep Dhavale <dhavale@xxxxxxxxxx>
> ---
> V3 -> V4
> * Updated commit message with background information
> V2 -> V3
> * Fix a warning Reported-by: kernel test robot <lkp@xxxxxxxxx>
> V1 -> V2
> * Changed name of kthread_workers from z_erofs to erofs_worker
> * Added kernel configuration to run kthread_workers at normal or
> high priority
> * Added cpu hotplug support
> * Added wrapped kthread_workers under worker_pool
> * Added one unbound thread in a pool to handle a context where
> we already stopped per-cpu kthread worker
> * Updated commit message

I've just modified your v4 patch based on erofs -dev branch with
my previous suggestion [1], but I haven't tested it.

Could you help check if the updated patch looks good to you and
test it on your side? If there are unexpected behaviors, please
help update as well, thanks!

[1] https://lore.kernel.org/r/5e1b7191-9ea6-3781-7928-72ac4cd88591@xxxxxxxxxxxxxxxxx/

Thanks,
Gao Xiang