Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs

From: Waiman Long
Date: Thu Feb 02 2023 - 11:07:50 EST


On 2/2/23 03:34, Peter Zijlstra wrote:
On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:

A major concern I have is the overhead of creating a poor man version of v2
cpus_allowed. This issue can be worked around even for cpuset v1 if it is
mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
handling. Alternatively we may be able to provide a config option to make
this the default for v1 without the special mount option, if necessary.
It is equally broken for v2, it masks against effective_cpus. Not to
mention it explicitly starts with cpu_online_mask.

After taking a close look at the patch, my understanding of what it is doing is as follows:

v2: cpus_allowed will not be affected by hotplug. So the new cpuset_cpus_allowed() will return effective_cpus + offline cpus that should have been part of effective_cpus if online before masking it with allowable cpus and then go up the cpuset hierarchy if necessary.

v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the current cpuset and move up the hierarchy if necessary to find a cpuset that have at least one allowable cpu.

First of all, it does not take into account of the v2 partition feature that may cause it to produce incorrect result if partition is enabled somewhere. Secondly, I don't see any benefit other than having some additional offline cpu available in a task's cpumask which the scheduler will ignore anyway. v2 is able to recover a previously offlined cpu. So we don't gain any net benefit other than the going up the cpuset hierarchy part.

For v1, I agree we should go up the cpuset hierarchy to find a usable cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(), my current preference is to do the hierarchy climbing part in an enhanced cpuset_cpus_allowed_fallback() after an initial failure of cpuset_cpus_allowed(). That will be easier to understand than having such complexity and overhead in cpuset_cpus_allowed() alone.

I will work on a patchset to do that as a counter offer.

Cheers,
Longman