On Thu, Feb 02, 2023 at 11:06:51AM -0500, Waiman Long wrote:There can be a child partition underneath it that uses up some of the cpus in cpus_allowed mask. So if you cascading up the cpuset tree from another child, your code will wrongly include those cpus that are dedicated to the other child partition.
After taking a close look at the patch, my understanding of what it is doingHow so? For a partition the cpus_allowed mask should be the parition
is as follows:
v2: cpus_allowed will not be affected by hotplug. So the new
cpuset_cpus_allowed() will return effective_cpus + offline cpus that should
have been part of effective_cpus if online before masking it with allowable
cpus and then go up the cpuset hierarchy if necessary.
v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the
current cpuset and move up the hierarchy if necessary to find a cpuset that
have at least one allowable cpu.
First of all, it does not take into account of the v2 partition feature that
may cause it to produce incorrect result if partition is enabled somewhere.
CPUs. The only magical bit about partitions is that any one CPU cannot
belong to two partitions and load-balancing is split.
Secondly, I don't see any benefit other than having some additional offlineThose CPUs can come online again -- you're *again* dismissing the true
cpu available in a task's cpumask which the scheduler will ignore anyway.
bug :/
If you filter out the offline CPUs at sched_setaffinity() time, you
forever lose those CPUs, the task will never again move to those CPUs,
even if they do come online after.
It is really simple to reproduce this:
- boot machine
- offline all CPUs except one
- taskset -p ffffffff $$
- online all CPUs
and observe your shell (and all its decendants) being stuck to the one
CPU. Do the same thing on a CPUSET=n build and note the difference (you
retain the full mask).
v2 is able to recover a previously offlined cpu. So we don't gain anyOnly for !root tasks. Not even v2 will re-set the affinity of root tasks
net benefit other than the going up the cpuset hierarchy part.
afaict.
For v1, I agree we should go up the cpuset hierarchy to find a usableWe will need a small and simple patch for /urgent, or I will need to
cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(),
my current preference is to do the hierarchy climbing part in an enhanced
cpuset_cpus_allowed_fallback() after an initial failure of
cpuset_cpus_allowed(). That will be easier to understand than having such
complexity and overhead in cpuset_cpus_allowed() alone.
I will work on a patchset to do that as a counter offer.
revert all your patches -- your call.
I also don't tihnk you fully appreciate the ramifications of
task_cpu_possible_mask(), cpuset currently gets that quite wrong.