Re: [PATCH] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()

From: Waiman Long
Date: Mon Jul 03 2023 - 10:55:59 EST



On 7/3/23 06:26, Peter Zijlstra wrote:
On Wed, Jun 28, 2023 at 05:16:37PM -0400, Waiman Long wrote:
Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask"), user provided CPU affinity via sched_setaffinity(2) is
perserved even if the task is being moved to a different cpuset. However,
that affinity is also being inherited by any subsequently created child
processes which may not want or be aware of that affinity.

One way to solve this problem is to provide a way to back off from
that user provided CPU affinity. This patch implements such a scheme
by using an empty cpumask to signal a reset of the cpumasks to the
default as allowed by the current cpuset.

Before this patch, passing in an empty cpumask to sched_setaffinity(2)
will return an EINVAL error. With this patch, an error will no longer
be returned. Instead, the user_cpus_ptr that stores the user provided
affinity, if set, will be cleared and the task's CPU affinity will be
reset to that of the current cpuset. This reverts the cpumask change
done by all the previous sched_setaffinity(2) calls.

This is a user visible ABI change -- but with very limited motivation.
Why do we want this? Who will use this?

Yes, this is a visible ABI change, but it should be backward compatible as I doubt there are applications out there depending on the fact that passing an empty cpumask to sched_setaffinity() must return an error.

Our OpenShift team has actually hit a problem with the recent persistent user provided cpu affinity change because they are relying on the fact that moving a task to a different cpuset will reset cpu affinity to the cpuset default which is no longer true. That is the main reason behind this patch to provide a way to reset cpu affinity to the cpuset default.

I am thinking of requesting sched_setaffinity(2) manpage update to document the persistent user provided cpu affinity change and the way to reset it after this patch is merged upstream.

Cheers,
Longman