Re: [PATCH 1/2] cgroup/cpuset: Keep current cpus list if cpus affinity was explicitly set

From: Waiman Long
Date: Thu Jul 28 2022 - 17:04:29 EST


On 7/28/22 16:44, Tejun Heo wrote:
Hello,

On Thu, Jul 28, 2022 at 03:21:26PM -0400, Waiman Long wrote:
On 7/28/22 15:02, Tejun Heo wrote:
On Thu, Jul 28, 2022 at 02:57:28PM -0400, Waiman Long wrote:
There can be a counter argument that if a user found out that there is not
enough cpus in a cpuset to meet its performance target, one can always
increase the number of cpus in the cpuset. Generalizing this behavior to all
the tasks irrespective if they have explicitly set cpus affinity before will
disallow this use case.
This is nasty.
That is a nasty example, I know. There may be users depending on the
existing behavior even if they don't know it. So I am a bit hesitant to
change the default behavior like that. On the other hand, tasks that have
explicitly set its cpu affinity certainly don't want to have unexpected
change to that.
Yeah, I hear you. I'm on the same page.

The real solution here is separating out what user requested
and the mask that cpuset (or cpu hotplug) needs to apply on top. ie.
remember what the user requested in a separate cpumask and compute the
intersection into p->cpus_maks whenever something changes and apply
fallbacks on that final mask. Multiple parties updating the same variable is
never gonna lead to anything consistent and we're patching up for whatever
the immediate use case seems to need at the moment. That said, I'm not
necessarily against patching it up but if you're interested in delving into
it deeper, that'd be great.
I believe the current code is already restricting what cpu affinity that a
user can request by limiting to those allowed by the current cpuset. Hotplug
is another issue that may need to be addressed. I will update my patch to
make it handle hotplug in a more graceful way.
af
So, the patch you proposed is making the code remember one special aspect of
user requested configuration - whether it configured it or not, and trying
to preserve that particular state as cpuset state changes. It addresses the
immediate problem but it is a very partial approach. Let's say a task wanna
be affined to one logical thread of each core and set its mask to 0x5555.
Now, let's say cpuset got enabled and enforced 0xff and affined the task to
0xff. After a while, the cgroup got more cpus allocated and its cpuset now
has 0xfff. Ideally, what should happen is the task now having the effective
mask of 0x555. In practice, tho, it either would get 0xf55 or 0x55 depending
on which way we decide to misbehave.

OK, I see what you want to accomplish. To fully address this issue, we will need to have a new cpumask variable in the the task structure which will be allocated if sched_setaffinity() is ever called. I can rework my patch to use this approach.

Thanks,
Longman