Re: Modifying isolcpus, nohz_full, and rcu_nocb kernel parameters at runtime

From: Waiman Long
Date: Tue Dec 12 2023 - 21:28:13 EST



On 12/12/23 18:57, Frederic Weisbecker wrote:
On Tue, Dec 12, 2023 at 03:18:43PM -0500, Waiman Long wrote:
On 12/12/23 08:27, Frederic Weisbecker wrote:
On Fri, Dec 08, 2023 at 09:18:53AM -0500, Gianfranco Dutka wrote:
The isolcpus, nohz_full and rcu_nocbs are boot-time kernel parameters. I am in the process of improving dynamic CPU isolation at runtime. Right now, we are able to do isolcpus=domain with the isolated cpuset partition functionality. Other aspects of CPU isolation are being looked at with the goal of reducing the gap of what one can do at boot time versus what can be done at run time. It will certain take time to reach that goal.

Cheers,
Longman

Thank you Waiman for the response. It would seem that getting similar
functionality through cgroups/cpusets is the only option at the moment. Is it
completely out of the question to possibly patch the kernel to modify these
parameters at runtime? Or would that entail a significant change that might
not be so trivial to accomplish? For instance, the solution wouldn’t be as
simple as patching the kernel to make these writeable and then calling the
same functions which run at boot-time when these parameters are originally
written?
As for nohz_full (which implies rcu_nocb), it's certainly possible to make it
tunable at runtime via cpusets. If people really want it, I'm willing to help.
As said by Phil, your help in in enabling dynamic rcu_nocb will be greatly
appreciated.
rcu_nocb is already ready for that. The not yet ready part is nohz_full and its
several components (tick, remote tick, [hr-]timers affinity, workqueues affinity, kthreads
affinity, vmstat, buffer head, etc...). Last debate on plumbers suggested that
nohz_full should be dynamically turned on/off only on offline CPUs. That will
indeed simplify the problem.

So rcu_nocb is ready for dynamically changing it without too much additional work. That is good to know as I haven't looked into that myself.

The other pieces will still need additional work. I already have a patch in the cgroup tree that updates the unbound workqueue affinity to exclude isolated cpuset CPUs, though there may still be some further fine tuning that can be done.


My current thought is to have a root level
cpuset.cpus.isolation_control file to enable additional CPU isolation like
rcu_nocb to be applied to CPUs in isolated partitions.
Last time I tried that, Peter Zijlstra was more in favour of an isolate all or nothing
switch by default for nohz_full that would include rcu_nocb. And then if people
are interested in something more finegrained, introduce such a file to control
individual features (see
https://lore.kernel.org/lkml/YpIwsiaY2IPK96WO@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ )

But so far I never heard about the need for such a finegrained isolation. Users of
nohz_full= seem to want to isolate everything out.

Yes, I recall some of the discussion now. I am fine with a single on/off switch. That will likely simplify the process as we can add additional isolation features over time once the code is ready, may be a cpuset.cpus.isolation_full boolean flag.

Cheers,
Longman



Thanks.