[PATCH v5] nohz: set isolcpus when nohz_full is set

From: Chris Metcalf
Date: Thu Apr 09 2015 - 13:00:08 EST


nohz_full is only useful with isolcpus also set, since otherwise the
scheduler has to run periodically to try to determine whether to steal
work from other cores.

Accordingly, when booting with nohz_full=xxx on the command line, we
should act as if isolcpus=xxx was also set, and set (or extend) the
isolcpus set to include the nohz_full cpus.

Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx>
---
Frederic wrote:
> cpu_isolated_map is allocated and filled early (__setup or sched_init())
> before tick_init() and tick_init() is before sched_init_smp() which first uses
> cpu_isolated_map(). So we can call some sched_isolated_map_add(struct cpumask *cpumask)
> from tick_nohz_init().

I'll re-send a v4 of the patch without your suggestion, just renaming
the methods to tick_nohz_full_cpumask_andnot() etc, since I still think
that that model is easier to understand - we tweak isolcpus in exactly
the spot where we first put it to use. And, we do need those
tick_nohz_full_cpumask_xxx() accessors in other places anyway --
see my earlier patch for the tilegx network driver to remove the
nohz_full cores from the set of cores that get interrupted by the driver,
for example.

That said, I'm not opposed to your idea, and we could certainly do it that
way if that's the consensus. For reference, here's what it looks like
when fleshed out; I'm calling it v5 to be sort of clear about this,
but either v4 or v5 would be fine. I left the sched_isolated_map_add()
function enabled in all kernel configurations, not just NO_HZ_FULL,
since it's pretty trivial and it felt like the #ifdefs to disable it
conditionally would be noisier than the benefit to kernel size.

include/linux/sched.h | 1 +
kernel/sched/core.c | 5 +++++
kernel/time/tick-sched.c | 3 +++
3 files changed, 9 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6d77432e14ff..18a961b9beba 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -323,6 +323,7 @@ struct task_struct;
extern int lockdep_tasklist_lock_is_held(void);
#endif /* #ifdef CONFIG_PROVE_RCU */

+extern void sched_isolated_map_add(const struct cpumask *);
extern void sched_init(void);
extern void sched_init_smp(void);
extern asmlinkage void schedule_tail(struct task_struct *prev);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..b055c5e0e65c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5824,6 +5824,11 @@ static int __init isolated_cpu_setup(char *str)

__setup("isolcpus=", isolated_cpu_setup);

+void sched_isolated_map_add(const struct cpumask *cpumask)
+{
+ cpumask_or(cpu_isolated_map, cpu_isolated_map, cpumask);
+}
+
struct s_data {
struct sched_domain ** __percpu sd;
struct root_domain *rd;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a4c4edac4528..b0092d02ca3f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -385,6 +385,9 @@ void __init tick_nohz_init(void)
for_each_cpu(cpu, tick_nohz_full_mask)
context_tracking_cpu_set(cpu);

+ /* It's not meaningful to be nohz without disabling the scheduler. */
+ sched_isolated_map_add(tick_nohz_full_mask);
+
cpu_notifier(tick_nohz_cpu_down_callback, 0);
pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
cpumask_pr_args(tick_nohz_full_mask));
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/