[RFC PATCH] tick/sched: update full_nohz status after SCHED dep is cleared

From: Juri Lelli
Date: Wed May 20 2020 - 10:04:27 EST


After tasks enter or leave a runqueue (wakeup/block) SCHED full_nohz
dependency is checked (via sched_update_tick_dependency()). In case tick
can be stopped on a CPU (see sched_can_stop_tick() for details), SCHED
dependency for such CPU is cleared. However, this new information is not
used right away to actually stop the tick.

In CONFIG_PREEMPT systems booted with threadirqs option, sched clock
tick is serviced by an actual task (ksoftirqd corresponding to the CPU
where tick timer fired). So, in case a CPU was running a single task,
servicing the timer involves exiting full nozh mode. Problem at this
point is that we might lose chances to enter back into full nozh mode,
since info about ksoftirqd thread going back to sleep is not used (as
mentioned above).

Fix it by calling tick_nozh_full_update_tick(_cpu)() right after SCHED
dependency is cleared, so that tick can be promptly stopped.

Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxxx>
---
Hi,

I noticed what seems to be the problem described in the changelog while
running sysjitter [1] on a PREEMPT system setup for isolation and full
nozh, i.e. ... skew_tick=1 nohz=on nohz_full=4-35 rcu_nocbs=4-35
threadirqs (36 CPUs box).

Starting sysjitter with something like the following

perf stat -C 4-35 -e irq_vectors:local_timer_entry taskset \
--cpu-list 4-35 ./sysjitter/sysjitter --runtime 30 200

with vanilla kernel was returning ~5k local_timer_entry events, while
that number goes down to ~100 or so with the proposed patch applied.

The following trace snippet also highlight the problematic situation:

...
ksoftirqd/19-125 [019] 170.700754: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.700755: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.700756: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.700757: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.700759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
sysjitter-2459 [019] 170.701740: local_timer_entry: vector=236
sysjitter-2459 [019] 170.701742: softirq_raise: vec=1 [action=TIMER]
sysjitter-2459 [019] 170.701743: softirq_raise: vec=7 [action=SCHED]
sysjitter-2459 [019] 170.701744: local_timer_exit: vector=236
sysjitter-2459 [019] 170.701747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019
sysjitter-2459 [019] 170.701748: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.701749: irq_work_entry: vector=246
sysjitter-2459 [019] 170.701750: irq_work_exit: vector=246
sysjitter-2459 [019] 170.701751: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.701753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125 [019] 170.701754: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.701756: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.701756: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.701758: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.701759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
sysjitter-2459 [019] 170.702740: local_timer_entry: vector=236
sysjitter-2459 [019] 170.702742: softirq_raise: vec=1 [action=TIMER]
sysjitter-2459 [019] 170.702743: softirq_raise: vec=7 [action=SCHED]
sysjitter-2459 [019] 170.702744: local_timer_exit: vector=236
sysjitter-2459 [019] 170.702747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019
sysjitter-2459 [019] 170.702748: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.702749: irq_work_entry: vector=246
sysjitter-2459 [019] 170.702750: irq_work_exit: vector=246
sysjitter-2459 [019] 170.702751: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.702753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125 [019] 170.702755: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.702756: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.702757: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.702758: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.702760: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
sysjitter-2459 [019] 170.703740: local_timer_entry: vector=236
sysjitter-2459 [019] 170.703742: softirq_raise: vec=1 [action=TIMER]
sysjitter-2459 [019] 170.703743: softirq_raise: vec=7 [action=SCHED]
sysjitter-2459 [019] 170.703745: local_timer_exit: vector=236
sysjitter-2459 [019] 170.703747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019
sysjitter-2459 [019] 170.703748: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.703749: irq_work_entry: vector=246
sysjitter-2459 [019] 170.703750: irq_work_exit: vector=246
sysjitter-2459 [019] 170.703751: tick_stop: success=0 dependency=SCHED
sysjitter-2459 [019] 170.703753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125 [019] 170.703755: softirq_entry: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.703756: softirq_exit: vec=1 [action=TIMER]
ksoftirqd/19-125 [019] 170.703757: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.703758: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/19-125 [019] 170.703759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
sysjitter-2459 [019] 170.704500: call_function_single_entry: vector=251
sysjitter-2459 [019] 170.704502: call_function_single_exit: vector=251
sysjitter-2459 [019] 170.704504: tick_stop: success=1 dependency=NONE
...

Luckily sometimes it happens (like in the above) that an interrupt fires
on a CPU while ksoftirqd is not executing, SCHED dep is seen as cleared
and the CPU enters full nohz. I've however observed pathological cases
where CPUs aren't able to enter back full nohz for a long time.

Hope my understanding is correct. Please find below a naive attempt to
fix the problem. I'm pretty sure I'm missing details, e.g. I'm not sure
it is safe to possibly stop the tick remotely. Consider the following as
not more than another way of pointing the problem out. :-)

Best,

Juri

1 - https://github.com/alexeiz/sysjitter
---
kernel/time/tick-sched.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3e2dc9b8858c..0d696e55d774 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -342,11 +342,16 @@ void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
}
EXPORT_SYMBOL_GPL(tick_nohz_dep_set_cpu);

+static void tick_nohz_full_update_tick_cpu(struct tick_sched *ts, int cpu);
+
void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
{
struct tick_sched *ts = per_cpu_ptr(&tick_cpu_sched, cpu);

atomic_andnot(BIT(bit), &ts->tick_dep_mask);
+
+ if (!in_irq())
+ tick_nohz_full_update_tick_cpu(ts, cpu);
}
EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);

@@ -870,11 +875,9 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
tick_nohz_restart(ts, now);
}

-static void tick_nohz_full_update_tick(struct tick_sched *ts)
+static void tick_nohz_full_update_tick_cpu(struct tick_sched *ts, int cpu)
{
#ifdef CONFIG_NO_HZ_FULL
- int cpu = smp_processor_id();
-
if (!tick_nohz_full_cpu(cpu))
return;

@@ -888,6 +891,15 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
#endif
}

+static void tick_nohz_full_update_tick(struct tick_sched *ts)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ int cpu = smp_processor_id();
+
+ tick_nohz_full_update_tick_cpu(ts, cpu);
+#endif
+}
+
static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
{
/*
--
2.25.3