Re: CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo

From: Muli Baron
Date: Sat Dec 21 2013 - 12:25:13 EST


On 21/12/2013 11:11, Mike Galbraith wrote:

Works, modulo noisy workqueues.

rtbox:~ # sleep 35 && killall pert& cgexec -g cpuset:rtcpus taskset -c 3 pert 5
[1] 5660
2400.05 MHZ CPU
perturbation threshold 0.018 usecs.
pert/s: 33 >15.75us: 2 min: 0.04 max: 24.14 avg: 2.51 sum/s: 84us overhead: 0.01%
pert/s: 35 >15.54us: 3 min: 0.04 max: 24.89 avg: 2.39 sum/s: 84us overhead: 0.01%
pert/s: 30 >15.27us: 2 min: 0.04 max: 23.03 avg: 2.64 sum/s: 80us overhead: 0.01%
pert/s: 34 >15.12us: 3 min: 0.04 max: 25.03 avg: 2.51 sum/s: 86us overhead: 0.01%
pert/s: 31 >14.93us: 2 min: 0.04 max: 23.86 avg: 2.60 sum/s: 83us overhead: 0.01%
Terminated


I can confirm this works for me as well, but I have noticed some strange behavior under certain conditions.

If I run a process with SCHED_OTHER priority and pin it to a specific CPU like Mike did then all is well and everything functions as expected. If however I mask the execution of ksoftirqd by running that process as SCHED_FIFO for too long (say a few seconds) and later I free the CPU, then what I'm seeing is that the tick is never turned off again even though the CPU is completely idle, and running that same SCHED_OTHER process from before now gets 1K timer interrupts/s. I suspect this has something to do with the timer softirq never running on that CPU again as a result of this patch.

Following is trace output during this condition for two consecutive ticks, when the CPU is completely idle (shielded by isolcpus, HZ=1000):

<idle>-0 [003] d..h3.. 355.620760: hrtimer_cancel: hrtimer=ffff88011fd8b720
<idle>-0 [003] d..h2.. 355.620760: hrtimer_expire_entry: hrtimer=ffff88011fd8b720 function=tick_sched_timer now=355487000349
<idle>-0 [003] d..h2.. 355.620761: hrtimer_expire_exit: hrtimer=ffff88011fd8b720
<idle>-0 [003] d..h3.. 355.620761: hrtimer_start: hrtimer=ffff88011fd8b720 function=tick_sched_timer expires=355488000000 softexpires=355488000000
<idle>-0 [003] ....2.. 355.620762: cpu_idle: state=4294967295 cpu_id=3
<idle>-0 [003] d...2.. 355.620762: cpu_idle: state=1 cpu_id=3

<idle>-0 [003] d..h3.. 355.621760: hrtimer_cancel: hrtimer=ffff88011fd8b720
<idle>-0 [003] d..h2.. 355.621761: hrtimer_expire_entry: hrtimer=ffff88011fd8b720 function=tick_sched_timer now=355488000330
<idle>-0 [003] d..h2.. 355.621761: hrtimer_expire_exit: hrtimer=ffff88011fd8b720
<idle>-0 [003] d..h3.. 355.621761: hrtimer_start: hrtimer=ffff88011fd8b720 function=tick_sched_timer expires=355489000000 softexpires=355489000000
<idle>-0 [003] ....2.. 355.621762: cpu_idle: state=4294967295 cpu_id=3
<idle>-0 [003] d...2.. 355.621762: cpu_idle: state=1 cpu_id=3

--Muli


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/