Re: [PATCH] nohz: don't kick non-idle CPUs in tick_nohz_full_kick_cpu()

From: Thomas Gleixner
Date: Fri Jul 20 2018 - 13:24:14 EST


On Thu, 19 Jul 2018, Yury Norov wrote:
> While here. I just wonder, on my system IRQs are sent to nohz_full CPUs
> at every incoming ssh connection. The trace is like this:
> [ 206.835533] Call trace:
> [ 206.848411] [<ffff00000889f984>] dump_stack+0x84/0xa8
> [ 206.853455] [<ffff0000081ea308>] _task_isolation_remote+0x130/0x140
> [ 206.859714] [<ffff0000081bf5ec>] irq_work_queue_on+0xcc/0xfc
> [ 206.865365] [<ffff0000081478ac>] tick_nohz_full_kick_cpu+0x88/0x94
> [ 206.871536] [<ffff000008147930>] tick_nohz_dep_set_all+0x78/0xa8
> [ 206.877533] [<ffff000008147b58>] tick_nohz_dep_set_signal+0x28/0x34
> [ 206.883792] [<ffff0000081421fc>] set_process_cpu_timer+0xd0/0x128
> [ 206.889876] [<ffff0000081422ac>] update_rlimit_cpu+0x58/0x7c
> [ 206.895528] [<ffff0000083aa3d0>] selinux_bprm_committing_creds+0x180/0x1fc
> [ 206.902394] [<ffff00000839e394>] security_bprm_committing_creds+0x40/0x5c
> [ 206.909173] [<ffff00000828c4a0>] install_exec_creds+0x20/0x6c
> [ 206.914911] [<ffff0000082e15b0>] load_elf_binary+0x368/0xbb8
> [ 206.920561] [<ffff00000828d09c>] search_binary_handler+0xb8/0x224
> [ 206.926645] [<ffff00000828d99c>] do_execveat_common+0x44c/0x5f0
> [ 206.932555] [<ffff00000828db78>] do_execve+0x38/0x44
> [ 206.937510] [<ffff00000828dd74>] SyS_execve+0x34/0x44
>
> I suspect that scp, ssh tunneling and similar network activities will source
> ticks on nohz_full CPUs as well. On high-loaded server it may generate
> significant interrupt traffic on nohz_full CPUs. Is it desirable behavior?

Supsicions and desirable are not really technical interesting aspects.

Just from looking at the stack trace it's obvious that there is a CPU TIME
rlimit on that newly spawned sshd. That's nothing what the kernel
imposes. That's what user space sets.

Now the actual mechanism which does that, i.e. set_process_cpu_timer() ends
up IPI'ing _ALL_ nohz full CPUs for no real good reason. In the exec path
this is really pointless because the new process is not running yet and it
is single threaded. So forcing a IPI to all cpus is pretty pointless.

In fact the state of the task/process for which update_rlimit_cpu(() is
called is known, so the IPI can really be either avoided completely or
restricted to the CPUs on which this process can run or actually runs.

Fredric?

Thanks,

tglx