Re: NMI watchdog triggering during load_balance

From: David Ahern
Date: Fri Mar 06 2015 - 10:02:27 EST


On 3/5/15 9:52 PM, Mike Galbraith wrote:
CPU970 attaching sched-domain:
domain 0: span 968-975 level SIBLING
groups: 8 single CPU groups
domain 1: span 968-975 level MC
groups: 1 group with 8 cpus
domain 2: span 768-1023 level CPU
groups: 4 groups with 256 cpus per group

Wow, that topology is horrid. I'm not surprised that your box is
writhing in agony. Can you twiddle that?


twiddle that how?

The system has 4 physical cpus (sockets). Each cpu has 32 cores with 8 threads per core and each cpu has 4 memory controllers.

If I disable SCHED_MC and CGROUPS_SCHED (group scheduling) there is a noticeable improvement -- watchdog does not trigger and I do not get the rq locks held for 2-3 seconds. But there is still fairly high cpu usage for an idle system. Perhaps I should leave SCHED_MC on and disable SCHED_SMT; I'll try that today.

Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/