NUMA, migrate/N, and tuned-adm

From: David Timothy Strauss
Date: Tue Dec 17 2013 - 13:11:10 EST


Our system gets storms of migrate/N (and sometimes kswapd) tasks from
the kernel, based on what we've seen in top [1]. This issue is unique
to our hardware application servers; we run hundreds of application
servers on Xen virtual hardware without this issue and the same
kernel. We also have no issues with identical kernels and hardware
servers while running databases.

System specs:
* Fedora 19 with the 3.11.10-200.fc19.x86_64 kernel (just the stock RPM)
* Bare-metal servers with 128GB RAM split between two NUMA regions,
each region with one hex-core processor
* More than 700 processes, a couple hundred of which are active
fairly frequently. The systems were at 7000 processes, but we've
dropped it while we dive into this issue.
* Many of the processes are short-lived. The long-lived ones
experience spikes in CPU and memory usage while processing requests.

Here's what we've tried, to no avail:
* tuned-adm on latency-performance and virtual-host profiles; this
places the system on the deadline scheduler, but this problem occurred
on the default one too
* kernel.sched_migration_cost_ns=5000000 (which tuned will do for
those profiles in v3.3/Fedora 20)
* numad to balance between regions
* Global use of sched_relax_domain_level=1 and sched_relax_domain_level=2
* Splitting the system with cpuset into management tasks (6 virtual
cores) and workload tasks (18 virtual cores) with
sched_relax_domain_level=2. This is based on recommendations for NUMA
systems in the cpuset man page.

Here's what we've used for analysis:
* powertop
* top/htop
* perf record -a -g
* SystemTap with code to print out migrations occurring
* numatop

[1] https://gist.github.com/davidstrauss/3ff0b29c4d3766bedd49

David Strauss
Pantheon Systems
Fedora Server Working Group

P.S. Josh Boyer (jwb) referred me here from the Fedora kernel side.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/