EEVDF and NUMA balancing

From: Julia Lawall
Date: Tue Oct 03 2023 - 16:25:22 EST


Is it expected that the commit e8f331bcc270 should have an impact on the
frequency of NUMA balancing?

The NAS benchmark ua.C.x (NPB3.4-OMP,
https://github.com/mbdevpl/nas-parallel-benchmarks.git) on a 4-socket
Intel Xeon 6130 suffers from some NUMA moves that leave some sockets with
too few threads and other sockets with too many threads. Prior to the
commit e8f331bcc270, this was corrected by subsequent load balancing,
leading to run times of 20-40 seconds (around 20 seconds can be achieved
if one just turns NUMA balancing off). After commit e8f331bcc270, the
running time can go up to 150 seconds. In the worst case, I have seen a
core remain idle for 75 seconds. It seems that the load balancer at the
NUMA domain level is not able to do anything, because when a core on the
overloaded socket has multiple threads, they are tasks that were NUMA
balanced to the socket, and thus should not leave. So the "busiest" core
chosen by find_busiest_queue doesn't actually contain any stealable
threads. Maybe it could be worth stealing from a core that has only one
task in this case, in hopes that the tasks that are tied to a socket will
spread out better across it if more space is available?

An example run is attached. The cores are renumbered according to the
sockets, so there is an overload on socket 1 and an underload on sockets
2.

julia

Attachment: ua.C.x_yeti-2_ge8f331bcc270_performance_18_socketorder.pdf
Description: Adobe PDF document