[bug report] resctrl high memory comsumption

From: Shakeel Butt
Date: Wed Jan 08 2020 - 12:07:55 EST


Hi,

Recently we had a bug in the system software writing the same pids to
the tasks file of resctrl group multiple times. The resctrl code
allocates "struct task_move_callback" for each such write and call
task_work_add() for that task to handle it on return to user-space
without checking if such request already exist for that particular
task. The issue arises for long sleeping tasks which has thousands for
such request queued to be handled. On our production, we notice
thousands of tasks having thousands of such requests and taking GiBs
of memory for "struct task_move_callback". I am not very familiar with
the code to judge if task_work_cancel() is the right approach or just
checking closid/rmid before doing task_work_add().

==repro==
# mkdir /sys/fs/resctrl/test
# cat /proc/slabinfo | grep kmalloc-32
kmalloc-32 57219 57288 32 124 1 : tunables 120 60
8 : slabdata 462 462 0
# sleep 600&
[1] 17611
# for i in {1..200000}; do echo 17611 > /sys/fs/resctrl/test/tasks ; done
# cat /proc/slabinfo | grep kmalloc-32
kmalloc-32 257466 257548 32 124 1 : tunables 120 60
8 : slabdata 2077 2077 5
# kill 17611
[1]+ Terminated sleep 600
# cat /proc/slabinfo | grep kmalloc-32
kmalloc-32 57924 60636 32 124 1 : tunables 120 60
8 : slabdata 470 489 385

thanks,
Shakeel