Re: [regression -next0117] What is kcompactd and why is he eating 100% of my cpu?

From: Mel Gorman
Date: Sun Jan 27 2019 - 09:09:52 EST


Adding Jan Kara to cc due to the fact it appears the lockup is within
buffer_migrate_page_norefs which changed recently.

On Sat, Jan 26, 2019 at 09:56:53PM -0500, valdis.kletnieks@xxxxxx wrote:
> On Sat, 26 Jan 2019 21:00:05 +0100, Pavel Machek said:
>
> > top - 13:38:51 up 1:42, 16 users, load average: 1.41, 1.93, 1.62
> > Tasks: 182 total, 3 running, 138 sleeping, 0 stopped, 0 zombie
> > %Cpu(s): 2.3 us, 57.8 sy, 0.0 ni, 39.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > KiB Mem: 3020044 total, 2429420 used, 590624 free, 27468 buffers
> > KiB Swap: 2097148 total, 0 used, 2097148 free. 1924268 cached Mem
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 608 root 20 0 0 0 0 R 99.6 0.0 11:34.38 kcompactd0
> > 9782 root 20 0 0 0 0 I 7.9 0.0 0:59.02 kworker/0:
> > 2971 root 20 0 46624 23076 13576 S 4.3 0.8 2:50.22 Xorg
>
> I've noticed this as well on earlier kernels (next-20181224 to 20190115)
>
> Some more info:
>
> 1) echo 3 > /proc/sys/vm/drop_caches unwedges kcompactd in 1-3 seconds.
>
> 2) Typical kcompactd traceback:
>
> cat /proc/27/stack
> [<0>] retint_kernel+0x1b/0x2d
> [<0>] lock_is_held_type+0x1b/0x50
> [<0>] ___might_sleep+0xad/0x220
> [<0>] __might_sleep+0x113/0x130
> [<0>] on_each_cpu_cond_mask+0x12a/0x140
> [<0>] on_each_cpu_cond+0x18/0x20
> [<0>] invalidate_bh_lrus+0x29/0x30
> [<0>] __buffer_migrate_page+0x154/0x340
> [<0>] buffer_migrate_page_norefs+0x14/0x20
> [<0>] move_to_new_page+0x8e/0x360
> [<0>] migrate_pages+0x3cc/0xfd8
> [<0>] compact_zone+0xb70/0x1380
> [<0>] kcompactd_do_work+0x15b/0x500
> [<0>] kcompactd+0x74/0x340
> [<0>] kthread+0x158/0x170
> [<0>] ret_from_fork+0x3a/0x50
> [<0>] 0xffffffffffffffff
>
> I've also seen khugepaged hung up:
>
> cat /proc/29/stack
> [<0>] ___preempt_schedule+0x16/0x18
> [<0>] page_vma_mapped_walk+0x60/0x840
> [<0>] remove_migration_pte+0x67/0x390
> [<0>] rmap_walk_file+0x186/0x380
> [<0>] rmap_walk+0xa3/0xd0
> [<0>] remove_migration_ptes+0x69/0x70
> [<0>] migrate_pages+0xb6d/0xfd8
> [<0>] compact_zone+0xb70/0x1370
> [<0>] compact_zone_order+0xd8/0x120
> [<0>] try_to_compact_pages+0xe5/0x550
> [<0>] __alloc_pages_direct_compact+0x6d/0x1a0
> [<0>] __alloc_pages_slowpath+0x6c9/0x1640
> [<0>] __alloc_pages_nodemask+0x558/0x5b0
> [<0>] khugepaged+0x499/0x810
> [<0>] kthread+0x158/0x170
> [<0>] ret_from_fork+0x3a/0x50
> [<0>] 0xffffffffffffffff
>
> Looks like something has gone astray with compact_zone.
>

--
Mel Gorman
SUSE Labs