Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS

From: Robert Kudyba
Date: Tue Feb 28 2017 - 11:50:55 EST



> On Feb 28, 2017, at 10:15 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Tue 28-02-17 09:59:35, Robert Kudyba wrote:
>>
>>> On Feb 28, 2017, at 9:40 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>>>
>>> On Tue 28-02-17 09:33:49, Robert Kudyba wrote:
>>>>
>>>>> On Feb 28, 2017, at 9:15 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>>>>> and this one is hitting the min watermark while there is not really
>>>>> much to reclaim. Only the page cache which might be pinned and not
>>>>> reclaimable from this context because this is GFP_NOFS request. It is
>>>>> not all that surprising the reclaim context fights to get some memory.
>>>>> There is a huge amount of the reclaimable slab which probably just makes
>>>>> a slow progress.
>>>>>
>>>>> That is not something completely surprsing on 32b system I am afraid.
>>>>>
>>>>> Btw. is the stall repeating with the increased time or it gets resolved
>>>>> eventually?
>>>>
>>>> Yes and if you mean by repeating itâs not only affecting rsync but
>>>> you can see just now automount and NetworkManager get these page
>>>> allocation stalls and kswapd0 is getting heavy CPU load, are there any
>>>> other settings I can adjust?
>>>
>>> None that I am aware of. You might want to talk to FS guys, maybe they
>>> can figure out who is pinning file pages so that they cannot be
>>> reclaimed. They do not seem to be dirty or under writeback. It would be
>>> also interesting to see whether that is a regression. The warning is
>>> relatively new so you might have had this problem before just haven't
>>> noticed it.
>>
>> We have been getting out of memory errors for a while but those seem
>> to have gone away.
>
> this sounds suspicious. Are you really sure that this is a new problem?
> Btw. is there any reason to use 32b kernel at all? It will always suffer
> from a really small lowmemâ

No this has been a problem for a while. Not sure if this server can handle 64b itâs a bit old.

>
>> We did just replace the controller in the VessRAID
>> as there were some timeouts observed and multiple login/logout
>> attempts.
>>
>> By FS guys do you mean the linux-fsdevel or linux-fsf list?
>
> yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you
> could collect some tracepoints before reporting the issue. At least
> those in events/vmscan/*.

Will do hereâs a perf report:
Children Self Command Shared Object Symbol â
63.54% 0.06% swapper [kernel.kallsyms] [k] cpu_startup_entry
62.07% 0.01% swapper [kernel.kallsyms] [k] default_idle_call
62.06% 0.01% swapper [kernel.kallsyms] [k] arch_cpu_idle
54.73% 54.73% swapper [kernel.kallsyms] [k] mwait_idle
45.22% 0.00% swapper [kernel.kallsyms] [k] start_secondary
18.35% 0.00% swapper [kernel.kallsyms] [k] rest_init
18.35% 0.00% swapper [kernel.kallsyms] [k] start_kernel
18.35% 0.00% swapper [kernel.kallsyms] [k] i386_start_kernel
13.97% 0.01% rsync [vdso] [.] __kernel_vsyscall
13.93% 0.01% rsync [kernel.kallsyms] [k] sysenter_past_esp
13.92% 0.02% rsync [kernel.kallsyms] [k] do_fast_syscall_32
8.58% 0.01% rsync [kernel.kallsyms] [k] sys_write
8.56% 0.01% rsync [kernel.kallsyms] [k] vfs_write
8.55% 0.01% rsync [kernel.kallsyms] [k] __vfs_write
8.55% 0.00% rsync [kernel.kallsyms] [k] ext4_file_write_iter
8.54% 0.00% rsync [kernel.kallsyms] [k] __generic_file_write_iter
8.36% 0.04% rsync [kernel.kallsyms] [k] generic_perform_write
6.96% 0.00% swapper [kernel.kallsyms] [k] irq_exit
6.81% 0.01% swapper [kernel.kallsyms] [k] do_softirq_own_stack
6.80% 0.01% swapper [kernel.kallsyms] [k] __do_softirq
6.30% 0.00% swapper [kernel.kallsyms] [k] do_IRQ
6.30% 0.00% swapper [kernel.kallsyms] [k] common_interrupt
5.76% 0.01% swapper [kernel.kallsyms] [k] net_rx_action
5.75% 0.10% swapper [kernel.kallsyms] [k] e1000_clean
5.33% 0.01% rsync [kernel.kallsyms] [k] sys_read
5.30% 0.00% rsync [kernel.kallsyms] [k] vfs_read
5.28% 0.02% rsync [kernel.kallsyms] [k] __vfs_read
5.23% 0.10% rsync [kernel.kallsyms] [k] generic_file_read_iter
5.22% 0.00% swapper [kernel.kallsyms] [k] netif_receive_skb_internal
5.21% 0.01% swapper [kernel.kallsyms] [k] __netif_receive_skb
5.20% 0.01% swapper [kernel.kallsyms] [k] __netif_receive_skb_core
5.19% 0.02% swapper [kernel.kallsyms] [k] ip_rcv
4.93% 0.01% swapper [kernel.kallsyms] [k] ip_rcv_finish
4.90% 0.01% swapper [kernel.kallsyms] [k] ip_local_deliver
4.86% 0.05% swapper [kernel.kallsyms] [k] e1000_clean_rx_irq
4.80% 0.00% swapper [kernel.kallsyms] [k] ip_local_deliver_finish
4.80% 0.01% swapper [kernel.kallsyms] [k] tcp_v4_rcv
4.73% 0.01% swapper [kernel.kallsyms] [k] tcp_v4_do_rcv