Re: [PATCH] mm,oom_reaper: avoid run queue_oom_reaper if task is not oom

From: Michal Hocko
Date: Thu Nov 23 2023 - 03:51:15 EST


On Wed 22-11-23 12:46:44, gaoxu wrote:
> The function queue_oom_reaper tests and sets tsk->signal->oom_mm->flags.
> However, it is necessary to check if 'tsk' is an OOM victim before
> executing 'queue_oom_reaper' because the variable may be NULL.
>
> We encountered such an issue, and the log is as follows:
> [3701:11_see]Out of memory: Killed process 3154 (system_server)
> total-vm:23662044kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB,
> UID:1000 pgtables:4056kB oom_score_adj:-900

> [3701:11_see][RB/E]rb_sreason_str_set: sreason_str set null_pointer
> [3701:11_see][RB/E]rb_sreason_str_set: sreason_str set unknown_addr

What are these?

> [3701:11_see]Unable to handle kernel NULL pointer dereference at virtual
> address 0000000000000328
> [3701:11_see]user pgtable: 4k pages, 39-bit VAs, pgdp=00000000821de000
> [3701:11_see][0000000000000328] pgd=0000000000000000,
> p4d=0000000000000000,pud=0000000000000000
> [3701:11_see]tracing off
> [3701:11_see]Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [3701:11_see]Call trace:
> [3701:11_see] queue_oom_reaper+0x30/0x170

Could you resolve this offset into the code line please?

> [3701:11_see] __oom_kill_process+0x590/0x860
> [3701:11_see] oom_kill_process+0x140/0x274
> [3701:11_see] out_of_memory+0x2f4/0x54c
> [3701:11_see] __alloc_pages_slowpath+0x5d8/0xaac
> [3701:11_see] __alloc_pages+0x774/0x800
> [3701:11_see] wp_page_copy+0xc4/0x116c
> [3701:11_see] do_wp_page+0x4bc/0x6fc
> [3701:11_see] handle_pte_fault+0x98/0x2a8
> [3701:11_see] __handle_mm_fault+0x368/0x700
> [3701:11_see] do_handle_mm_fault+0x160/0x2cc
> [3701:11_see] do_page_fault+0x3e0/0x818
> [3701:11_see] do_mem_abort+0x68/0x17c
> [3701:11_see] el0_da+0x3c/0xa0
> [3701:11_see] el0t_64_sync_handler+0xc4/0xec
> [3701:11_see] el0t_64_sync+0x1b4/0x1b8
> [3701:11_see]tracing off
>
> Signed-off-by: Gao Xu <gaoxu2@xxxxxxxxxxx>
> ---
> mm/oom_kill.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 9e6071fde..3754ab4b6 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -984,7 +984,7 @@ static void __oom_kill_process(struct task_struct *victim, const char *message)
> }
> rcu_read_unlock();
>
> - if (can_oom_reap)
> + if (can_oom_reap && tsk_is_oom_victim(victim))
> queue_oom_reaper(victim);

I do not understand. We always do send SIGKILL and call
mark_oom_victim(victim); on victim task when reaching out here. How can
tsk_is_oom_victim can ever be false?

>
> mmdrop(mm);
> --
> 2.17.1
>
>

--
Michal Hocko
SUSE Labs