Re: [PATCH 1/1] mm: prevent a race between process_mrelease and exit_mmap

From: Suren Baghdasaryan
Date: Fri Oct 22 2021 - 01:23:24 EST


On Thu, Oct 21, 2021 at 7:25 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, 21 Oct 2021 18:46:58 -0700 Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> > Race between process_mrelease and exit_mmap, where free_pgtables is
> > called while __oom_reap_task_mm is in progress, leads to kernel crash
> > during pte_offset_map_lock call. oom-reaper avoids this race by setting
> > MMF_OOM_VICTIM flag and causing exit_mmap to take and release
> > mmap_write_lock, blocking it until oom-reaper releases mmap_read_lock.
> > Reusing MMF_OOM_VICTIM for process_mrelease would be the simplest way to
> > fix this race, however that would be considered a hack. Fix this race
> > by elevating mm->mm_users and preventing exit_mmap from executing until
> > process_mrelease is finished. Patch slightly refactors the code to adapt
> > for a possible mmget_not_zero failure.
> > This fix has considerable negative impact on process_mrelease performance
> > and will likely need later optimization.
>
> Has the impact been quantified?

A ball-park figure for a large process (6GB) it takes 4x times longer
for process_mrelease to exit.

>
> And where's the added cost happening? The changes all look quite
> lightweight?

I think it's caused by the fact that exit_mmap and all other cleanup
routines happening on the last mmput are postponed until
process_mrelease finishes __oom_reap_task_mm and drops mm->mm_users. I
suspect all that cleanup is happening at the end of process_mrelease
now and that might be contributing to the regression. I didn't have
time yet to fully understand all the reasons for that regression but
wanted to fix the crash first. Will proceed with more investigation
and hopefully with a quick fix for the lost performance.

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@xxxxxxxxxxx.
>