Re: [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers

From: Andrea Arcangeli
Date: Wed Aug 30 2017 - 13:49:30 EST


Hello Michal,

On Wed, Aug 30, 2017 at 10:46:00AM +0200, Michal Hocko wrote:
> + * TODO: we really want to get rid of this ugly hack and make sure that
> + * notifiers cannot block for unbounded amount of time and add
> + * mmu_notifier_invalidate_range_{start,end} around unmap_page_range

KVM already should be ok in that respect. However the major reason to
prefer mmu_notifier_invalidate_range_start/end is those can block and
schedule waiting for stuff happening behind the PCI bus easily. So I'm
not sure if the TODO is good idea to keep.

> + */
> + if (mm_has_notifiers(mm)) {
> + schedule_timeout_idle(HZ);

Why the schedule_timeout? What's the difference with the OOM
reaper going to sleep again in the main loop instead?

> + goto unlock_oom;
> + }

mm_has_notifiers stops changing after obtaining the mmap_sem for
reading. See the do_mmu_notifier_register. So it's better to put the
mm_has_notifiers check immediately after the below:

> if (!down_read_trylock(&mm->mmap_sem)) {
> ret = false;
> trace_skip_task_reaping(tsk->pid);

If we succeed taking the mmap_sem for reading then we read a stable
value out of mm_has_notifiers and be sure it won't be set from under
us.

Otherwise the patch looks fine including the incremental comment about
why the mmu_notifier_invalidate_range in MMU gather wasn't enough.

Thanks!
Andrea