Re: [PATCH v3] migrate_pages: Avoid blocking for IO in MIGRATE_SYNC_LIGHT

From: Doug Anderson
Date: Tue May 02 2023 - 17:21:26 EST


Hi,

On Sun, Apr 30, 2023 at 1:53 AM Hillf Danton <hdanton@xxxxxxxx> wrote:
>
> On 28 Apr 2023 13:54:38 -0700 Douglas Anderson <dianders@xxxxxxxxxxxx>
> > The MIGRATE_SYNC_LIGHT mode is intended to block for things that will
> > finish quickly but not for things that will take a long time. Exactly
> > how long is too long is not well defined, but waits of tens of
> > milliseconds is likely non-ideal.
> >
> > When putting a Chromebook under memory pressure (opening over 90 tabs
> > on a 4GB machine) it was fairly easy to see delays waiting for some
> > locks in the kcompactd code path of > 100 ms. While the laptop wasn't
> > amazingly usable in this state, it was still limping along and this
> > state isn't something artificial. Sometimes we simply end up with a
> > lot of memory pressure.
>
> Given longer than 100ms stall, this can not be a correct fix if the
> hardware fails to do more than ten IOs a second.
>
> OTOH given some pages reclaimed for compaction to make forward progress
> before kswapd wakes kcompactd up, this can not be a fix without spotting
> the cause of the stall.

Right that the system is in pretty bad shape when this happens and
it's not very effective at doing IO or much of anything because it's
under bad memory pressure.

I guess my first thought is that, when this happens then a process
holding the lock gets preempted and doesn't get scheduled back in for
a while. That _should_ be possible, right? In the case where I'm
reproducing this then all the CPUs would be super busy madly trying to
compress / decompress zram, so it doesn't surprise me that a process
could get context switched out for a while.

-Doug