Re: [PATCH] zsmalloc: move LRU update from zs_map_object() to zs_malloc()

From: Johannes Weiner
Date: Tue May 09 2023 - 13:44:09 EST


On Tue, May 09, 2023 at 12:00:30PM +0900, Sergey Senozhatsky wrote:
> On (23/05/08 09:00), Nhat Pham wrote:
> > > The deeper bug here is that zs_map_object() tries to add the page to
> > > the LRU list while the shrinker has it isolated for reclaim. This is
> > > way too sutble and error prone. Even if it worked now, it'll cause
> > > corruption issues down the line.
> > >
> > > For example, Nhat is adding a secondary entry point to reclaim.
> > > Reclaim expects that a page that's on the LRU is also on the fullness
> > > list, so this would lead to a double remove_zspage() and BUG_ON().
> > >
> > > This patch doesn't just fix the crash, it eliminates the deeper LRU
> > > isolation issue and makes the code more robust and simple.
> >
> > I agree. IMO, less unnecessary concurrent interaction is always a
> > win for developers' and maintainers' cognitive load.
>
> Thanks for all the explanations.
>
> > As a side benefit - this also gets rid of the inelegant check
> > (mm == ZS_MM_WO). The fact that we had to include a
> > a multi-paragraph explanation for a 3-line piece of code
> > should have been a red flag.
>
> Minchan had some strong opinion on that, so we need to hear from him
> before we decide how do we fix it.

I'd be happy if he could validate the fix. But this fixes a crash, so
the clock is ticking.

I will also say, his was a design preference. One we agreed to only
very reluctantly: https://lore.kernel.org/lkml/Y3f6habiVuV9LMcu@xxxxxxxxxx/

Now we have a crash that is a direct result of it, and which cost us
(and apparently is still costing us) time and energy to resolve.

Unless somebody surfaces a real technical problem with the fix, I'd
say let's do it our way this time.