Re: [PATCH v4] mm: swap: async free swap slot cache entries

From: Andrew Morton
Date: Thu Feb 15 2024 - 23:18:08 EST


On Thu, 15 Feb 2024 17:38:38 -0800 Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote:

> > What this description lacks is any description of why anyone cares.
> >
> > The patch clearly decreases overall throughput (speed-vs-latency is a
> > common tradeoff).

This, please.

> > And the "we don't know how to fix this properly so punt it into a
> > kernel thread" approach remains lame. For example, the risk that the
> > now-liberated allocator can outpace the async freeing, resulting in
> > unlimited object windup.
>
>
> Andrew,
>
> What you are saying about outpacing asyn free is true for v1 and v2 versions of the patch.
>
> But in this latest version, if another reclaim comes in before the async free has kicked in,
> we would be freeing the whole cache directly, same as original code, without waiting
> for the async free. It is different from the first version
> where you go into the free one at a time mode while waiting for the async free. 
> That was also my objection to the first two versions as you could be in this
> slow free one at a time mode for a long time.
>
> So now we should not have unlimited object windup. And we would be doing free
> in batch of 64, either still in the direct path or in the async path.
>

OK, thanks, I didn't read closely enough,

> If the next swap fault comes in very fast, before the async
> free gets a chance to run. It will directly free all the swap
> cache in the swap fault the same way as previously.

And might it be a win to cancel the async_work in this case?


Again, without a clear description of the userspace-visible effects of
this problem I am groping in the dark. My hands blindly landed upon
the question: the overall effect here is to leave worst-case latency
unaltered, but to decrease average latency. Does this satisfy the
yet-to-be-described requirements?


Also, the V4 patch's quoted quantitative testing results are pasted
from the V2 patch's. V2 was a fundamentally different implementation.
I think it is fair to say that V4 is "untested", with regard to
satisfying its runtime objectives.