Re: [PATCH] mm: swap: async free swap slot cache entries

From: David Rientjes
Date: Sun Dec 24 2023 - 17:21:05 EST


On Sun, 24 Dec 2023, Chris Li wrote:

> > > > > > How do you quantify the impact of the delayed swap_entry_free()?
> > > > > >
> > > > > > Since the free and memcg uncharge are now delayed, is there not the
> > > > > > possibility that we stay under memory pressure for longer? (Assuming at
> > > > > > least some users are swapping because of memory pressure.)
> > > > > >
> > > > > > I would assume that since the free and uncharge itself is delayed that in
> > > > > > the pathological case we'd actually be swapping *more* until the async
> > > > > > worker can run.
> > > > >
> > > > > Thanks for raising this interesting question.
> > > > >
> > > > > First of all, the swap_entry_free() does not impact "memory.current".
> > > > > It reduces "memory.swap.current". Technically it is the swap pressure
> > > > > not memory pressure that suffers the extra delay.
> > > > >
> > > > > Secondly, we are talking about delaying up to 64 swap entries for a
> > > > > few microseconds.
> > > >
> > > > What guarantees that the async freeing happens within a few microseconds?
> > >
> > > Linux kernel typically doesn't provide RT scheduling guarantees. You
> > > can change microseconds to milliseconds, my following reasoning still
> > > holds.
> > >
> >
> > What guarantees that the async freeing happens even within 10s? Your
> > responses are implying that there is some deadline by which this freeing
> > absolutely must happen (us or ms), but I don't know of any strong
> > guarantees.
>
> I think we are in agreement there, there are no such strong guarantees
> in linux scheduling. However, when there are free CPU resources, the
> job will get scheduled to execute in a reasonable table time frame. If
> it does not, I consider that a bug if the CPU has idle resources and
> the pending jobs are not able to run for a long time.
> The existing code doesn't have such a guarantee either, see my point
> follows. I don't know why you want to ask for such a guarantee.
>

I'm simply trying to compare the pros and cons of the approach. As
pointed out previously by Andrew, this approach actually results in *more*
work to do freeing. Then, we could have a significant time delay before
the freeing is actually done, in addition to the code complexity. And
nothing safeguards us from an exponentially increasing amount of freeing
that will be done sometime in the future.

The current implementation provides strong guarantees that you will do
batched freeing that will not accumulate beyond a pre-defined threshold
which is proven to work well in practice.

My only question was about how we can quantify the impact of the delayed
free. We've come to the conclusion that it hasn't been quantified and
there is no guarantees on when it will be freed.

I'll leave it to those more invested in this path in the page fault
handler to provide feedback. Thanks!