Re: [PATCH 0/7] mm: Improve swap path scalability with batched operations

From: Michal Hocko
Date: Wed May 04 2016 - 15:49:10 EST


On Wed 04-05-16 10:13:06, Tim Chen wrote:
> On Wed, 2016-05-04 at 14:45 +0200, Michal Hocko wrote:
> > On Tue 03-05-16 14:00:39, Tim Chen wrote:
> > [...]
> > >
> > >  include/linux/swap.h |  29 ++-
> > >  mm/swap_state.c      | 253 +++++++++++++-----
> > >  mm/swapfile.c        | 215 +++++++++++++--
> > >  mm/vmscan.c          | 725 ++++++++++++++++++++++++++++++++++++++-
> > > ------------
> > >  4 files changed, 945 insertions(+), 277 deletions(-)
> > This is rather large change for a normally rare path. We have been
> > trying to preserve the anonymous memory as much as possible and
> > rather
> > push the page cache out. In fact swappiness is ignored most of the
> > time for the vast majority of workloads.
> >
> > So this would help anonymous mostly workloads and I am really
> > wondering
> > whether this is something worth bothering without further and deeper
> > rethinking of our current reclaim strategy. I fully realize that the
> > swap out sucks and that the new storage technologies might change the
> > way how we think about anonymous memory being so "special" wrt. disk
> > based caches but I would like to see a stronger use case than "we
> > have
> > been playing with some artificial use case and it scales better"
>
> With non-volatile ram based block devices, swap device could be very
> fast, approaching RAM speed and can potentially be used as a secondary
> memory. Just configuring these NVRAM as swap will be
> an easy way for apps to make use of them without doing any heavy
> lifting to change the apps.  But the swap path is so 
> un-scalable today that such use case
> is unfeasible, even more so for multi-threaded server machines.

In order this to work other quite intrusive changes to the current
reclaim decisions would have to be made though. This is what I tried to
say. Look at get_scan_count() on how we are making many steps to ignore
swappiness or prefer the page cache. Even when we make swapout scale it
won't help much if we do not swap out that often. That's why I claim
that we really should think more long term and maybe reconsider these
decisions which were based on the rotating rust for the swap devices.

> I understand that the patch set is a little large. Any better
> ideas for achieving similar ends will be appreciated.  I put
> out these patches in the hope that it will spur solutions
> to improve swap.
>
> Perhaps the first two patches to make shrink_page_list into
> smaller components can be considered first, as a first step 
> to make any changes to the reclaim code easier.

I didn't get to review those yet and probably will not get to them
shortly (sorry about that). shrink_page_list is surely one giant
function that is calling for a better layout/split out. I wouldn't be
opposed but there are some subtle details lurking there which make
clean ups non-trivial. I will not discourage you from trying to get it
into shape of course.

--
Michal Hocko
SUSE Labs