Re: [PATCH v6 0/4] fadvise(DONTNEED) support

From: Andrea Arcangeli
Date: Mon Feb 21 2011 - 14:09:10 EST

Next message: Andi Kleen: "[PATCH 5/8] Use correct numa policy node for transparent hugepages"
Previous message: Andi Kleen: "[PATCH 7/8] Use GFP_OTHER_NODE for transparent huge pages"
In reply to: Minchan Kim: "[PATCH v6 3/3] Reclaim invalidated page ASAP"
Next in thread: Minchan Kim: "Re: [PATCH v6 0/4] fadvise(DONTNEED) support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello,

On Sun, Feb 20, 2011 at 11:43:35PM +0900, Minchan Kim wrote:
> Recently, there was a reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into

"recently" and "thrashing horribly" seem to signal a regression. Ok
that trying to have backup not messing up the VM working set, but by
any means running rsync in a loop shouldn't lead a server into
"trashing horribly" (other than for the additional disk I/O, just like
if rsync would be using O_DIRECT).

This effort in teaching rsync to tell the VM it's likely an used-once
type of access to the cache is good (tar will need it too), but if
this is a regression like it appears from the words above ("recently"
and "trashing horribly"), I suspect it's much higher priority to fix a
VM regression than to add fadvise support in rsync/tar. Likely if the
system didn't start "trashing horribly", they wouldn't need rsync.

Then fadvise becomes an improvement on top of that.

It'd be nice if at least it was tested if older kernel wouldn't trash
horribly after being left inactive overnight. If it still trashes
horribly with 2.6.18 ok... ignore this, otherwise we need a real fix.

I'm quite comfortable that older kernels would do perfectly ok with a
loop of rsync overnight while the system was idle. I also got people
asking me privately what to do to avoid the backup to swapout, that
further make me believe something regressed recently as older VM code
would never swapout on such a workload, even if you do used twice or 3
times in a row. If it swapout that's the real bug.

I had questions about limiting the pagecache size to a certain amount,
that works too, but that's again a band aid like fadvise, and it's
real minor issue compared to fixing the VM so that at least you can
tell the kernel "nuke all clean cache first", being able to tell the
kernel just that (even if some VM clever algorithm thinks swapping is
better and we want to swap by default) will fix it. We still need a
way to make the kernel behave perfect with zero swapping without
fadvise and without limiting the cache. Maybe setting swappiness to 0
just does that, I suggested that and I heard nothing back.

If you can reproduce I suggest making sure that at least it doesn't
swap anything during the overnight workload as that would signal a
definitive problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andi Kleen: "[PATCH 5/8] Use correct numa policy node for transparent hugepages"
Previous message: Andi Kleen: "[PATCH 7/8] Use GFP_OTHER_NODE for transparent huge pages"
In reply to: Minchan Kim: "[PATCH v6 3/3] Reclaim invalidated page ASAP"
Next in thread: Minchan Kim: "Re: [PATCH v6 0/4] fadvise(DONTNEED) support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]