Re: [PATCH v4 0/2] fadvise: move active pages to inactive list withPOSIX_FADV_DONTNEED

From: Andrea Righi
Date: Wed Jun 29 2011 - 10:05:19 EST


On Wed, Jun 29, 2011 at 12:20:22PM +0100, Pádraig Brady wrote:
> On 29/06/11 00:03, Andrew Morton wrote:
> > On Wed, 29 Jun 2011 00:56:45 +0200
> > Andrea Righi <andrea@xxxxxxxxxxxxxxx> wrote:
> >
> >>>>
> >>>> In this way if the backup was the only user of a page, that page will be
> >>>> immediately removed from the page cache by calling POSIX_FADV_DONTNEED. If the
> >>>> page was also touched by other processes it'll be moved to the inactive list,
> >>>> having another chance of being re-added to the working set, or simply reclaimed
> >>>> when memory is needed.
> >>>
> >>> So if an application touches a page twice and then runs
> >>> POSIX_FADV_DONTNEED, that page will now not be freed.
> >>>
> >>> That's a big behaviour change. For many existing users
> >>> POSIX_FADV_DONTNEED simply doesn't work any more!
> >>
> >> Yes. This is the main concern that was raised by P__draig.
> >>
> >>>
> >>> I'd have thought that adding a new POSIX_FADV_ANDREA would be safer
> >>> than this.
> >>
> >> Actually Jerry (in cc) proposed
> >> POSIX_FADV_IDONTNEEDTHISBUTIFSOMEBODYELSEDOESTHENDONTTOUCHIT in a
> >> private email. :)
> >
> > Sounds good. Needs more underscores though.
> >
> >>>
> >>>
> >>> The various POSIX_FADV_foo's are so ill-defined that it was a mistake
> >>> to ever use them. We should have done something overtly linux-specific
> >>> and given userspace more explicit and direct pagecache control.
> >>
> >> That would give us the possibility to implement a wide range of
> >> different operations (drop, drop if used once, add to the active list,
> >> add to the inactive list, etc..). Some users always complain that they
> >> would like to have a better control over the page cache from userspace.
> >
> > Well, I'd listen to proposals ;)
> >
> > One thing we must be careful about is to not expose things like "active
> > list" to userspace. linux-4.5 may not _have_ an active list, and its
> > implementors would hate us and would have to jump through hoops to
> > implement vaguely compatible behaviour in the new scheme.
> >
> > So any primitives which are exposed should be easily implementable and
> > should *make sense* within any future scheme...
>
> Agreed.
>
> In fairness to posix_fadvise(), I think it's designed to
> specify hints for the current process' use of data
> so that it can get at it more efficiently and also be
> allow the system to manipulate cache more efficiently.
> I.E. it's not meant for direct control of the cache.
>
> That being said, existing use has allowed this,
> and it would be nice not to change without consideration.
>
> I've mentioned how high level cache control functions
> might map to the existing FADV knobs here:
>
> http://marc.info/?l=linux-kernel&m=130917619416123&w=2
>
> cheers,
> Pádraig.

OK, your proposal seems a good start to implement a better cache control
interface.

Basically you're proposing to provide the following operations:
1. DROP
2. DROP if used once
3. ADD
4. ADD if there's space

I would also add for sure:
5. ADD and will use once

Some of them are already implemented by the available fadvise()
operations, like 1 (POSIX_FADV_DONTNEED) and 3 (POSIX_FADV_WILLNEED).
Option 5 can be mapped to POSIX_FADV_NOREUSE, but it's not yet
implemented.

I need to think a little bit more about all of this. I'll try to post a
new RFC, proposing the list of high-level operations to implement the
better page cache control from userspace.

Suggestions, comments, ideas are always welcome.

Thanks,
-Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/