Re: [PATCH 2/2] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and_NONVOLATILE flags

From: Dave Chinner
Date: Tue Feb 14 2012 - 00:17:24 EST


On Thu, Feb 09, 2012 at 04:16:33PM -0800, John Stultz wrote:
> This patch provides new fadvise flags that can be used to mark
> file pages as volatile, which will allow it to be discarded if the
> kernel wants to reclaim memory.
>
> This is useful for userspace to allocate things like caches, and lets
> the kernel destructively (but safely) reclaim them when there's memory
> pressure.
.....
> @@ -655,6 +656,8 @@ struct address_space {
> spinlock_t private_lock; /* for use by the address_space */
> struct list_head private_list; /* ditto */
> struct address_space *assoc_mapping; /* ditto */
> + struct range_tree_node *volatile_root; /* volatile range list */
> + struct mutex vlist_mutex; /* protect volatile_list */
> } __attribute__((aligned(sizeof(long))));

So you're adding roughly 32 bytes to every cached inode in the
system? This will increasing the memory footprint of the inode cache
by 2-5% (depending on the filesystem). Almost no-one will be using
this functionality on most inodes that are cached in the system, so
that seems like a pretty bad trade-off to me...

> +static int volatile_shrink(struct shrinker *ignored, struct shrink_control *sc)
> +{
> + struct volatile_range *range, *next;
> + unsigned long nr_to_scan = sc->nr_to_scan;
> + const gfp_t gfp_mask = sc->gfp_mask;
> +
> + /* We might recurse into filesystem code, so bail out if necessary */
> + if (nr_to_scan && !(gfp_mask & __GFP_FS))
> + return -1;
> + if (!nr_to_scan)
> + return lru_count;
> +
> + mutex_lock(&volatile_lru_mutex);
> + list_for_each_entry_safe(range, next, &volatile_lru_list, lru) {
> + struct inode *inode = range->mapping->host;
> + loff_t start, end;
> +
> +
> + start = range->range_node.start * PAGE_SIZE;
> + end = (range->range_node.end + 1) * PAGE_SIZE - 1;
> +
> + /*
> + * XXX - calling vmtruncate_range from a shrinker causes
> + * lockdep warnings. Revisit this!
> + */
> + vmtruncate_range(inode, start, end);

That function vmtruncate_range, I don't think it does what you think
it does.

Firstly, it's only implemented for shmfs/tmpfs, so this can't have
been tested for caching files on any real filesystem. If it's only
for shm/tmpfs, then the applications cwcan just as easily use their
own memory for caching their volatile data...

Secondly, vmtruncate_range() is actually a hole punching function,
not a page cache invalidation function. You should be using
invalidate_inode_pages2_range() to invalidate and tear down the page
cache. If you really want to punch holes in files, then you should
be using the fallocate syscall with direct application control, not
trying to hide it until memory pressure occurs via fadvise because
hole punching requires memory for the transactions necessary to run
extent freeing operations.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/