Re: [PATCH] swap: send callback when swap slot is freed

From: Nitin Gupta
Date: Wed Aug 12 2009 - 22:30:30 EST


On 08/13/2009 04:18 AM, Hugh Dickins wrote:
On Wed, 12 Aug 2009, Nitin Gupta wrote:

Currently, we have "swap discard" mechanism which sends a discard bio request
when we find a free cluster during scan_swap_map(). This callback can come a
long time after swap slots are actually freed.

This delay in callback is a great problem when (compressed) RAM [1] is used
as a swap device. So, this change adds a callback which is called as
soon as a swap slot becomes free. For above mentioned case of swapping
over compressed RAM device, this is very useful since we can immediately
free memory allocated for this swap page.

This callback does not replace swap discard support. It is called with
swap_lock held, so it is meant to trigger action that finishes quickly.
However, swap discard is an I/O request and can be used for taking longer
actions.

Links:
[1] http://code.google.com/p/compcache/

Please keep this with compcache for the moment (it has no other users).

I don't share Peter's view that it should be using a more general
notifier interface (but I certainly agree with his EXPORT_SYMBOL_GPL).

Considering that the callback is made under swap_lock, we should not have an array of callbacks to do. But what if this callback finds other users too? I think we should leave it in its current state till it finds more users and probably add BUG() to make sure callback is not already set.

I will make it EXPORT_SYMBOL_GPL.

There better not be others hooking in here at the same time (a BUG_ON
could check that): in fact I don't even want you hooking in here where
swap_lock is held. Glancing at compcache, I don't see you violating
lock hierarchy by that, but it is a worry.


I tried an approach that allows releasing swap_lock and 'lazily' make
the callback but this turned out to be pretty messy. So, I think just
adding a note that the callback is done under swap_lock should be better.


The interface to set the notifier, you currently have it by swap type:
that would better be by bdev, wouldn't it? with a search for the right
slot. There's nowhere else in ramzswap.c that you rely on swp_entry_t
and page_private(page), let's keep such details out of compcache.


Use of bdev instead of swap_entry_t looks better. I will make this change.


But fundamentally, though I can see how this cutdown communication
path is useful to compcache, I'd much rather deal with it by the more
general discard route if we can. (I'm one of those still puzzled by
the way swap is mixed up with block device in compcache: probably
because I never found time to pay attention when you explained.)


I tried this too -- make discard bio request as soon as a swap slot becomes free (I can send details if you want). However, I could not get it to work. Also, allocating bio to issue discard I/O request looks like a complete artifact in compcache case.


You're right to question the utility of the current swap discard
placement. That code is almost a year old, written from a position
of great ignorance, yet only now do we appear to be on the threshold
of having an SSD which really supports TRIM (ah, the Linux ATA TRIM
support seems to have gone missing now, but perhaps it's been
waiting for a reality to check against too - Willy?).


I won't be surprised if we find that we need to move swap discard
support much closer to swap_free (though I know from trying before
that it's much messier there): in which case, even if we decided to
keep your hotline to compcache (to avoid allocating bios etc.), it
would be better placed alongside.


This new callback and discard can actually co-exist: Use callback to trigger small actions and discard for longer actions. Depending on use case, you might need both or either one of these.


I am not very sure how willing you are to accept this patch but let me send another revision with all the suggestions from you all.


Thanks for looking into this.
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/