Re: [PATCH v2] mm: prototype: rid swapoff of quadratic complexity

From: Matthew Wilcox
Date: Mon Nov 26 2018 - 12:22:59 EST


On Mon, Nov 26, 2018 at 04:55:21PM +0000, Vineeth Remanan Pillai wrote:
> + do {
> + XA_STATE(xas, &mapping->i_pages, start);
> + int i;
> + int entries = 0;
> + struct page *page;
> + pgoff_t indices[PAGEVEC_SIZE];
> + unsigned long end = start + PAGEVEC_SIZE;
>
> + rcu_read_lock();
> + xas_for_each(&xas, page, end) {

I think this is a mistake. You should probably specify ULONG_MAX for the
end. Otherwise if there are no swap entries in the first 60kB of the file,
you'll just exit. That does mean you'll need to check 'entries' for
hitting PAGEVEC_SIZE.

> + if (xas_retry(&xas, page))
> + continue;
>
> + if (!xa_is_value(page))
> + continue;
>
> + indices[entries++] = xas.xa_index;
> +
> + if (need_resched()) {
> + xas_pause(&xas);
> + cond_resched_rcu();
> + }
> }
> + rcu_read_unlock();
> +
> + if (entries == 0)
> + break;
> +
> + for (i = 0; i < entries; i++) {
> + int err = 0;
> +
> + err = shmem_getpage(inode, indices[i],
> + &page, SGP_CACHE);
> + if (err == 0) {
> + unlock_page(page);
> + put_page(page);
> + }
> + if (err == -ENOMEM)
> + goto out;
> + else
> + error = err;
> + }
> + start = xas.xa_index;
> + } while (true);
> +
> +out:
> return error;
> }

This seems terribly complicated. You run through i_pages, record the
indices of the swap entries, then go back and look them up again by
calling shmem_getpage() which calls the incredibly complex 300 line
shmem_getpage_gfp().

Can we refactor shmem_getpage_gfp() to skip some of the checks which
aren't necessary when called from this path, and turn this into a nice
simple xas_for_each() loop which works one entry at a time?

> /*
> - * Search through swapped inodes to find and replace swap by page.
> + * Read all the shared memory data that resides in the swap
> + * device 'type' back into memory, so the swap device can be
> + * unused.
> */
> -int shmem_unuse(swp_entry_t swap, struct page *page)
> +int shmem_unuse(unsigned int type)
> {
> - struct list_head *this, *next;
> struct shmem_inode_info *info;
> - struct mem_cgroup *memcg;
> + struct inode *inode;
> + struct inode *prev_inode = NULL;
> + struct list_head *p;
> + struct list_head *next;
> int error = 0;
>
> - /*
> - * There's a faint possibility that swap page was replaced before
> - * caller locked it: caller will come back later with the right page.
> - */
> - if (unlikely(!PageSwapCache(page) || page_private(page) != swap.val))
> - goto out;
> + if (list_empty(&shmem_swaplist))
> + return 0;
> +
> + mutex_lock(&shmem_swaplist_mutex);
> + p = &shmem_swaplist;
>
> /*
> - * Charge page using GFP_KERNEL while we can wait, before taking
> - * the shmem_swaplist_mutex which might hold up shmem_writepage().
> - * Charged back to the user (not to caller) when swap account is used.
> + * The extra refcount on the inode is necessary to safely dereference
> + * p->next after re-acquiring the lock. New shmem inodes with swap
> + * get added to the end of the list and we will scan them all.
> */
> - error = mem_cgroup_try_charge_delay(page, current->mm, GFP_KERNEL,
> - &memcg, false);
> - if (error)
> - goto out;
> - /* No memory allocation: swap entry occupies the slot for the page */
> - error = -EAGAIN;
> + list_for_each_safe(p, next, &shmem_swaplist) {
> + info = list_entry(p, struct shmem_inode_info, swaplist);

This could use list_for_each_entry_safe(), right?