Re: [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section

From: KOSAKI Motohiro
Date: Sat Jan 07 2012 - 03:28:46 EST


(1/6/12 4:10 PM), Hugh Dickins wrote:
scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
evictable again once the shared memory is unlocked. It does this with
pagevec_lookup()s across the whole object (which might occupy most of
memory), and takes 300ms to unlock 7GB here. A cond_resched() every
PAGEVEC_SIZE pages would be good.

However, KOSAKI-san points out that this is called under shmem.c's
info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
There is no strong reason for that: we need to take these pages off
the unevictable list soonish, but those locks are not required for it.

So move the call to scan_mapping_unevictable_pages() from shmem.c's
unlock handling up to shm.c's unlock handling. Remove the recently
added barrier, not needed now we have spin_unlock() before the scan.

Use get_file(), with subsequent fput(), to make sure we have a
reference to mapping throughout scan_mapping_unevictable_pages():
that's something that was previously guaranteed by the shm_lock().

Remove shmctl's lru_add_drain_all(): we don't fault in pages at
SHM_LOCK time, and we lazily discover them to be Unevictable later,
so it serves no purpose for SHM_LOCK; and serves no purpose for
SHM_UNLOCK, since pages still on pagevec are not marked Unevictable.

The original code avoided redundant rescans by checking VM_LOCKED
flag at its level: now avoid them by checking shp's SHM_LOCKED.

The original code called scan_mapping_unevictable_pages() on a
locked area at shm_destroy() time: perhaps we once had accounting
cross-checks which required that, but not now, so skip the overhead
and just let inode eviction deal with them.

Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
more as comment than to save space; comment them used for SHM_UNLOCK.

Signed-off-by: Hugh Dickins<hughd@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx [back to 2.6.32 but will need respins]

Looks completely make sense.
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/