Re: kernel BUG at mm/truncate.c:475!

From: Andrew Morton
Date: Mon Dec 13 2010 - 17:21:25 EST


On Sat, 11 Dec 2010 15:14:47 +0100
Miklos Szeredi <miklos@xxxxxxxxxx> wrote:

> On Mon, 6 Dec 2010, Michael Leun wrote:
> > At the moment I'm trying to create an easy to reproduce scenario.
> >
>
> I've managed to reproduce the BUG. First I thought it has to do with
> fork() racing with invalidate_inode_pages2_range() but it turns out,
> just two parallel invocation of invalidate_inode_pages2_range() with
> some page faults going on can trigger it.
>
> The problem is: unmap_mapping_range() is not prepared for more than
> one concurrent invocation per inode. For example:
>
> thread1: going through a big range, stops in the middle of a vma and
> stores the restart address in vm_truncate_count.
>
> thread2: comes in with a small (e.g. single page) unmap request on
> the same vma, somewhere before restart_address, finds that the

"restart_addr", please.

> vma was already unmapped up to the restart address and happily
> returns without doing anything.
>
> Another scenario would be two big unmap requests, both having to
> restart the unmapping and each one setting vm_truncate_count to its
> own value. This could go on forever without any of them being able to
> finish.
>
> Truncate and hole punching already serialize with i_mutex. Other
> callers of unmap_mapping_range() do not, however, and I see difficulty
> with doing it in the callers. I think the proper solution is to add
> serialization to unmap_mapping_range() itself.
>
> Attached patch attempts to do this without adding more fields to
> struct address_space. It fixes the bug in my testing.
>

That's a pretty old bug, isn't it? 5+ years.

>
>
> ---
> include/linux/pagemap.h | 1 +
> mm/memory.c | 14 ++++++++++++++
> 2 files changed, 15 insertions(+)
>
> Index: linux.git/include/linux/pagemap.h
> ===================================================================
> --- linux.git.orig/include/linux/pagemap.h 2010-11-26 10:52:17.000000000 +0100
> +++ linux.git/include/linux/pagemap.h 2010-12-11 13:39:32.000000000 +0100
> @@ -24,6 +24,7 @@ enum mapping_flags {
> AS_ENOSPC = __GFP_BITS_SHIFT + 1, /* ENOSPC on async write */
> AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */
> AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
> + AS_UNMAPPING = __GFP_BITS_SHIFT + 4, /* for unmap_mapping_range() */
> };
>
> static inline void mapping_set_error(struct address_space *mapping, int error)
> Index: linux.git/mm/memory.c
> ===================================================================
> --- linux.git.orig/mm/memory.c 2010-12-11 13:07:28.000000000 +0100
> +++ linux.git/mm/memory.c 2010-12-11 14:09:42.000000000 +0100
> @@ -2535,6 +2535,12 @@ static inline void unmap_mapping_range_l
> }
> }
>
> +static int mapping_sleep(void *x)
> +{
> + schedule();
> + return 0;
> +}
> +
> /**
> * unmap_mapping_range - unmap the portion of all mmaps in the specified address_space corresponding to the specified page range in the underlying file.
> * @mapping: the address space containing mmaps to be unmapped.
> @@ -2572,6 +2578,9 @@ void unmap_mapping_range(struct address_
> details.last_index = ULONG_MAX;
> details.i_mmap_lock = &mapping->i_mmap_lock;
>
> + wait_on_bit_lock(&mapping->flags, AS_UNMAPPING, mapping_sleep,
> + TASK_UNINTERRUPTIBLE);
> +
> spin_lock(&mapping->i_mmap_lock);
>
> /* Protect against endless unmapping loops */
> @@ -2588,6 +2597,11 @@ void unmap_mapping_range(struct address_
> if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
> unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
> spin_unlock(&mapping->i_mmap_lock);
> +
> + clear_bit_unlock(AS_UNMAPPING, &mapping->flags);
> + smp_mb__after_clear_bit();
> + wake_up_bit(&mapping->flags, AS_UNMAPPING);
> +

I do think this was premature optimisation. The open-coded lock is
hidden from lockdep so we won't find out if this introduces potential
deadlocks. It would be better to add a new mutex at least temporarily,
then look at replacing it with a MiklosLock later on, when the code is
bedded in.

At which time, replacing mutexes with MiklosLocks becomes part of a
general "shrink the address_space" exercise in which there's no reason
to exclusively concentrate on that new mutex!


How hard is it to avoid adding a new lock and using an existing one,
presumablt i_mutex? Because if we can get i_mutex coverage over
unmap_mapping_range() then I suspect all the
vm_truncate_count/restart_addr stuff can go away?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/