Re: [COUNTERPATCH] mm: avoid overflowing preempt_count() inmmu_take_all_locks()

From: Peter Zijlstra
Date: Thu Apr 01 2010 - 12:32:55 EST


On Thu, 2010-04-01 at 18:07 +0200, Andrea Arcangeli wrote:
> On Thu, Apr 01, 2010 at 05:56:02PM +0200, Peter Zijlstra wrote:
> > Another thing is mm->nr_ptes, that doens't appear to be properly
> > serialized, __pte_alloc() does ++ under mm->page_table_lock, but
> > free_pte_range() does -- which afaict isn't always with page_table_lock
> > held, it does however always seem to have mmap_sem for writing.
>
> Not saying this is necessarily safe, but how can be that relevant with
> spinlock->mutex/rwsem conversion?

Not directly, but I keep running into that BUG_ON() at the end up
exit_mmap() with my conversion patch, and I though that maybe I widened
the race window.

But I guess I simply messed something up.

> Only thing that breaks with that
> conversion would be RCU (the very anon_vma rcu breaks because it
> rcu_read_lock disabling preempt and then takes the anon_vma->lock,
> that falls apart because taking the anon_vma->lock will imply a
> schedule), but nr_ptes is a write operation so it can't be protected
> by RCU.
>
> > However __pte_alloc() callers do not in fact hold mmap_sem for writing.
>
> As long as the mmap_sem readers always also take the page_table_lock
> we're safe.

Ah, I see so its: down_read(mmap_sem) + page_table_lock that's exclusive
against down_write(mmap_sem), nifty, should be a comment somewhere.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/