Re: [patch 2/6] mmu_notifier: Callbacks to invalidate addressranges

From: Andrea Arcangeli
Date: Tue Jan 29 2008 - 19:01:04 EST


On Tue, Jan 29, 2008 at 02:39:00PM -0800, Christoph Lameter wrote:
> If it does not run in write mode then concurrent faults are permissible
> while we remap pages. Weird. Maybe we better handle this like individual
> page operations? Put the invalidate_page back into zap_pte. But then there
> would be no callback w/o lock as required by Robin. Doing the

The Robin requirements and the need to schedule are the source of the
complications indeed.

I posted all the KVM patches using mmu notifiers, today I reposted the
ones to work with your V2 (which crashes my host unlike my last
simpler mmu notifier patch but I also changed a few other variable
besides your mmu notifier changes, so I can't yet be sure it's a bug
in your V2, and the SMP regressions I fixed so far sure can't explain
the crashes because my KVM setup could never run in do_wp_page nor
remap_file_pages so it's something else I need to find ASAP).

Robin, if you don't mind, could you please post or upload somewhere
your GPLv2 code that registers itself in Christoph's V2 notifiers? Or
is it top secret? I wouldn't mind to have a look so I can better
understand what's the exact reason you're sleeping besides attempting
GFP_KERNEL allocations. Thanks!

> invalidate_range after populate allows access to memory for which ptes
> were zapped and the refcount was released.

The last refcount is released by the invalidate_range itself.

> > All pins are gone by the time invalidate_page/range returns. But there
> > is no critical section between invalidate_page and the _later_
> > ptep_clear_flush. So get_user_pages is free to run and take the PT
> > lock before the ptep_clear_flush, find the linux pte still
> > instantiated, and to create a new spte, before ptep_clear_flush runs.
>
> Hmmm... Right. Did not consider get_user_pages. A write to the page that
> is not marked dirty would typically require a fault that will serialize.

The pte is already marked dirty (and this is the case only for
get_user_pages, regular linux writes don't fault unless it's
explicitly writeprotect, which is mandatory in a few archs, x86 not).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/