Re: page table lock patch V15 [0/7]: overview

From: Andi Kleen
Date: Thu Jan 13 2005 - 15:30:37 EST


On Thu, Jan 13, 2005 at 10:16:58AM -0800, Christoph Lameter wrote:
> On Thu, 13 Jan 2005, Andi Kleen wrote:
>
> > The rule in i386/x86-64 is that you cannot set the PTE in a non atomic way
> > when its present bit is set (because the hardware could asynchronously
> > change bits in the PTE that would get lost). Atomic way means clearing
> > first and then replacing in an atomic operation.
>
> Hmm. I replaced that portion in the swapper with an xchg operation
> and inspect the result later. Clearing a pte and then setting it to
> something would open a window for the page fault handler to set up a new

Yes, it usually assumes page table lock hold.

> pte there since it does not take the page_table_lock. That xchg must be
> atomic for PAE mode to work then.

You can always use cmpxchg8 for that if you want. Just to make
it really atomic you may need a LOCK prefix, and with that the
cost is not much lower than a real spinlock.


>
> > This helps you because you shouldn't be looking at the pte anyways
> > when pte_present is false. When it is not false it is always updated
> > atomically.
>
> so pmd_present, pud_none and pgd_none could be considered atomic even if
> the pm/u/gd_t is a multi-word entity? In that case the current approach

The optimistic read function I posted would do this.

But you have to read multiple entries anyways, which could get
non atomic no? (e.g. to do something on a PTE you always need
to read PGD/PUD/PMD)

In theory you could do this lazily with retires too, but it would be probably
somewhat costly and complicated.

> would work for higher level entities and in particular S/390 would be in
> the clear.
>
> But then the issues of replacing multi-word ptes on i386 PAE remain. If no
> write lock is held on mmap_sem then all writes to pte's must be atomic in

mmap_sem is only for VMAs. The page tables itself are protected by page table
lock.

> order for the get_pte_atomic operation to work reliably.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/