Re: [PATCH] ppc64: Fix possible race with set_pte on a present PTE

From: Linus Torvalds
Date: Tue May 25 2004 - 15:51:56 EST




On Tue, 25 May 2004, David S. Miller wrote:
>
> Hmmm, do you understand how broken the sparc hardware is? :-)

I seem to always repress that part.

> Seriously, the issue is that the MMU writes back access/dirty bits
> asynchronously, does not do a relookup when it writes these bits back
> into the PTE (like x86 and others do) it actually stores away the PTE
> physical address and writes into the PTE using that, and finally as
> previously mentioned we lack a cmpxchg we only have raw SWAP.

Ok. Still, that doesn't sound too bad. I assume that the pte write has to
be atomic anyway (ie it doesn't walk the page tables, but it clearly _has_
to do an atomic "read-modify-update" or just the _hardware_ updates might
race against each other and one CPU loses the dirty bit when another CPU
writes back the accessed bit).

So I really think it should be safe to just do a regular write when you
update both bits, because you know that no other CPU will at least _clear_
any bits. Hmm?

So it sounds like the SWAP loop basically ends up being just something
like

val = pte_value(entry);
for (;;) {
oldval = SWAP(ptep, val);
/*
* If we wrote a value that had the same or more bits set
* than the old value, we're ok...
*/
if (!(oldval & ~val))
break;
/*
* ..otherwise we need to write the "or" of all bits and
* try again.
*/
val |= oldval;
}

Right? Note that the reason we can do the "dirty and accessed bit both
set" case with a simple write is exactly because that's already the
"maximal" bits anybody can write to the thing, so we don't need to loop,
we can just write it directly.

I realize that this isn't as simple as the x86 solution (or most
everything else, for that matter ;), but it's by no means totally
unreasonable. Or are there even _more_ gotchas that I've missed?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/