Re: x86 ptep_get_and_clear question

From: Manfred Spraul (
Date: Thu Feb 15 2001 - 16:26:50 EST

Linus Torvalds wrote:
> In article <>,
> Jamie Lokier <> wrote:
> >> > << lock;
> >> > read pte
> >> > if (!present(pte))
> >> > do_page_fault();
> >> > pte |= dirty
> >> > write pte.
> >> > >> end lock;
> >>
> >> No, it is a little more complicated. You also have to include in the
> >> tlb state into this algorithm. Since that is what we are talking about.
> >> Specifically, what does the processor do when it has a tlb entry allowing
> >> RW, the processor has only done reads using the translation, and the
> >> in-memory pte is clear?
> >
> >Yes (no to the no): Manfred's pseudo-code is exactly the question you're
> >asking. Because when the TLB entry is non-dirty and you do a write, we
> >_know_ the processor will do a locked memory cycle to update the dirty
> >bit. A locked memory cycle implies read-modify-write, not "write TLB
> >entry + dirty" (which would be a plain write) or anything like that.
> >
> >Given you know it's a locked cycle, the only sensible design from Intel
> >is going to be one of Manfred's scenarios.
> Not necessarily, and this is NOT guaranteed by the docs I've seen.
> It _could_ be that the TLB data actually also contains the pointer to
> the place where it was fetched, and a "mark dirty" becomes
> read *ptr locked
> val |= D
> write *ptr unlock

Jamie wrote "one of my scenarios", that's the other option ;-)

> Now, I will agree that I suspect most x86 _implementations_ will not do
> this. TLB's are too timing-critical, and nobody tends to want to make
> them bigger than necessary - so saving off the source address is
> unlikely. Also, setting the D bit is not a very common operation, so
> it's easy enough to say that an internal D-bit-fault will just cause a
> TLB re-load, where the TLB re-load just sets the A and D bits as it
> fetches the entry (and then page fault handling is an automatic result
> of the reload).

But then the cpu would support setting the D bit in the page directory,
but it doesn't.

Probably Kanoj is right, the current code is not guaranteed by the

But if we change the interface, could we think about the poor s390

s390 only has a "clear the present bit in the pte and flush the tlb"

>From your other post:
> pte = ptep_get_and_clear(page_table);
> flush_tlb_page(vma, address);
>+ pte = ptep_update_after_flush(page_table, pte);

What about one arch specific

        pte = ptep_get_and_invalidate(vma, address, page_table);

On i386 SMP it would

        pte = *page_table_entry;
                return pte;
        lock; andl 0xfffffffe, *page_table_entry;
        return *page_table_entry | 1;

> The "gather" operation could possibly be improved to make the other
> CPU's do useful work while being shot down (ie schedule away to another
> mm), but that has it's own pitfalls too.
IMHO scheduling away is the best long term solution.
Perhaps try to schedule away, just to improve the probability that
mm->cpu_vm_mask is clear.

I just benchmarked a single flush_tlb_page().

Pentium II 350: ~ 2000 cpu ticks.
Pentium III 850: ~ 3000 cpu ticks.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Thu Feb 15 2001 - 21:00:27 EST