Re: [PATCH]: Handling spurious page fault for hugetlb region for2.6.14-rc4-git5

From: Rohit Seth
Date: Wed Oct 19 2005 - 13:40:40 EST

Next message: Kyle Moffett: "Re: large files unnecessary trashing filesystem cache?"
Previous message: john stultz: "Re: Ktimer / -rt9 (+custom) monotonic_clock going backwards."
In reply to: Hugh Dickins: "Re: [PATCH]: Handling spurious page fault for hugetlb region for2.6.14-rc4-git5"
Next in thread: Linus Torvalds: "Re: [PATCH]: Handling spurious page fault for hugetlb region for2.6.14-rc4-git5"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 2005-10-19 at 16:23 +0100, Hugh Dickins wrote:

> I thought that the CPU never caches !present entries in the TLB?
> Or is that true of i386 (and x86_64), but untrue of ia64?

IA-64 can prefetch any entry from VHPT (last level page table)
irrespective of its value. You are right that i386 and x86_64 does not
cache !present entry. Though OS is suppose to handle those faults if
happen.

> Or do you have some new model or errata on some CPU where it's true?

No errata here.

> Or, final ghastly possibility ;), am I simply altogether wrong?
>

You are asking the right questions here.

> > Meaning, unless this entry is purged or displaced, for virtual address V
>
> When you say "purged", is that what we elsewhere call "flushed"
> in relation to the TLB, or something else?
>

I should use flush to be consistent.

> > CPU will generate the page fault (as the P bit is not set and assuming
> > this fault has the highest precedence).
> >
> > Kernel updates the *pte so that it now maps the hugepage at virtual
> > address V to physical address P.
> >
> > Later when the user process make a reference to V, because of stale TLB
> > entry, the processor gets PAGE_FAULT.
>
> You seem to be saying that strictly, we ought to flush TLB even when we
> make a page present where none was before, but that the likelihood of it
> being needed is so low, and the overhead of TLB flush so high, and the
> existing code almost everywhere recovering safely from this condition,
> that the most effective thing to do is just fix up the hugetlb case.
> Is that correct?
>

Yes. At least for the architectures that can cache any translation in
its TLB. IA-64 is again a good example here. It flushes the entry only
at the fault time so that next time around you get the updated entry
(for the cases where the fault happened because of any stale TLB).

> > > Has this problem been observed in testing?
> >
> > Yes. On IA-64.
>
> But not on i386 or x86_64.
>

No.

> Same series of doubts as with !present entries in the TLB; but after
> looking at the ia64 fault handler, that does seem to have stuff about
> speculative loads, so I'm guessing i386 and x86_64 prefetch does not
> cause faults (modulo errata), but ia64 does.
>

Those speculative loads (are more of advanced loads generated by
compiler in anticipation that they will be helpful) on IA-64 are
different from prefetches that HW does for TLBs.

HW Speculative loads never generates any fault.

Whereas prefetched TLB entries in i386, x86_64 or IA-64 can cause fault
if they are not flushed after updates.

> Once I started to understand this thread, I thought you were quite
> wrong to be changing hugetlb fault handling, thought I'd find several
> other places which would need fixing too e.g. kmap_atomic, remap_pfn_range.
>
> But no, I've found no others. Either miraculously, or by good design,
> all the kernel misfaults should be seamlessly handled by the lazy vmalloc
> path (on i386 anyway: I don't know what happens for ia64 there), and the
> userspace misfaults by handle_pte_fault's pte_present check. I think.
>

Good OS design :-) Though on IA-64 there was recently a similar issue
for vmalloc area that got fixed in low level arch specific code.

-rohit

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Kyle Moffett: "Re: large files unnecessary trashing filesystem cache?"
Previous message: john stultz: "Re: Ktimer / -rt9 (+custom) monotonic_clock going backwards."
In reply to: Hugh Dickins: "Re: [PATCH]: Handling spurious page fault for hugetlb region for2.6.14-rc4-git5"
Next in thread: Linus Torvalds: "Re: [PATCH]: Handling spurious page fault for hugetlb region for2.6.14-rc4-git5"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]