Re: pud_bad vs pud_bad

From: Ingo Molnar
Date: Thu Feb 05 2009 - 19:50:44 EST



* Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:

> Ingo Molnar wrote:
>> just the act of using PAE was measured to cause multi-percent slowdown
>> in fork() and exec() latencies, etc. The pagetables are twice as large
>> so is that really surprising?
>>
>
> Is there a similar slowdown running the CPU in 32 vs 64 bit mode? Or does
> having more/wider registers mitigate it?

Yes, of course there's a slowdown on 64-bit kernels in fork() performance,
mainly related to pte size.

Here's some numbers done with perfstat. The "fork" binary forks 256 times,
waits for the child tasks and then exits. It is a 32-bit binary, statically
linked - i.e. very similar layout and function on both 32-bit and 64-bit
kernels.

The results (tabulated a bit, average result of 20 runs):

$ perfstat -e -3,-4,-5 ./fork

Performance counter stats for './fork':

32-bit 32-bit-PAE 64-bit
--------- ---------- ---------
27.367537 30.660090 31.542003 task clock ticks (msecs)

5785 5810 5751 pagefaults (events)
389 388 388 context switches (events)
4 4 4 CPU migrations (events)
--------- ---------- ---------
+12.0% +15.2% overhead

So PAE is 12.0% slower (the overhead of double the pte size and three page
table levels), and 64-bit is 15.2% slower (the extra overhead of having four
page table levels added to the overhead of double the pte size).

Larger ptes do not come for free and the 64-bit instructions do not mitigate
the cachemiss overhead and memory bandwidth cost.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/