Re: page fault scalability patch V11 [0/7]: overview

From: William Lee Irwin III
Date: Fri Nov 19 2004 - 23:30:37 EST


On Fri, Nov 19, 2004 at 09:33:12PM -0600, Robin Holt wrote:
> Agree, we are currently using atomic ops on a global rss on our 2.4
> kernel with 512cpu systems and not seeing much cacheline contention.
> I don't remember how little it ended up being, but it was very little.
> We had gone to dropping the page_table_lock and only reaquiring it if
> the pte was non-null when we went to insert our new one. I think that
> was how we had it working. I would have to wake up and actually look
> at that code as it was many months ago that Ray Bryant did that work.
> We did make rss atomic. Most of the contention is sorted out by the
> mmap_sem. Processes acquiring themselves off of mmap_sem were found
> to have spaced themselves out enough that they were all approximately
> equal time from doing their atomic_add and therefore had very little
> contention for the cacheline. At least it was not enough that we could
> measure it as significant.

Also, the densely-packed split counter can only get 4-16 cpus to a
cacheline with cachelines <= 128B, so there are definite limitations to
the amount of cacheline contention in such schemes.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/