Re: [benchmark] 1% performance overhead of paravirt_ops on nativekernels

From: Linus Torvalds
Date: Tue Jun 09 2009 - 12:02:26 EST




On Tue, 9 Jun 2009, Nick Piggin wrote:
>
> If it's such a problem, it could be made a lot faster without too
> much problem. You could just introduce a FIFO of ptes behind it
> and flush them all in one go. 4K worth of ptes per CPU might
> hopefully bring your overhead down to < 1%.

We already have that. The regular kmap() does that. It's just not usable
in atomic context.

We'd need to fix the locking: right now kmap_high() uses non-irq-safe
locks, and it does that whole cross-cpu flushing thing (which is why
those locks _have_ to be non-irq-safe.

The way to fix that, though, would be to never do any cross-cpu calls, and
instead just have a cpumask saying "you need to flush before you do
anything with kmap". So you'd just set that cpumask inside the lock, and
if/when some other CPU does a kmap, they'd flush their local TLB at _that_
point instead of having to have an IPI call.

If we can get rid of kmap_atomic(), I'd already like HIGHMEM more. Right
now I absolutely _hate_ all the different "levels" of kmap_atomic() and
having to be careful about crazy nesting rules etc.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/