Re: 2.2.18Pre Lan Performance Rocks!

From: Andi Kleen (ak@suse.de)
Date: Mon Oct 30 2000 - 03:16:15 EST


On Mon, Oct 30, 2000 at 01:04:34AM -0700, Jeff V. Merkey wrote:
> > >
> > > When you say it reloads it's VM, you mean it reloads the CR3 register?
> >
> > Yes.
> >
> > No. In 2.4 you could probably use the on demand lazy vm mechanism ingo described for
> > the nfsd processes. In 2.2 it is a bit more tricky, if I remember right lazy mm needed
> > quite a few changes.
> >
> > But before doing too many changes i would first verify if that is really the problem.
>
> We will never beat NetWare on scaling if this is the case, even in 2.4.
> Andre and my first job will be to create an arch port with MANOS that
> disables this and restructures the VM.

I just guess the end result will be as crash prone as Netware when you install any third
party software ;)

lazy mm is probably a better path, as long as you stay in kernel threads and a single user mm
it'll never switch VMs.

>
> > PTEs are read for aging on memory pressure in vmscan.
> >
> > segment register reload happen on interrupt entry, to load the kernel cs/ds (are they that
> > costly?). If it was really costly you could probably check in interrupt entry if you're
> > already running in kernel space and skip it.
>
> They are. segment register reloads will trigger the following:
>
> IDT table atomic fetch to verify (LOCK#) (if triggered by task gate from INTR)
> GDT table atomic fetch to verify (LOCK#)
> LDT table atomic fetch to verify (LOCK#) (if present)
> PDE table atomic fetch to verify (LOCK#)
>
> The process has to verify that the loaded segment descriptor is valid, and
> it will fetch from all these tables to do it, with up to 4 (LOCK#)
> assertions occurring invisibly in the hardware underneath (which will
> generate 4 non-cacheable memory references, in addition to wrecking
> havoc on the affected L1/L2 cache lines). Oink. It only does this
> when you load one, not when you save one, like pushing it on the
> stack. Since you look at MANOS code, you'll note that in
> CONTEXT.386, I do and add esp, 3 * 4 instead of poppping the segment
> registers off the stack if they are in the kernel address space.
>
> Linux should do the same, if possible as an optimization.

Interesting. You could do easily the same in Linux by changing (in 2.2) the SAVE_ALL macro
in arch/i386/kernel/irq.h and doing the same in ret_from_intr's RESTORE_ALL macro (after
changing it to a RESTORE_ALL_INT or you'll break system calls)

Actually I don't even know why irq.h's SAVE_ALL even loads __KERNEL_CS, it should be already
set by the interrupt gate. __KERNEL_DS probably needs to be still loaded, but only when
you came from user space.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:25 EST