Re: MMX based IP-checksumming patch, 2.1.105, RFC

Adam J. Richter (adam@yggdrasil.com)
Wed, 17 Jun 1998 16:43:08 -0700


In article <Pine.LNX.3.96.980613043429.569A-100000@hal.cobaltmicro.com> you write:
>
>the attached patch implements MMX-based checksumming for csum_partial().
>It's functional but not completed yet. I'm posting it here because maybe
>someone out there has ideas how to prevent the quite expensive FPU_SAVE /
>RESTORE operations somehow ... the routine itself basically clobbers only
>2 MMX registers.
[...]

I don't know if there are any 387 FPU instructions that would
save just those two registers and I cannot find my 486 manual at this
moment, but here is one approach that would work in conjunction with
the lazy FPU restore:

At the beginning of csum_partial do:

if (current->flags & FP_USEDFPU) {
__asm__ __volatile__("fnsave %0":"=m"(current->tss.i387.hard));
current->flags &= ~FP_USEDFPU;
} else {
clts(); /* Allow FPU operations. Is this necessary? */
}

At the end of csum_partial do:

stts(); /* Trap next FPU operation if there is one, to
restore current->tss.i387.hard before it
executes.*/

This way, the FPU restore is only executed if the currently
running process decides to execute a floating point instruction in the
current CPU time slice. Otherwise, even the time taken by the FPU
save is reclaimed, because it avoids the FPU save when the currently
running process is switched out.

In stock the stock kernels (at least up to 2.1.106), this
optimization must be bracketed in "#ifndef __SMP__". However, I have
patches that implement lazy FPU restore (albiet with non-lazy FPU
save) for SMP, which should make this code work as I described in
all cases.

I am a little surprised that your code works and is not
bracketed by clts() (to allow floating point instructions) and stts()
(to trap floating point instructions). You must be running on a
kernel compiled with SMP enabled (and without the SMP lazy FPU restore
patch that I posted on linux-kernel).

By the way, is your code really 686 specific or is it
compatible and beneficial to all MMX processors? If the latter, it
would be better to check for MMX (I think this can be done somehow by
examining current_cpu_data.x86_capability). This would also allow for
a single kernel binary that can run on 386's, but gets the MMX
optimizations when running on an MMX processor.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 205
adam@yggdrasil.com \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu