Re: Direct access to hardware

From: Richard B. Johnson (root@chaos.analogic.com)
Date: Wed Jul 26 2000 - 09:58:17 EST


On Wed, 26 Jul 2000, Horst von Brand wrote:

> James Sutherland <jas88@cam.ac.uk> said:
> > On 25 Jul 2000, Krzysztof Halasa wrote:
>
> [...]
>
> > > So what does the kernel (can) do to prevent this problem on defective
> > > pentiums?
>
> > Trap the defective instructions, and implement a replacement instruction
> > in software.
>
> How so? The kernel doesn't look at each instruction before it is
> executed... it does clean up _after_ the fact, or arranges so that the bug
> can be caught when it tries to bite (f00f case). This is _very_ different
> from exhaustively checking beforehand (as is being adovocated here), and it
> is done in cases where the fix is cheap (luckily).
> --
 
Doc, It seems that many don't know how it works. I will amplify just a
bit.

If we have a defective Pentium with a bad floating-point unit, that
gives bad answers, It just can't be fixed in kernel software because
every floating-point instruction would have to be checked and somehow, if
the bad sequence was found, replaced with work-around code. However,
such a bad processor can still be used if the application software made
certain that the sequence that didn't work was never executed. This
could be done by modifying the code-generation of a compiler, and
recompiling the application software. The result would have little impact
upon software performance since you just make sure that the bad sequence
never exists.

It the problem cited, the actual FP errors could have been fixed by
performing an initial multiplication by a constant, doing the division,
then dividing by the same constant. This gets us out of the range of the
missing entries in the internal FP matrix. However, Intel decided to
replace the defective CPUs even though work-arounds existed. The FP
Unit error would not allow a rogue user to crash a machine. It just
produced wrong answers under certain circumstances.

Then we get to a fatal error such as the F00F bug. This involves
a locked instruction with an illegal operand. Application software
would never generate such an illegal instruction sequence unless
the purpose of the software was to crash the CPU. Once the sequence
was known, every hacker with a shell account on an Intel machine had
the capability of crashing the machine.

Intel either had to replace the chip or come up with a workaround.
They came up with a workaround that has zero impact upon the machine
performance. Of course, the implementation details were left up to
the designers of Linux and BSD. The first fix was applied to BSD.
A more workman-like fix was applied to Linux, then both systems
revised their fixes by sharing information. The result is that, unless
you actually execute the bad code sequence, the CPU runs at full speed.
There is no "filtering" of the instruction code sequence. No code is
"checking" anything.

What was done was to mark the table for the 'illegal opcode' trap
as 'page not present'. This was just a bit in a descriptor. Then,
and only then, if the F00F instruction sequence occurred, the
page-fault handler could inspect the code. In fact, since it was
an illegal opcode anyway, the handler for the illegal opcode was
simply moved into the page-fault handler and no "inspection" was
necessary. You just seg-fault the task if it executes a bad opcode.
You don't have to inspect anything.

The result was a software fix, actually handled in hardware. It
is a good example of hardware and software working together to
find a good solution to a potentially bad problem.

Cheers,
Dick Johnson

Penguin : Linux version 2.2.15 on an i686 machine (797.90 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jul 31 2000 - 21:00:21 EST