Re: [patch, 2.1.124] Re: BogoMIPS

Richard B. Johnson (root@chaos.analogic.com)
Wed, 7 Oct 1998 08:28:27 -0400 (EDT)


On Tue, 6 Oct 1998, MOLNAR Ingo wrote:

>
> On Tue, 6 Oct 1998, Richard B. Johnson wrote:
>
> > > Sorry, it is not. Someone shown timings of empty loop at they were
> > > dependend on placement of code in page. (It gives plenty of sense if
> > > code happens to cross page boundary.)
> >
> > It does not make any sense. That kind is situation is the primary
> > reason for the instruction-queue (which includes the 32-byte code-
> > queue + the prefetcher + the external cache). When register-to-
> > register looping operations occur, everything will have already been
> > fetched. It is only when memory oprands occur, or you run off the
> > cache-line with new code, that any new code fetches occur.
>
> depending on how smart the TLB code is, it _can_ make alot of sense. It's
> in the L1 cache, but it's still physically indexed, while the CPU needs a
> linear address. So we go through the TLB phase no matter how hard it is
> already cached. And it's naive to think that in such a scenario, going
> through page boundaries is a simple matter in the CPU, especially for
> typically streamlike memory access pattern, like code fetching.
>

It is not that complicated. All we need to do is make sure that no
new fetches are occurring while the loop is executing. You do this
by making sure that the actual loop is at the beginning of a cache-line,
that the jump within the loop is back to the same beginning, the
loop can fit within 4-bytes of a new cache-line, and a new cache-line
fetch has occurred immediately before the loop is started. This will
keep the prefetch logic quiet during the loop.

There are lots of ways of doing this. I showed two. I also wanted to
make sure that the code could be tested in user-mode.

> anyway, i'd be curious wether the attached patch shows any effect on
> machines that are affected by BogoMIPS irregularities. We cannot .align
> PAGE_SIZE due to ld limitations, but we can pull another trick to force
> __delay on a page boundary. (the current version wastes about 2000 bytes

[SNIPPED patch]

I have tested your patch and it works. However, so did my two previous
ones. Your patch accomplishes the same thing (quiets the prefetch during
the loop).

Cheers,
Dick Johnson
***** FILE SYSTEM WAS MODIFIED *****
Penguin : Linux version 2.1.123 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/