Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch

From: Jamie Lokier
Date: Wed Oct 01 2003 - 02:21:29 EST


Andi Kleen wrote:
> Andrew Morton <akpm@xxxxxxxx> writes:
>
> > Looking at Andi's patch, it is also a dead box if the fault happens inside
> > down_write(mmap_sem). That should be fixed, methinks.
>
> The only way to fix all that would be to move the instruction checks early
> into the fast path.
>
> [On a P4 the overhead is 3.7268 vs 3.6594 microseconds for a fault that
> doesn't hit as measured by lmbench2's lat_sig. This was before the latest
> changes which added more checking, so the overhead is probably bigger now]

Hi Andi!

I think the mmap_sem problems are fixed by an appropriate "address >=
TASK_SIZE" check at the beginning do_page_fault, which should jump
straight to bad_area_nosemaphore. As there is already such a check,
there's no penalty for rearranging the logic there, and it will in
fact speed up kernel faults slightly by avoiding the mmap_sem and
find_vma() which are redundant for kernel faults.

I have some ideas for speeding up __is_prefetch(). First, take the
get_segment_eip() stuff from my patch - that should speed up your
latest "more checking" slightly, because it replaces the access_ok()
checks with something slightly tighter.

Second, instead of masking and a switch statement, do test_bit on a
256-bit vector. (I admit I'm not quite sure how fast variable
test_bit is; this is assuming it is quite fast). If it's zero, return
0 from __is_prefetch(). If it's one, it's either a prefix byte to
skip or 0x0f, to check the next byte for a prefetch. That'll probably
make the code smaller too, because the vector is only 32 bytes,
shorter than the logic in the switch().

Third, once you have taken the "limit" variable idea from my patch,
clamp that at "instr + 14", and so remove the small amount of loop
setup and iteration overhead which is there to restrict the
instruction to 15 bytes. Build it in to the other limit check.

Fourth, without a switch statement that's no need for a scan_more
variable. Setting a boolean variable in one place to test it on
another is still surprisingly poor with GCC - it doesn't remove the
variable and turn the code into direct jumps as you'd think, much of
the time. Just test the bit vector and if it's zero, return 0,
otherwise carry on to check if the byte is 0x0f, and if not, loop.

Fifth, the "if (regs->eip == addr)" check - is it helpful on 32-bit?

Thoughts for the day,
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/