RE: [PATCH] add slice by 8 algorithm to crc32.c

From: Bob Pearson
Date: Fri Aug 05 2011 - 11:51:24 EST

Next message: Christoph Lameter: "Re: kernel BUG at mm/vmscan.c:1114"
Previous message: Stephen Warren: "RE: [RFC PATCH 0/3] If an IRQ is a GPIO, request and configure it"
In reply to: Joakim Tjernlund: "RE: [PATCH] add slice by 8 algorithm to crc32.c"
Next in thread: Bob Pearson: "RE: [PATCH] add slice by 8 algorithm to crc32.c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> > version. While I haven't done the experiment you suggest there is
> something
> > to the point that the second q computation in the new version can be
> moved
> > ahead of the table lookups from the first q computation . My guess is
that
> > the unrolled version will be significantly slower.
>
> Ah, didn't see that. Don't understand how this works though.
> Why do you do two 32 bit loads instead of one 64 bit load?
>
> >

The two expression trees can be computed in parallel and combined with the
final xor. If the compiler/instruction scheduler are smart enough and can
process enough instructions per cycle they overlap well and you get some
speedup. I did try a 64 bit load on Nehalem but got about 2 cycles per byte
which is a little worse than doing two loads and better than the 32 bit
version. I'm not really sure why.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Christoph Lameter: "Re: kernel BUG at mm/vmscan.c:1114"
Previous message: Stephen Warren: "RE: [RFC PATCH 0/3] If an IRQ is a GPIO, request and configure it"
In reply to: Joakim Tjernlund: "RE: [PATCH] add slice by 8 algorithm to crc32.c"
Next in thread: Bob Pearson: "RE: [PATCH] add slice by 8 algorithm to crc32.c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]