Re: [PATCH] poly1305: generic C can be faster on chips with slow unaligned access

From: Jason A. Donenfeld
Date: Mon Nov 07 2016 - 13:23:25 EST


On Mon, Nov 7, 2016 at 7:08 PM, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
> Hmm... The general data flow that strikes me as most pertinent is
> something like:
>
> struct sk_buff *skb = get_it_from_somewhere();
> skb = skb_share_check(skb, GFP_ATOMIC);
> num_frags = skb_cow_data(skb, ..., ...);
> struct scatterlist sg[num_frags];
> sg_init_table(sg, num_frags);
> skb_to_sgvec(skb, sg, ..., ...);
> blkcipher_walk_init(&walk, sg, sg, len);
> blkcipher_walk_virt_block(&desc, &walk, BLOCK_SIZE);
> while (walk.nbytes >= BLOCK_SIZE) {
> size_t chunk_len = rounddown(walk.nbytes, BLOCK_SIZE);
> poly1305_update(&poly1305_state, walk.src.virt.addr, chunk_len);
> blkcipher_walk_done(&desc, &walk, walk.nbytes % BLOCK_SIZE);
> }
> if (walk.nbytes) {
> poly1305_update(&poly1305_state, walk.src.virt.addr, walk.nbytes);
> blkcipher_walk_done(&desc, &walk, 0);
> }
>
> Is your suggestion that that in the final if block, walk.src.virt.addr
> might be unaligned? Like in the case of the last fragment being 67
> bytes long?

In fact, I'm not so sure this happens here. In the while loop, each
new walk.src.virt.addr will be aligned to BLOCK_SIZE or be aligned by
virtue of being at the start of a new page. In the subsequent if
block, walk.src.virt.addr will either be
some_aligned_address+BLOCK_SIZE, which will be aligned, or it will be
a start of a new page, which will be aligned.

So what did you have in mind exactly?

I don't think anybody is running code like:

for (size_t i = 0; i < len; i += 17)
poly1305_update(&poly, &buffer[i], 17);

(And if so, those consumers should be fixed.)