Re: e1000 performance hack for ppc64 (Power4)

From: Anton Blanchard (
Date: Fri Jun 13 2003 - 18:15:01 EST

> 2-byte alignment is bad for ppc64, so what is minimum alignment that is
> good? (I hope it's not 2K!) What could you do at a higher level to
> present properly aligned buffers to the driver?

Best case 128 bytes - ppc64 cacheline size. Worst case 512 bytes - our
pci-pci bridge likes to prefetch in 512 byte increments. From Herman's
data you can see this in action:

Linux sending one 1514 byte packet (777 Mbits sec rate)

      Address PCI command Bytes xfered
      420A8 MRM 344
      42200 MRM 512
      42400 MRM 512
      42600 MRL 128
      42680 MR 18

With the buffer 512 byte aligned this is completed in 3 transactions.
Yes the hardware could be more intelligent about these unaligned
transactions but we cant do much about that now.

We might be able to do the alignment at a higher level but its not
straightforward (see my previous mail).

> If I'm understanding the patch correctly, you're saying unaligned DMA +
> TCE lookup is more expensive than a data copy? If we copy the data, we
> loss some of the benefits of TSO and Zerocopy and h/w checksum
> offloading! What is more expensive, unaligned DMA or TCE?

Lets ignore TCE lookup overhead for the moment. As Herman pointed all
these DMAs should occur on the same page.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Sun Jun 15 2003 - 22:00:38 EST