RE: e1000 performance hack for ppc64 (Power4)

From: Feldman, Scott (
Date: Thu Jun 12 2003 - 20:16:11 EST

David, arch-specific code in the driver concerns me. I would really
like to avoid any arch-specific code in the driver if at all possible.
Can we find a way to move this work to the arch-dependent areas? This
doesn't seem to be an issue unique to e1000, so moving this up one level
should benefit other devices as well. More questions below.

> Peculiarities in the PCI bridges on Power4 based ppc64 machines mean
> that unaligned DMAs are horribly slow. This hits us hard on gigabit
> transfers, since the packets (starting from the MAC header) are
> usually only 2-byte aligned.

2-byte alignment is bad for ppc64, so what is minimum alignment that is
good? (I hope it's not 2K!) What could you do at a higher level to
present properly aligned buffers to the driver?

> The patch below addresses this by copying and realigning packets into
> nicely 2k aligned buffers. As well as fixing the alignment that
> minimises the number of TCE lookups, and because we allocate the
> buffers pci_alloc_consistent(), we avoid setting up and tearing down
> the TCE mappings for each packet.

If I'm understanding the patch correctly, you're saying unaligned DMA +
TCE lookup is more expensive than a data copy? If we copy the data, we
loss some of the benefits of TSO and Zerocopy and h/w checksum
offloading! What is more expensive, unaligned DMA or TCE?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Sun Jun 15 2003 - 22:00:35 EST