Re: readX/writeX semantic and ordering

Gabriel Paubert (paubert@iram.es)
Tue, 14 Dec 1999 12:37:42 +0100 (MET)


On Mon, 13 Dec 1999, Gerard Roudier wrote:

> > Gerard> Can somebody elaborate, especially about readX/writeX
> > Gerard> implementation for PPC. Thanks.
> >
> > This was discussed some months ago here and the consensus is that
> > readl/writel are supposed to guarantee ordering just as they do on the
> > x86. However on the Alpha they didn't use to and this was only updated
> > somewhere during 2.3.x, 2.2.x has the old behavior. The PPC macros on
> > the other hand does include the ordering.
> >
> > With the new interface you also have the option of using
> > __raw_writel/__raw_readl to get non ordered, native byte order access
> > to the PCI shared memory. Which allows you to do optimize the access
> > better in case it makes sense for y our device and you know what you
> > are doing.
>
> The PPC headers seem to me unclear about CPU STOREs to memory being
> carried out from CPU to system BUS prior to the IO/MMIO being performed.
>
> On this example
>
> extern inline void out_le32(volatile unsigned *addr, int val)
> {
> __asm__ __volatile__("stwbrx %1,0,%2; eieio" : "=m" (*addr) :
> "r" (val), "r" (addr));
> }
>
> I must be sure that 'stwbrx', in this context, does carry out buffered CPU
> STOREs to system bus prior to the IO being issued, in order to assume
> IA32-like ordering. You may let me know if this is true. Thanks.

Cachable CPU stores (whether buffered due to a cache miss or not) will not
be carried out to system bus, they will only be put into the cache. The
architecture does not guarantee anything about the ordering in theory,
but in all current implementations the only effect is that loads may be
moved ahead of pending stores and stores may be combined (but not on
guarded memory which happens to be the case of areas returned by ioremap).

Are the previous buffered stores to cachable memory or not ? Tthis may
make a difference AFAIK.

> My concern is not to criticize or to change the current writeX/readX
> semantic, but to be aware of how they are supposed to deal with ordering.
> I am able to add any memory barrier that is needed, if I can figure out
> the remaining ordering issues drivers have to be careful about.

Well, I consider the current state not completely satifactory and I have
an example in a driver in which I have to write a few PCI device registers
to control a DMA operation and then wite to one register to trigger the
operation:

- if I use only writel, one extra eieio instruction is needlessly added
after each writel while I don't care in which order the registers are set.

- if I use __raw_writel I have to byte swap with cpu_to_le32 which
generates more code. I then have to add iobarrier_w() just before the last
writel but this is normal.

So in the end I defined my own macros to access the registers of this
device: sometimes I don't care about reordering/merging for 5 or so
consecutive register writes which are a mixture of big and little endian
accesses. Without considering code bloat, eieio is expensive on some
processors since it goes to the bus to tell the bridge not to perform
reordering.

> By the way, I tried to find the PPC manuals on the Net and didn't find any
> pointer to such documentations. If some exists, I am very interested in
> knowing of them.

It sems they have disappeared from Motorola's web sites. I can send you
the pdf (it's called programming environment manual or PEM for short)
files if you are interested.

Gabriel.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/