Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for bigendian systems

From: Lars-Peter Clausen
Date: Tue Jan 18 2011 - 17:22:38 EST

Next message: Jan Kara: "Re: [PATCH] ext2: reduce redundant check of '*options'"
Previous message: Nick Piggin: "Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)"
In reply to: Arnd Bergmann: "Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems"
Next in thread: Arnd Bergmann: "Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 01/18/2011 10:37 PM, Arnd Bergmann wrote:
> On Tuesday 18 January 2011 21:54:59 Lars-Peter Clausen wrote:
>>>
>>> Right, but the header file also serves as a template for new architectures
>>> that cannot directly use it. I would prefer not to give a possibly bad example
>>> here, especially when it's in a rarely used function.
>>
>> Maybe I'm missing something here, but if I have a big-endian architecture isn't
>> ioread{16,32}be what I should use to access iomapped memory?
>
> Most I/O devices are little-endian, even for big-endian machines, and
> should use readl or ioread. If you have big-endian SoC components,
> ioread*be is often the right choice, but that case is rather rare.

The lm32 architecture is a big-endian softcpu architecture. I'm currently working on
the MilkyMist SoC which uses it and all the SoC components have native endianess.

>
> Some architectures also define their own I/O accessors for SoC components,
> since those often have other requirements from PCI MMIO areas.
> E.g. on powerpc, the in_be32/in_le32 accessor only works on directly
> mapped MMIO regions and performs no PCI error handling.

I've seen those and actually the lm32 architecture currently defines (and uses) them
as well. But I wanted to replace them with something more generic and try to reuse as
much as possible from asm-generic.

> On ARM, the
> readl_relaxed() accessor does not synchronize with external buses.
> On x86, readl is different from ioread32 in that it cannot work on
> addresses returned from ioport_map.
> I believe some SoCs are even configurable to have little- or big-endian
> I/O, so the accessor does not do byte swapping.
>
> It might be a good idea to make all this a little more structured, but
> it's also fine if you set your own rules for a new architecture when
> it has non-PCI devices that work in other ways.
>
>>>>> The right solution is probably to use swab16/swab32 for the
>>>>> big-endian functions. This also corrects the iowrite functions
>>>>> which really should be using cpu_to_be32 instead of be32_to_cpu
>>>>> (although they are always defined to be the same afaict.
>>>>
>>>> This would first cause a conversion to little-endian, which is a swap() in the
>>>> generic case and then you would call swap() again on the result. Which is basically a
>>>> noop, but I'm not sure if compilers will detect this.
>>>
>>> The overhead of the swab() is certainly dwarfed by the long time spent in
>>> readl().
>>
>> Well at least the code size overhead is fundamental:
>
> Fair enough. You could of course make it out of line, but then you would
> no longer be able to use the generic implementation of these functions.
>
>> with #define ioread32be(addr) swap32(ioread32(addr)):
>>
>> 4001a694 <get_cycles>:
>> addi sp,sp,-16
>> sw (sp+16),r11
>> sw (sp+12),r12
>> sw (sp+8),r13
>> sw (sp+4),ra
>> mvhi r2,0x4021
>> ori r2,r2,0xa100
>> lw r1,(r2+0)
>> mvi r2,24
>> mvhi r13,0xff
>> lw r12,(r1+0)
>> mv r1,r12
>> calli 400f6f9c <__lshrsi3>
>> mv r11,r1
>> mvi r2,24
>> mv r1,r12
>> calli 400f6f6c <__ashlsi3>
>> or r11,r11,r1
>> mvi r2,8
>> andi r1,r12,0xff00
>> ...
>
> That is indeed huge. Byte swapping is a relatively common operation
> in the kernel, so independent of the solution to this particular
> problem, it will be a good idea to see if you can do a better
> implementation than this, using inline assembly or gcc internal
> helpers.

The reason why it got so huge is that the kernel was compiled without barrel-shifter
support, so we have basically 4 instructions per shift calling a helper function.

But thats not the point anyway. The point I'm trying to make is, that accessing
iomapped memory is exactly a single instruction. And I don't see why - no matter if
swab takes 4 or 20 instructions - we should add any additional overhead.
>
>> So I as someone who implements arch support has two options either redefine
>> ioread32be in the arch io header, or use __raw_readl everywhere to access iomap memory.
>
> __raw_readl is not a good thing to use, because of a number of reasons.
> Please choose one of these four:
>
> * change the common ioread*/iowrite* functions to all be based on
> the __raw_* I/O versions, not just the big-endian ones. The
> space overhead you quoted is enough of a justification for that.

I would prefer this solution.

> * change asm-generic/io.h so you can override the definitions
> with architecture specific implementations.
> * use GENERIC_IOMAP.
> * define your own bus-specific accessors that are big-endian and
> based on __raw_readl/__raw_writel.
>
> Arnd

- Lars
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jan Kara: "Re: [PATCH] ext2: reduce redundant check of '*options'"
Previous message: Nick Piggin: "Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)"
In reply to: Arnd Bergmann: "Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems"
Next in thread: Arnd Bergmann: "Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]