Re: rseq/arm32: choosing rseq code signature

From: Mathieu Desnoyers
Date: Wed Apr 17 2019 - 11:30:42 EST


----- On Apr 17, 2019, at 10:43 AM, Mathieu Desnoyers mathieu.desnoyers@xxxxxxxxxxxx wrote:

> ----- On Apr 17, 2019, at 6:37 AM, richard earnshaw Richard.Earnshaw@xxxxxxx
> wrote:
>
>> On 16/04/2019 14:39, Mathieu Desnoyers wrote:
>>> ----- On Apr 15, 2019, at 9:37 AM, Mathieu Desnoyers
>>> mathieu.desnoyers@xxxxxxxxxxxx wrote:
>>>
>>>> ----- On Apr 15, 2019, at 9:30 AM, peter maydell peter.maydell@xxxxxxxxxx wrote:
>>>>
>>>>> On Mon, 15 Apr 2019 at 14:11, Mathieu Desnoyers
>>>>> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> ----- On Apr 11, 2019, at 3:55 PM, peter maydell peter.maydell@xxxxxxxxxx wrote:
>>>>>>
>>>>>>> On Thu, 11 Apr 2019 at 18:51, Mathieu Desnoyers
>>>>>>> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>>>>>>>> * This translates to the following instruction pattern in the T16 instruction
>>>>>>>> * set:
>>>>>>>> *
>>>>>>>> * little endian:
>>>>>>>> * def3 udf #243 ; 0xf3
>>>>>>>> * e7f5 b.n <7f5>
>>>>>>>> *
>>>>>>>> * big endian:
>>>>>>>> * e7f5 b.n <7f5>
>>>>>>>> * def3 udf #243 ; 0xf3
>>>>>>>
>>>>>>> Do we really care about big-endian instruction-ordering for Thumb?
>>>>>>> It requires (AIUI) either an ARMv7R CPU which implements and sets
>>>>>>> SCTLR.IE to 1, or a v6-or-earlier CPU using BE32, and it's going to
>>>>>>> be even rarer than normal BE8 big-endian...
>>>>>>
>>>>>> I don't think we care enough about it to look for a trick to
>>>>>> turn the branch into something else (which would not branch away from the
>>>>>> udf instruction), but considering this signature will be ABI, it's good to
>>>>>> be thorough documentation-wise and cover all existing cases.
>>>>>
>>>>> I think if you want to document it it would be helpful to
>>>>> readers to make it clear that this is the ultra-rare
>>>>> big-endian-instruction-order "big endian Thumb", not the only
>>>>> moderately-rare little-endian-instructions-big-endian-data
>>>>> "big endian Thumb".
>>>>
>>>> I'm actually very much concerned about environments with big endian
>>>> data and little endian code. Which gcc compiler flags do I need to
>>>> use to test it ?
>>>>
>>>> I'm concerned about a signature mismatch between what is passed to
>>>> the rseq system call ("data-endian signature") and what is generated
>>>> in the code ("instruction-endian signature").
>>>
>>> Based on this page:
>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/CDFBBCHB.html
>>>
>>> My understanding is that the situation is as follows (please confirm):
>>>
>>> - Prior to ARMv6, you could build and run code that is either big or little
>>> endian,
>>> given you had a matching Linux kernel endianness. Code and data endianness
>>> needed
>>> to match,
>>> - Starting from ARMv6, only little endian code is supported. The endianness for
>>> data
>>> access can be changed through bit [9], the E bit, of the Program Status
>>> Register,
>>> (mixed endianness)
>>>
>>> Looking at ARM build options for gcc, it seems you can select either big or
>>> little
>>> endian (-mbig-endian or -mlittle-endian (default)) which affects both
>>> instruction and
>>> data endianness. So I suspect the -mbig-endian option is really only useful for
>>> pre-ARMv6.
>>
>> -mbig-endian is still correct, even on later architectures. The linker
>> gets involved, however, and (using the mapping symbol information) swaps
>> the code segments to little-endian form (this is why you have to use
>> .inst rather than .word when inserting instructions, so that the correct
>> mapping symbols are inserted).
>
> So what you're saying is that if I have:
>
> void main()
> {
> asm volatile (
> ".arm\n\t"
> ".inst 0xe7f5def3\n\t"
> ".long 0xe7f5def3\n\t");
> }
>
> and compile it with:
>
> arm-linux-gnueabihf-gcc -mbig-endian -march=armv6 -c -o arm-big-endianv6.o
> arm-test-endian.c
>
> It's expected that the generated .o will have big endian instructions, matching
> the endianness of the data, e.g.:
>
> hexdump arm-big-endianv6.o
>
> [...]
> 0000030 0a00 0900 80b5 00af f5e7 f3de f5e7 f3de
>
> But it's then at the linking stage that the linker will
> reverse the endianness of the ".inst" (but not .long).
>
> Let's see:
>
> arm-linux-gnueabihf-gcc -nodefaultlibs -nostdlib -mbig-endian -march=armv6 -o
> arm-big-endianv6 arm-big-endianv6.o
> /usr/lib/gcc-cross/arm-linux-gnueabihf/7/../../../../arm-linux-gnueabihf/bin/ld:
> warning: cannot find entry symbol _start; defaulting to 00000000000001b0
>
> hexdump gives me:
> [...]
> 00001b0 80b5 00af f5e7 f3de f5e7 f3de c046 bd46
>
> So it has not reversed the instruction endianness.
>
> What am I doing wrong ?

It seems to be specific to using armv6 and armv7* with gcc 7.
gcc 8 seems to indeed reverse the code vs data endianness.

So we need to figure out whether .inst is the right things to
do to declare a signature, or if it's better to use ".long"
which would probably generate an invalid instruction on BE...

Thanks,

Mathieu

>
> I'm using:
>
> gcc version 7.3.0 (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04)
> GNU ld (GNU Binutils for Ubuntu) 2.30
>
> Thanks,
>
> Mathieu
>
>>
>>>
>>> For ARMv6+ mixed-endianness, it seems to be a mode that temporarily swap
>>> endianness
>>> of load/store instructions for specific memory accesses communicating with DMA
>>> devices,
>>> so I don't see any scenario where we can generate a binary that has little
>>> endian code
>>> and big endian data. If that is true, then it should be fine to declare the
>>> signature
>>> with ".arm .inst" and expect the data endianness to be the same as code
>>> endianness.
>>>
>>> Am I missing something ?
>>>
>>> Thanks,
>>>
>>> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com