Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

From: Måns Rullgård
Date: Thu Nov 19 2015 - 11:46:35 EST


Nicolas Pitre <nicolas.pitre@xxxxxxxxxx> writes:

> On Thu, 19 Nov 2015, Måns Rullgård wrote:
>
>> Nicolas Pitre <nicolas.pitre@xxxxxxxxxx> writes:
>>
>> > +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias)
>> > +{
>> > + unsigned long long res;
>> > + unsigned int tmp = 0;
>> > +
>> > + if (!bias) {
>> > + asm ( "umull %Q0, %R0, %Q1, %Q2\n\t"
>> > + "mov %Q0, #0"
>> > + : "=&r" (res)
>> > + : "r" (m), "r" (n)
>> > + : "cc");
>> > + } else if (!(m & ((1ULL << 63) | (1ULL << 31)))) {
>> > + res = m;
>> > + asm ( "umlal %Q0, %R0, %Q1, %Q2\n\t"
>> > + "mov %Q0, #0"
>> > + : "+&r" (res)
>> > + : "r" (m), "r" (n)
>> > + : "cc");
>> > + } else {
>> > + asm ( "umull %Q0, %R0, %Q2, %Q3\n\t"
>> > + "cmn %Q0, %Q2\n\t"
>> > + "adcs %R0, %R0, %R2\n\t"
>> > + "adc %Q0, %1, #0"
>> > + : "=&r" (res), "+&r" (tmp)
>> > + : "r" (m), "r" (n)
>>
>> Why is tmp using a +r constraint here? The register is not written, so
>> using an input-only operand could/should result in better code. That is
>> also what the old code did.
>
> No, it is worse. gcc allocates two registers because, somehow, it
> doesn't think that the first one still holds zero after the first usage.
> This way usage of only one temporary register is forced throughout,
> producing better code.

Makes sense. Thanks for explaining.

--
Måns Rullgård
mans@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/