Re: [PATCH] x86/AMD: Fix ASM constraints in amd_clear_divider()

From: Andrew Cooper
Date: Wed Aug 09 2023 - 19:12:03 EST


On 09/08/2023 10:33 pm, Linus Torvalds wrote:
> On Wed, 9 Aug 2023 at 13:24, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> DIV writes its results into %eax and %edx, meaning that they need to be output
>> constraints too. It happens to be benign in this case as the registers don't
>> change value, but the compiler should still know.
>>
>> Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
> As mentioned earlier (html, not on list), I think it was intentional
> and this "fix" doesn't really fix anything.
>
> A comment might be good, of course, if this really bothers somebody.
>
> That said, if the code wanted to be *really* clever, it could have done
>
> asm volatile(ALTERNATIVE("", "div %0", X86_BUG_DIV0)
> :: "a" (1), "d" (0));
>
> instead. One less register used, and the same "no change to register
> state" approach.

Yeah, I spotted that as an option, and it does save one whole zeroing
idiom...

But IMO, the risk of someone copy&pasting this as if it were a good
example, and the debugging thereafter ought to be enough of a reason to
avoid klever tricks to save 1 line of C.

> Of course, who knows what early-out algorithm the divider uses, and
> maybe it's cheaper to do 0/1 than it is to do 1/1. Not that I think we
> should care. The main reason to be cute here would be just to be cute.

AMD said "any non-faulting divide".  Which still isn't as helpful as it
could be, because according to Agner Fogh:

             Uops Latency
DIV  r8/m8    1    13-16
DIV  r16/m16  2    14-21
DIV  r32/m32  2    14-30
DIV  r64/m64  2    14-46
IDIV r8/m8    1    13-16
IDIV r16/m16  2    13-21
IDIV r32/m32  2    14-30
IDIV r64/m64  2    14-47

DIV %al looks to be the firm favourite choice.

Assuming the one extra cycle is just for the double-pumped uop, then the
best latency for a divide is 13 cycles across the board.

It doesn't make sense to optimise this as a fastpath.  After all, what
fool would put a real divide-by-1 in their code...

> That said, if you were to use this pattern in *real* code (as opposed
> to in that function that will never be called in reality because
> nobody divides by zero in real code), the register clobbers might be
> useful just to make sure the compiler doesn't re-use a zero register
> content that is then behind the latency of the dummy divide. But
> again, this particular code really doesn't _matter_ in that sense.

Well - that's a different question.

An attacker skilled in the art can easily hide #DE in the transient
shadow of something else, and plenty of people got very skilled in this
particular art trying to make better Meltdown exploits.

So I don't think putting any scrubbing in the #DE handler is going to
stop a real attack.  But I'm just speculating.

~Andrew

P.S. https://www.amd.com/system/files/documents/security-whitepaper.pdf
currently says

"The divide by zero (#DE) fault is signaled[sic] on the integer divide
instructions. No data is forwarded to younger, dependent operations for
speculative execution on this fault."

which needs to be revisited.  Zen1 was the latest-and-greatest when that
whitepaper was written.