KVM vs AMD: Re: [PATCH v3 48/59] x86/retbleed: Add SKL return thunk

From: Andrew Cooper
Date: Thu Nov 03 2022 - 18:54:07 EST


On 21/10/2022 16:21, Nathan Chancellor wrote:
> On Fri, Oct 21, 2022 at 11:53:09AM +0200, Peter Zijlstra wrote:
>> On Thu, Oct 20, 2022 at 04:10:28PM -0700, Nathan Chancellor wrote:
>>> This commit is now in -next as commit 5d8213864ade ("x86/retbleed: Add
>>> SKL return thunk"). I just bisected an immediate reboot on my AMD test
>>> system when starting a virtual machine with QEMU + KVM to it (see the
>>> bisect log below). My Intel test systems do not show this.
>>> Unfortunately, I do not have much more information, as there are no logs
>>> in journalctl, which makes sense as the reboot occurs immediately after
>>> I hit the enter key for the QEMU command.
>>>
>>> If there is any further information I can provide or patches I can test
>>> for further debugging, I am more than happy to do so.
>> Moo :-(
>>
>> you happen to have a .config for me?
> Sure thing, sorry I did not provide it in the first place! Attached. It
> has been run through localmodconfig for the particular machine but I
> assume the core pieces should still be present.

Following up from some debugging on IRC.

The problem is that FILL_RETURN_BUFFER now has a per-cpu variable
access, and AMD SVM has a fun optimisation where the VMRUN instruction
doesn't swap, amongst other things, %gs.

per-cpu variables only become safe following
vmload(__sme_page_pa(sd->save_area)); in svm_vcpu_enter_exit().

Given that retbleed=force ought to work on non-skylake hardware, the
appropriate fix is to move the VMLOAD/VMSAVE's down into asm and put
them adjacent to VMRUN.

This also addresses an undocumented dependency where its only the memory
clobber in vmload() which stops the compiler moving
svm_vcpu_enter_exit()'s calculation of sd into an unsafe position.

~Andrew