Re: [PATCH] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

From: Andy Lutomirski
Date: Fri Aug 16 2019 - 11:20:15 EST


On 8/14/19 2:17 PM, Lendacky, Thomas wrote:
From: Tom Lendacky <thomas.lendacky@xxxxxxx>

There have been reports of RDRAND issues after resuming from suspend on
some AMD family 15h and family 16h systems. This issue stems from BIOS
not performing the proper steps during resume to ensure RDRAND continues
to function properly.

Can you or someone from AMD document *precisely* what goes wrong here? The APM is crystal clear:

Hardware modifies the CF flag to indicate whether the value returned in the destination register is valid. If CF = 1, the value is valid. If CF = 0, the value is invalid.

If BIOS screws up and somehow RDRAND starts failing and returning CF = 0, then I think it's legitimate to call it a BIOS bug. Some degree of documentation would be nice, as would a way for BIOS to indicate to the OS that it does not have this bug.

But, from the reports, it sounds like RDRAND starts failing, setting CF = 1, and returning 0xFFFF.... in the destination register. If true, then this is, in my book, a severe CPU bug. Software is supposed to be able to trust that, if RDRAND sets CF = 1, the result is a cryptographically secure random number, even if everything else in the system is actively malicious. On a SEV-ES system, this should be considered a security hole -- even if the hypervisor and BIOS collude, RDRAND in the guest should work as defined by the manual.

So, can you clarify what is actually going on? And, if there is an issue where the CPU does not behave as documented in the APM, and AMD issue an erratum? And ideally also fix it in microcode or in a stepping and give an indication that the issue is fixed?