Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

From: Tom Lendacky
Date: Wed Oct 06 2021 - 10:02:04 EST


On 10/6/21 8:23 AM, Alex Deucher wrote:
On Wed, Oct 6, 2021 at 5:42 AM Borislav Petkov <bp@xxxxxxxxx> wrote:

On Tue, Oct 05, 2021 at 10:48:15AM -0400, Alex Deucher wrote:
It's not incompatible per se, but SEM requires the IOMMU be enabled
because the C bit used for encryption is beyond the dma_mask of most
devices. If the C bit is not set, the en/decryption for DMA doesn't
occur. So you need IOMMU to be enabled in remapping mode to use SME
with most devices. Raven has further requirements in that it requires
IOMMUv2 functionality to support some features which currently uses a
direct mapping in the IOMMU and hence the C bit is not properly
handled.

So lemme ask you this: do Raven-containing systems exist out there which
don't have IOMMUv2 functionality and which can cause boot failures when
SME is enabled in the kernel .config?

There could be some OEM systems that disable the IOMMU on the platform
and don't provide a switch in the bios to enable it. The GPU driver
will still work in that case, it will just not be able to enable KFD
support for ROCm compute. SME won't work for most devices in that
case however since most devices have a DMA mask too small to handle
the C bit for encryption. SME should be dependent on IOMMU being
enabled.

That's not completely true. If the IOMMU is not enabled (off or in passthrough mode), then the DMA api will check the DMA mask and use SWIOTLB to bounce the DMA if the device doesn't support DMA at the position where the c-bit is located (see force_dma_unencrypted() in arch/x86/mm/mem_encrypt.c).

To avoid bounce buffering, though, commit 2cc13bb4f59f was introduced to disable passthrough mode when SME is active (unless iommu=pt was explicitly specified).

Thanks,
Tom



IOW, can we handle this at boot time properly, i.e., disable SME if we
detect Raven or IOMMUv2 support is missing?

If not, then we really will have to change the default.

I'm not an SME expert, but I thought that that was already the case.
We just added the error condition in the GPU driver to prevent the
driver from loading when the user forced SME on. IIRC, there were
users that cared more about SME than graphics support.

Alex


Thx.

--
Regards/Gruss,
Boris.

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cthomas.lendacky%40amd.com%7Cbab2eedbc1704f90f63408d988cc7fb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637691234178637291%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xCXc1pcfJiWvKG1DTJKq986Ecid8M7M7K3gvCDWrZL8%3D&amp;reserved=0