Re: [PATCH] drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()

From: Christian König
Date: Tue Jul 04 2023 - 10:51:30 EST


Am 04.07.23 um 16:31 schrieb Arnd Bergmann:
On Tue, Jul 4, 2023, at 14:33, Christian König wrote:
Am 04.07.23 um 14:24 schrieb Arnd Bergmann:
On Tue, Jul 4, 2023, at 08:54, Christian König wrote:
Am 03.07.23 um 14:35 schrieb Arnd Bergmann:
Not sure I understand the specific requirement. Do you mean the entire
amdgpu driver requires 64-bit BAR addressing, or just the bits that
resize the BARs?
Well a bit of both.

Modern AMD GPUs have 16GiB of local memory (VRAM), making those
accessible to a CPU which can only handle 32bit addresses by resizing
the BAR is impossible to begin with.

But going a step further even without resizing it is pretty hard to get
that hardware working on such an architecture.
I'd still like to understand this part better, as we have a lot of
arm64 chips with somewhat flawed PCIe implementations, often with
a tiny 64-bit memory space, but otherwise probably capable of
using a GPU.

Yeah, those are unfortunately very well known to us :(

What exactly do you expect to happen here?

a) Use only part of the VRAM but otherwise work as expected
b) Access all of the VRAM, but at a performance cost for
bank switching?

We have tons of x86 systems where we can't resize the BAR (because of lack of BIOS setup of the root PCIe windows). So bank switching is still perfectly supported.

c) Require kernel changes to make a) or b) work, otherwise
fail to load
d) have no chance of working even with driver changes

Yeah, that is usually what happens on those arm64 system with flawed PCIe implementations.

The problem is not even BAR resize, basically we already had tons of customers which came to us and complained that amdgpu doesn't load or crashes the system after a few seconds.

After investigating (which sometimes even includes involving engineers from ARM) we usually find that those boards doesn't even remotely comply to the PCIe specification, both regarding power as well as functional things like DMA coherency.

Regards,
Christian.


It might be cleaner to just not build the whole driver on such systems
or at least leave out this function.
How about this version? This also addresses the build failure, but
I don't know if this makes any sense:

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1325,6 +1325,9 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
u16 cmd;
int r;
+ if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
+ return 0;
+
Yes, if that suppresses the warning as well then that makes perfect
sense to me.
Ok, I'll send that as a v2 then.

Arnd