Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

From: Thomas Zimmermann
Date: Mon May 17 2021 - 16:11:25 EST


Hi

Am 17.05.21 um 21:32 schrieb Daniel Stone:
Hi,

On Mon, 17 May 2021 at 20:12, Thomas Zimmermann <tzimmermann@xxxxxxx> wrote:
Am 17.05.21 um 09:40 schrieb Daniel Vetter:
We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
think G as in General, not Graphisc.

I hope this was a joke.

Just some thoughts:

AFAICT AI first came as an application of GPUs, but has now
evolved/specialized into something of its own. I can imagine sharing
some code among the various subsystems, say GEM/TTM internals for memory
management. Besides that there's probably little that can be shared in
the userspace interfaces. A GPU is device that puts an image onto the
screen and an AI accelerator isn't.

But it isn't. A GPU is a device that has a kernel-arbitrated MMU
hosting kernel-managed buffers, executes user-supplied compiled
programs with reference to those buffers and other jobs, and informs
the kernel about progress.

KMS lies under the same third-level directory, but even when GPU and
display are on the same die, they're totally different IP blocks
developed on different schedules which are just periodically glued
together.

I mentioned this elsewhere: it's not about the chip architecture, it's about the UAPI. In the end, the GPU is about displaying things on a screen. Even if the rendering and the scanout engines are on different IP blocks. (Or different devices.)

The fact that one can do general purpose computing on a GPU is a byproduct of the evolution of graphics hardware. It never was the goal.



Treating both as the same, even if
they share similar chip architectures, seems like a stretch. They might
evolve in different directions and fit less and less under the same
umbrella.

Why not? All we have in common in GPU land right now is MMU + buffer
references + job scheduling + synchronisation. None of this has common
top-level API, or even a common top-level model. It's not just ISA
differences, but we have very old-school devices where the kernel
needs to register fill on every job, living next to middle-age devices
where the kernel and userspace co-operate to fill a ring buffer,
living next to modern devices where userspace does some stuff and then
the hardware makes it happen with the bare minimum of kernel
awareness.

I see all this as an example why AI should not live under gpu/. There are already many generations of GPUs with different feature sets supported. Why lump more behind the same abstractions if AI can take a fresh start? Why should we care about AI and why should AI care about all our legacy.

We can still share all the internal code if AI needs any of it. Meanwhile AI drivers can provide their own UAPIs until a common framework emerges.

Again, just my 2 cents.

Best regards
Thomas


Honestly I think there's more difference between lima and amdgpu then
there is between amdgpu and current NN/ML devices.

Cheers,
Daniel


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer

Attachment: OpenPGP_signature
Description: OpenPGP digital signature