Re: New subsystem for acceleration devices

From: Jiho Chu
Date: Thu Aug 04 2022 - 05:27:51 EST


On Thu, 4 Aug 2022 09:46:49 +0300
Oded Gabbay <oded.gabbay@xxxxxxxxx> wrote:

> On Thu, Aug 4, 2022 at 2:32 AM Daniel Stone <daniel@xxxxxxxxxxxxx> wrote:
> >
> > Hi Oded,
> >
> > On Wed, 3 Aug 2022 at 21:21, Oded Gabbay <oded.gabbay@xxxxxxxxx> wrote:
> > > The reason it happened now is because I saw two drivers, which are
> > > doing h/w acceleration for AI, trying to be accepted to the misc
> > > subsystem.
> >
> > Why misc?
> You will need to ask them ;)
> Seriously, I guess they thought they were not gpu drivers and didn't
> find anything else to go to.
> And at least for one of them, I remember Greg and Arnd pointing them to misc.
>

Hi, Daniel.
Samsung NPU driver is one of the trier to be a misc device. There is some
reasons that it chooses misc, but it can be simply said that GPU was not a
perfect suit for NPU.
AI workload is not limited in graphical job, it can be NLP, data analysis or
training job. The GPU/DRM can work for them, but its description is not for
them.
e.g. AI workloads needs to manage ai model data as well as input data. I guess
it can be working with GEM object, and needs to be expaned for model information.
But I have a question that DRM accept this specialized GEM, thus it's not
related to Graphics.
Other subsystem was simliar, so I only could choose misc device.

IMHO, at the same reason, I'm positive on Oded's working, expecting that the
new subsystem could be more specialized for AI workload.

thanks,
Jiho

> >
> > > Regarding the open source userspace rules in drm - yes, I think your
> > > rules are too limiting for the relatively young AI scene, and I saw at
> > > the 2021 kernel summit that other people from the kernel community
> > > think that as well.
> > > But that's not the main reason, or even a reason at all for doing
> > > this. After all, at least for habana, we open-sourced our compiler and
> > > a runtime library. And Greg also asked those two drivers if they have
> > > matching open-sourced user-space code.
> > >
> > > And a final reason is that I thought this can also help in somewhat
> > > reducing the workload on Greg. I saw in the last kernel summit there
> > > was a concern about bringing more people to be kernel maintainers so I
> > > thought this is a step in the right direction.
> >
> > Can you please explain what the reason is here?
> >
> > Everything you have described - uniform device enumeration, common job
> > description, memory management helpers, unique job submission format,
> > etc - applies exactly to DRM. If open userspace is not a requirement,
> > and bypassing Greg's manual merging is a requirement, then I don't see
> > what the difference is between DRM and this new bespoke subsystem. It
> > would be great to have these differences enumerated in email as well
> > as in kerneldoc.
> I don't think preparing such a list at this point is relevant, because
> I don't have a full-featured subsystem ready, which I can take and
> list all its features and compare it with drm.
> I have a beginning of a subsystem, with very minimal common code, and
> I planned for it to grow with time and with the relevant participants.
>
> And regarding the serspace issue, I believe it will be less stringent
> than in drm.
> For example, afaik in drm you must upstream your LLVM fork to the
> mainline LLVM tree. This is something that is really a heavy-lifting
> task for most, if not all, companies.
> So this is a requirement I think we can forgo.
>
> Thanks,
> Oded
>
> >
> > Cheers,
> > Daniel
>