Re: [PATCH 16/17] cgroup/drm: Expose memory stats

From: Tejun Heo
Date: Wed Jul 26 2023 - 15:50:05 EST


Hello,

On Wed, Jul 26, 2023 at 05:44:28PM +0100, Tvrtko Ursulin wrote:
...
> > So, yeah, if you want to add memory controls, we better think through how
> > the fd ownership migration should work.
>
> It would be quite easy to make the implicit migration fail - just the matter
> of failing the first ioctl, which is what triggers the migration, after the
> file descriptor access from a new owner.

So, it'd be best if there's no migration involved at all as per the
discussion with Maarten.

> But I don't think I can really add that in the RFC given I have no hard
> controls or anything like that.
>
> With GPU usage throttling it doesn't really apply, at least I don't think it
> does, since even when migrated to a lower budget group it would just get
> immediately de-prioritized.
>
> I don't think hard GPU time limits are feasible in general, and while soft
> might be, again I don't see that any limiting would necessarily have to run
> immediately on implicit migration.

Yeah, I wouldn't worry about hard allocation of GPU time. CPU RT control
does that but it's barely used.

> Second part of the story are hypothetical/future memory controls.
>
> I think first thing to say is that implicit migration is important, but it
> is not really established to use the file descriptor from two places or to
> migrate more than once. It is simply fresh fd which gets sent to clients
> from Xorg, which is one of the legacy ways of doing things.
>
> So we probably can just ignore that given no significant amount of memory
> ownership would be getting migrated.

So, if this is the case, it'd be better to clarify this. ie. if the summary is:

fd gets assigned to the user with a certain action at which point the fd
doesn't have significant resources attached to it and the fd can't be moved
to some other cgroup afterwards.

then, everything is pretty simple. No need to worry about migration at all.
fd just gets assigned once at the beginning and everything gets accounted
towards that afterwards.

> And for drm.memory.stat I think what I have is good enough - both private
> and shared data get accounted, for any clients that have handles to
> particular buffers.
>
> Maarten was working on memory controls so maybe he would have more thoughts
> on memory ownership and implicit migration.
>
> But I don't think there is anything incompatible with that and
> drm.memory.stats as proposed here, given how the categories reported are the
> established ones from the DRM fdinfo spec, and it is fact of the matter that
> we can have multiple memory regions per driver.
>
> The main thing that would change between this RFC and future memory controls
> in the area of drm.memory.stat is the implementation - it would have to get
> changed under the hood from "collect on query" to "account at
> allocation/free/etc". But that is just implementation details.

I'd much prefer to straighten out this before adding a prelimiary stat only
thing. If the previously described ownership model is sufficient, none of
this is complicated, right? We can just add counters to track the resources
and print them out.

Thanks.

--
tejun