Re: [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature

From: Namhyung Kim
Date: Mon Sep 02 2019 - 22:13:16 EST


Hi Tejun,

Sorry for the late reply.


On Fri, Aug 30, 2019 at 09:58:15PM -0700, Tejun Heo wrote:
> Hello,
>
> On Sat, Aug 31, 2019 at 12:03:26PM +0900, Namhyung Kim wrote:
> > Hmm.. it looks hard to use fhandle as the identifier since perf
> > sampling is done in NMI context. AFAICS the encode_fh part seems ok
> > but getting dentry/inode from a kernfs_node seems not.
> >
> > I assume kernfs_node_id's ino and gen are same to its inode's. Then
> > we might use kernfs_node for encoding but not sure you like it ;-)
>
> Oh yeah, the whole cgroup id situation is kinda shitty and it's likely
> that it needs to be cleaned up a bit for this to be used widely. The
> issues are...
>
> * As identifiers, paths sucks. It's too big and unwieldy and can be
> rapidly reused for different instances.
>
> * ino is compact but can't be easily mapped to path from userland and
> also not unique.
>
> * The fhandle identifier - currently ino+gen - is better in that it's
> finite sized and compact and can be efficiently mapped to path from
> userspace. It's also mostly unique. However, the way gen is
> currently generated still has some chance of the same ID getting
> reused and it isn't easily accessible from inside the kernel right
> now.
>
> Eventually, where we wanna be at is having a single 64bit identifier
> which can be easily used everywhere. It should be pretty straight
> forward on 64bit machines - we can just use monotonically increasing
> id and use it for everything - ino, fhandle and internal cgroup id.
> On 32bit, it gets a bit complicated because ino is 32bit, so it'll
> need a custom allocator which bumps gen when the lower 32bit wraps and
> skips in-use inos. Once we have that, we can use that for cgrp->id
> and fhandle and derive ino from it.
>
> This is on the to-do list but obviously hasn't happened yet. If you
> wanna take on it, great, but, otherwise, what can be done now is
> either moving gen+ino generation into cgroup and tell kernfs to use it
> or copy gen+ino into cgroup for easier access. The former likely is
> the better approach given that it brings us closer to where we wanna
> be eventually.

So is my understanding below correct?

* currently kernfs ino+gen is different than inode's ino+gen
* but it'd be better to make them same
* so move (generic?) inode's ino+gen logic to cgroup
* and kernfs node use the same logic (and number)
* so perf sampling code (NMI) just access kernfs node
* and userspace can use file handle for comparison

Thanks,
Namhyung