Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs

From: John Fastabend
Date: Thu Nov 17 2022 - 17:36:59 EST


David Vernet wrote:
> On Thu, Nov 17, 2022 at 01:03:45PM -0800, John Fastabend wrote:
> > David Vernet wrote:
> > > Now that BPF supports adding new kernel functions with kfuncs, and
> > > storing kernel objects in maps with kptrs, we can add a set of kfuncs
> > > which allow struct task_struct objects to be stored in maps as
> > > referenced kptrs.
> > >
> > > The possible use cases for doing this are plentiful. During tracing,
> > > for example, it would be useful to be able to collect some tasks that
> > > performed a certain operation, and then periodically summarize who they
> > > are, which cgroup they're in, how much CPU time they've utilized, etc.
> > > Doing this now would require storing the tasks' pids along with some
> > > relevant data to be exported to user space, and later associating the
> > > pids to tasks in other event handlers where the data is recorded.
> > > Another useful by-product of this is that it allows a program to pin a
> > > task in a BPF program, and by proxy therefore also e.g. pin its task
> > > local storage.
> >
> > Sorry wasn't obvious to me (late to the party so if it was in some
> > other v* described apologies). Can we say something about the life
> > cycle of this acquired task_structs because they are incrementing
> > the ref cnt on the task struct they have potential to impact system.
>
> We should probably add an entire docs page which describes how kptrs
> work, and I am happy to do that (ideally in a follow-on patch set if
> that's OK with you). In general I think it would be useful to include
> docs for any general-purpose kfuncs like the ones proposed in this set.

Sure, I wouldn't require that for your series though fwiw.

>
> In regards to your specific question about the task lifecycle, nothing
> being proposed in this patch set differs from how kptr lifecycles work
> in general. The idea is that the BPF program:
>
> 1. Gets a "kptr_ref" kptr from an "acquire" kfunc.
> 2. Stores it in a map with bpf_kptr_xchg().
>
> The program can then either later manually extract it from the map
> (again with bpf_kptr_xchg()) and release it, or if the reference is
> never removed from the map, let it be automatically released when the
> map is destroyed. See [0] and [1] for a bit more information.

Yep as long as the ref is decremented on map destroy and elem delete
all good.

>
> [0]: https://docs.kernel.org/bpf/kfuncs.html?highlight=kptr#kf-acquire-flag
> [1]: https://lwn.net/Articles/900749/
>
> > I know at least we've had a few bugs in our task struct tracking
> > that has led to various bugs where we leak references. In our case
> > we didn't pin the kernel object so the leak is just BPF memory and
> > user space memory, still sort of bad because we would hit memory
> > limits and get OOM'd. Leaking kernel task structs is worse though.
>
> I don't think we need to worry about leaks. The verifier should fail to
> load any program that doesn't properly release a kptr, and if it fails
> to identify programs that improperly hold refcounts, that's a bug that
> needs to be fixed. Similarly, if any map implementation (described
> below) fails to properly free references at the appropriate time (when
> an element or the map itself is destroyed), those are just bugs that
> need to be fixed.
>
> I think the relevant tradeoff here is really the possible side effects
> of keeping a task pinned and avoiding it being reaped. I agree that's an
> important consideration, but I think that would arguably apply to any
> kptr (modulo the size of the object being pinned, which is certainly
> relevant as well). We already have kptrs for e.g. struct nf_conn [2].
> Granted, struct task_struct is significantly larger, but bpf_kptr_xchg()
> is only enabled for privileged programs, so it seems like a reasonable
> operation to allow.

No not arguing it shouldn't be possible just didn't see the release
hook.

>
> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/net/netfilter/nf_conntrack_bpf.c#n253
>
> > quick question. If you put acquired task struct in a map what
> > happens if user side deletes the entry? Presumably this causes the
> > release to happen and the task_struct is good to go. Did I miss
> > the logic? I was thinking you would have something in bpf_map_free_kptrs
> > and a type callback to release() the refcnt?
>
> Someone else can chime in here to correct me if I'm wrong, but AFAIU
> this is handled by the map implementations calling out to
> bpf_obj_free_fields() to invoke the kptr destructor when the element is
> destroyed. See [3] and [4] for examples of where they're called from the
> arraymap and hashmap logic respectively. This is how the destructors are
> similarly invoked when the maps are destroyed.

Yep I found the dtor() gets populated in btf.c and apparently needed
to repull my local tree because I missed it. Thanks for the detailed
response.

And last thing I was checking is because KF_SLEEPABLE is not set
this should be blocked from running on sleepable progs which would
break the call_rcu in the destructor. Maybe small nit, not sure
its worth it but might be nice to annotate the helper description
with a note, "will not work on sleepable progs" or something to
that effect.

Thanks.

>
> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/arraymap.c#n431
> [4]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/hashtab.c#n764
>
> [...]