Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs

From: Chuyi Zhou
Date: Wed Sep 20 2023 - 08:21:36 EST




在 2023/9/20 07:30, Andrii Nakryiko 写道:
On Sat, Sep 16, 2023 at 7:03 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:

Hello.

在 2023/9/16 04:37, Andrii Nakryiko 写道:
On Fri, Sep 15, 2023 at 8:03 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
在 2023/9/15 07:26, Andrii Nakryiko 写道:
On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:

This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_process in open-coded iterator
style. BPF programs can use these kfuncs or through bpf_for_each macro to
iterate all processes in the system.

[...cut...]

Few high level thoughts. I think it would be good to follow
SEC("iter/task") naming and approach. Open-coded iterators in many
ways are in-kernel counterpart to iterator programs, so keeping them
close enough within reason is useful for knowledge transfer.

SEC("iter/task") allows to:
a) iterate all threads in the system
b) iterate all threads for a given TGID
c) it also allows to "iterate" a single thread or process, but that's
a bit less relevant for in-kernel iterator, but we can still support
them, why not?

I'm not sure if it supports iterating all processes (as in group
leaders of each task group) in the system, but if it's possible I
think we should support it at least for open-coded iterator, seems
like a very useful functionality.

So to that end, let's design a small set of input arguments for
bpf_iter_process_new() that would allow to specify this as flags +
either (optional) struct task_struct * pointer to represent
task/process or PID/TGID.


Another concern from Alexei was the readability of the API of open-coded
in BPF Program[1].

bpf_for_each(task, curr) is straightforward. Users can easily understand
that this API does the same thing as 'for_each_process' in kernel.

In general, users might have no idea about for_each_process macro in
the kernel, so I don't find this particular argument very convincing.

We can add a separate set of iterator kfuncs for every useful
combination of conditions, of course, but it's a double-edged sword.
Needing to use a different iterator just to specify a different
direction of cgroup iteration (from the example you referred in [1])
also means that it's now harder to write some generic function that
needs to do something for all cgroups matching some criteria where the
order might be coming as an argument.

Similarly for task iterators. It's not hard to imagine some processing
that can be equivalently done per thread or per process in the system,
or on each thread of the process, depending on some conditions or
external configuration. Having to do three different
bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the
nature of the thing that is iterated over is the same, and it's just a
different set of filters to specify which subset of those items should
be iterated, I think it's better to try to stick to the same iterator
with few simple arguments. IMO, of course, there is no objectively
best approach.


However, if we keep the approach of SEC("iter/task")

enum ITER_ITEM {
ITER_TASK,
ITER_THREAD,
}

__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
task_struct *group_task, enum ITER_ITEM type)

the API have to chang:


bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
the system
bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
thread of group_leader
bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
all the process in the system

Useres may guess what are this API actually doing....

I'd expect users to consult documentation before trying to use an
unfamiliar cutting-edge functionality. So let's try to keep
documentation clear and up to the point. Extra flag argument doesn't
seem to be a big deal.

Thanks for your suggestion!

Before we begin working on the next version, I have outlined a detailed
API design here:

1.task_iter

It will be used to iterate process/threads like SEC("iter/task"). Here
we should better to follow the naming and approach SEC("iter/task"):

enum {
ITERATE_PROCESS,
ITERATE_THREAD,
}

__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct
task_struct *task, int flag);

If we want to iterate all processes in the system, the iteration will
start from the *task* which is passed from user.(since process in the
system are connected through a linked list)

but will go through all of them anyways, right? it's kind of
surprising from usability standpoint to have to pass some task_struct
to iterate all of them, tbh. I wonder if it's hard to adjust kfunc
validation to allow "nullable" pointers? We can look at that
separately, of course.


Yes, little subtle here. When we want to iterate threads of a task, we may want the verifier to ensure the *task* is a valid pointer, which would require KF_TRUSTED_ARGS. However, when we want to iterate all process/threads in the system, we want to accept a null pointer.

Anyway, I will try to explore a workaround in the verifier here.

Thanks.