Re: [PATCH bpf-next v3 3/7] bpf: Introduce task open coded iterator kfuncs

From: Andrii Nakryiko
Date: Tue Oct 03 2023 - 18:06:06 EST


On Sun, Oct 1, 2023 at 1:21 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
>
> Hello, Andrii
>
> 在 2023/9/30 05:27, Andrii Nakryiko 写道:
> > On Wed, Sep 27, 2023 at 8:29 PM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
> >>
> >> Hello,
> >>
> >> 在 2023/9/28 07:20, Andrii Nakryiko 写道:
> >>> On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
> >>>>
> >>>> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
> >>>> creation and manipulation of struct bpf_iter_task in open-coded iterator
> >>>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >>>> iterate all processes in the system.
> >>>>
> >>>> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
> >>>> accepts a specific task and iterating type which allows:
> >>>> 1. iterating all process in the system
> >>>>
> >>>> 2. iterating all threads in the system
> >>>>
> >>>> 3. iterating all threads of a specific task
> >>>> Here we also resuse enum bpf_iter_task_type and rename BPF_TASK_ITER_TID
> >>>> to BPF_TASK_ITER_THREAD, rename BPF_TASK_ITER_TGID to BPF_TASK_ITER_PROC.
> >>>>
> >>>> The newly-added struct bpf_iter_task has a name collision with a selftest
> >>>> for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
> >>>> renamed in order to avoid the collision.
> >>>>
> >>>> Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx>
> >>>> ---
> >>>> include/linux/bpf.h | 8 +-
> >>>> kernel/bpf/helpers.c | 3 +
> >>>> kernel/bpf/task_iter.c | 96 ++++++++++++++++---
> >>>> .../testing/selftests/bpf/bpf_experimental.h | 5 +
> >>>> .../selftests/bpf/prog_tests/bpf_iter.c | 18 ++--
> >>>> .../{bpf_iter_task.c => bpf_iter_tasks.c} | 0
> >>>> 6 files changed, 106 insertions(+), 24 deletions(-)
> >>>> rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
> >>>>
> >>>
>
>
> [...]
>
> >>>> +get_next_task:
> >>>> + kit->pos = next_task(kit->pos);
> >>>> + kit->task = kit->pos;
> >>>> + if (kit->pos == &init_task)
> >>>> + kit->pos = NULL;
> >>>
> >>> I can't say I completely follow the logic (e.g., for
> >>> BPF_TASK_ITER_PROC, why do we do next_task() on first next() call)?
> >>> Can you elabore the expected behavior for various combinations of
> >>> types and starting task argument?
> >>>
> >>
> >> Thanks for the review.
> >>
> >> The expected behavior of current implementation is:
> >>
> >> BPF_TASK_ITER_PROC:
> >>
> >> init_task->first_process->second_process->...->last_process->init_task
> >>
> >> We would exit before visiting init_task again.
> >
> > ah, ok, so in this case it's more like BPF_TASK_ITER_ALL_PROCS, i.e.,
> > we iterate all processes in the system. Input `task` that we provide
> > is ignored/meaningless, right? Maybe we should express it as
> > ALL_PROCS?
> >
> >>
> >> BPF_TASK_ITER_THREAD:
> >>
> >> group_task->first_thread->second_thread->...->last_thread->group_task
> >>
> >> We would exit before visiting group_task again.
> >>
> >
> > And this one is iterating threads of a process specified by given
> > `task`, right? This is where my confusion comes from. ITER_PROC and
> > ITER_THREAD, by their name, seems to be very similar, but in reality
> > ITER_PROC is more like ITER_ALL (except process vs thread iteration),
> > while ITER_THREAD is parameterized by input `task`.
> >
> > I'm not sure what's the least confusing way to name and organize
> > everything, but I think it's quite confusing right now, unfortunately.
> > I wonder if you or someone else have a better suggestion on making
> > this more straightforward?
> >
>
> Maybe here we can introduce new enums and not reuse or rename
> BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID?

Yep, probably it's cleaner

>
> {
> BPF_TASK_ITER_ALL_PROC,

BPF_TASK_ITER_ALL_PROCS

> BPF_TASK_ITER_ALL_THREAD,

BPF_TASK_ITER_ALL_THREADS

> BPF_TASK_ITER_THREAD

BPF_TASK_ITER_PROC_THREADS ?

> }
>
> BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID are inner flags. Looking at the
> example usage of SEC("iter/task"), unlike
> BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST, we
> actually don't use BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID directly. When
> using SEC("iter/task"), we just set pid/tid for struct
> bpf_iter_link_info. Exposing new enums to users for open coded
> task_iters will not confuse users.
>
> Thanks.
>