Re: [PATCH bpf-next 1/4] bpf: enable task local storage for tracing programs

From: Yonghong Song
Date: Fri Jan 15 2021 - 20:51:32 EST




On 1/15/21 5:12 PM, Song Liu wrote:


On Jan 15, 2021, at 4:55 PM, Yonghong Song <yhs@xxxxxx> wrote:



On 1/15/21 3:34 PM, Song Liu wrote:
On Jan 12, 2021, at 8:53 AM, KP Singh <kpsingh@xxxxxxxxxx> wrote:

On Tue, Jan 12, 2021 at 5:32 PM Yonghong Song <yhs@xxxxxx> wrote:



On 1/11/21 3:45 PM, Song Liu wrote:


On Jan 11, 2021, at 1:58 PM, Martin Lau <kafai@xxxxxx> wrote:

On Mon, Jan 11, 2021 at 10:35:43PM +0100, KP Singh wrote:
On Mon, Jan 11, 2021 at 7:57 PM Martin KaFai Lau <kafai@xxxxxx> wrote:

On Fri, Jan 08, 2021 at 03:19:47PM -0800, Song Liu wrote:

[ ... ]

diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index dd5aedee99e73..9bd47ad2b26f1 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c

[...]

+#include <linux/bpf.h>

#include <asm/pgalloc.h>
#include <linux/uaccess.h>
@@ -734,6 +735,7 @@ void __put_task_struct(struct task_struct *tsk)
cgroup_free(tsk);
task_numa_free(tsk, true);
security_task_free(tsk);
+ bpf_task_storage_free(tsk);
exit_creds(tsk);
If exit_creds() is traced by a bpf and this bpf is doing
bpf_task_storage_get(..., BPF_LOCAL_STORAGE_GET_F_CREATE),
new task storage will be created after bpf_task_storage_free().

I recalled there was an earlier discussion with KP and KP mentioned
BPF_LSM will not be called with a task that is going away.
It seems enabling bpf task storage in bpf tracing will break
this assumption and needs to be addressed?

For tracing programs, I think we will need an allow list where
task local storage can be used.
Instead of whitelist, can refcount_inc_not_zero(&tsk->usage) be used?

I think we can put refcount_inc_not_zero() in bpf_task_storage_get, like:

diff --git i/kernel/bpf/bpf_task_storage.c w/kernel/bpf/bpf_task_storage.c
index f654b56907b69..93d01b0a010e6 100644
--- i/kernel/bpf/bpf_task_storage.c
+++ w/kernel/bpf/bpf_task_storage.c
@@ -216,6 +216,9 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
* by an RCU read-side critical section.
*/
if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
+ if (!refcount_inc_not_zero(&task->usage))
+ return -EBUSY;
+
sdata = bpf_local_storage_update(
task, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST);

But where shall we add the refcount_dec()? IIUC, we cannot add it to
__put_task_struct().

Maybe put_task_struct()?

Yeah, something like, or if you find a more elegant alternative :)

--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -107,13 +107,20 @@ extern void __put_task_struct(struct task_struct *t);

static inline void put_task_struct(struct task_struct *t)
{
- if (refcount_dec_and_test(&t->usage))
+
+ if (rcu_access_pointer(t->bpf_storage)) {
+ if (refcount_sub_and_test(2, &t->usage))
+ __put_task_struct(t);
+ } else if (refcount_dec_and_test(&t->usage))
__put_task_struct(t);
}

static inline void put_task_struct_many(struct task_struct *t, int nr)
{
- if (refcount_sub_and_test(nr, &t->usage))
+ if (rcu_access_pointer(t->bpf_storage)) {
+ if (refcount_sub_and_test(nr + 1, &t->usage))
+ __put_task_struct(t);
+ } else if (refcount_sub_and_test(nr, &t->usage))
__put_task_struct(t);
}
It is not ideal to leak bpf_storage here. How about we only add the
following:
diff --git i/kernel/bpf/bpf_task_storage.c w/kernel/bpf/bpf_task_storage.c
index f654b56907b69..2811b9fc47233 100644
--- i/kernel/bpf/bpf_task_storage.c
+++ w/kernel/bpf/bpf_task_storage.c
@@ -216,6 +216,10 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
* by an RCU read-side critical section.
*/
if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
+ /* the task_struct is being freed, fail over*/
+ if (!refcount_read(&task->usage))
+ return -EBUSY;

This may not work? Even we check here and task->usage is not 0, it could still become 0 immediately after the above refcount_read, right?

We call bpf_task_storage_get() with "task" that has valid BTF, so "task"
should not go away during the BPF program? Whatever mechanism that

Oh, right. this is true. Otherwise, we cannot use task ptr in the helper.

triggers the BPF program should either hold a reference to task (usage > 0)
or be the only one owning it (usage == 0, in __put_task_struct). Did I miss
anything?

Sorry. I think you are right. Not sure lsm requirement. There are two
more possible ways to check task is exiting which happens before __put_task_struct():
. check task->exit_state
. check task->flags & PF_EXITING (used in bpf_trace.c)

Not sure which condition is the correct one to check.


Thanks,
Song


+
sdata = bpf_local_storage_update(
task, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST);


I may be missing something but shouldn't bpf_storage be an __rcu
member like we have for sk_bpf_storage?
Good catch! I will fix this in v2.
Thanks,
Song