Re: [PATCH -next] [RFC] block: fix null-deref in percpu_ref_put

From: zhangwensheng (E)
Date: Fri Jul 29 2022 - 22:15:20 EST


Hi, Ming

I don't think this is a generic issue in percpu_ref, I sort out some processes
using percpu_ref like "part->ref", "blkg->refcnt" and "ctx->reqs/ctx->users",
they all use percpu_ref_exit after "release" done which will not cause problem.
so I think it should not change it in api(percpu_ref_put_many), and user should
to guarantee it.

thanks!
Wensheng

在 2022/7/29 21:58, Ming Lei 写道:
On Fri, Jul 29, 2022 at 06:50:36PM +0800, Zhang Wensheng wrote:
From: Zhang Wensheng <zhangwensheng5@xxxxxxxxxx>

A problem was find in stable 5.10 and the root cause of it like below.

In the use of q_usage_counter of request_queue, blk_cleanup_queue using
"wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
to wait q_usage_counter becoming zero. however, if the q_usage_counter
becoming zero quickly, and percpu_ref_exit will execute and ref->data
will be freed, maybe another process will cause a null-defef problem
like below:

CPU0 CPU1
blk_cleanup_queue
blk_freeze_queue
blk_mq_freeze_queue_wait
scsi_end_request
percpu_ref_get
...
percpu_ref_put
atomic_long_sub_and_test
percpu_ref_exit
ref->data -> NULL
ref->data->release(ref) -> null-deref

Looks it is one generic issue in percpu_ref, I think the following patch
should address it.


diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index d73a1c08c3e3..07308bd36d83 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -331,8 +331,12 @@ static inline void percpu_ref_put_many(struct percpu_ref *ref, unsigned long nr)
if (__ref_is_percpu(ref, &percpu_count))
this_cpu_sub(*percpu_count, nr);
- else if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
- ref->data->release(ref);
+ else {
+ percpu_ref_func_t *release = ref->data->release;
+
+ if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
+ release(ref);
+ }
rcu_read_unlock();
}


Thanks,
Ming