Re: [PATCH -next] [RFC] block: fix null-deref in percpu_ref_put

From: Ming Lei
Date: Fri Jul 29 2022 - 09:58:59 EST


On Fri, Jul 29, 2022 at 06:50:36PM +0800, Zhang Wensheng wrote:
> From: Zhang Wensheng <zhangwensheng5@xxxxxxxxxx>
>
> A problem was find in stable 5.10 and the root cause of it like below.
>
> In the use of q_usage_counter of request_queue, blk_cleanup_queue using
> "wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
> to wait q_usage_counter becoming zero. however, if the q_usage_counter
> becoming zero quickly, and percpu_ref_exit will execute and ref->data
> will be freed, maybe another process will cause a null-defef problem
> like below:
>
> CPU0 CPU1
> blk_cleanup_queue
> blk_freeze_queue
> blk_mq_freeze_queue_wait
> scsi_end_request
> percpu_ref_get
> ...
> percpu_ref_put
> atomic_long_sub_and_test
> percpu_ref_exit
> ref->data -> NULL
> ref->data->release(ref) -> null-deref
>

Looks it is one generic issue in percpu_ref, I think the following patch
should address it.


diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index d73a1c08c3e3..07308bd36d83 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -331,8 +331,12 @@ static inline void percpu_ref_put_many(struct percpu_ref *ref, unsigned long nr)

if (__ref_is_percpu(ref, &percpu_count))
this_cpu_sub(*percpu_count, nr);
- else if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
- ref->data->release(ref);
+ else {
+ percpu_ref_func_t *release = ref->data->release;
+
+ if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
+ release(ref);
+ }

rcu_read_unlock();
}


Thanks,
Ming