Re: [PATCH-next] block: fix null-deref in percpu_ref_put

From: Yu Kuai
Date: Wed Dec 07 2022 - 08:10:45 EST


Hi,

在 2022/12/07 9:05, Dennis Zhou 写道:
Hello,

On Tue, Dec 06, 2022 at 05:09:39PM +0800, Zhong Jinghua wrote:
A problem was find in stable 5.10 and the root cause of it like below.

In the use of q_usage_counter of request_queue, blk_cleanup_queue using
"wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
to wait q_usage_counter becoming zero. however, if the q_usage_counter
becoming zero quickly, and percpu_ref_exit will execute and ref->data
will be freed, maybe another process will cause a null-defef problem
like below:

CPU0 CPU1
blk_mq_destroy_queue
blk_freeze_queue
blk_mq_freeze_queue_wait
scsi_end_request
percpu_ref_get
...
percpu_ref_put
atomic_long_sub_and_test
blk_put_queue
kobject_put
kref_put
blk_release_queue
percpu_ref_exit
ref->data -> NULL
ref->data->release(ref) -> null-deref


I remember thinking about this a while ago. I don't think this fix works
as nicely as it may seem. Please correct me if I'm wrong.

q->q_usage_counter has the oddity that the lifetime of the percpu_ref
object isn't managed by the release function. The freeing is handled by
a separate path where it depends on the percpu_ref hitting 0. So here we
have 2 concurrent paths racing to run with 1 destroying the object. We
probably need blk_release_queue() to wait on percpu_ref's release
finishing, not starting.

I think the above works in this specific case because there is a
call_rcu() in blk_release_queue(). If there wasn't a call_rcu(),
then by the same logic we could delay ref->data->release(ref) further
and that could potentially lead to a use after free.

Ideally, I think fixing the race in q->q_usage_counter's pattern is
better than masking it here as I think we're being saved by the
call_rcu() call further down the object release path.

Agree.

BTW, Wensheng used to send a patch to fix this in block layer:

https://www.spinics.net/lists/kernel/msg4615696.html.

Thanks,
Kuai