Re: [RFC PATCH] bfq: fix waker_bfqq inconsistency crash

From: Yu Kuai
Date: Wed Nov 02 2022 - 23:53:46 EST


Hi,

在 2022/11/03 11:05, Khazhy Kumykov 写道:
On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2022/11/03 9:39, Khazhismel Kumykov 写道:
This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
but woken_list_node still being hashed. This would happen when
bfq_init_rq() expects a brand new allocated queue to be returned from

From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if
'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq'
from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused
here...
There's two calls for bfq_get_bfqq_handle_split in this function - the
second one is after the check you mentioned, and is the problematic
one.
Yes, thanks for the explanation. Now I understand how the problem
triggers.


bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
without resetting woken_list_node. Since we can always return oom_bfqq
when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
We must either reset woken_list_node, or avoid setting woken_list at all
for oom_bfqq - opt to do the former.

Once oom_bfqq is used, I think the io is treated as issued from root
group. Hence I don't think it's necessary to set woken_list or
waker_bfqq for oom_bfqq.
Ack, I was wondering what's right here since, evidently, *someone* had
already set oom_bfqq->waker_bfqq to *something* (although... maybe it
was an earlier init_rq). But maybe it's better to do nothing if we
*know* it's oom_bfqq.

I need to have a check how oom_bfqq get involved with waker_bfqq, and
then see if it's reasonable.

Probably Jan and Paolo will have better view on this.

Thanks,
Kuai

Is it a correct interpretation here that setting waker_bfqq won't
accomplish anything anyways on oom_bfqq, so better off not?