Re: [RFC PATCH] bfq: fix waker_bfqq inconsistency crash

From: Khazhy Kumykov
Date: Wed Nov 02 2022 - 23:05:39 EST


On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> 在 2022/11/03 9:39, Khazhismel Kumykov 写道:
> > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> > but woken_list_node still being hashed. This would happen when
> > bfq_init_rq() expects a brand new allocated queue to be returned from
>
> From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if
> 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq'
> from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused
> here...
There's two calls for bfq_get_bfqq_handle_split in this function - the
second one is after the check you mentioned, and is the problematic
one.
>
> > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> > without resetting woken_list_node. Since we can always return oom_bfqq
> > when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> > We must either reset woken_list_node, or avoid setting woken_list at all
> > for oom_bfqq - opt to do the former.
>
> Once oom_bfqq is used, I think the io is treated as issued from root
> group. Hence I don't think it's necessary to set woken_list or
> waker_bfqq for oom_bfqq.
Ack, I was wondering what's right here since, evidently, *someone* had
already set oom_bfqq->waker_bfqq to *something* (although... maybe it
was an earlier init_rq). But maybe it's better to do nothing if we
*know* it's oom_bfqq.

Is it a correct interpretation here that setting waker_bfqq won't
accomplish anything anyways on oom_bfqq, so better off not?

>
> Thanks,
> Kuai