NULL deref crash in bfq_add_bfqq_busy

From: Khazhy Kumykov
Date: Tue Nov 01 2022 - 19:16:29 EST


Still poking around here, I think my previous emails were leading me
down a dead end, at least w.r.t. unsafe locking. The one code path I
identified below (blk_bio_list_merge() via __blk_mq_sched_bio_merge())
won't actually get hit since we'll instead call bfq_bio_merge().

The crashes I'm seeing have been in the read side (stacks below)

[160595.656560] bfq_add_bfqq_busy+0x110/0x1ec
[160595.661142] bfq_add_request+0x6bc/0x980
[160595.666602] bfq_insert_request+0x8ec/0x1240
[160595.671762] bfq_insert_requests+0x58/0x9c
[160595.676420] blk_mq_sched_insert_request+0x11c/0x198
[160595.682107] blk_mq_submit_bio+0x270/0x62c
[160595.686759] __submit_bio_noacct_mq+0xec/0x178
[160595.691926] submit_bio+0x120/0x184
[160595.695990] ext4_mpage_readpages+0x77c/0x7c8
[160595.701026] ext4_readpage+0x60/0xb0
[160595.705158] filemap_read_page+0x54/0x114
[160595.711961] filemap_fault+0x228/0x5f4
[160595.716272] do_read_fault+0xe0/0x1f0
[160595.720487] do_fault+0x40/0x1c8

or

[28497.344552] bfq_add_bfqq_busy+0x110/0x1ec
[28497.348787] bfq_add_request+0x6bc/0x980
[28497.352845] bfq_insert_request+0x8ec/0x1240
[28497.357265] bfq_insert_requests+0x58/0x9c
[28497.361503] blk_mq_sched_insert_requests+0x8c/0x180
[28497.366647] blk_mq_flush_plug_list+0x15c/0x1e0
[28497.371376] blk_flush_plug_list+0xf0/0x124
[28497.375877] blk_finish_plug+0x34/0x48
[28497.379846] read_pages+0x21c/0x288
[28497.383510] page_cache_ra_unbounded+0x1f0/0x24c
[28497.388346] do_page_cache_ra+0x48/0x54
[28497.392388] do_sync_mmap_readahead+0x190/0x1e0
[28497.397150] filemap_fault+0x150/0x5f4
[28497.401111] do_read_fault+0xe0/0x1f0
[28497.404948] do_fault+0x40/0x1c8

In the crashes I'm looking at, it looks like the inconsistent
bfq_queue is oom_bfqq (where waker_bfqq is NULL, but woken_list_node
is hashed)

I'm looking at bfq_init_rq(), where we set bfqq->waker_bfqq
(potentially to NULL), and only hlist_add_head, no hlist_del_init. It
looks like the logic here is, bfq_get_bfqq_handle_split is supposed to
return a freshly allocated bfq_queue.

However, it can also return oom_bfqq, which may already be on a woken
list - then we proceed to set waker_bfqq without deleting woken_list
node, which can result in the inconsistency seen in the crash. wdyt?

bfqq = bfq_split_bfqq(bic, bfqq);
split = true;

if (!bfqq) {
bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio,
true, is_sync,
NULL); # This can also return oom_bfqq
bfqq->waker_bfqq = old_bfqq->waker_bfqq; # this can be set to null
bfqq->tentative_waker_bfqq = NULL;


an aside, which looks at the async case, since I'd like to understand
this code better.
---

The call to bfq_split_bfqq() doesn't make sense to me - since for
bfq_get_bfqq_handle_split() et. al. we pass the is_sync parameter, but
bfq_split_bfqq() only ever clears bfqq for sync=true. My understanding
seems like this is intended to be: we decide to split, we call
bfq_split_bfqq() which sets bic->bfqq[is_sync] to NULL, then
bfq_get_bfqq_handle_split() sees that it's null and allocates a brand
new queue. But for async, bfq_split_bfqq will return NULL, but
bic->bfqq[0] was never cleared, so bfq_get_bfqq_handle_split() will
see that, and return that already existing bfqq. We'll then modify the
not-freshly-allocated bfqq.

Should bfq_split_bfqq() here have an is_sync param, and clear the
corresponding bic->bfqq?
---




On Mon, Oct 31, 2022 at 10:05 PM Khazhy Kumykov <khazhy@xxxxxxxxxxxx> wrote:
>
> On Mon, Oct 31, 2022 at 9:37 PM Khazhy Kumykov <khazhy@xxxxxxxxxxxx> wrote:
> >
> > I'm investigating a NULL deref crash in bfq_add_bfqq_busy(), wherein
> > bfqq->woken_list_node is hashed, but bfqq->waker_bfqq is NULL - which
> > seems inconsistent per my reading of the code.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature