Re: [PATCH -next] block: Warn if page offset is equal to or larger than page size

From: Ming Lei
Date: Thu Apr 18 2019 - 21:00:56 EST


On Fri, Apr 19, 2019 at 12:59 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>
> Prior to commit 8a96a0e40810 ("block: rewrite blk_bvec_map_sg to avoid a
> nth_page call"), it was valid to call blk_bvec_map_sg() with an offset
> equal to or larger than the page size. blk_bvec_map_sg() would then adjust
> page and offset by calling bvec_nth_page(). This is no longer possible.
> Large offsets are now directly passed without validation to sg_set_page().
> This results in data corruption and crashes such as the following,
> observed when trying to boot arm:vexpress from mmc.

The issue is in scatterlist code, which claims(and is supposed) to support
>=PAGE_SIZE offset and start page. However, there is bug somewhere.

>
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> CPU: 0 PID: 1 Comm: init Tainted: G W
> 5.1.0-rc5-next-20190418-dirty #1
> Hardware name: ARM-Versatile Express
> [<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14)
> [<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8)
> [<c10daad8>] (dump_stack) from [<c03485fc>] (panic+0x110/0x328)
> [<c03485fc>] (panic) from [<c034d550>] (do_exit+0xbfc/0xc08)
> [<c034d550>] (do_exit) from [<c034e608>] (do_group_exit+0x3c/0xbc)
> [<c034e608>] (do_group_exit) from [<c035ace8>] (get_signal+0x13c/0x9c4)
> [<c035ace8>] (get_signal) from [<c030cd3c>] (do_work_pending+0x1a8/0x5e0)
> [<c030cd3c>] (do_work_pending) from [<c030106c>] (slow_work_pending+0xc/0x20)
>
> The above does not help at all when trying to track down the problem.
> Let's at least add a warning to generate a traceback. The crash will
> still be observed, but at least we'll have a hint what is causing it.
> With this patch applied, we'll see something like the following
> prior to the crash.
>
> WARNING: CPU: 0 PID: 84 at block/blk-merge.c:480 blk_rq_map_sg+0x3cc/0x6ac
> page offset 19456 larger than page size 4096
> Modules linked in:
> CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 5.1.0-rc5-next-20190418-dirty #1
> Hardware name: ARM-Versatile Express
> Workqueue: kblockd blk_mq_run_work_fn
> [<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14)
> [<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8)
> [<c10daad8>] (dump_stack) from [<c03483ac>] (__warn+0xe0/0xf8)
> [<c03483ac>] (__warn) from [<c0348410>] (warn_slowpath_fmt+0x4c/0x70)
> [<c0348410>] (warn_slowpath_fmt) from [<c078fb5c>] (blk_rq_map_sg+0x3cc/0x6ac)
> [<c078fb5c>] (blk_rq_map_sg) from [<c0eadeec>] (mmc_blk_data_prep+0x1b0/0x2c8)
> [<c0eadeec>] (mmc_blk_data_prep) from [<c0eae054>] (mmc_blk_rw_rq_prep+0x50/0x178)
> [<c0eae054>] (mmc_blk_rw_rq_prep) from [<c0eb145c>] (mmc_blk_mq_issue_rq+0x294/0x880)
> [<c0eb145c>] (mmc_blk_mq_issue_rq) from [<c0eb1e78>] (mmc_mq_queue_rq+0x128/0x230)
> [<c0eb1e78>] (mmc_mq_queue_rq) from [<c07957f8>] (blk_mq_dispatch_rq_list+0x3b4/0x5e0)
> [<c07957f8>] (blk_mq_dispatch_rq_list) from [<c079ab3c>] (blk_mq_do_dispatch_sched+0x70/0x10c)
> [<c079ab3c>] (blk_mq_do_dispatch_sched) from [<c079b238>] (blk_mq_sched_dispatch_requests+0x11c/0x198)
> [<c079b238>] (blk_mq_sched_dispatch_requests) from [<c0793e20>] (__blk_mq_run_hw_queue+0xe0/0x170)
> [<c0793e20>] (__blk_mq_run_hw_queue) from [<c03665d8>] (process_one_work+0x284/0x6b4)
> [<c03665d8>] (process_one_work) from [<c0367914>] (worker_thread+0x44/0x528)
> [<c0367914>] (worker_thread) from [<c036cc6c>] (kthread+0x15c/0x164)
> [<c036cc6c>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
> Exception stack(0xc58d5fb0 to 0xc58d5ff8)
> 5fa0: 00000000 00000000 00000000 00000000
> 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> ---[ end trace ef7077b559d20ddc ]---
>
> This patch is needed even if the call from mmc_blk_data_prep() is fixed.
> We'll have to expect further problems from callers who are not aware about
> the API change. Those problems will be difficult to track down without
> extra information.
>
> Use WARN_ONCE to report the problem. If the problem is seen, it may be
> seen many times, and we don't want to flood the kernel log with repeated
> messages.
>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx>
> ---
> I don't know if there are situations where offset >= PAGE_SIZE
> is acceptable. A quick comparison with current mainline did not report
> any instances where offset >= PAGE_SIZE is passed to sg_set_page().
> I do see a number of warnings triggered by this patch in -next
> (for example with alpha, m68k, ppc64, and riscv64 emulations) which
> do not (or not immediately) result in a crash.
>
> If there is a different / better means to detect a problem, please
> let me know and I'll be happy to adjust the patch accordingly.
>
> block/blk-merge.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 247b17f2a0f6..9432cd88bd7e 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -475,6 +475,10 @@ static unsigned blk_bvec_map_sg(struct request_queue *q,
> unsigned offset = bvec->bv_offset + total;
> unsigned len = min(get_max_segment_size(q, offset), nbytes);
>
> + WARN_ONCE(offset >= PAGE_SIZE,
> + "page offset %u larger than or equal to page size %ld\n",
> + offset, PAGE_SIZE);
> +
> *sg = blk_next_sg(sg, sglist);
> sg_set_page(*sg, bvec->bv_page, len, offset);

No, it shouldn't be difficult to trigger >= PAGE_SIZE offset, and it
is definitely
correct.


Thanks,
Ming Lei