RE: [PATCH RFC] block: fix bio merge checks when virt_boundary is set

From: Dexuan Cui
Date: Thu Dec 15 2016 - 10:37:40 EST


> From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Vitaly Kuznetsov
> Sent: Wednesday, April 20, 2016 21:48
> To: Ming Lei <tom.leiming@xxxxxxxxx>
> Cc: Keith Busch <keith.busch@xxxxxxxxx>; linux-block@xxxxxxxxxxxxxxx; Linux
> Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>; Jens Axboe
> <axboe@xxxxxxxxx>; Dan Williams <dan.j.williams@xxxxxxxxx>; Martin K.
> Petersen <martin.petersen@xxxxxxxxxx>; Sagi Grimberg
> <sagig@xxxxxxxxxxxx>; Mike Snitzer <snitzer@xxxxxxxxxx>; KY Srinivasan
> <kys@xxxxxxxxxxxxx>; Cathy Avery <cavery@xxxxxxxxxx>
> Subject: Re: [PATCH RFC] block: fix bio merge checks when virt_boundary is set
>
> Ming Lei <tom.leiming@xxxxxxxxx> writes:
>
> > On Fri, Mar 18, 2016 at 10:59 AM, Ming Lei <tom.leiming@xxxxxxxxx> wrote:
> >> On Fri, Mar 18, 2016 at 12:39 AM, Keith Busch <keith.busch@xxxxxxxxx>
> wrote:
> >>> On Thu, Mar 17, 2016 at 12:20:28PM +0100, Vitaly Kuznetsov wrote:
> >>>> Keith Busch <keith.busch@xxxxxxxxx> writes:
> >>>> > been combined. In any case, I think you can get what you're after just
> >>>> > by moving the gap check after BIOVEC_PHYS_MERGABLE. Does the
> following
> >>>> > look ok to you?
> >>>> >
> >>>>
> >>>> Thanks, it does.
> >>>
> >>> Cool, thanks for confirming.
> >>>
> >>>> Will you send it or would you like me to do that with your Suggested-by?
> >>>
> >>> I'm not confident yet this doesn't break anything, particularly since
> >>> we moved the gap check after the length check. Just wanted to confirm
> >>> the concept addressed your concern, but still need to take a closer look
> >>> and test before submitting.
> >>
> >> IMO, the change on blk_bio_segment_split() is correct, because actually it
> >> is a sg gap and the check should have been done between segments
> >> instead of bvecs. So it is reasonable to move the check just before populating
> >> a new segment.
> >
> > Thinking of the 1st part change further, looks it is just correct in concept,
> > but wrong from current implementation. Because of bios/reqs merge,
> > blk_rq_map_sg() may end one segment in any bvec in theroy, so I guess
> > that is why each non-1st bvec need the check to make sure no sg gap.
> > Looks a very crazy limit, :-)
> >
> >>
> >> But for the 2nd change in bio_will_gap(), which should fix Vitaly's problem, I
> >> am still not sure if it is completely correct. bio_will_gap() is used
> >> to check if two
> >> bios may be merged. Suppose two bios are continues physically, the last bvec
> >> in 1st bio and the first bvec in 2nd bio might not be in one same segment
> >> because of segment size limit.
> >
> > How about the attached patch?
> >
>
> I just wanted to revive the discussion as the issue persists. I
> re-tested your patch against 4.6-rc4 and it efficiently solves the
> issue.
>
> pre-patch:
> # time mkfs.ntfs /dev/sdb1
> Cluster size has been automatically set to 4096 bytes.
> Initializing device with zeroes: 100% - Done.
> Creating NTFS volume structures.
> mkntfs completed successfully. Have a nice day.
>
> real8m10.977s
> user0m0.115s
> sys0m12.672s
>
> post-patch:
> # time mkfs.ntfs /dev/sdb1
> Cluster size has been automatically set to 4096 bytes.
> Initializing device with zeroes: 100% - Done.
> Creating NTFS volume structures.
> mkntfs completed successfully. Have a nice day.
>
> real0m42.430s
> user0m0.171s
> sys0m7.675s
>
> Will you send this patch? Please let me know if I can further
> assist. Thanks!
>
> --
> Vitaly

Hi, I'm reviving the thread because I'm suffering from exactly the same issue.
This is the thread I created today:
"Big I/O requests are split into small ones due to unaligned ext4 partition boundary?"
http://marc.info/?t=148180346100002&r=1&w=2

Ming's patch can fix this issue for me.

Stable 4.4 and later are affected too.
I didn't check 4.3.x kernels, but for Linux guest on Hyper-V, any kernel with the
patch "storvsc: get rid of bounce buffer"
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=81988a0e6b031bc80da15257201810ddcf989e64
should be affected.

Thanks,
-- Dexuan