Re: [PATCH 1/1] block: Check the queue limit before bio submitting

From: Ming Lei
Date: Mon Nov 06 2023 - 22:50:10 EST


On Tue, Nov 07, 2023 at 02:53:20AM +0000, Ed Tsai (蔡宗軒) wrote:
> On Mon, 2023-11-06 at 19:54 +0800, Ming Lei wrote:
> > On Mon, Nov 06, 2023 at 12:53:31PM +0800, Ming Lei wrote:
> > > On Mon, Nov 06, 2023 at 01:40:12AM +0000, Ed Tsai (蔡宗軒) wrote:
> > > > On Mon, 2023-11-06 at 09:33 +0800, Ed Tsai wrote:
> > > > > On Sat, 2023-11-04 at 11:43 +0800, Ming Lei wrote:
> > >
> > > ...
> > >
> > > > Sorry for missing out on my dd command. Here it is:
> > > > dd if=/data/test_file of=/dev/null bs=64m count=1 iflag=direct
> > >
> > > OK, thanks for the sharing.
> > >
> > > I understand the issue now, but not sure if it is one good idea to
> > check
> > > queue limit in __bio_iov_iter_get_pages():
> > >
> > > 1) bio->bi_bdev may not be set
> > >
> > > 2) what matters is actually bio's alignment, and bio size still can
> > > be big enough
> > >
> > > So I cooked one patch, and it should address your issue:
> >
> > The following one fixes several bugs, and is verified to be capable
> > of
> > making big & aligned bios, feel free to run your test against this
> > one:
> >
> > block/bio.c | 28 +++++++++++++++++++++++++++-
> > 1 file changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/bio.c b/block/bio.c
> > index 816d412c06e9..80b36ce57510 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -1211,6 +1211,7 @@ static int bio_iov_add_zone_append_page(struct
> > bio *bio, struct page *page,
> > }
> >
> > #define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) /
> > sizeof(struct page *))
> > +#define BIO_CHUNK_SIZE(256U << 10)
> >
> > /**
> > * __bio_iov_iter_get_pages - pin user or kernel pages and add them
> > to a bio
> > @@ -1266,6 +1267,31 @@ static int __bio_iov_iter_get_pages(struct bio
> > *bio, struct iov_iter *iter)
> > size -= trim;
> > }
> >
> > +/*
> > + * Try to make bio aligned with 128KB if it isn't the last one, so
> > + * we can avoid small bio in case of big chunk sequential IO because
> > + * of bio split and multipage bvec.
> > + *
> > + * If nothing is added to this bio, simply allow unaligned since we
> > + * have chance to add more bytes
> > + */
> > +if (iov_iter_count(iter) && bio->bi_iter.bi_size) {
> > +unsigned int aligned_size = (bio->bi_iter.bi_size + size) &
> > +~(BIO_CHUNK_SIZE - 1);
> > +
> > +if (aligned_size <= bio->bi_iter.bi_size) {
> > +/* stop to add page if this bio can't keep aligned */
> > +if (!(bio->bi_iter.bi_size & (BIO_CHUNK_SIZE - 1))) {
> > +ret = left = size;
> > +goto revert;
> > +}
> > +} else {
> > +aligned_size -= bio->bi_iter.bi_size;
> > +iov_iter_revert(iter, size - aligned_size);
> > +size = aligned_size;
> > +}
> > +}
> > +
> > if (unlikely(!size)) {
> > ret = -EFAULT;
> > goto out;
> > @@ -1285,7 +1311,7 @@ static int __bio_iov_iter_get_pages(struct bio
> > *bio, struct iov_iter *iter)
> >
> > offset = 0;
> > }
> > -
> > +revert:
> > iov_iter_revert(iter, left);
> > out:
> > while (i < nr_pages)
> > --
> > 2.41.0
> >
> >
> >
> > Thanks,
> > Ming
> >
>
> The latest patch you provided with 256 alignment does help alleviate
> the severity of fragmentation. However, the actual aligned size may
> vary depending on the device. Using a fixed and universal size of 128
> or 256KB only provides partial relief from fragmentation.
>
> I performed a dd direct I/O read of 64MB with your patch, and although
> most of the bios were aligned, there were still cases of missalignment
> to the device limit (e.g., 512MB for my device), as shown below:

512MB is really big, and actually you have reached 3520MB in READ by
limiting max bio size to 1MB in your original patch.

Just be curious what is the data if you change to align with max sectors
against my last patch? which can try to maximize & align bio.

>
> dd [000] ..... 392.976830: block_bio_queue: 254,52 R 2997760 + 3584
> dd [000] ..... 392.979940: block_bio_queue: 254,52 R 3001344 + 3584
> dd [000] ..... 392.983235: block_bio_queue: 254,52 R 3004928 + 3584
> dd [000] ..... 392.986468: block_bio_queue: 254,52 R 3008512 + 3584

Yeah, I thought that 128KB should be fine for usual hardware, but
looks not good enough.

>
> Comparing the results of the Antutu Sequential test to the previous
> data, it is indeed an improvement, but still a slight difference with
> limiting the bio size to max_sectors:
>
> Sequential Read (average of 5 rounds):
> Original: 3033.7 MB/sec
> Limited to max_sectors: 3520.9 MB/sec
> Aligned 256KB: 3471.5 MB/sec
>
> Sequential Write (average of 5 rounds):
> Original: 2225.4 MB/sec
> Limited to max_sectors: 2800.3 MB/sec
> Aligned 256KB: 2618.1 MB/sec

Thanks for sharing the data.

>
> What if we limit the bio size only for those who have set the
> max_sectors?

I think it may be doable, but need more smart approach for avoiding
extra cost of iov_iter_revert(), and one way is to add bio_shrink()
(or bio_revert()) to run the alignment just once.

I will think further and write a new patch if it is doable.



Thanks,
Ming