Re: 2.6.9-rc2 bio sickness with large writes

From: Jens Axboe
Date: Fri Sep 17 2004 - 02:43:49 EST


On Thu, Sep 16 2004, Jeff V. Merkey wrote:
>
> >
> >>static int end_bio_asynch(struct bio *bio, unsigned int bytes_done, int
> >>err)
> >>{
> >> ASYNCH_IO *io = bio->bi_private;
> >>
> >>
> >>
> >
> > if (bio->bi_size)
> > return 1;
> >
> >
>
> I guess the question here is if a group of coalesed bios are submitted,
> shouldn't I get a callback
> on each one? This implies if they merge beneath me, I won't always see
> the callback for each one.
> Is this an acuurate assumption?

No, you are reading it wrong. You will get a callback for each bio that
you issued, regardless of merge status. You never need to care about
merge status. But you might get partial completions from drivers, so
multiple completions on the same bio. Your code doesn't look like it's
handling that - you either need to take bytes_done into account, or just
add the above two lines if you don't care about partial completions.
When bi_size reaches 0, all io is done on this bio.

> >> register struct page *page = virt_to_page(&Sector[i * blocksize]);
> >> register ULONG offset = ((ULONG)(&Sector[i * blocksize])) %
> >> PAGE_SIZE;
> >> register ULONG bytes;
> >>
> >> bytes = bio_add_page(io->bio, page,
> >> PAGE_SIZE - (offset % PAGE_SIZE), 0);
> >>
> >>
> >
> >offset instead of 0?
> >
> >
> >
> blocksize always = PAGE_SIZE in this code. PAGE_SIZE - (offset %
> PAGE_SIZE) determines the length of the
> transfer. You could just as well substitute PAGE_SIZE for blocksize. I
> always set the device (if it supports it) to a
> PAGE_SIZE for the blocksize. 1024 blocksize is a little small.

Ok, that's not visible from the source you sent. You should still change
that to offset in case this changes. Or if not, kill the PAGE_SIZE - xxx
stuff to avoid confusion.

> >Get rid of this if/else, it's not correct. 2.6 always had
> >bio_add_page(), and you _must_ use it.
> >
> >
> >
>
> Linus should then do the same in buffer.c since this example is
> misleading in the code. Linus seems to get away with it
> so why can't I? :-)

Linus didn't write that, I did.
>
> buffer.c line 2776
> /*
> * from here on down, it's all bio -- do the initial mapping,
> * submit_bio -> generic_make_request may further map this bio around
> */
> bio = bio_alloc(GFP_NOIO, 1);
>
> bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
> bio->bi_bdev = bh->b_bdev;
> bio->bi_io_vec[0].bv_page = bh->b_page;
> bio->bi_io_vec[0].bv_len = bh->b_size;
> bio->bi_io_vec[0].bv_offset = bh_offset(bh);
>
> bio->bi_vcnt = 1;
> bio->bi_idx = 0;
> bio->bi_size = bh->b_size;
>
> bio->bi_end_io = end_bio_bh_io_sync;
> bio->bi_private = bh;
>
> submit_bio(rw, bio);

And submit_bh() is correct, since it only uses a single page. You are
adding multiple pages to the bio manually, this is highly illegal. If
you aren't using bio_add_page() to do this you are both on your own and
in a world of pain support wise.

> >> // unplug the queue and drain the bathtub
> >> bio_get(io->bio);
> >> submit_bio(WRITE | (1 << BIO_RW_SYNC), io->bio);
> >> bio_put(io->bio);
> >>
> >>
> >
> >You don't get to get/put the bio here, you aren't touching it after
> >submit_bio().
> >
> >
> >
>
> I removed this from my code. You need to correct the example in bio.h
> with something more meaningful. This is what
> the source code says to do.
>
> bio.h line 213
>
> /*
> * get a reference to a bio, so it won't disappear. the intended use is
> * something like: *
> * bio_get(bio);
> * submit_bio(rw, bio);
> * if (bio->bi_flags ...)
> * do_something
> * bio_put(bio);
> *
> * without the bio_get(), it could potentially complete I/O before submit_bio
> * returns. and then bio would be freed memory when if (bio->bi_flags ...)
> * runs
> */

No, you need to read it properly. See how it looks at the bio after
submitting it in this example? That's why it gets a reference to it.
There's even a comment to that effect. You don't need to hold a
reference across the submit_bio() call if you don't look at the bio
afterwards, why would you care? This is, btw, no different than the bh
semantics since 2.4...

> I still see the problem and I suspect it's related to the callback
> mechanism with the bio-bi_size check always returning 1. I guess
> the question here is are bios guaranteed to always return a 1:1 ratio of
> submission vs. callbacks?

Fix your code like I described first, I cannot help you otherwise. Your
question is answered further up (it's yes).

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/