Re: [PATCH next] bio: zero inlined bio_vec

From: Jens Axboe
Date: Tue Dec 23 2008 - 05:32:30 EST


On Tue, Dec 23 2008, Jens Axboe wrote:
> On Tue, Dec 23 2008, Hugh Dickins wrote:
> > On Tue, 23 Dec 2008, Jens Axboe wrote:
> > > On Tue, Dec 23 2008, Hugh Dickins wrote:
> > > > bvec_alloc_bs() zeroes its bio_vec, and at least __blk_queue_bounce()
> > > > relies upon that: therefore bio_alloc_bioset() must zero the inlined
> > > > bio_vec - without that, lots of nastiness occurs in bounce_end_io and
> > > > blk_rq_map_sg and other places when booting up my PAE box.
> > >
> > > Hmm, where does it die? Nobody should look at the bio_vec index beyond
> > > bio->bi_vcnt, and 0...bio->bi_vcnt-1 should always be initialized due to
> > > the way we fill the entries.
> >
> > It dies in a great variety of places, different mmotms and different
> > nexts and different configs on that box preferring to collapse in
> > different places all over the blk/bounce/scsi/map_sg stack.
> >
> > bounce_end_io() and blk_rq_map_sg() and nommu_map_sg() stick in my
> > mind as among the locations seen, though perhaps the actual oopses
> > and BUGs were sometimes a level below.
> >
> > __blk_queue_bounce() does one transit of the bio skipping any
> > segments which don't need bouncing, then a second transit of the
> > newly allocated bounce bio assuming
> > if (!to->bv_page) {
> > means that it needs to copy from orig_bio: so if the new bio was
> > not zeroed there, it'll skip that segment and leave junk?
>
> It's because the code in question does this:
>
> __bio_for_each_segment(from, *bio_orig, i, 0) {
> to = bio_iovec_idx(bio, i);
> if (!to->bv_page) {
>
> That is, it iterates *bio_orig but indexes bio since it knows it has the
> same number of segments. So this code is the odd one out, I'd be
> surprised if we had more such cases. And since it would be nice to get
> rid of the need to memset in general, can you try with the below patch?
>
> The reason why it also oopses in other places is probably because this
> very same code can leave junk in in the bio_vec if it happens to find a
> non-NULL page there that isn't valid.
>
> diff --git a/mm/bounce.c b/mm/bounce.c
> index 06722c4..f4907d3 100644
> --- a/mm/bounce.c
> +++ b/mm/bounce.c
> @@ -181,12 +181,22 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
> struct page *page;
> struct bio *bio = NULL;
> int i, rw = bio_data_dir(*bio_orig);
> - struct bio_vec *to, *from;
> + struct bio_vec *uninitialized_var(to), *from;
>
> bio_for_each_segment(from, *bio_orig, i) {
> page = from->bv_page;
>
> /*
> + * Make sure we initialize the page element even if we
> + * don't need to bounce it, since we'll be checking for
> + * a valid page further down.
> + */
> + if (bio) {
> + to = bio->bi_io_vec + i;
> + to->bv_page = NULL;
> + }
> +
> + /*
> * is destination page below bounce pfn?
> */
> if (page_to_pfn(page) <= q->bounce_pfn)
> @@ -195,10 +205,10 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
> /*
> * irk, bounce it
> */
> - if (!bio)
> + if (!bio) {
> bio = bio_alloc(GFP_NOIO, (*bio_orig)->bi_vcnt);
> -
> - to = bio->bi_io_vec + i;
> + to = bio->bi_io_vec + i;
> + }
>
> to->bv_page = mempool_alloc(pool, q->bounce_gfp);
> to->bv_len = from->bv_len;

Nope, that still wont be enough, since we leave entries 0..i-1
uninitialized. Lets just do this instead.

diff --git a/mm/bounce.c b/mm/bounce.c
index 06722c4..877be42 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -195,11 +195,14 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
/*
* irk, bounce it
*/
- if (!bio)
- bio = bio_alloc(GFP_NOIO, (*bio_orig)->bi_vcnt);
+ if (!bio) {
+ unsigned int vcnt = (*bio_orig)->bi_vcnt;

- to = bio->bi_io_vec + i;
+ bio = bio_alloc(GFP_NOIO, vcnt);
+ memset(bio->bi_io_vec, 0,vcnt * sizeof(struct bio_vec));
+ }

+ to = bio->bi_io_vec + i;
to->bv_page = mempool_alloc(pool, q->bounce_gfp);
to->bv_len = from->bv_len;
to->bv_offset = from->bv_offset;

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/