Re: [PATCH] block: fix mis-synchronisation inblkdev_issue_zeroout()

From: Lukas Czerner
Date: Fri Mar 04 2011 - 10:04:33 EST


On Fri, 4 Mar 2011, Jeff Moyer wrote:

> Lukas Czerner <lczerner@xxxxxxxxxx> writes:
>
> > BZ29402
> > https://bugzilla.kernel.org/show_bug.cgi?id=29402
> >
> > We can hit serious mis-synchronization in bio completion path of
> > blkdev_issue_zeroout() leading to a panic.
> >
> > The problem is that when we are going to wait_for_completion() in
> > blkdev_issue_zeroout() we check if the bb.done equals issued (number of
> > submitted bios). If it does, we can skip the wait_for_completition()
> > and just out of the function since there is nothing to wait for.
> > However, there is a ordering problem because bio_batch_end_io() is
> > calling atomic_inc(&bb->done) before complete(), hence it might seem to
> > blkdev_issue_zeroout() that all bios has been completed and exit. At
> > this point when bio_batch_end_io() is going to call complete(bb->wait),
> > bb and wait does not longer exist since it was allocated on stack in
> > blkdev_issue_zeroout() ==> panic!
> >
> > (thread 1) (thread 2)
> > bio_batch_end_io() blkdev_issue_zeroout()
> > if(bb) { ...
> > if (bb->end_io) ...
> > bb->end_io(bio, err); ...
> > atomic_inc(&bb->done); ...
> > ... while (issued != atomic_read(&bb.done))
> > ... (let issued == bb.done)
> > ... (do the rest of the function)
> > ... return ret;
> > complete(bb->wait);
> > ^^^^^^^^
> > panic
>
> That's a pretty tight window. The complete is immediately following the
> increment. I'm surprised thread 2 has time to finish up and exit the
> function before the completion is done.

Yeah, maybe that is why it is so hard to hit. Eric was able to hit it
only on 48 core box running ffsb test with 192 threads. Also the panic
backtrace bug "bad spinlock magic" pointed me to this.

>
> > We can fix this easily by simplifying bio_batch and completion counting.
> > We can count completion locally in blkdev_issue_zeroout() without need of
> > locking or atomic operation because we are the only one handling issued
> > variable holding the number of submitted bios. So remove atomic_t done
> > from struct bio_batch.
>
> It seems to me like it might be better to just not complete anything
> until the count is zero. Why issue a wakeup for every bio?
> fs/direct-io does something similar, maybe take a look at the
> dio_bio_end* routines and see if that would fit well here. With your
> scheme, I worry about missing a completion, maybe because the first bio
> completes before you are done submitting bios. Is that possible?

I do not think it is possible. For every bio submitted there is
wait_for_completion called. When bio complete()s completion->done is
incremented (under the wait->lock). In wait_for_completion() we are
waiting for single submitted bio to complete (completion->done > 0),
then completion->done is decremented. It seems like simple
synchronization.

I am not sure what wakeup you have in mind, but thanks for the tip I'll
look in fs/direct-io.

Thanks!
-Lukas

>
> > Also remove bio_end_io_t *end_io since it is not used.
>
> Yeah, no idea why that was in there.
>
> Cheers,
> Jeff
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/