Re: [RFC][WIP] DIO simplification and AIO-DIO stability

From: Zach Brown
Date: Thu Feb 23 2006 - 20:19:00 EST


Suparna Bhattacharya wrote:

> A recent AIO-DIO bug reported by Kenneth Chen, came very close
> to being the proverbial last straw for me.

Me too, though I found out about it from a different path. Our QA guys
were pulling drives under load and it got stuck. Trying to fix that bug
(io error setting dio->result to -EIO stops finished_one_bio() from
calling aio_complete()) without introducing other regressions involved
an incredible amount of squinting and head scratching. In wandering
around I found what seem to be other additional bugs:

- errors that hit after dio->result is sampled in the buffered fallback
case are lost. dio->result should be checked again after waiting.

- a few paths try to do arithmetic with dio->result assuming it's the
number of bytes transferred when it could be -EIO.

- the AIO path seems to forget to check dio->page_errors, but I didn't
look very hard to see what that means.

- the AIO bio completion paths don't populate dio->bio_list so reaping
doesn't happen in the AIO issuing case.. maybe that's intentional?

> It would be quite pointless (and painful!), if the rewrite ends up becoming
> just as tricky and error prone as before. Such a patch will need a very
> close critical review by many sharp eyes, to avoid disrupting the current
> state of stability.

So, I'm all for wringing the current bugs and confusion out of the
current code. But the words "a patch" and "rewrite" terrify me. It
seems much more prudent to make progress with incremental patches that
can be tested and reviewed. Especially if that is tied to writing tests
as changes are made.

Let me think harder about the specific proposals..

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/