RE: [PATCH 2/2] aio: propogate post-EIOCBQUEUED errors to completion event

From: Ananiev, Leonid I
Date: Wed Feb 21 2007 - 14:16:21 EST


> kjournald submited buffers for IO and waiting for them to finish.

It is means that the patch incorrectly moves internal kernel
synchronization
problem into user space as EIO instead of fixing a root cause or perform
iterative
synchronization.
After patching users will be surprised a lot of EIO
(remember 47% EIO in aio-stress runs after patching) will buy new
disk...
This patch is harmful.

Leonid
>-----Original Message-----
>From: Zach Brown [mailto:zach.brown@xxxxxxxxxx]
>Sent: Wednesday, February 21, 2007 9:25 PM
>To: Ken Chen
>Cc: Ananiev, Leonid I; Chris Mason; linux-aio; Linux Kernel Mailing
List; Benjamin LaHaise; Suparna
>bhattacharya; Andrew Morton; Badari Pulavarty
>Subject: Re: [PATCH 2/2] aio: propogate post-EIOCBQUEUED errors to
completion event
>
>
>On Feb 21, 2007, at 12:35 AM, Ken Chen wrote:
>
>> On 2/20/07, Ananiev, Leonid <leonid.i.ananiev@xxxxxxxxx> wrote:
>>> 1) mem=1G in kernel boot param if you have more
>>> 2) unmount; mk2fs; mount
>>> 3) dd if=/dev/zero of=<test_file> bs=1M count=1200
>>> 4) aiostress -s 1200m -O -o 2 -i 1 -r 16k <test_file>
>>> 5) if i++<50 goto 2).
>>
>> Would you please instrument the call chain of
>> invalidate_complete_page2() and tell us exactly where it returns zero
>> value in your failure case?
>>
>> invalidate_complete_page2
>> try_to_release_page
>> ext3_releasepage
>> journal_try_to_free_buffers
>> ???
>
>For what it's worth, Badari has explained this race in the past in a
>credible way. I'll take the liberty of pasting a mail from him:
>
>"
>kjournald submited buffers for IO and waiting for them to finish.
>Note that it has a ref. against the buffer.
>
>journal_commit_transaction()
> ...
> submited buffers for IO
> /* Waiting for IO to complete */
> while (commit_transaction->t_locked_list) {
> ...
> get_bh(bh);
> if (buffer_locked(bh)) {
> spin_unlock(&journal->j_list_lock);
> wait_on_buffer(bh); <<<<<<
> spin_lock(&journal->j_list_lock);
> }
>
> ..
> put_bh(bh);
> }
>
>Now, DIO process comes to frees the jh through
>journal_try_to_free_buffers()
>but fails to drop_buffers() since kjournald() has a reference against
>it.
>invalidate_inode_pages2_range()
> ..
> ext3_releasepage()
> journal_try_to_free_buffers()
> journal_put_journal_head()
> __journal_try_to_free_buffer()
> <--- freed jh
>
> try_to_free_buffers()
> drop_buffers()
> if (buffer_busy(bh))
> goto failed;
> <<--- returns EIO due to
>b_count
>
>"
>
>I don't mean to say that we shouldn't get traces to confirm the
>theory, just sharing. And now we can point to this in the archives
>next time :).
>
>- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/