Re: Bug in kernel 2.6.31, Slow wb_kupdate writeout

From: Wu Fengguang
Date: Wed Jul 29 2009 - 23:19:39 EST


On Thu, Jul 30, 2009 at 10:57:35AM +0800, Martin Bligh wrote:
> > On closer looks I found this line:
> >
> > Â Â Â Â Â Â Â Âif (inode_dirtied_after(inode, start))
> > Â Â Â Â Â Â Â Â Â Â Â Âbreak;
>
> Ah, OK.
>
> > In this case "list_empty(&sb->s_io)" is not a good criteria:
> > here we are breaking away for some other reasons, and shall
> > not touch wbc.more_io.
> >
> > So let's stick with the current code?
>
> Well, I see two problems. One is that we set more_io based on
> whether s_more_io is empty or not before we finish the loop.
> I can't see how this can be correct, especially as there can be
> other concurrent writers. So somehow we need to check when
> we exit the loop, not during it.

It is correct inside the loop, however with some overheads.

We put it inside the loop because sometimes the whole filesystem is
skipped and we shall not set more_io on them whether or not s_more_io
is empty.

> The other is that we're saying we are setting more_io when
> nr_to_write is <=0 ... but we only really check it when
> nr_to_write is > 0 ... I can't see how this can be useful?

That's the caller's fault - I guess the logic was changed a bit by
Jens in linux-next. I noticed this just now. It shall be fixed.

> I'll admit there is one corner case when page_skipped it set
> from one of the branches, but I am really not sure what the
> intended logic is here, given the above?
>
> In the case where we hit the inode_dirtied_after break
> condition, is it bad to set more_io ? There is more to do
> on that inode after all. Is there a definition somewhere for
> exactly what the more_io flag means?

"More dirty pages to be put to io"?

The exact semantics of more_io is determined by the caller,
which used to be (in 2.6.31):

background_writeout():

if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
/* Wrote less than expected */
if (wbc.encountered_congestion || wbc.more_io)
congestion_wait(BLK_RW_ASYNC, HZ/10);
else
break;
}

wb_kupdate() is same except that it does not check pages_skipped.

Note that in 2.6.31, more_io is not used at all for sync().

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/