Re: I/O and pdflush

From: Wu Fengguang
Date: Sun Jul 12 2009 - 04:04:44 EST


On Sat, Jul 11, 2009 at 02:27:25PM -0300, Fernando Silveira wrote:
> Hi.
>
> I'm having a hard time with an application that writes sequentially
> 250GB of non-stop data directly to a solid state disk (OCZ SSD CORE
> v2) device and I hope you can help me. The command "dd if=/dev/zero
> of=/dev/sdc bs=4M" reproduces the same symptoms I'm having and writes
> exactly as that application does.
>
> The problem is that after some time of data writing at 70MB/s, it
> eventually falls down to about 25MB/s and does not get up again until
> a loooong time has passed (from 1 to 30 minutes). This happens much
> more often when "vm.dirty_*" settings are default (30 secs to expire,
> 5 secs for writeback, 10% and 40% for background and normal ratio),
> and when I set them to 1 second or even 0, the problem happens much
> less often and the sticking period of 25MB/s is much lower.
>
> In one of my experiences, I could see that writing some blocks of of
> data (aprox. 48 blocks of 4MB each time) at a random position of the
> "disk" increases the chances of decreasing the writing rate to 25MB/s.
> You can see at this graph[1] that after the 7th random big write (at
> 66 GB) it falls down to 25MB/s. The writes happened at the following
> positions (in GB): 10, 20, 30, 39, 48, 57, 66, 73, 80, 90, 100, 109,
> 118, 128, 137, 147, and 156 GB.
>
> As I don't know much about kernel internals, IMHO it might be the SSD
> might be "hiccuping" and some kind of kernel I/O scheduler or pdflush
> decreases its rate to avoid write errors, I don't know.
>
> Could somebody tell me how could I debug the kernel and any of its
> modules to understand exactly why the writing is behaving this way?
> Maybe I could do it just by logging write errors or something, I don't
> know. Telling me which part I should start analyzing would be a huge
> hint, seriously.
>
> Thanks.
>
> 1. http://rootshell.be/~swrh/ssd-tests/ssd-no_dirty_buffer_with_random_192mb_writes.png
>
> PS: This is used with two A/D converters which provide 25MB/s of data
> each, leading my writing software to need at least 50MB/s of
> sequential writing rate.

Hi Fernando,

What's your kernel version? Can the following patch help?

Thanks,
Fengguang

---
fs/fs-writeback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- mm.orig/fs/fs-writeback.c
+++ mm/fs/fs-writeback.c
@@ -325,7 +325,8 @@ __sync_single_inode(struct inode *inode,
* soon as the queue becomes uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
- if (wbc->nr_to_write <= 0) {
+ if (wbc->nr_to_write <= 0 ||
+ wbc->encountered_congestion) {
/*
* slice used up: queue for next turn
*/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/