Re: [PATCH] Fix mapping->writeback_index to point to the lastwritten page

From: Wu Fengguang
Date: Thu Mar 03 2011 - 08:31:28 EST


On Thu, Mar 03, 2011 at 10:26:19AM +0800, Jun'ichi Nomura wrote:
> Hi Jan,
>
> thank you for the comments.
>
> On 03/03/11 07:18, Jan Kara wrote:
> > On Fri 25-02-11 16:55:19, Jun'ichi Nomura wrote:
> >> I think it's intended for sequential writer.
> > Not exactly. The code is meant so that background writeback gets to
> > writing the end of a file which gets continuously dirtied (if we always
> > started at the beginning, nr_to_write could always get to 0 before we reach
> > end of the file).
>
> Ah, ok. Thanks.
> My patch doesn't break the above goal.

Yeah.

> >> Otherwise, the last written page was left dirty until the writeback
> >> wraps around.
> >>
> >> I.e. if the sequential dirtier has written on pagecache as '*'s below:
> >>
> >> |*******|*******|****---|-------|-------| ( |---| is a page )
> >>
> >> then, writeback happens:
> >>
> >> |-------|-------|-------|-------|-------|
> >>
> >> and the dirtier continues:
> >>
> >> |-------|-------|----***|*******|*****--|
> >> A B
> >>
> >> Next writeback should start from page A, not B.
> > Yes, this is downside of our current scheme. Have you actually observed
> > it in practice or is it mostly a theoretic concern?
>
> Half practical, half theoretic.
> It has been observed with a real application, which uses a big ring file.
>
> I took a trace with a test program for example:
>
> [1st writeback session]
> ...
> flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898514 + 8
> flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898522 + 8
> flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898530 + 8
> flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898538 + 8
> flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898546 + 8
> kworker/0:1-11 4571: block_rq_issue: 8,0 W 0 () 94898514 + 40
> >> flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898554 + 8
> >> flush-8:0-2743 4571: block_rq_issue: 8,0 W 0 () 94898554 + 8
>
> [2nd writeback session after 35sec]
> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898562 + 8
> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898570 + 8
> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898578 + 8
> ...
> kworker/0:1-11 4606: block_rq_issue: 8,0 W 0 () 94898562 + 640
> kworker/0:1-11 4606: block_rq_issue: 8,0 W 0 () 94899202 + 72
> ...

> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899962 + 8
> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899970 + 8
> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899978 + 8
> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899986 + 8
> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899994 + 8
==> > kworker/0:1-11 4606: block_rq_issue: 8,0 W 0 () 94899962 + 40
> >> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898554 + 8
==> > >> flush-8:0-2743 4606: block_rq_issue: 8,0 W 0 () 94898554 + 8

I'd expect the wrapped around 94898554+8 to be merged with 94899962+8.
Why kworker/0:1-11 is submitting the request early? And the second
request is submitted by flush-8:0-2743.

> The 1st writeback ended at block 94898562. (94898554+8)
> The 2nd writeback started there.
> However, since the last page at the 1st writeback was just redirtied,
> the 2nd writeback looped back to block 94898554 after sequentially
> submitting blocks from 94898562 to 94900001.
>
> 1 extra seek which could be avoided.
> I haven't seen fatal problem with the latest kernel, though.
>
> With older kernels (before 2.6.29, without commit 31a12666),
> kupdate leaves the dirty pages like spots until the application wraps
> around the ring. (It could take hours to days.)
> That led me to this code.
>
> > But as I'm thinking about it, it wouldn't harm our original aim to do
> > what you propose and it can help this relatively common case. So I think
> > it's a good idea. Fengguang, what do you think?

I see no problem too.

Tested-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>

I compared the writeback_single_inode trace (https://lkml.org/lkml/2011/3/3/73)
and find no differences other than the 1-offset in the index field.

writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=292 wrote=8192 to_write=-1 index=8191
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=2820 wrote=8192 to_write=0 index=16383
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=8644 wrote=8192 to_write=0 index=24575
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=15664 wrote=8192 to_write=0 index=32767
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=21660 wrote=8192 to_write=0 index=40959
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=27592 wrote=8192 to_write=0 index=49151
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=34700 wrote=8192 to_write=0 index=57343
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=40936 wrote=8192 to_write=0 index=65535
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=47556 wrote=8192 to_write=0 index=73727
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=53660 wrote=8192 to_write=0 index=81919
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=60220 wrote=8192 to_write=0 index=90111
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=67156 wrote=8192 to_write=0 index=98303
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=73796 wrote=8192 to_write=0 index=106495
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=79948 wrote=8192 to_write=0 index=114687
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=86176 wrote=8192 to_write=0 index=122879

writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=480 wrote=8192 to_write=-1 index=8192
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=536 wrote=8192 to_write=0 index=16384
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=828 wrote=8192 to_write=0 index=24576
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=1540 wrote=8192 to_write=0 index=32768
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=2084 wrote=8192 to_write=0 index=40960
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=2620 wrote=8192 to_write=0 index=49152
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=3156 wrote=8192 to_write=0 index=57344
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=4008 wrote=8192 to_write=0 index=65536
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=4964 wrote=8192 to_write=0 index=73728
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=5560 wrote=8192 to_write=0 index=81920
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=6132 wrote=8192 to_write=0 index=90112
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=7436 wrote=8192 to_write=-180 index=98304
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=8200 wrote=8192 to_write=0 index=106496
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=8652 wrote=8192 to_write=0 index=114688
writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES age=9424 wrote=8192 to_write=0 index=122880

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/