Re: msync() behaviour broken for MS_ASYNC, revert patch?

From: Linus Torvalds
Date: Fri Feb 10 2006 - 12:59:33 EST




On Sat, 11 Feb 2006, Nick Piggin wrote:
>
> It seems very obvious to me that it is a hint. If you wer expecting
> to call msync(MS_SYNC) at some point, then you could hope that hinting
> with msync(MS_ASYNC) at some point earlier might improve its efficiency.

And it will. MS_ASYNC tells the system about dirty pages. It _should_
actually initiate writeback if the system decides that it has lots of
dirty pages. Of course, if the system doesn't have a lot of dirty pages,
the kernel will decide that no writeback is necessary.

If you (as an application) know that you will wait for the IO later (which
is _not_ what MS_ASYNC talks about), why don't you just start it?

ie what's wrong with Andrew's patch which is what I also encourage?

I contend that "mmap + MS_ASYNC" should work as "write()". That's just
_sensible_.

Btw, you can equally well make the argument that "write()" is a hint that
we should start IO, so that if we do fdatasync() later, it will finish
more quickly. It's _true_. It just isn't the whole truth. It makes things
_slowe_ if you don't do fdatasync(), the same way you can do MS_ASYNC
without doing MS_SYNC afterwards.

Now, if your argument is more general, aka "we should do better at
writeback in general", I actually wouldn't disagree. We probably _should_
do better at write-back. The "sync every five seconds" causes pulses of
(efficient) IO, but it also allows for lots of dirty stuff to have
collected for no good reason, and causes bad IO latency for reads when it
happens.

So if you were to argue _in_general_ for smoother write-back, I wouldn't
actually object at all. I think it would potentially make much sense to
make both "write()" _and_ things like msync(MS_ASYNC) perhaps see if the
IO queue has been idle for a second, and if so, start trickling writes
out.

I bet that would be lovely. I hate how un-tarring a big tree tends to have
these big hickups, and "vmstat 1" shows that the disk isn't even writing
all the time until half-way through the "untar".

IOW, I think you could re-phrase your argument in a more generic way, and
I might well _agree_ with it. I just don't think it has anything to do
with MS_ASYNC _in_particular_.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/