Re: msync() behaviour broken for MS_ASYNC, revert patch?

From: Linus Torvalds
Date: Fri Feb 10 2006 - 14:42:51 EST




On Sat, 11 Feb 2006, Nick Piggin wrote:
> >
> > Your pattern would actually be
> >
> > .. dirty offset 100-200 ..
> > fadvice(fd, 100, 200, FADV_WRITE_START);
> >
> > .. dirty offset 200-300 ..
> > fadvice(fd, 200, 300, FADV_WRITE_START);
> >
> > .. dirty offset 300-400 ..
> > fadvice(fd, 300, 400, FADV_WRITE_START);
> >
> > .. dirty offset 400-415 .. (for the next transaction)
> >
>
> - IOW if the app or OS crashed here it would be possible to see 400-415 on
> the disk and none of the previous transactions (assuming we don't know
> the page size).

If the app/OS crashed here, nothing would matter. We haven't committed
anything at all yet. We've just started the IO. What is at 400-415 simply
doesn't matter, because nobody would have any reason to look at it.

(Besides, it's not at all clear that 400-415 would or would not be on
disk. It depends on entirely on timing and buffering of the IO system at
that point - the fact that its dirty in memory doesn't mean that it ever
made it into the IO buffer that was started).

> > fadvice(fd, 100, 400, FADV_JUST_WAIT); (for the previous one)

This is the one that waits for it to finish, so _now_ we can update the
pointers (elsewhere) to that log (and if the app/OS crashes before that,
nobody will even know about it).

See?

> I'm not convinced. You above example was bogus.

No, your understanding was incomplete. I'm talking about just parts of a
much bigger transaction.

A single write on its own is almost never a transaction unless your system
is _purely_ log-based (which it could be, of course. Not in my example).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/