Re: replace() system call needed (was Re: EXT4-ish "fixes" in UBIFS)

From: Artem Bityutskiy
Date: Sun Mar 29 2009 - 08:43:24 EST


Pavel Machek wrote:
On Fri 2009-03-27 14:48:10, Artem Bityutskiy wrote:
UBIFS has exactly the same properties like ext4 - in case
of power cuts:

1. truncate/write/close leads to empty files
2. create/write/rename leads to empty files

UBIFS is used in hand-held and and power-cuts are very
often there, because users just remove battery often.

I realize the "reality is different" argument, and already
concluded that we need a similar changes as Theo has done
for ext4:
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commitdiff;h=bf1b69c0db7f9b9d8f02e94d40b19fca8336b991
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commitdiff;h=f32b730a69bd56c5c9d704d8b75f03e90e290971
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commitdiff;h=8411e347c3306ed36b8ca88611bf5fbf4d27d705

We have a problem that user-space people do not want to
use 'fsync()', even when they are pointed to their code
which is doing create/write/rename/close without fsync().

Well... they really don't want to spin the disk up for the
fsync(). I'm not sure if fsync() is really sensible operation to use
there.

I'm personally concerned about hand-held, and in case of UBIFS
fsync is not too expensive - we work on flash and on fsync() we
write back only the stuff belonging to inode in question, and
nothing else.

1. truncate/write/close leads to empty files

this is buggy.

In FS, or in application?

2. create/write/rename leads to empty files

..but this should not be. If we want to make that explicit, we should
provide "replace()" operation; where replace is rename that makes sure
that source file is completely on media before commiting the rename.

Well, OK, we can fsync() before rename, we just need clean rules
for this, so that all Linux FSes would follow them. Would be nice
to have final agreement on all this stuff.

It is somehow similar to fsync()/rename(), but does not force disk
spin up immediately -- it only inserts "barrier" between data blocks
and rename. (And yes, it should be implemented as fsync()+rename() for
filesystems like xfs. It can be implemented as plain rename for ext3
and ext4 after the fixes...)

Right. But I guess only few file-systems would really implement
this, because this is complex.

--
Best Regards,
Artem Bityutskiy (ÐÑÑÑÐ ÐÐÑÑÑÐÐÐ)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/