Re: ftruncate-mmap: pages are lost after writing to mmaped file.

From: Aneesh Kumar K.V
Date: Thu Mar 26 2009 - 04:48:00 EST


On Tue, Mar 24, 2009 at 04:29:59PM +0100, Jan Kara wrote:
> On Tue 24-03-09 15:56:03, Peter Zijlstra wrote:
> > On Tue, 2009-03-24 at 15:47 +0100, Jan Kara wrote:
> > >
> > > Or we could implement ext3_mkwrite() to allocate buffers already when we
> > > make page writeable. But it costs some performace (we have to write page
> > > full of zeros when allocating those buffers, where previously we didn't
> > > have to do anything) and it's not trivial to make it work if pagesize >
> > > blocksize (we should not allocate buffers outside of i_size so if i_size
> > > = 1024, we create just one block in ext3_mkwrite() but then we need to
> > > allocate more when we extend the file).
> >
> > I think this is the best option, failing with SIGBUS when we fail to
> > allocate blocks seems consistent with other filesystems as well.
> I agree this looks attractive at the first sight. But there are drawbacks
> as I wrote - the problem with blocksize < pagesize, slight performance
> decrease due to additional write,

It should not cause an additional write. Can you let me why it would
result in additional write ?


>page faults doing allocation can take a
> *long* time

That is true

>and overall fragmentation is going to be higher (previously
> writepage wrote pages for us in the right order, now we are going to
> allocate in the first-accessed order). So I'm not sure we really want to
> go this way.


block allocator should be improved to fix that. For example ext4
mballoc also look at the logical file block number when doing block
allocation. So if we does enough reservation it should handle the
the first-accessed order and sequential order allocation properly.

Another reason why I think we would need ext3_page_mkwrite is, if we
really are out of space how do we handle it ? Currently the patch you
posted does redirty_page_for_writepage, which would imply we can't
reclaim the page and since get_block get ENOSPC we can't allocate
blocks.

> Hmm, maybe we could play a trick ala delayed allocation - i.e., reserve
> some space in mkwrite() but don't actually allocate it. That would be done
> in writepage(). This would solve all the problems I describe above. We could
> use PG_Checked flag to track that the page has a reservation and behave
> accordingly in writepage() / invalidatepage(). ext3 in data=journal mode
> already uses the flag but the use seems to be compatible with what I want
> to do now... So it may actually work.
> BTW: Note that there's a plenty of filesystems that don't implement
> mkwrite() (e.g. ext2, UDF, VFAT...) and thus have the same problem with
> ENOSPC. So I'd not speak too much about consistency ;).
>

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/