Re: [PATCH v2] ext4: fix race condition between buffer write and page_mkwrite

From: Jan Kara
Date: Mon Jun 05 2023 - 05:17:25 EST


On Mon 05-06-23 02:58:15, Matthew Wilcox wrote:
> On Sun, Jun 04, 2023 at 05:08:21PM -0400, Theodore Ts'o wrote:
> > On Sat, Jun 03, 2023 at 11:04:45PM -0400, Theodore Ts'o wrote:
> > > I tried testing to see if this fixed [1], and it appears to be
> > > triggering a lockdep warning[2] at this line in the patch:
> > >
> > > [1] https://syzkaller.appspot.com/bug?extid=f4582777a19ec422b517
> > > [2] https://syzkaller.appspot.com/x/report.txt?x=17260843280000
> >
> > Looking at this more closely, the fundamental problem is by the time
> > ext4_file_mmap() is called, the mm layer has already taken
> > current->mm->mmap_lock, and when we try to take the inode_lock, this
> > causes locking ordering problems with how buffered write path works,
> > which take the inode_lock first, and then in some cases, may end up
> > taking the mmap_lock if there is a page fault for the buffer used for
> > the buffered write.
> >
> > If we're going to stick with the approach in this patch, I think what
> > we would need is to add a pre_mmap() function to file_operations
> > struct, which would get called by the mmap path *before* taking
> > current->mm->mmap_lock, so we can do the inline conversion before we
> > take the mmap_lock.
> >
> > I'm not sure how the mm folks would react to such a proposal, though.
> > I could be seen as a bit hacky, and it's not clear that any file
> > system other than ext4 would need something like this. Willy, as
> > someone who does a lot of work in both mm and fs worlds --- I'm
> > curious what you think about this idea?
>
> I'm probably missing something here, but why do we need to convert inline
> data in page_mkwrite? mmap() can't change i_size (stores past i_size are
> discarded), so we should be able to simply copy the data from the page
> cache into the inode and write the inode when it comes to writepages()
> time.
>
> Unless somebody does a truncate() or write() that expands i_size, but we
> should be able to do the conversion then without the mmap_lock held. No?
> I'm not too familiar with inline data.

Yeah, I agree, that is also the conclusion I have arrived at when thinking
about this problem now. We should be able to just remove the conversion
from ext4_page_mkwrite() and rely on write(2) or truncate(2) doing it when
growing i_size.

Honza

--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR