Re: [PATCH v2] ext4: fix race condition between buffer write and page_mkwrite

From: Baokun Li
Date: Tue Jun 06 2023 - 08:19:37 EST


On 2023/6/5 23:08, Jan Kara wrote:
On Mon 05-06-23 15:55:35, Matthew Wilcox wrote:
On Mon, Jun 05, 2023 at 02:21:41PM +0200, Jan Kara wrote:
On Mon 05-06-23 11:16:55, Jan Kara wrote:
Yeah, I agree, that is also the conclusion I have arrived at when thinking
about this problem now. We should be able to just remove the conversion
from ext4_page_mkwrite() and rely on write(2) or truncate(2) doing it when
growing i_size.
OK, thinking more about this and searching through the history, I've
realized why the conversion is originally in ext4_page_mkwrite(). The
problem is described in commit 7b4cc9787fe35b ("ext4: evict inline data
when writing to memory map") but essentially it boils down to the fact that
ext4 writeback code does not expect dirty page for a file with inline data
because ext4_write_inline_data_end() should have copied the data into the
inode and cleared the folio's dirty flag.

Indeed messing with xattrs from the writeback path to copy page contents
into inline data xattr would be ... interesting. Hum, out of good ideas for
now :-|.
Is it so bad? Now that we don't have writepage in ext4, only
writepages, it seems like we have a considerably more benign locking
environment to work in.
Well, yes, without ->writepage() it might be *possible*. But still rather
ugly. The problem is that in ->writepages() i_size is not stable. Thus also
whether the inode data is inline or not is not stable. We'd need inode_lock
for that but that is not easily doable in the writeback path - inode lock
would then become fs_reclaim unsafe...

Honza

If we try to add inode_lock to ext4_writepages to complete the
conversion, there may be a deadlock as follows:

      CPU0             CPU1
writeback_single_inode
 spin_lock(&inode->i_lock) ---> LOCK B
                   enable_verity
                    inode_lock(inode)  ---> LOCK A
                    vops->begin_enable_verity
                    ext4_begin_enable_verity
                     ext4_inode_attach_jinode
                      spin_lock(&inode->i_lock)   ---> try LOCK B
 __writeback_single_inode          |
  do_writepages                ABBA deadlock
   ext4_writepages                 |
    inode_lock(inode)  ---> try LOCK A

If we add inode_lock to the write back process to complete the inline conversion,
it seems that we still have to add an ops ...

I've been going over this problem for a long time, but I can't think of a good way
to solve it.

--
With Best Regards,
Baokun Li
.