Re: [PATCH -next] ext4: fix bug_on in ext4_iomap_begin as race between bmap and write

From: Jan Kara
Date: Thu Jun 16 2022 - 06:54:49 EST


On Thu 16-06-22 12:01:00, Ritesh Harjani wrote:
> On 22/06/15 07:26PM, Jan Kara wrote:
> > On Wed 15-06-22 21:01:23, Ritesh Harjani wrote:
> > > > >
> > > > > To solved above issue hold inode lock in ext4_bamp.
> > > > ^^^ ext4_bmap()
> > > >
> > > > I checked the paths where bmap() kernel api can be called i.e. from jbd2/fc and
> > > > generic_swapfile_activate() (apart from ioctl())
> > > > For jbd2, it will be called with j_inode within bmap(), hence taking a inode lock
> > > > of the inode passed within ext4_bmap() (j_inode in this case) should be safe here.
> > > > Same goes with swapfile path as well.
> > > >
> > > > However I feel maybe we should hold inode_lock_shared() since there is no
> > > > block/extent map layout changes that can happen via ext4_bmap().
> > > > Hence read lock is what IMO should be used here.
> > >
> > > On second thoughts, shoudn't we use ext4_iomap_report_ops here? Can't
> > > recollect why we didn't use ext4_iomap_report_ops for iomap_bmap() in the
> > > first place. Should be good to verify it once.
> >
> > Hum, but I guess there's a deeper problem than ext4_bmap(). Generally we
> > have places doing block mapping (such as ext4_writepages(), readahead, or
> > page fault) where we don't hold i_rwsem and racing
> > ext4_create_inline_data() could confuse them? I guess we need to come up
>
> You are right, i_rwsem won't be able to protect against such races which you
> described. So, we actually use EXT4_I(inode)->xattr_sem for inline data
> serialization.

Yes, and that is a problem. Because all the places checking for
ext4_has_inline_data() would have to be protected by xattr_sem (unless they
are already protected by i_rwsem) to make sure we cannot race with inline
data creation which is just unworkable both for performance and I suspect
also lock ordering reasons.

> So for this issue, I think if we should move from ext4_iomap_ops to
> ext4_iomap_report_ops. ext4_iomap_begin_report does takes care of read locking
> xattr_sem to properly report if it's a inline_data and similarly iomap_bmap
> reports 0 (which it should) in case of iomap->type != IOMAP_MAPPED
> (since in this case ext4_iomap_begin_report() will give IOMAP_INLINE)

I don't think the switch to ext4_iomap_report_ops is really correct. We
use ext4_iomap_ops mostly for direct IO and if inline data gets there,
there's indeed a deeper problem. Specifically for ext4_bmap() using i_rwsem
may be acceptable solution but I'm generally against sprinkling locks here
and there without good general locking design how exactly are inline data
operations intended to be synchronized with stuff as writeback, readahead
etc. Because as I wrote above xattr_sem is not really a working answer...

Honza