Re: [PATCH RFC 7/7] libfs: Re-arrange locking in offset_iterate_dir()

From: Liam R. Howlett
Date: Thu Feb 15 2024 - 12:00:54 EST


* Jan Kara <jack@xxxxxxx> [240215 08:16]:
> On Tue 13-02-24 16:38:08, Chuck Lever wrote:
> > From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >
> > Liam says that, unlike with xarray, once the RCU read lock is
> > released ma_state is not safe to re-use for the next mas_find() call.
> > But the RCU read lock has to be released on each loop iteration so
> > that dput() can be called safely.
> >
> > Thus we are forced to walk the offset tree with fresh state for each
> > directory entry. mt_find() can do this for us, though it might be a
> > little less efficient than maintaining ma_state locally.
> >
> > Since offset_iterate_dir() doesn't build ma_state locally any more,
> > there's no longer a strong need for offset_find_next(). Clean up by
> > rolling these two helpers together.
> >
> > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>
> Well, in general I think even xas_next_entry() is not safe to use how
> offset_find_next() was using it. Once you drop rcu_read_lock(),
> xas->xa_node could go stale. But since you're holding inode->i_rwsem when
> using offset_find_next() you should be protected from concurrent
> modifications of the mapping (whatever the underlying data structure is) -
> that's what makes xas_next_entry() safe AFAIU. Isn't that enough for the
> maple tree? Am I missing something?

If you are stopping, you should be pausing the iteration. Although this
works today, it's not how it should be used because if we make changes
(ie: compaction requires movement of data), then you may end up with a
UAF issue. We'd have no way of knowing you are depending on the tree
structure to remain consistent.

IOW the inode->i_rwsem is protecting writes of data but not the
structure holding the data.

This is true for both xarray and maple tree.

Thanks,
Liam