Re: [PATCH v3 0/3] fsnotify: fix softlockups iterating over d_subdirs

From: Stephen Brennan
Date: Fri Nov 04 2022 - 19:33:35 EST


Jan Kara <jack@xxxxxxx> writes:
> On Tue 01-11-22 13:48:54, Stephen Brennan wrote:
>> Jan Kara <jack@xxxxxxx> writes:
>> > Hi Stephen!
>> >
>> > On Thu 27-10-22 17:10:13, Stephen Brennan wrote:
>> >> Here is v3 of the patch series. I've taken all of the feedback,
>> >> thanks Amir, Christian, Hilf, et al. Differences are noted in each
>> >> patch.
>> >>
>> >> I caught an obvious and silly dentry reference leak: d_find_any_alias()
>> >> returns a reference, which I never called dput() on. With that change, I
>> >> no longer see the rpc_pipefs issue, but I do think I need more testing
>> >> and thinking through the third patch. Al, I'd love your feedback on that
>> >> one especially.
>> >>
>> >> Thanks,
>> >> Stephen
>> >>
>> >> Stephen Brennan (3):
>> >> fsnotify: Use d_find_any_alias to get dentry associated with inode
>> >> fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem
>> >> fsnotify: allow sleepable child flag update
>> >
>> > Thanks for the patches Stephen and I'm sorry for replying somewhat late.
>>
>> Absolutely no worries, these things take time. Thanks for taking a look!
>>
>> > The first patch is a nobrainer. The other two patches ... complicate things
>> > somewhat more complicated than I'd like. I guess I can live with them if we
>> > don't find a better solution but I'd like to discuss a bit more about
>> > alternatives.
>>
>> Understood!
>>
>> > So what would happen if we just clear DCACHE_FSNOTIFY_PARENT_WATCHED in
>> > __fsnotify_parent() for the dentry which triggered the event and does not
>> > have watched parent anymore and never bother with full children walk? I
>> > suppose your contention problems will be gone, we'll just pay the price of
>> > dget_parent() + fsnotify_inode_watches_children() for each child that
>> > falsely triggers instead of for only one. Maybe that's not too bad? After
>> > all any event upto this moment triggered this overhead as well...
>>
>> This is an interesting idea. It came across my mind but I don't think I
>> considered it seriously because I assumed that it was too big a change.
>> But I suppose in the process I created an even bigger change :P
>>
>> The false positive dget_parent() + fsnotify_inode_watches_children()
>> shouldn't be too bad. I could see a situation where there's a lot of
>> random accesses within a directory, where the dget_parent() could cause
>> some contention over the parent dentry. But to be fair, the performance
>> would have been the same or worse while fsnotify was active in that
>> case, and the contention would go away as most of the dentries get their
>> flags cleared. So I don't think this is a problem.
>>
>> > Am I missing something?
>>
>> I think there's one thing missed here. I understand you'd like to get
>> rid of the extra flag in the connector. But the advantage of the flag is
>> avoiding duplicate work by saving a bit of state. Suppose that a mark is
>> added to a connector, which causes fsnotify_inode_watches_children() to
>> become true. Then, any subsequent call to fsnotify_recalc_mask() must
>> call __fsnotify_update_child_dentry_flags(), even though the child
>> dentry flags don't need to be updated: they're already set. For (very)
>> large directories, this can take a few seconds, which means that we're
>> doing a few extra seconds of work each time a new mark is added to or
>> removed from a connector in that case. I can't imagine that's a super
>> common workload though, and I don't know if my customers do that (my
>> guess would be no).
>
> I understand. This basically matters for fsnotify_recalc_mask(). As a side
> note I've realized that your changes to fsnotify_recalc_mask() acquiring
> inode->i_rwsem for updating dentry flags in patch 2/3 are problematic for
> dnotify because that calls fsnotify_recalc_mask() under a spinlock.
> Furthermore it is somewhat worrying also for inotify & fanotify because it
> nests inode->i_rwsem inside fsnotify_group->lock however I'm not 100% sure
> something doesn't force the ordering the other way around (e.g. the removal
> of oneshot mark during modify event generation). Did you run tests with
> lockdep enabled?

No I didn't. I'll be sure to get the LTP tests running with lockdep
early next week and try this series out, we'll probably get an error
like you say.

I'll also take a look at the dnotify use case and see if there's
anything to do there. Hopefully there's something we can do to salvage
it :D

Thanks,
Stephen

> Anyway, if the lock ordering issues can be solved, I suppose we can
> optimize fsnotify_recalc_mask() like:
>
> inode_lock(inode);
> spin_lock(&conn->lock);
> oldmask = inode->i_fsnotify_mask;
> __fsnotify_recalc_mask(conn);
> newmask = inode->i_fsnotify_mask;
> spin_unlock(&conn->lock);
> if (watching children changed(oldmask, newmask))
> __fsnotify_update_child_dentry_flags(...)
> inode_unlock(inode);
>
> And because everything is serialized by inode_lock, we don't have to worry
> about inode->i_fsnotify_mask and dentry flags getting out of sync or some
> mark addition returning before all children are marked for reporting
> events. No need for the connector flag AFAICT.
>
> But the locking issue needs to be resolved first in any case. I need to
> think some more...
>
> Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR