Re: [PATCH v3 03/10] eventfs: adding eventfs dir add functions

From: Ajay Kaher
Date: Mon Jul 03 2023 - 14:51:29 EST




> On 03-Jul-2023, at 8:38 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> I just looked at that code and the commit, and I honestly believe that
> is a horrible hack, and very fragile. It's in the smb code, so it was
> unlikely reviewed by anyone outside that subsystem. I really do not
> want to prolificate that solution around the kernel. We need to come up
> with something else.
>
> I also think it's buggy (yes the cifs code is buggy!) because in the
> comment above the down_read_nested() it says:
>
> /*
> * nested locking. NOTE: rwsems are not allowed to recurse
> * (which occurs if the same task tries to acquire the same
> * lock instance multiple times), but multiple locks of the
> * same lock class might be taken, if the order of the locks
> * is always the same. This ordering rule can be expressed
> * to lockdep via the _nested() APIs, but enumerating the
> * subclasses that are used. (If the nesting relationship is
> * static then another method for expressing nested locking is
> * the explicit definition of lock class keys and the use of
> * lockdep_set_class() at lock initialization time.
> * See Documentation/locking/lockdep-design.rst for more details.)
> */
>
> So this is NOT a solution (and the cifs code should be fixed too!)
>
> Can you show me the exact backtrace where the reader lock gets taken
> again? We will have to come up with a way to not take the same lock
> twice.


[ 244.185505] eventfs_root_lookup+0x37/0x1f0 <--- require read lock
[ 244.185509] __lookup_slow+0x72/0x100
[ 244.185511] lookup_one_len+0x6a/0x70
[ 244.185513] eventfs_start_creating+0x58/0xd0
[ 244.185515] ? security_locked_down+0x2e/0x50
[ 244.185518] eventfs_create_file+0x57/0x150
[ 244.185521] dcache_dir_open_wrapper+0x1c6/0x260 <--- require read lock
[ 244.185524] ? __pfx_dcache_dir_open_wrapper+0x10/0x10
[ 244.185526] do_dentry_open+0x1ed/0x420
[ 244.185529] vfs_open+0x2d/0x40


>
> We can also look to see if we can implement this with RCU. What exactly
> is this rwsem protecting?
>

- struct eventfs_file holds the meta-data for file or dir.
https://github.com/intel-lab-lkp/linux/blob/dfe0dc15a73261ed83cdc728e43f4b3d4e315aae/include/linux/tracefs.h#L28
- eventfs_rwsem is supposed to protect the 'link-list which is made of struct eventfs_file
' and elements of struct eventfs_file.

I tried one more solution i.e by checking owner of lock:
static inline struct task_struct *rwsem_owner(struct rw_semaphore *sem)
{
return (struct task_struct *)
(atomic_long_read(&sem->owner) & ~RWSEM_OWNER_FLAGS_MASK);
}

But rwsem_owner() is static.

>
>>
>> Looking further for your input. I will add explanation in v4.
>>
>>
>>>> +}
>>>> +
>
> [..]
>
>>>> + *
>>>> + * This function creates the top of the trace event directory.
>>>> + */
>>>> +struct dentry *eventfs_create_events_dir(const char *name,
>>>> + struct dentry *parent,
>>>> + struct rw_semaphore *eventfs_rwsem)
>>>
>>> OK, I'm going to have to really look at this. Passing in a lock to the
>>> API is just broken. We need to find a way to solve this another way.
>>
>> eventfs_rwsem is a member of struct trace_array, I guess we should
>> pass pointer to trace_array.
>
> No, it should not be part of the trace_array. If we can't do this with
> RCU, then we need to add a descriptor that contains the dentry that is
> returned above, and have the lock held there. The caller of the
> eventfs_create_events_dir() should not care about locking. That's an
> implementation detail that should *not* be part of the API.
>
> That is, if you need a lock:
>
> struct eventfs_dentry {
> struct dentry *dentry;
> struct rwsem *rwsem;
> };
>
> And then get to that lock by using the container_of() macro. All
> created eventfs dentry's could have this structure, where the rwsem
> points to the top one. Again, that's only if we can't do this with RCU.

Ok. Let’s first fix locking issue.

-Ajay