Re: [PATCH v3 03/10] eventfs: adding eventfs dir add functions

From: Steven Rostedt
Date: Mon Jul 03 2023 - 11:09:10 EST


On Mon, 3 Jul 2023 10:13:22 +0000
Ajay Kaher <akaher@xxxxxxxxxx> wrote:

> >> +/**
> >> + * eventfs_down_write - acquire write lock function
> >> + * @eventfs_rwsem: a pointer to rw_semaphore
> >> + *
> >> + * helper function to perform write lock on eventfs_rwsem
> >> + */
> >> +static void eventfs_down_write(struct rw_semaphore *eventfs_rwsem)
> >> +{
> >> + while (!down_write_trylock(eventfs_rwsem))
> >> + msleep(10);
> >
> > What's this loop for? Something like that needs a very good explanation
> > in a comment. Loops like these are usually a sign of a workaround for a
> > bug in the design, or worse, simply hides an existing bug.
> >
>
> Yes correct, this logic is to solve deadlock:
>
> Thread 1 Thread 2
> down_read_nested() - read lock acquired
> down_write() - waiting for write lock to acquire
> down_read_nested() - deadlock
>
> Deadlock is because rwlock wouldn’t allow read lock to be acquired if write lock is waiting.
> down_write_trylock() wouldn’t add the write lock in waiting queue, hence helps to prevent
> deadlock scenario.
>
> I was stuck with this Deadlock, tried few methods and finally borrowed from cifs, as it’s
> upstreamed, tested and working in cifs, please refer:
> https://elixir.bootlin.com/linux/v6.3.1/source/fs/cifs/file.c#L438

I just looked at that code and the commit, and I honestly believe that
is a horrible hack, and very fragile. It's in the smb code, so it was
unlikely reviewed by anyone outside that subsystem. I really do not
want to prolificate that solution around the kernel. We need to come up
with something else.

I also think it's buggy (yes the cifs code is buggy!) because in the
comment above the down_read_nested() it says:

/*
* nested locking. NOTE: rwsems are not allowed to recurse
* (which occurs if the same task tries to acquire the same
* lock instance multiple times), but multiple locks of the
* same lock class might be taken, if the order of the locks
* is always the same. This ordering rule can be expressed
* to lockdep via the _nested() APIs, but enumerating the
* subclasses that are used. (If the nesting relationship is
* static then another method for expressing nested locking is
* the explicit definition of lock class keys and the use of
* lockdep_set_class() at lock initialization time.
* See Documentation/locking/lockdep-design.rst for more details.)
*/

So this is NOT a solution (and the cifs code should be fixed too!)

Can you show me the exact backtrace where the reader lock gets taken
again? We will have to come up with a way to not take the same lock
twice.

We can also look to see if we can implement this with RCU. What exactly
is this rwsem protecting?


>
> Looking further for your input. I will add explanation in v4.
>
>
> >> +}
> >> +

[..]

> >> + *
> >> + * This function creates the top of the trace event directory.
> >> + */
> >> +struct dentry *eventfs_create_events_dir(const char *name,
> >> + struct dentry *parent,
> >> + struct rw_semaphore *eventfs_rwsem)
> >
> > OK, I'm going to have to really look at this. Passing in a lock to the
> > API is just broken. We need to find a way to solve this another way.
>
> eventfs_rwsem is a member of struct trace_array, I guess we should
> pass pointer to trace_array.

No, it should not be part of the trace_array. If we can't do this with
RCU, then we need to add a descriptor that contains the dentry that is
returned above, and have the lock held there. The caller of the
eventfs_create_events_dir() should not care about locking. That's an
implementation detail that should *not* be part of the API.

That is, if you need a lock:

struct eventfs_dentry {
struct dentry *dentry;
struct rwsem *rwsem;
};

And then get to that lock by using the container_of() macro. All
created eventfs dentry's could have this structure, where the rwsem
points to the top one. Again, that's only if we can't do this with RCU.

-- Steve


>
>
> > I'm about to board a plane to JFK shortly, I'm hoping to play with this
> > while flying back.
> >
>
> I have replied for major concerns. All other minor I will take care in v4.
>
> Thanks a lot for giving time to eventfs patches.
>
> - Ajay
>