Re: [RFC PATCH v2 2/2] regulator: core: Avoid lockdep reports when resolving supplies

From: Doug Anderson
Date: Thu Apr 13 2023 - 20:36:29 EST


Hi,

On Fri, Apr 7, 2023 at 2:46 PM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
>
> Quoting Douglas Anderson (2023-03-29 14:33:54)
> > diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> > index 9a13240f3084..08726bc0da9d 100644
> > --- a/drivers/regulator/core.c
> > +++ b/drivers/regulator/core.c
> > @@ -207,6 +207,78 @@ static void regulator_unlock(struct regulator_dev *rdev)
> > mutex_unlock(&regulator_nesting_mutex);
> > }
> >
> > +/**
> > + * regulator_lock_two - lock two regulators
> > + * @rdev1: first regulator
> > + * @rdev2: second regulator
> > + * @ww_ctx: w/w mutex acquire context
> > + *
> > + * Locks both rdevs using the regulator_ww_class.
> > + */
> > +static void regulator_lock_two(struct regulator_dev *rdev1,
> > + struct regulator_dev *rdev2,
> > + struct ww_acquire_ctx *ww_ctx)
> > +{
> > + struct regulator_dev *tmp;
> > + int ret;
> > +
> > + ww_acquire_init(ww_ctx, &regulator_ww_class);
> > +
> > + /* Try to just grab both of them */
> > + ret = regulator_lock_nested(rdev1, ww_ctx);
> > + WARN_ON(ret);
> > + ret = regulator_lock_nested(rdev2, ww_ctx);
> > + if (ret != -EDEADLOCK) {
> > + WARN_ON(ret);
> > + goto exit;
> > + }
>
> I think this would be clearer if we had two local variable pointers
>
> struct regulator_dev *held, *contended;
>
> held = rdev1;
> contended = rdev2;
>
> > +
> > + while (true) {
> > + /*
> > + * Start of loop: rdev1 was locked and rdev2 was contended.
> > + * Need to unlock rdev1, slowly lock rdev2, then try rdev1
> > + * again.
> > + */
> > + regulator_unlock(rdev1);
>
> regulator_unlock(held);
>
> > +
> > + ww_mutex_lock_slow(&rdev2->mutex, ww_ctx);
> > + rdev2->ref_cnt++;
> > + rdev2->mutex_owner = current;
> > + ret = regulator_lock_nested(rdev1, ww_ctx);
>
> ww_mutex_lock_slow(&contended->mutex, ww_ctx);
> contended->ref_cnt++;
> contended->mutex_owner = current;
> swap(held, contended);
> ret = regulator_lock_nested(contended, ww_ctx);
> if (ret != -EDEADLOCK) {

Sure, I can do the rename to make it clearer. OK, sent out as
("regulator: core: Make regulator_lock_two() logic easier to follow")
[1]

[1] https://lore.kernel.org/r/20230413173359.1.I1ae92b25689bd6579952e6d458b79f5f8054a0c9@changeid


> > @@ -2190,7 +2263,9 @@ struct regulator *_regulator_get(struct device *dev, const char *id,
> > return regulator;
> > }
> >
> > + regulator_lock(rdev);
> > regulator = create_regulator(rdev, dev, id);
> > + regulator_unlock(rdev);
>
> I'm sad that we're now locking the entire time create_regulator() is
> called. Can that be avoided? I see that create_regulator() publishes the
> consumer on the consumer_list, but otherwise I don't think it needs to
> hold the regulator lock. It goes on to call debugfs code after
> allocating memory. After this patch, we're going to be holding the lock
> for that regulator across debugfs APIs. I suspect that may lead to more
> problems later on because the time we hold the lock is extremely wide
> now.
>
> Of course, we were already holding the child regulator's lock for the
> supply, because that's what this patch is fixing in
> regulator_resolve_supply(). I'm just nervous that we're holding the lock
> for a much wider time now. Maybe we can have create_regulator() return
> the regulator and add a new function like add_regulator_consumer() that
> does the list modification? Then we can make create_regulator() do
> everything without holding a lock and have a very short time where the
> new function locks two regulator locks and does the linkage.

While we could try to come up with something fancier like this, I'm
not convinced it's worth the complexity. There are already cases where
we hold multiple regulator locks for quite long periods of time.
Specifically you can look at regulator_enable(). There, we'll grab the
lock for the regulator and the locks for all of the regulators parents
up the chain. Then we'll enable the regulator (and maybe the parents)
which might even include a long delay/sleep while holding the mutexes
for the whole chain.

Mark: do you have any opinion / intuition here? Is holding the rdev
lock for this larger scope a problem?

-Doug