Re: [PATCH v3 03/22] perf dso: Make lock error check and add BUG_ONs

From: Ian Rogers
Date: Fri Feb 11 2022 - 14:35:22 EST


On Fri, Feb 11, 2022 at 11:21 AM Arnaldo Carvalho de Melo
<acme@xxxxxxxxxx> wrote:
>
> Em Fri, Feb 11, 2022 at 09:43:19AM -0800, Ian Rogers escreveu:
> > On Fri, Feb 11, 2022 at 9:13 AM Arnaldo Carvalho de Melo
> > <acme@xxxxxxxxxx> wrote:
> > >
> > > Em Fri, Feb 11, 2022 at 02:33:56AM -0800, Ian Rogers escreveu:
> > > > Make the pthread mutex on dso use the error check type. This allows
> > > > deadlock checking via the return type. Assert the returned value from
> > > > mutex lock is always 0.
> > >
> > > I think this is too blunt/pervasive source code wise, perhaps we should
> > > wrap this like its done with rwsem in tools/perf/util/rwsem.h to get
> > > away from pthreads primitives and make the source code look more like
> > > a kernel one and then, taking advantage of the so far ideologic
> > > needless indirection, add this BUG_ON if we build with "DEBUG=1" or
> > > something, wdyt?
> >
>
> > My concern with semaphores is that they are a concurrency primitive
>
> I'm not suggesting we switch over to semaphores, just to use the same
> technique of wrapping pthread_mutex_t with some other API that then
> allows us to add these BUG_ON() calls without polluting the source code
> in many places.

Sounds simple enough and would ensure consistency too. I can add it to
the front of this set of changes. A different approach would be to
take what's here and then refactor and cleanup as a follow on patch
set. I'd prefer that as the size of this set of changes is already
larger than I like - albeit that most of it is just introducing the
use of functions to access struct variables. Perhaps I just remove the
BUG_ON and pthread changes here, we work to get this landed and in a
separate set of patches clean up the pthread mutex code to have better
bug checking.

Thanks,
Ian

> - Arnaldo
>
> > that has more flexibility and power than a mutex. I like a mutex as it
> > is quite obvious what is going on and that is good from a tooling
> > point of view. A deadlock with two mutexes is easy to understand. On a
> > semaphore, were we using it like a condition variable? There's more to
> > figure out. I also like the idea of compiling the perf command with
> > emscripten, we could then generate say perf annotate output in your
> > web browser. Emscripten has implementations of standard posix
> > libraries including pthreads, but we may need to have two approaches
> > in the perf code if we want to compile with emscripten and use
> > semaphores when targeting linux.
> >
> > Where this change comes from is that I worried that extending the
> > locked regions to cover the race that'd been found would then expose
> > the kind of recursive deadlock that pthread mutexes all too willingly
> > allow. With this code we at least see the bug and don't just hang. I
> > don't think we need the change to the mutexes for this change, but we
> > do need to extend the regions to fix the data race.
> >
> > Let me know how you prefer it and I can roll it into a v4 version.
> >
> > Thanks,
> > Ian
> >
> > > - Arnaldo
> > >
> > > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > > > ---
> > > > tools/perf/util/dso.c | 12 +++++++++---
> > > > tools/perf/util/symbol.c | 2 +-
> > > > 2 files changed, 10 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> > > > index 9cc8a1772b4b..6beccffeef7b 100644
> > > > --- a/tools/perf/util/dso.c
> > > > +++ b/tools/perf/util/dso.c
> > > > @@ -784,7 +784,7 @@ dso_cache__free(struct dso *dso)
> > > > struct rb_root *root = &dso->data.cache;
> > > > struct rb_node *next = rb_first(root);
> > > >
> > > > - pthread_mutex_lock(&dso->lock);
> > > > + BUG_ON(pthread_mutex_lock(&dso->lock) != 0);
> > > > while (next) {
> > > > struct dso_cache *cache;
> > > >
> > > > @@ -830,7 +830,7 @@ dso_cache__insert(struct dso *dso, struct dso_cache *new)
> > > > struct dso_cache *cache;
> > > > u64 offset = new->offset;
> > > >
> > > > - pthread_mutex_lock(&dso->lock);
> > > > + BUG_ON(pthread_mutex_lock(&dso->lock) != 0);
> > > > while (*p != NULL) {
> > > > u64 end;
> > > >
> > > > @@ -1259,6 +1259,8 @@ struct dso *dso__new_id(const char *name, struct dso_id *id)
> > > > struct dso *dso = calloc(1, sizeof(*dso) + strlen(name) + 1);
> > > >
> > > > if (dso != NULL) {
> > > > + pthread_mutexattr_t lock_attr;
> > > > +
> > > > strcpy(dso->name, name);
> > > > if (id)
> > > > dso->id = *id;
> > > > @@ -1286,8 +1288,12 @@ struct dso *dso__new_id(const char *name, struct dso_id *id)
> > > > dso->root = NULL;
> > > > INIT_LIST_HEAD(&dso->node);
> > > > INIT_LIST_HEAD(&dso->data.open_entry);
> > > > - pthread_mutex_init(&dso->lock, NULL);
> > > > + pthread_mutexattr_init(&lock_attr);
> > > > + pthread_mutexattr_settype(&lock_attr, PTHREAD_MUTEX_ERRORCHECK);
> > > > + pthread_mutex_init(&dso->lock, &lock_attr);
> > > > + pthread_mutexattr_destroy(&lock_attr);
> > > > refcount_set(&dso->refcnt, 1);
> > > > +
> > > > }
> > > >
> > > > return dso;
> > > > diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> > > > index b2ed3140a1fa..43f47532696f 100644
> > > > --- a/tools/perf/util/symbol.c
> > > > +++ b/tools/perf/util/symbol.c
> > > > @@ -1783,7 +1783,7 @@ int dso__load(struct dso *dso, struct map *map)
> > > > }
> > > >
> > > > nsinfo__mountns_enter(dso->nsinfo, &nsc);
> > > > - pthread_mutex_lock(&dso->lock);
> > > > + BUG_ON(pthread_mutex_lock(&dso->lock) != 0);
> > > >
> > > > /* check again under the dso->lock */
> > > > if (dso__loaded(dso)) {
> > > > --
> > > > 2.35.1.265.g69c8d7142f-goog
> > >
> > > --
> > >
> > > - Arnaldo
>
> --
>
> - Arnaldo