Re: [PATCH 29/42] perf tools: Protect dso cache fd with a mutex

From: Arnaldo Carvalho de Melo
Date: Thu Jan 29 2015 - 11:23:33 EST


Em Thu, Jan 29, 2015 at 10:19:38PM +0900, Namhyung Kim escreveu:
> On Thu, Jan 29, 2015 at 09:31:07AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Jan 29, 2015 at 05:07:10PM +0900, Namhyung Kim escreveu:
> > > When dso cache is accessed in multi-thread environment, it's possible
> > > to close other dso->data.fd during operation due to open file limit.
> > > Protect the file descriptors using a separate mutex.
> > >
> > > Signed-off-by: Namhyung Kim <namhyung@xxxxxxxxxx>
> > > ---
> > > tools/perf/tests/dso-data.c | 5 ++
> > > tools/perf/util/dso.c | 136 +++++++++++++++++++++++++++++---------------
> > > 2 files changed, 94 insertions(+), 47 deletions(-)
> > >
> > > diff --git a/tools/perf/tests/dso-data.c b/tools/perf/tests/dso-data.c
> > > index caaf37f079b1..0276e7d2d41b 100644
> > > --- a/tools/perf/tests/dso-data.c
> > > +++ b/tools/perf/tests/dso-data.c
> > > @@ -111,6 +111,9 @@ int test__dso_data(void)
> > > memset(&machine, 0, sizeof(machine));
> > >
> > > dso = dso__new((const char *)file);
> > > + TEST_ASSERT_VAL("failed to get dso", dso);
> > > +
> > > + dso->binary_type = DSO_BINARY_TYPE__SYSTEM_PATH_DSO;
> > >
> > > /* Basic 10 bytes tests. */
> > > for (i = 0; i < ARRAY_SIZE(offsets); i++) {
> > > @@ -199,6 +202,8 @@ static int dsos__create(int cnt, int size)
> > >
> > > dsos[i] = dso__new(file);
> > > TEST_ASSERT_VAL("failed to get dso", dsos[i]);
> > > +
> > > + dsos[i]->binary_type = DSO_BINARY_TYPE__SYSTEM_PATH_DSO;
> >
> > Those two are unrelated, please put them in a separate patch, one that I
> > can even cherrypick ahead of the other patches.
>
> It's a consequence of changing dso__data_read_offset() not to call
> dso__data_fd() due to a performance reason. The binary_type was
> determined during the dso__data_fd() before, but now it needs to be
> set explicitly for this test.
>
> In the original code, it was called everytime we access to the dso
> cache just to check an error, I guess. But it's enough to check the
> status field.

Are you saying that this test should not rely on some function that is
called somewhere down the functions it uses and should instead do as you
do above?

I.e. if that is the case, then this stands out as a separate patch, if
not, if this is indeed really related to this patch (at first sight it
doesn't look like) then this explanation you give should be in the
patch comment log.

> > > return 0;
> > > diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> > > index 11ece224ef50..ae92046ae2c8 100644
> > > --- a/tools/perf/util/dso.c
> > > +++ b/tools/perf/util/dso.c
> > > @@ -213,6 +213,7 @@ bool dso__needs_decompress(struct dso *dso)
> > > */
> > > static LIST_HEAD(dso__data_open);
> > > static long dso__data_open_cnt;
> > > +static pthread_mutex_t dso__data_open_lock = PTHREAD_MUTEX_INITIALIZER;
> > >
> > > static void dso__list_add(struct dso *dso)
> > > {
> > > @@ -240,7 +241,7 @@ static int do_open(char *name)
> > > if (fd >= 0)
> > > return fd;
> > >
> > > - pr_debug("dso open failed, mmap: %s\n",
> > > + pr_debug("dso open failed: %s\n",
> > > strerror_r(errno, sbuf, sizeof(sbuf)));
> > > if (!dso__data_open_cnt || errno != EMFILE)
> >
> > Ditto, another unrelated patch, please separate.
>
> Ah, okay. I kept it since it's just a small change. But I'd like to
> separate if it helps reviewing.

Thanks

> > > break;
> > > @@ -382,7 +383,9 @@ static void check_data_close(void)
> > > */
> > > void dso__data_close(struct dso *dso)
> > > {
> > > + pthread_mutex_lock(&dso__data_open_lock);
> > > close_dso(dso);
> > > + pthread_mutex_unlock(&dso__data_open_lock);
> > > }
> > >
> > > /**
> > > @@ -405,6 +408,8 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
> > > if (dso->data.status == DSO_DATA_STATUS_ERROR)
> > > return -1;
> > >
> > > + pthread_mutex_lock(&dso__data_open_lock);
> > > +
> > > if (dso->data.fd >= 0)
> > > goto out;
> > >
> > > @@ -427,6 +432,7 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
> > > else
> > > dso->data.status = DSO_DATA_STATUS_ERROR;
> > >
> > > + pthread_mutex_unlock(&dso__data_open_lock);
> > > return dso->data.fd;
> > > }
> > >
> > > @@ -531,52 +537,66 @@ dso_cache__memcpy(struct dso_cache *cache, u64 offset,
> > > }
> > >
> > > static ssize_t
> > > -dso_cache__read(struct dso *dso, u64 offset, u8 *data, ssize_t size)
> > > +dso_cache__read(struct dso *dso, struct machine *machine,
> > > + u64 offset, u8 *data, ssize_t size)
> > > {
> > > struct dso_cache *cache;
> > > struct dso_cache *old;
> > > - ssize_t ret;
> > > -
> > > - do {
> > > - u64 cache_offset;
> >
> > While I understand that there was no need for this do { } while (0)
> > construct in the first place, removing it in this patch is not
> > interesting, as it is both unrelated to this patch and makes the it
> > harder to review by just looking at the patch :-\ Please refrain from
> > doing this in this patch.
>
> Understood, sorry for bothering! :)

:-)

> > A later patch that does _just_ that could be done, if you feel like
> > doing it.
>
> Okay.
>
>
> >
> > > + ssize_t ret = -EINVAL;
> > > + u64 cache_offset;
> > >
> > > - ret = -ENOMEM;
> > > + cache = zalloc(sizeof(*cache) + DSO__DATA_CACHE_SIZE);
> > > + if (!cache)
> > > + return -ENOMEM;
> > >
> > > - cache = zalloc(sizeof(*cache) + DSO__DATA_CACHE_SIZE);
> > > - if (!cache)
> > > - break;
> > > + cache_offset = offset & DSO__DATA_CACHE_MASK;
> > >
> > > - cache_offset = offset & DSO__DATA_CACHE_MASK;
> > > - ret = -EINVAL;
> > > + pthread_mutex_lock(&dso__data_open_lock);
> > >
> > > - if (-1 == lseek(dso->data.fd, cache_offset, SEEK_SET))
> > > - break;
> > > + /*
> > > + * dso->data.fd might be closed if other thread opened another
> > > + * file (dso) due to open file limit (RLIMIT_NOFILE).
> > > + */
> > > + if (dso->data.fd < 0) {
> > > + dso->data.fd = open_dso(dso, machine);
> >
> > Also please consider adding a backpointer to machine in the dso object,
> > since you need to reopen it, so that we don't have to go on passing
> > machine around to dso_cache__read(), etc.
>
> Yeah, it's a pain to passing a machine pointer.

Hey, so setting of dso->data.fd is protected in this function and we can
be sure that it will not be closed _again_ just before we do that lseek
and other operations, I guess so, just checking...

> >
> > This probably needs to be done in the patch that makes dso->data.fd to
> > be closed due to limit.
>
> I don't know which patch you are refering.. It already closes an fd
> if it reaches the limit - what this patch does is protecting such
> concurrent open and close when multi-thread is used.

Ok then, i.e.:

A) take the lock, close it if over the limit, drop the lock.

B) take the lock, check if it is closed, open if so, use it, drop the
lock.

Is that right?

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/