Re: [PATCH v2] perf: Synchronously cleanup child events

From: Peter Zijlstra
Date: Wed Jan 20 2016 - 03:32:38 EST


On Tue, Jan 19, 2016 at 01:58:19PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 19, 2016 at 09:05:58PM +0100, Peter Zijlstra wrote:

> > The most obvious place that generates such magical references would be
> > the bpf arraymap doing perf_event_get() on things. There are a few other
> > places that take temp references (perf_mmap_close), but those are
> > 'short' lived and while ugly will not cause massive grief. The BPF one
> > OTOH is a real problem here.
> >
> > And looking at the BPF stuff, that code seems to assume
> > perf_event_kernel_release() := put_event(), so this patch breaks that
> > too.
> >
> >
> > Alexei, is there a reason the arraymap stuff needs a perf event ref as
> > opposed to a file ref? I'm forever a little confused on how perf<->bpf
> > works.
>
> A file ref will not work, since user space could have closed that
> perf_event file to avoid unnecessary FDs.

So I'm (possibly again) confused on how BPF works.

I thought the reason you handed in perf events from userspace; as
opposed to creating your own with perf_event_create_kernel_counter();
was because userspace was interested in the output.

Also, BPF should not be a way to get around the filedesc resource limit.

> Program only need the stable pointer to 'struct perf_event' which
> it will use while running.
> At the end it will call perf_event_kernel_release() which
> is == put_event().
> It was the case that 'perf_events' were normal refcnt-ed structures
> and the last guy frees it.

Sort-of, but user events are (or should be, rather) tied to the filedesc
to account the resources used.

There is also the event->owner field, we track the task that created the
event, with your current scheme that is left dangling once userspace
closes the last filedesc and you still have a ref open.

> This put_event_last() logic definitely looks problematic.
> There are no ordering guarantees.
> User space may close FD, while struct perf_event is still alive.
> The loop around perf_event_last() looks buggy.
> I'm obviously missing the main goal of this patch.

Right, so the patch in question tries to synchronously clean up
everything related to the counter when we close the file. Such that the
file better reflects the actual resource usage.

Currently we do this async (and with holes).


In short, user created event really should be filedesc based, yes we
have event references, but those 'should' be short lived.