Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

From: Suzuki Poulose
Date: Fri Oct 23 2020 - 09:30:00 EST

Next message: Jean-Philippe Brucker: "Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes"
Previous message: Frederic Weisbecker: "Re: [PATCH v4 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs"
In reply to: Peter Zijlstra: "Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()"
Next in thread: Peter Zijlstra: "Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 10/23/20 2:16 PM, Peter Zijlstra wrote:

On Fri, Oct 23, 2020 at 01:56:47PM +0100, Suzuki Poulose wrote:

On 10/23/20 11:54 AM, Peter Zijlstra wrote:

I think I'm more confused now :-/

Where do we use ->owner after event creation? The moment you create your
eventN you create the link to sink0. That link either succeeds (same
'cookie') or fails.

The event->sink link is established at creation. At event::add(), we
check the sink is free (i.e, it is inactive) or is used by an event
of the same session (this is where the owner field *was* required. But
this is not needed anymore, as we cache the "owner" read pid in the
handle->rb->aux_priv for each event and this is compared against the
pid from the handle currently driving the hardware)

*groan*.. that's going to be a mess with sinks that are shared between
CPUs :/

I'm also not seeing why exactly we need ->owner in the first place.

Suppose we make the sink0 device return -EBUSY on open() when it is
active. Then a perf session can open the sink0 device, create perf
events and attach them to the sink0 device using
perf_event_attr::config2. The events will attach to sink0 and increment
its usage count, such that any further open() will fail.

Thats where we are diverging. The sink device doesn't have any fops. It
is all managed by the coresight driver transparent to the perf tool. All
the perf tool does is, specifying which sink to use (btw, we now have
automatic sink selection support which gets rid of this, and uses
the best possible sink e.g, in case of per-CPU sinks).

per-CPU sinks sounds a lot better.

I'm really not convinced it makes sense to do what you do with shared
sinks though. You'll loose random parts of the execution trace because
of what the other CPUs do.

The ETM trace protocol has in built TraceID to distinguish the packets
and thus we could decode the trace streams from the shared buffer.
[ But, we don't have buffer overflow interrupts (I am keeping the lid closed on that can, for the sake of keeping sanity ;-) ), and thus
any shared session could easily loose data unless we tune the AUX
buffer size to a really large buffer ].

Full exclusive sink access is far more deterministic.

Once the events are created, the perf tool close()s the sink0 device,
which is now will in-use by the events. No other events can be attached
to it.

Or are you doing the event->sink mapping every time you do: pmu::add()?
That sounds insane.

Sink is already mapped at event create. But yes, the refcount on the
sink is managed at start/stop. Thats when we need to make sure that the
event being scheduled belongs to the same owner as the one already
driving the sink.

pmu::add() I might hope, because pmu::start() is not allowed to fail.

Right. If we can't get the sink, we simply truncate the buffer.

That way another session could use the same sink if it is free. i.e

perf record -e cs_etm/@sink0/u --per-thread app1

and

perf record -e cs_etm/@sink0/u --per-thread app2

both can work as long as the sink is not used by the other session.

Like said above, if sink is shared between CPUs, that's going to be a
trainwreck :/ Why do you want that?

That ship has sailed. That is how the current generation of systems are,
unfortunately. But as I said, this is changing and there are guidelines
in place to avoid these kind of topologies. With the future
technologies, this will be completely gone.

And once you have per-CPU sinks like mentioned above, the whole problem
goes away.

True, until then, this is the best we could do.

Suzuki

Next message: Jean-Philippe Brucker: "Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes"
Previous message: Frederic Weisbecker: "Re: [PATCH v4 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs"
In reply to: Peter Zijlstra: "Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()"
Next in thread: Peter Zijlstra: "Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]