Re: [PATCH 1/3] perf/core: Flush PMU internal buffers for per-CPU events

From: Madhavan Srinivasan
Date: Tue Nov 24 2020 - 00:43:49 EST



On 11/24/20 10:21 AM, Namhyung Kim wrote:
Hello,

On Mon, Nov 23, 2020 at 8:00 PM Michael Ellerman <mpe@xxxxxxxxxxxxxx> wrote:
Namhyung Kim <namhyung@xxxxxxxxxx> writes:
Hi Peter and Kan,

(Adding PPC folks)

On Tue, Nov 17, 2020 at 2:01 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
Hello,

On Thu, Nov 12, 2020 at 4:54 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:


On 11/11/2020 11:25 AM, Peter Zijlstra wrote:
On Mon, Nov 09, 2020 at 09:49:31AM -0500, Liang, Kan wrote:

- When the large PEBS was introduced (9c964efa4330), the sched_task() should
be invoked to flush the PEBS buffer in each context switch. However, The
perf_sched_events in account_event() is not updated accordingly. The
perf_event_task_sched_* never be invoked for a pure per-CPU context. Only
per-task event works.
At that time, the perf_pmu_sched_task() is outside of
perf_event_context_sched_in/out. It means that perf has to double
perf_pmu_disable() for per-task event.
- The patch 1 tries to fix broken per-CPU events. The CPU context cannot be
retrieved from the task->perf_event_ctxp. So it has to be tracked in the
sched_cb_list. Yes, the code is very similar to the original codes, but it
is actually the new code for per-CPU events. The optimization for per-task
events is still kept.
For the case, which has both a CPU context and a task context, yes, the
__perf_pmu_sched_task() in this patch is not invoked. Because the
sched_task() only need to be invoked once in a context switch. The
sched_task() will be eventually invoked in the task context.
The thing is; your first two patches rely on PERF_ATTACH_SCHED_CB and
only set that for large pebs. Are you sure the other users (Intel LBR
and PowerPC BHRB) don't need it?
I didn't set it for LBR, because the perf_sched_events is always enabled
for LBR. But, yes, we should explicitly set the PERF_ATTACH_SCHED_CB
for LBR.

if (has_branch_stack(event))
inc = true;

If they indeed do not require the pmu::sched_task() callback for CPU
events, then I still think the whole perf_sched_cb_{inc,dec}() interface
No, LBR requires the pmu::sched_task() callback for CPU events.

Now, The LBR registers have to be reset in sched in even for CPU events.

To fix the shorter LBR callstack issue for CPU events, we also need to
save/restore LBRs in pmu::sched_task().
https://lore.kernel.org/lkml/1578495789-95006-4-git-send-email-kan.liang@xxxxxxxxxxxxxxx/

is confusing at best.

Can't we do something like this instead?

I think the below patch may have two issues.
- PERF_ATTACH_SCHED_CB is required for LBR (maybe PowerPC BHRB as well) now.
- We may disable the large PEBS later if not all PEBS events support
large PEBS. The PMU need a way to notify the generic code to decrease
the nr_sched_task.
Any updates on this? I've reviewed and tested Kan's patches
and they all look good.

Maybe we can talk to PPC folks to confirm the BHRB case?
Can we move this forward? I saw patch 3/3 also adds PERF_ATTACH_SCHED_CB
for PowerPC too. But it'd be nice if ppc folks can confirm the change.
Sorry I've read the whole thread, but I'm still not entirely sure I
understand the question.
Thanks for your time and sorry about not being clear enough.

We found per-cpu events are not calling pmu::sched_task()
on context switches. So PERF_ATTACH_SCHED_CB was
added to indicate the core logic that it needs to invoke the
callback.

The patch 3/3 added the flag to PPC (for BHRB) with other
changes (I think it should be split like in the patch 2/3) and
want to get ACKs from the PPC folks.

Sorry for delay.

I guess first it will be better to split the ppc change to a separate patch,

secondly, we are missing the changes needed in the power_pmu_bhrb_disable()

where perf_sched_cb_dec() needs the "state" to be included.


Maddy



Thanks,
Namhyung