Re: [PATCH V4 4/7] perf/x86/intel: Support LBR event logging

From: Liang, Kan
Date: Fri Oct 20 2023 - 08:45:12 EST




On 2023-10-19 7:12 a.m., Peter Zijlstra wrote:
> On Wed, Oct 04, 2023 at 11:40:41AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
>> +static __always_inline void get_lbr_events(struct cpu_hw_events *cpuc,
>> + int i, u64 info)
>> +{
>> + /*
>> + * The later code will decide what content can be disclosed
>> + * to the perf tool. It's no harmful to unconditionally update
>> + * the cpuc->lbr_events.
>> + * Pleae see intel_pmu_lbr_event_reorder()
>> + */
>> + cpuc->lbr_events[i] = info & LBR_INFO_EVENTS;
>> +}
>
> You could be forcing an extra cachemiss here.

Here is to temporarily store the branch _counter information. Maybe we
can leverage the reserved field of cpuc->lbr_entries[i] to avoid the
cachemiss.

e->reserved = info & LBR_INFO_COUNTERS;

I tried to add something like a static_assert to check the size of the
reserved field in case the field is shrink later. But the reserved field
is a bit field. I have no idea how to get the exact size of a bit field
unless define a macro. Is something as below OK? Any suggestions are
appreciated.


diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 1e80a551a4c2..62675593e39a 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1582,6 +1582,8 @@ static bool is_arch_lbr_xsave_available(void)
return true;
}

+static_assert((64 - PERF_BRANCH_ENTRY_INFO_BITS_MAX) >
LBR_INFO_COUNTERS_MAX_NUM * 2);
+
void __init intel_pmu_arch_lbr_init(void)
{
struct pmu *pmu = x86_get_pmu(smp_processor_id());
diff --git a/arch/x86/include/asm/msr-index.h
b/arch/x86/include/asm/msr-index.h
index f220c3598d03..e9ff8eba5efd 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -238,6 +238,7 @@
#define LBR_INFO_BR_TYPE (0xfull << LBR_INFO_BR_TYPE_OFFSET)
#define LBR_INFO_EVENTS_OFFSET 32
#define LBR_INFO_EVENTS (0xffull << LBR_INFO_EVENTS_OFFSET)
+#define LBR_INFO_COUNTERS_MAX_NUM 4

#define MSR_ARCH_LBR_CTL 0x000014ce
#define ARCH_LBR_CTL_LBREN BIT(0)
diff --git a/include/uapi/linux/perf_event.h
b/include/uapi/linux/perf_event.h
index 4461f380425b..3a64499b0f5d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1437,6 +1437,9 @@ struct perf_branch_entry {
reserved:31;
};

+/* Size of used info bits in struct perf_branch_entry */
+#define PERF_BRANCH_ENTRY_INFO_BITS_MAX 33
+
union perf_sample_weight {
__u64 full;
#if defined(__LITTLE_ENDIAN_BITFIELD)



> A long time ago I had
> hacks to profile perf with perf, but perhaps PT can be abused for that
> now?

As my understanding, the PT can only give the trace information, and may
not tell if there is a canchemiss or something.
I will take a deep look and see if PT can help the case.

Thanks,
Kan