Re: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event

From: Xie XiuQi
Date: Sun Apr 16 2017 - 23:09:48 EST


Hi Tyler,

Thanks for your comments and testing.

On 2017/4/15 4:36, Baicar, Tyler wrote:
> On 3/30/2017 4:31 AM, Xie XiuQi wrote:
>> Add a new trace event for ARM processor error information, so that
>> the user will know what error occurred. With this information the
>> user may take appropriate action.
>>
>> These trace events are consistent with the ARM processor error
>> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>>
>> ---
>> v2: add trace enabled condition as Steven's suggestion.
>> fix a typo.
>> ---
>>
>> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
>> Cc: Tyler Baicar <tbaicar@xxxxxxxxxxxxxx>
>> Signed-off-by: Xie XiuQi <xiexiuqi@xxxxxxxxxx>
>> ---
> ...
>> +#define ARM_PROC_ERR_TYPE \
>> + EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \
>> + EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \
>> + EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \
>> + EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
>> +
>> +#define ARM_PROC_ERR_FLAGS \
>> + EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \
>> + EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \
>> + EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \
>> + EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
>> +
> Hello Xie XiuQi,
>
> This isn't compiling for me because of these definitions. Here you are using ARM_*, but below in the TP_printk you are using ARCH_*. The compiler complains the ARCH_* ones are undefined:
>
> ./include/trace/../../include/ras/ras_event.h:278:37: error: 'ARCH_PROC_ERR_TYPE' undeclared (first use in this function)
> __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
> ./include/trace/../../include/ras/ras_event.h:280:38: error: 'ARCH_PROC_ERR_FLAGS' undeclared (first use in this function)
> __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),

Sorry, it's a typo. It should be ARM_xxx.

>
>> +/*
>> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
>> + * via TRACE_DEFINE_ENUM().
>> + */
>> +#undef EM
>> +#undef EMe
>> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
>> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
>> +
>> +ARM_PROC_ERR_TYPE
>> +ARM_PROC_ERR_FLAGS
> Are the above two lines supposed to be here?
>> +
>> +/*
>> + * Now redefine the EM() and EMe() macros to map the enums to the strings
>> + * that will be printed in the output.
>> + */
>> +#undef EM
>> +#undef EMe
>> +#define EM(a, b) { a, b },
>> +#define EMe(a, b) { a, b }
>> +
>> +TRACE_EVENT(arm_proc_err,
> I think it would be better to keep this similar to the naming of the current RAS trace events (right now we have mc_event, arm_event, aer_event, etc.). I would suggest using "arm_err_info_event" since this is handling the error information structures of the arm errors.
>> +
>> + TP_PROTO(const struct cper_arm_err_info *err),
>> +
>> + TP_ARGS(err),
>> +
>> + TP_STRUCT__entry(
>> + __field(u8, type)
>> + __field(u16, multiple_error)
>> + __field(u8, flags)
>> + __field(u64, error_info)
>> + __field(u64, virt_fault_addr)
>> + __field(u64, physical_fault_addr)
> Validation bits should also be a part of this structure that way user space tools will know which of these fields are valid.

Could we use the default value to check the validation which we have checked in TP_fast_assign?

>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->type = err->type;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
>> + __entry->multiple_error = err->multiple_error;
>> + else
>> + __entry->multiple_error = ~0;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
>> + __entry->flags = err->flags;
>> + else
>> + __entry->flags = ~0;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
>> + __entry->error_info = err->error_info;
>> + else
>> + __entry->error_info = 0ULL;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
>> + __entry->virt_fault_addr = err->virt_fault_addr;
>> + else
>> + __entry->virt_fault_addr = 0ULL;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
>> + __entry->physical_fault_addr = err->physical_fault_addr;
>> + else
>> + __entry->physical_fault_addr = 0ULL;
>> + ),
>> +
>> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
> I think the "ARM Processor Error:" part of this should just be removed. Here's the output with this removed and the trace event renamed to arm_err_info_event. I think this looks much cleaner and matches the style used with the arm_event.
>
> <idle>-0 [020] .ns. 366.592434: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510f8000; running state: 1; PSCI state: 0
> <idle>-0 [020] .ns. 366.592437: arm_err_info_event: type cache error; count: 0; flags: 0x3; error info: 0000000000c20058; virtual address: 0000000000000000; physical address: 0000000000000000

I agree. It looks much better.

>
> Thanks,
> Tyler
>

--
Thanks,
Xie XiuQi