Re: [PATCH] watchdog: Prefer use "ref-cycles" for NMI watchdog

From: Andrew Morton
Date: Fri May 12 2023 - 19:41:03 EST


On Tue, 9 May 2023 15:17:00 -0700 Song Liu <song@xxxxxxxxxx> wrote:

> NMI watchdog permanently consumes one hardware counters per CPU on the
> system. For systems that use many hardware counters, this causes more
> aggressive time multiplexing of perf events.
>
> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
> used. Try use "ref-cycles" for the watchdog. If the CPU supports it, so
> that one more hardware counter is available to the user. If the CPU doesn't
> support "ref-cycles", fall back to "cycles".
>
> The downside of this change is that users of "ref-cycles" need to disable
> nmi_watchdog.
>
> ...
>
> @@ -286,6 +286,12 @@ int __init hardlockup_detector_perf_init(void)
> {
> int ret = hardlockup_detector_event_create();
>
> + if (ret) {

If we get here, hardlockup_detector_event_create() has sent a scary
pr_debug message.

> + /* Failed to create "ref-cycles", try "cycles" instead */
> + wd_hw_attr.config = PERF_COUNT_HW_CPU_CYCLES;
> + ret = hardlockup_detector_event_create();

So it would be good to emit a followup message here telling users that
things are OK. Or tell the user we're retrying with a different
counter, etc.

> + /* Failed to create "ref-cycles", try "cycles" instead */
> + wd_hw_attr.config = PERF_COUNT_HW_CPU_CYCLES;
> + ret = hardlockup_detector_event_create();
> + }
> +
> if (ret) {
> pr_info("Perf NMI watchdog permanently disabled\n");
> } else {
> --
> 2.34.1