Re: [PATCH 2/7] watchdog/hardlockup: Make the config checks more straightforward

From: Petr Mladek
Date: Thu Jun 08 2023 - 07:02:45 EST


On Wed 2023-06-07 16:35:09, Doug Anderson wrote:
> Hi,
>
> On Wed, Jun 7, 2023 at 8:25 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 422f0ffa269e..13c6e596cf9e 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -404,17 +404,27 @@ config HAVE_NMI_WATCHDOG
> > depends on HAVE_NMI
> > bool
> > help
> > - The arch provides a low level NMI watchdog. It provides
> > - asm/nmi.h, and defines its own watchdog_hardlockup_probe() and
> > - arch_touch_nmi_watchdog().
> > + The arch provides its own hardlockup detector implementation instead
> > + of the generic perf one.
>
> nit: did you mean to have different wording here compared to
> HAVE_HARDLOCKUP_DETECTOR_ARCH? Here you say "the generic perf one" and
> there you say "the generic ones", though it seems like you mean them
> to be the same.

Good point, I'll fix it in v2.

> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 3e91fa33c7a0..d201f5d3876b 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1035,16 +1035,33 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
> >
> > Say N if unsure.
> >
> > +config HAVE_HARDLOCKUP_DETECTOR_BUDDY
> > + bool
> > + depends on SMP
> > + default y
>
> I think you simplify your life if you also add:
>
> depends on !HAVE_NMI_WATCHDOG
>
> The existing config system has always assumed that no architecture
> defines both HAVE_HARDLOCKUP_DETECTOR_PERF and HAVE_NMI_WATCHDOG
> (symbols would have clashed and you would get a link error as two
> watchdogs try to implement the same weak symbol). If you add the extra
> dependency to "buddy" as per above, then a few things below fall out.
> I'll try to point them out below.

My aim is to have two variables with and without HAVE_* prefix
for each variant. They already existed for perf:

+ HAVE_HARDLOCKUP_DETECTOR_PERF means that it is supported by the arch.
+ HARDLOCKUP_DETECTOR_PERF means that it will really be built.

1. It will make it clear what variants are available on the give system.
And it will make it clear what variant is used in the end.

2. It allows to add the dependecy on the global switch HARDLOCKUP_DETECTOR.
It makes it clear that the detector could be disabled by this switch.
Also it actually allows to use both top-bottom logic and C-like
definition ordering.


>
> > +
> > #
> > -# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
> > -# lockup detector rather than the perf based detector.
> > +# Global switch whether to build a hardlockup detector at all. It is available
> > +# only when the architecture supports at least one implementation. There are
> > +# two exceptions. The hardlockup detector is newer enabled on:
>
> s/newer/never/

Great catch. Will fix in v2.

> > +#
> > +# s390: it reported many false positives there
> > +#
> > +# sparc64: has a custom implementation which is not using the common
> > +# hardlockup command line options and sysctl interface.
> > +#
> > +# Note that HAVE_NMI_WATCHDOG is used to distinguish the sparc64 specific
> > +# implementaion. It is automatically enabled also for other arch-specific
> > +# variants which set HAVE_HARDLOCKUP_DETECTOR_ARCH. It makes the check
> > +# of avaialable and supported variants quite tricky.
> > #
> > config HARDLOCKUP_DETECTOR
> > bool "Detect Hard Lockups"
> > depends on DEBUG_KERNEL && !S390
> > - depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
> > + depends on ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
>
> Adding the dependency to buddy (see ablove) would simplify the above
> to just this:
>
> depends on HAVE_HARDLOCKUP_DETECTOR_PERF ||
> HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH

This is exactly what I do not want. It would just move the check
somewhere else. But it would make the logic harder to understand.
Especially when the related definitions could not be found by cscope.

> As per above, it's simply a responsibility of architectures not to
> define that they have both "perf" if they have the NMI watchdog, so
> it's just buddy to worry about.

Where is this documented, please?
Is it safe to assume this?

I would personally prefer to ensure this by the config check.
It is even better than documentation because nobody reads
documentation ;-)

It took me long time to understand all the dependencies and
possibilities. My motivation is that neither me nor others
would need to absolve the same adventure in the future.

>
> > + imply HARDLOCKUP_DETECTOR_PERF
> > + imply HARDLOCKUP_DETECTOR_BUDDY
> > select LOCKUP_DETECTOR
> > - select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> >
> > help
> > Say Y here to enable the kernel to act as a watchdog to detect
> > @@ -1055,9 +1072,15 @@ config HARDLOCKUP_DETECTOR
> > chance to run. The current stack trace is displayed upon detection
> > and the system will stay locked up.
> >
> > +#
> > +# Note that arch-specific variants are always preferred.
> > +#
> > config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > bool "Prefer the buddy CPU hardlockup detector"
> > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
> > + depends on HARDLOCKUP_DETECTOR
> > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
> > + depends on !HAVE_NMI_WATCHDOG
>
> Can get rid of above "!HAVE_NMI_WATCHDOG" if it's added to
> HAVE_HARDLOCKUP_DETECTOR_BUDDY.

I would prefer to keep it hear to make it clear on the first look.

> > + default n
>
> I'm pretty sure "default n" isn't needed.

I'll try and fix in v2.

>
> > help
> > Say Y here to prefer the buddy hardlockup detector over the perf one.
> >
> > @@ -1071,39 +1094,27 @@ config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> >
> > config HARDLOCKUP_DETECTOR_PERF
> > bool
> > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF
> > + depends on HARDLOCKUP_DETECTOR
> > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > + depends on !HAVE_NMI_WATCHDOG
>
> We don't really need the "!HAVE_NMI_WATCHDOG". A user wouldn't be able
> to mess this up and it's up to an architecture not to define both
> HAVE_HARDLOCKUP_DETECTOR_PERF and HAVE_NMI_WATCHDOG.

This is an assumption that only few people knows. I would prefer
to enforce this by the check.

> Overall, though, I agree that this improves things! Thanks! :-)

Thanks a lot for the review. I did my best and I got lost many times ;-)

Best Regards,
Petr