Re: [PATCH linux-next][RFC] powerpc: fix HOTPLUG error in rcutorture

From: Michael Ellerman
Date: Mon Oct 10 2022 - 07:21:40 EST


Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> writes:
> I think we should avoid torture offline the cpu who do tick timer
> when nohz full is running.

Can you tell us what the bug you're fixing is?

Did you see a crash/oops/hang etc? Or are you just proposing this as
something that would be a good idea?

> Tested on PPC VM of Open Source Lab of Oregon State University.
> The test results show that after the fix, the success rate of
> rcutorture is improved.
> After:
> Successes: 40 Failures: 9
> Before:
> Successes: 38 Failures: 11
>
> I examined the console.log and Make.out files one by one, no new
> compile error or test error is introduced by above fix.
>
> Signed-off-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx>
> ---
> Dear PPC developers
>
> I found this bug when trying to do rcutorture tests in ppc VM of
> Open Source Lab of Oregon State University:
>
> ubuntu@ubuntu:~/linux-next/tools/testing/selftests/rcutorture/res/2022.09.30-01.06.22-torture$ find . -name "console.log.diags"|xargs grep HOTPLUG
> ./results-scftorture/NOPREEMPT/console.log.diags:WARNING: HOTPLUG FAILURES NOPREEMPT
> ./results-rcutorture/TASKS03/console.log.diags:WARNING: HOTPLUG FAILURES TASKS03
> ./results-rcutorture/TREE04/console.log.diags:WARNING: HOTPLUG FAILURES TREE04
> ./results-scftorture-kasan/NOPREEMPT/console.log.diags:WARNING: HOTPLUG FAILURES NOPREEMPT
> ./results-rcutorture-kasan/TASKS03/console.log.diags:WARNING: HOTPLUG FAILURES TASKS03
> ./results-rcutorture-kasan/TREE04/console.log.diags:WARNING: HOTPLUG FAILURES TREE04
>
> I tried to fix this bug.
>
> Thanks for your patience and guidance ;-)
>
> Thanks
> Zhouyi
> --
> arch/powerpc/kernel/sysfs.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index ef9a61718940..be9c0e45337e 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -4,6 +4,7 @@
> #include <linux/smp.h>
> #include <linux/percpu.h>
> #include <linux/init.h>
> +#include <linux/tick.h>
> #include <linux/sched.h>
> #include <linux/export.h>
> #include <linux/nodemask.h>
> @@ -21,6 +22,7 @@
> #include <asm/firmware.h>
> #include <asm/idle.h>
> #include <asm/svm.h>
> +#include "../../../kernel/time/tick-internal.h"

Needing to include this internal header is a sign that we are using the
wrong API or otherwise using time keeping internals we shouldn't be.

> #include "cacheinfo.h"
> #include "setup.h"
> @@ -1151,7 +1153,11 @@ static int __init topology_init(void)
> * CPU. For instance, the boot cpu might never be valid
> * for hotplugging.
> */
> - if (smp_ops && smp_ops->cpu_offline_self)
> + if (smp_ops && smp_ops->cpu_offline_self
> +#ifdef CONFIG_NO_HZ_FULL
> + && !(tick_nohz_full_running && tick_do_timer_cpu == cpu)
> +#endif
> + )

I can't see any other arches doing anything like this. I don't think
it's the arches responsibility.

If the time keeping core needs a CPU to stay online to run the timer
then it needs to organise that itself IMHO :)

cheers

> c->hotpluggable = 1;
> #endif
>
> --
> 2.25.1