Re: [PATCH 32/35] clockevents: Fix cpu down race for hrtimer based broadcasting

From: Preeti U Murthy
Date: Fri Feb 27 2015 - 03:49:22 EST

Next message: Vaishali Thakkar: "[PATCH] net: Use setup_timer and mod_timer"
Previous message: Thorsten.Bschorr: "[PATCH] Avoid null-pointer access in w1/slaves/w1_therm"
In reply to: Preeti U Murthy: "Re: [PATCH 32/35] clockevents: Fix cpu down race for hrtimer based broadcasting"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 02/26/2015 11:01 AM, Preeti U Murthy wrote:
> On 02/23/2015 11:03 PM, Nicolas Pitre wrote:
>> On Mon, 23 Feb 2015, Nicolas Pitre wrote:
>>
>>> On Mon, 23 Feb 2015, Peter Zijlstra wrote:
>>>
>>>> The reported function that fails: bL_switcher_restore_cpus() is called
>>>> in the error paths of the former and the main path in the latter to make
>>>> the 'stolen' cpus re-appear.
>>>>
>>>> The patch in question somehow makes that go boom.
>>>>
>>>>
>>>> Now what all do you need to do to make it go boom? Just enable/disable
>>>> the switcher once and it'll explode? Or does it need to do actual
>>>> switches while it is enabled?
>>>
>>> It gets automatically enabled during boot. Then several switches are
>>> performed while user space is brought up. If I manually disable it
>>> via /sys then it goes boom.
>>
>> OK. Forget the bL switcher. I configured it out of my kernel and then
>> managed to get the same crash by simply hotplugging out one CPU and
>> plugging it back in.
>>
>> $ echo 0 > /sys/devices/system/cpu/cpu2/online
>> [CPU2 gone]
>> $ echo 1 > /sys/devices/system/cpu/cpu2/online
>> [Boom!]
>>
>>
> I saw an issue with this patch as well. I tried to do an smt mode switch
> on a power machine, i.e varying the number of hyperthreads on an SMT 8
> system, and the system hangs. Worse, there are no softlockup
> messages/warnings/bug_ons reported. I am digging into this issue.
>
> A couple of points though. Looking at the patch, I see that we are
> shutting down tick device of the hotplugged out cpu *much before*
> migrating the timers and hrtimers from it. Migration of timers is done
> in the CPU_DEAD phase, while we shutdown tick devices in the CPU_DYING
> phase. There is quite a bit of a gap here. Earlier we would do both in a
> single notification.
>
> Another point is that the tick devices are shutdown before the
> hotplugged out cpu actually dies in __cpu_die(). At first look none of
> these two points should create any issues. But since we are noticing
> problems with this patch, I thought it would be best to put them forth.
>
> But why are tick devices being shutdown that early ? Is there any
> specific advantage to this? Taking/handing over tick duties should be
> done before __cpu_die(), but shutdown of tick devices should be done
> after this phase. This seems more natural, doesn't it?

The problem reported in the changelog of this patch is causing severe
regressions very frequently on our machines for certain usecases. It would
help to put in a fix in place first and then follow that up with these
cleanups. A fix on the below lines :

---------------------------------------------------------------------

clockevents: Fix cpu down race for hrtimer based broadcasting

---
include/linux/tick.h | 10 +++++++---
kernel/cpu.c | 2 ++
kernel/time/tick-broadcast.c | 19 +++++++++++--------
3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index eda850c..8735776 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -91,14 +91,18 @@ extern void tick_cancel_sched_timer(int cpu);
static inline void tick_cancel_sched_timer(int cpu) { }
# endif

-# ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
+# if defined CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
extern struct tick_device *tick_get_broadcast_device(void);
extern struct cpumask *tick_get_broadcast_mask(void);

-# ifdef CONFIG_TICK_ONESHOT
+# if defined CONFIG_TICK_ONESHOT
extern struct cpumask *tick_get_broadcast_oneshot_mask(void);
+extern void tick_takeover(int deadcpu);
+# else
+static void tick_takeover(int deadcpu) {}
# endif
-
+# else
+static void tick_takeover(int deadcpu) {}
# endif /* BROADCAST */

# ifdef CONFIG_TICK_ONESHOT
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1972b16..f9ca351 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -20,6 +20,7 @@
#include <linux/gfp.h>
#include <linux/suspend.h>
#include <linux/lockdep.h>
+#include <linux/tick.h>
#include <trace/events/power.h>

#include "smpboot.h"
@@ -411,6 +412,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
while (!idle_cpu(cpu))
cpu_relax();

+ tick_takeover(cpu);
/* This actually kills the CPU. */
__cpu_die(cpu);

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 066f0ec..0fd6634 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -669,14 +669,19 @@ static void broadcast_shutdown_local(struct clock_event_device *bc,
clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
}

-static void broadcast_move_bc(int deadcpu)
+void tick_takeover(int deadcpu)
{
- struct clock_event_device *bc = tick_broadcast_device.evtdev;
+ struct clock_event_device *bc;
+ unsigned long flags;

- if (!bc || !broadcast_needs_cpu(bc, deadcpu))
- return;
- /* This moves the broadcast assignment to this cpu */
- clockevents_program_event(bc, bc->next_event, 1);
+ raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
+ bc = tick_broadcast_device.evtdev;
+
+ if (bc && broadcast_needs_cpu(bc, deadcpu)) {
+ /* This moves the broadcast assignment to this cpu */
+ clockevents_program_event(bc, bc->next_event, 1);
+ }
+ raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
}

/*
@@ -913,8 +918,6 @@ void tick_shutdown_broadcast_oneshot(unsigned int *cpup)
cpumask_clear_cpu(cpu, tick_broadcast_pending_mask);
cpumask_clear_cpu(cpu, tick_broadcast_force_mask);

- broadcast_move_bc(cpu);
-
raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
}

Regards
Preeti U Murthy

>
>
> Regards
> Preeti U Murthy
>
>> Nicolas
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Vaishali Thakkar: "[PATCH] net: Use setup_timer and mod_timer"
Previous message: Thorsten.Bschorr: "[PATCH] Avoid null-pointer access in w1/slaves/w1_therm"
In reply to: Preeti U Murthy: "Re: [PATCH 32/35] clockevents: Fix cpu down race for hrtimer based broadcasting"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]