Re: frequent lockups in 3.18rc4

From: Linus Torvalds
Date: Thu Dec 04 2014 - 11:18:16 EST


On Thu, Dec 4, 2014 at 12:43 AM, DÃniel Fraga <fragabr@xxxxxxxxx> wrote:
>
> Linus, today it's your lucky day, because I think I found the
> real bad commit (if it isn't, then it's some very close to it). I
> managed to narrow the bisect and here's the result:

Ok, that actually looks very reasonable, I had actually looked at it
because of the whole "changes IPI" thing.

One more thing to try: does a revert fix it on current git?

It doesn't revert entirely cleanly, but close enough - attached a
quick rough patch that may or may not work, but looks like a good
revert.

Dave - this might be worth testing for you too, exactly because of
that whole "it changes how we do IPI's". It was your bug report with
TLB IPI's that made me look at that commit originally.

Linus

---
> fd2ac4f4a65a7f34b0bc6433fcca1192d7ba8b8e is the first bad commit
> commit fd2ac4f4a65a7f34b0bc6433fcca1192d7ba8b8e
> Author: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Date: Tue Mar 18 21:12:53 2014 +0100
>
> nohz: Use nohz own full kick on 2nd task enqueue
kernel/sched/core.c | 5 ++++-
kernel/sched/sched.h | 2 +-
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 89e7283015a6..1b40aed13931 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1577,7 +1577,9 @@ void scheduler_ipi(void)
*/
preempt_fold_need_resched();

- if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
+ if (llist_empty(&this_rq()->wake_list)
+ && !tick_nohz_full_cpu(smp_processor_id())
+ && !got_nohz_idle_kick())
return;

/*
@@ -1594,6 +1596,7 @@ void scheduler_ipi(void)
* somewhat pessimize the simple resched case.
*/
irq_enter();
+ tick_nohz_full_check();
sched_ttwu_pending();

/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2df8ef067cc5..e9a73143d318 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1245,7 +1245,7 @@ static inline void add_nr_running(struct rq *rq, unsigned count)
* new value of rq->nr_running is visible on reception
* from the target.
*/
- tick_nohz_full_kick_cpu(rq->cpu);
+ smp_send_reschedule(rq->cpu);
}
#endif
}