[PATCH RFC] panic: Avoid extra noisy messages due to stopped cpus

From: Feng Tang
Date: Thu Oct 11 2018 - 03:21:20 EST


Sometimes when debugging kernel panic, we saw many extra noisy error
messages after the expected end:

[ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception
[ 35.749975] ------------[ cut here ]------------

These messages may overflow the sceen (framebuffer) and make debugging
much difficulter.

This hack patch just quickly prevent these noisy message, and would
really like to get some comments and suggestions.

I have tried other ways like adding a panic notifier block inside
tick/sched code to cancel tick_sched timer in panic case, which
also works.

These extra messages are of 2 kinds:
a)
WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
Call Trace:
<IRQ>
try_to_wake_up+0x157/0x430
default_wake_function+0xd/0x10
autoremove_wake_function+0x11/0x60
__wake_up_common+0x8a/0x160
__wake_up_common_lock+0x6c/0x90
__wake_up+0xe/0x10
wake_up_klogd_work_func+0x3b/0x60
irq_work_run_list+0x4e/0x80
irq_work_tick+0x40/0x50
update_process_times+0x3d/0x50
tick_sched_timer+0x38/0x80
__hrtimer_run_queues+0xce/0x200
hrtimer_interrupt+0xac/0x1f0
smp_apic_timer_interrupt+0x6e/0x140
apic_timer_interrupt+0x8e/0xa0

b)
sched: Unexpected reschedule of offline CPU#0!
------------[ cut here ]------------
WARNING: CPU: 1 PID: 300 at arch/x86/kernel/smp.c:141 native_smp_send_reschedule+0x3d/0x50
trigger_load_balance+0x125/0x230
scheduler_tick+0xa2/0xd0
update_process_times+0x42/0x50
tick_sched_handle.isra.5+0x21/0x60
tick_sched_timer+0x38/0x80
__hrtimer_run_queues+0xce/0x200
hrtimer_interrupt+0xac/0x1f0
smp_apic_timer_interrupt+0x6e/0x140
apic_timer_interrupt+0x8e/0xa0

Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
---
arch/x86/kernel/process.c | 1 +
kernel/sched/fair.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c93fcfd..b703862 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -520,6 +520,7 @@ void stop_this_cpu(void *dummy)
* Remove this CPU:
*/
set_cpu_online(smp_processor_id(), false);
+ set_cpu_active(smp_processor_id(), false);
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7fc4a37..cf41b7b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9034,7 +9034,7 @@ static inline int find_new_ilb(void)
{
int ilb = cpumask_first(nohz.idle_cpus_mask);

- if (ilb < nr_cpu_ids && idle_cpu(ilb))
+ if (ilb < nr_cpu_ids && idle_cpu(ilb) && cpu_online(ilb))
return ilb;

return nr_cpu_ids;
--
2.7.4