Can't boot as Xen dom0 due to commit fe055896

From: Juergen Gross
Date: Thu Dec 15 2016 - 11:12:12 EST


Boris,

with today's kernel the system isn't coming up when booted as Xen dom0:

[ 33.575326] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
[swapper/0:1]
[ 33.589795] Modules linked in:
[ 33.596015] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-pv+ #687
[ 33.608844] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07
06/26/2014
[ 33.623590] task: ffff8801fa574dc0 task.stack: ffffc90001e68000
[ 33.635535] RIP: e030:xen_hypercall_sched_op+0xa/0x20
[ 33.645756] RSP: e02b:ffffc90001e6bdc8 EFLAGS: 00000246
[ 33.656331] RAX: 0000000000000000 RBX: 0000000000000001 RCX:
ffffffff810013aa
[ 33.670718] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[ 33.685102] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000001
[ 33.699487] R10: 000000000001b448 R11: 0000000000000246 R12:
000000000000b8b8
[ 33.713872] R13: 0000000000000001 R14: 00000000001ff889 R15:
00000000001193a2
[ 33.728260] FS: 0000000000000000(0000) GS:ffff8801ff800000(0000)
knlGS:0000000000000000
[ 33.744562] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 33.756158] CR2: ffffc900010f6000 CR3: 0000000001807000 CR4:
0000000000042660
[ 33.770545] Call Trace:
[ 33.775567] ? xen_cpu_up+0x1da/0x500
[ 33.783020] ? put_online_cpus+0x70/0x70
[ 33.790992] ? bringup_cpu+0x1e/0x80
[ 33.798269] ? cpuhp_invoke_callback+0x7b/0x3e0
[ 33.807459] ? ring_buffer_record_is_on+0x10/0x10
[ 33.816989] ? cpuhp_up_callbacks+0x2b/0xa0
[ 33.825482] ? _cpu_up+0x6d/0xc0
[ 33.832068] ? rest_init+0x70/0x70
[ 33.838998] ? do_cpu_up+0x77/0xa0
[ 33.845931] ? smp_init+0xd7/0xdc
[ 33.852688] ? kernel_init_freeable+0xe3/0x1fd
[ 33.861705] ? rest_init+0x70/0x70
[ 33.868635] ? rest_init+0x70/0x70
[ 33.875568] ? kernel_init+0x5/0x100
[ 33.882848] ? ret_from_fork+0x25/0x30
[ 33.890473] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00
0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc

Looking into the state of cpu 1 I find the following backtrace (created
manually by looking up addresses from a stack dump retrieved from the
hypervisor):

find_cpio_data()
find_microcode_in_initrd()
__load_ucode_intel()
load_ucode_intel_ap()
cpu_init()
cpu_bringup()
cpu_bringup_and_idle()

It seems as if load_ucode_intel_ap() is looping. You introduced a
possibly endless loop in it with commit fe055896.

I'm not sure whether it is best to add a maximum loop counter or to
correct the situation in another way.


Juergen