kexec crash on OVMF i386 + x86_64 kernel (Re: [PATCH v4] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel)

From: Junichi Nomura
Date: Tue Apr 16 2019 - 19:11:30 EST


On 4/16/19 6:45 PM, Borislav Petkov wrote:
> On Mon, Apr 15, 2019 at 11:14:34PM +0000, Junichi Nomura wrote:
>> I see kexec is only supported on 64bit kernel. But are we sure
>> we don't need to support kexec on EFI32 + 64bit kernel?
>>
>> I don't have such an environment and as far as I tried with OVMF i386
>> and KVM guest, that combination doesn't work reliably even with v5.0.
>
> What does that mean exactly?
>
> If it can be fixed, we can try to.

When I do kexec on OVMF i386 + x86_64 kernel, 1st kexec seems to work.
But 2nd kexec (i.e. kexec from kexec-booted system) causes kernel
crash during boot like this:

[ 69.907176] kexec_core: Starting new kernel
early console in extract_kernel
input_data: 0x000000003e7a73b1
input_len: 0x00000000004464c8
output: 0x000000003d600000
output_len: 0x00000000015c7248
kernel_total_size: 0x000000000142c000
trampoline_32bit: 0x000000000009d000
booted via startup_64()
Physical KASLR using RDRAND RDTSC...
Virtual KASLR using RDRAND RDTSC...

Decompressing Linux... Parsing ELF... Performing relocations... done.
Booting the kernel.
[ 0.000000] Linux version 5.0.0-dirty (root@vm76) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)) #2 SMP Mon Apr 8 04:42:45 EDT 2019
[ 0.000000] Command line: root=UUID=6bea2b7b-e6cc-4dba-ac79-be6530d348f5 ro console=tty0 console=ttyS0,115200n8 no_timer_check net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 earlyprintk=serial,ttyS0,115200 kexec kexec
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000100-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000003ed74fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000003ed75000-0x000000003ee86fff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000003ee87000-0x000000003ff06fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000003ff07000-0x000000003ff5efff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000003ff5f000-0x000000003ff66fff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000003ff67000-0x000000003ff6afff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000003ff6b000-0x000000003ffcffff] usable
[ 0.000000] BIOS-e820: [mem 0x000000003ffd0000-0x000000003ffeffff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000003fff0000-0x000000003fffffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000ffe00000-0x00000000ffffffff] reserved
[ 0.000000] printk: bootconsole [earlyser0] enabled
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI not present or invalid.
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: cpu 0, msr 2238e001, primary cpu clock
[ 0.000001] kvm-clock: using sched offset of 100318497884 cycles
[ 0.001055] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.004086] tsc: Detected 2399.998 MHz processor
[ 0.005147] last_pfn = 0x40000 max_arch_pfn = 0x400000000
[ 0.006234] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
Memory KASLR using RDRAND RDTSC...
[ 0.008079] x2apic: enabled by BIOS, switching to x2apic ops
[ 0.020284] RAMDISK: [mem 0x3b8da000-0x3d5fffff]
[ 0.021169] ACPI: Early table checksum verification disabled
[ 0.022280] ACPI BIOS Error (bug): A valid RSDP was not found (20181213/tbxfroot-210)
[ 0.023755] No NUMA configuration found
[ 0.024461] Faking a node at [mem 0x0000000000000000-0x000000003fffffff]
[ 0.025746] NODE_DATA(0) allocated [mem 0x3ffa6000-0x3ffcffff]
[ 0.027098] crashkernel: memory value expected
[ 0.027918] Zone ranges:
[ 0.028384] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.029553] DMA32 [mem 0x0000000001000000-0x000000003fffffff]
[ 0.030688] Normal empty
[ 0.031217] Device empty
[ 0.031741] Movable zone start for each node
[ 0.032525] Early memory node ranges
[ 0.033212] node 0: [mem 0x0000000000001000-0x000000000009ffff]
[ 0.034377] node 0: [mem 0x0000000000100000-0x000000003ed74fff]
[ 0.035520] node 0: [mem 0x000000003ee87000-0x000000003ff06fff]
[ 0.036663] node 0: [mem 0x000000003ff6b000-0x000000003ffcffff]
[ 0.037840] node 0: [mem 0x000000003fff0000-0x000000003fffffff]
[ 0.039012] Zeroed struct page in unavailable ranges: 503 pages
[ 0.039013] Initmem setup node 0 [mem 0x0000000000001000-0x000000003fffffff]
[ 0.044319] BUG: unable to handle kernel paging request at ffffffffff5fd020
[ 0.045637] #PF error: [normal kernel read fault]
[ 0.046501] PGD 2200e067 P4D 2200e067 PUD 22010067 PMD 22011067 PTE 0
[ 0.047682] Oops: 0000 [#1] SMP
[ 0.048258] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-dirty #2
[ 0.049419] RIP: 0010:native_apic_mem_read+0x3/0x10
[ 0.050328] Code: 00 00 e8 20 3a 2b 00 48 89 d8 5b 5d c3 90 90 90 90 90 90 90 90 90 90 55 89 ff 48 89 e5 89 b7 00 d0 5f ff 5d c3 66 90 55 89 ff <8b> 87 00 d0 5f ff 48 89 e5 5d c3 66 90 e8 7b 8a 5b 00 55 b8 01 00
[ 0.053749] RSP: 0000:ffffffff88003e38 EFLAGS: 00010002
[ 0.054703] RAX: ffffffff87248840 RBX: 000000003fe09000 RCX: 0000000000000000
[ 0.056009] RDX: ffffffff88003e30 RSI: 000000000000f800 RDI: 0000000000000020
[ 0.057346] RBP: ffffffff88003e48 R08: 0000000000000000 R09: 0000000000000000
[ 0.058667] R10: 00000000000000ff R11: 0000000000000000 R12: 0000000001d254d6
[ 0.059969] R13: 000000003d600000 R14: 0000000000000000 R15: 0000000000000000
[ 0.061313] FS: 0000000000000000(0000) GS:ffffffff88173000(0000) knlGS:0000000000000000
[ 0.062812] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.063865] CR2: ffffffffff5fd020 CR3: 000000002200d000 CR4: 00000000000406b0
[ 0.065222] Call Trace:
[ 0.065670] ? read_apic_id+0x19/0x30
[ 0.066347] init_apic_mappings+0x7a/0x129
[ 0.067096] setup_arch+0xb67/0xc19
[ 0.067729] start_kernel+0x6b/0x4e3
[ 0.068386] x86_64_start_reservations+0x24/0x26
[ 0.069230] x86_64_start_kernel+0x6f/0x72
[ 0.069974] secondary_startup_64+0xa4/0xb0
[ 0.070739] Modules linked in:
[ 0.071297] CR2: ffffffffff5fd020
[ 0.071901] random: get_random_bytes called from print_oops_end_marker+0x3f/0x60 with crng_init=0
[ 0.073567] ---[ end trace 2cc66932e568af60 ]---
[ 0.074427] RIP: 0010:native_apic_mem_read+0x3/0x10
[ 0.075320] Code: 00 00 e8 20 3a 2b 00 48 89 d8 5b 5d c3 90 90 90 90 90 90 90 90 90 90 55 89 ff 48 89 e5 89 b7 00 d0 5f ff 5d c3 66 90 55 89 ff <8b> 87 00 d0 5f ff 48 89 e5 5d c3 66 90 e8 7b 8a 5b 00 55 b8 01 00
[ 0.078755] RSP: 0000:ffffffff88003e38 EFLAGS: 00010002
[ 0.079741] RAX: ffffffff87248840 RBX: 000000003fe09000 RCX: 0000000000000000
[ 0.081050] RDX: ffffffff88003e30 RSI: 000000000000f800 RDI: 0000000000000020
[ 0.082355] RBP: ffffffff88003e48 R08: 0000000000000000 R09: 0000000000000000
[ 0.083687] R10: 00000000000000ff R11: 0000000000000000 R12: 0000000001d254d6
[ 0.084996] R13: 000000003d600000 R14: 0000000000000000 R15: 0000000000000000
[ 0.086296] FS: 0000000000000000(0000) GS:ffffffff88173000(0000) knlGS:0000000000000000
[ 0.087805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.088855] CR2: ffffffffff5fd020 CR3: 000000002200d000 CR4: 00000000000406b0
[ 0.090167] Kernel panic - not syncing: Fatal exception
[ 0.091160] BUG: unable to handle kernel paging request at ffffffffff5fd030
[ 0.092438] #PF error: [normal kernel read fault]
[ 0.093301] PGD 2200e067 P4D 2200e067 PUD 22010067 PMD 22011067 PTE 0
[ 0.094480] Oops: 0000 [#2] SMP
[ 0.095094] CPU: 0 PID: 0 Comm: swapper Tainted: G D 5.0.0-dirty #2
[ 0.096478] RIP: 0010:native_apic_mem_read+0x3/0x10
[ 0.097367] Code: 00 00 e8 20 3a 2b 00 48 89 d8 5b 5d c3 90 90 90 90 90 90 90 90 90 90 55 89 ff 48 89 e5 89 b7 00 d0 5f ff 5d c3 66 90 55 89 ff <8b> 87 00 d0 5f ff 48 89 e5 5d c3 66 90 e8 7b 8a 5b 00 55 b8 01 00
[ 0.100833] RSP: 0000:ffffffff88003aa8 EFLAGS: 00010002
[ 0.101792] RAX: ffffffff87248840 RBX: 0000000000000046 RCX: 0000000000000000
[ 0.103130] RDX: 0000000000000080 RSI: 0000000000002000 RDI: 0000000000000030
[ 0.104433] RBP: ffffffff88003ac0 R08: 0000000000000001 R09: 0000000000000080
[ 0.105733] R10: ffffffff88160ca0 R11: ffffffff8818a428 R12: 0000000000000000
[ 0.107070] R13: 0000000000000046 R14: ffffffff88013740 R15: 000000000000000b
[ 0.108382] FS: 0000000000000000(0000) GS:ffffffff88173000(0000) knlGS:0000000000000000
[ 0.109872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.110926] CR2: ffffffffff5fd030 CR3: 000000002200d000 CR4: 00000000000406b0
[ 0.112265] Call Trace:
[ 0.112712] ? clear_local_APIC+0x37/0x2f0
[ 0.113463] disable_local_APIC+0x22/0x60
[ 0.114200] native_stop_other_cpus+0xc8/0x160
[ 0.115048] panic+0x11a/0x2a8
[ 0.115606] oops_end+0xc1/0xd0
[ 0.116188] no_context+0x1eb/0x550
[ 0.116826] __bad_area_nosemaphore.constprop.30+0x50/0x1d0
[ 0.117852] bad_area_nosemaphore+0x13/0x20
[ 0.118618] do_kern_addr_fault+0x5c/0x90
[ 0.119387] __do_page_fault+0x382/0x440
[ 0.120109] ? memmap_init_zone+0x8f/0x22d
[ 0.120851] do_page_fault+0x32/0x120
[ 0.121521] page_fault+0x1e/0x30
[ 0.122128] RIP: 0010:native_apic_mem_read+0x3/0x10
[ 0.123045] Code: 00 00 e8 20 3a 2b 00 48 89 d8 5b 5d c3 90 90 90 90 90 90 90 90 90 90 55 89 ff 48 89 e5 89 b7 00 d0 5f ff 5d c3 66 90 55 89 ff <8b> 87 00 d0 5f ff 48 89 e5 5d c3 66 90 e8 7b 8a 5b 00 55 b8 01 00
[ 0.126481] RSP: 0000:ffffffff88003e38 EFLAGS: 00010002
[ 0.127502] RAX: ffffffff87248840 RBX: 000000003fe09000 RCX: 0000000000000000
[ 0.128807] RDX: ffffffff88003e30 RSI: 000000000000f800 RDI: 0000000000000020
[ 0.130161] RBP: ffffffff88003e48 R08: 0000000000000000 R09: 0000000000000000
[ 0.131464] R10: 00000000000000ff R11: 0000000000000000 R12: 0000000001d254d6
[ 0.132771] R13: 000000003d600000 R14: 0000000000000000 R15: 0000000000000000
[ 0.134081] ? native_apic_mem_write+0x10/0x10
[ 0.134892] ? read_apic_id+0x19/0x30
[ 0.135564] init_apic_mappings+0x7a/0x129
[ 0.136316] setup_arch+0xb67/0xc19
[ 0.136954] start_kernel+0x6b/0x4e3
[ 0.137656] x86_64_start_reservations+0x24/0x26
[ 0.138576] x86_64_start_kernel+0x6f/0x72
[ 0.139329] secondary_startup_64+0xa4/0xb0
[ 0.140096] Modules linked in:
[ 0.140653] CR2: ffffffffff5fd030
[ 0.141259] ---[ end trace 2cc66932e568af61 ]---
[ 0.142102] RIP: 0010:native_apic_mem_read+0x3/0x10
[ 0.142992] Code: 00 00 e8 20 3a 2b 00 48 89 d8 5b 5d c3 90 90 90 90 90 90 90 90 90 90 55 89 ff 48 89 e5 89 b7 00 d0 5f ff 5d c3 66 90 55 89 ff <8b> 87 00 d0 5f ff 48 89 e5 5d c3 66 90 e8 7b 8a 5b 00 55 b8 01 00
[ 0.146520] RSP: 0000:ffffffff88003e38 EFLAGS: 00010002
[ 0.147533] RAX: ffffffff87248840 RBX: 000000003fe09000 RCX: 0000000000000000
[ 0.148845] RDX: ffffffff88003e30 RSI: 000000000000f800 RDI: 0000000000000020
[ 0.150191] RBP: ffffffff88003e48 R08: 0000000000000000 R09: 0000000000000000
[ 0.151492] R10: 00000000000000ff R11: 0000000000000000 R12: 0000000001d254d6
[ 0.152796] R13: 000000003d600000 R14: 0000000000000000 R15: 0000000000000000
[ 0.154103] FS: 0000000000000000(0000) GS:ffffffff88173000(0000) knlGS:0000000000000000
[ 0.155579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.156625] CR2: ffffffffff5fd030 CR3: 000000002200d000 CR4: 00000000000406b0
[ 0.157967] Kernel panic - not syncing: Fatal exception
<repeating panic>


Libvirt configuration of the VM looks like this:

<os>
<type arch='x86_64' machine='pc'>hvm</type>
<loader readonly='yes' type='pflash'>/usr/share/edk2.git/ovmf-ia32/OVMF_CODE-pure-efi.fd</loader>
<nvram template='/usr/share/edk2.git/ovmf-ia32/OVMF_VARS-pure-efi.fd'>/var/lib/libvirt/qemu/nvram/vm76_VARS-32.fd</nvram>
<kernel>/var/lib/libvirt/boot/vmlinuz-5.0.0-dirty</kernel>
<initrd>/var/lib/libvirt/boot/initramfs-5.0.0-dirty.img</initrd>
<cmdline>root=UUID=6bea2b7b-e6cc-4dba-ac79-be6530d348f5 ro console=tty0 console=ttyS0,115200n8 no_timer_check net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 earlyprintk=serial,ttyS0,115200</cmdline>
<boot dev='hd'/>
</os>

--
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.