Re: [PATCH] random: Fix kernel panic due to system_wq use before init

From: Waiman Long
Date: Sun Sep 18 2016 - 23:09:30 EST


On 09/14/2016 03:19 PM, Linus Torvalds wrote:
On Wed, Sep 14, 2016 at 12:14 PM, Waiman Long<waiman.long@xxxxxxx> wrote:
In the stack backtrace above, the kernel hadn't even reached SMP boot after
about 50s. That was extremely slow. I tried the 4.7.3 kernel and it booted
up fine. So I suspect that there may be too many interrupts going on and it
consumes most of the CPU cycles. The prime suspect is the random driver, I
think.
Any chance of bisecting it at least partially? The random driver
doesn't do interrupts itself, it just gets called by other drivers
doing intterrupts. So if there are too many of them, that would be
something else..

Linus

I have finally finished bisecting the problem. I was wrong in saying that the 4.7.3 kernel had no problem. It did have. There were some slight differences between the 4.8 and 4.7 kernel config files that I used. After some further testing, it was found that the bootup problem only happened when the following kernel config option was defined:

CONFIG_EFI_MIXED=y

Bisecting reviewed that the following 4.6 patch was the first patch that had this problem:

c9f2a9a65e4855b74d92cdad688f6ee4a1a323ff
[PATCH] x86/efi: Hoist page table switching code into efi_call_virt()

I did testing on my test system with three different partition sizes:
1) 16-socket Broadwell-EX with 12TB memory
2) 8-socket Broadwell-EX with 6TB memory
3) 4-socket Broadwell-EX with 3TB memory

Only the 16-socket and 8-socket configurations had this problem. I am not sure if over 4TB of main memory is a factor or not.

I have attached several slightly different panic messages that had happened in my testing. I know little about the EFI code and so I am not sure if it is a kernel problem, firmware problem or a combination of both. Hopefully someone with knowledge on this code will shed light on this problem.

Cheers,
Longman
commit 1bb6936473c07b5a7c8daced1000893b7145bb14
Author: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
Date: Mon Feb 1 22:07:00 2016 +0000

efi: Runtime-wrapper: Get rid of the rtc_lock spinlock

The rtc_lock spinlock aims to serialize access to the CMOS RTC
between the UEFI firmware and the kernel drivers that use it
directly. However, x86 is the only arch that performs such
direct accesses, and that never uses the time related UEFI
runtime services. Since no other UEFI enlightened architectures
have a legcay CMOS RTC anyway, we can remove the rtc_lock
spinlock entirely.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
Signed-off-by: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Brian Gerst <brgerst@xxxxxxxxx>
Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: linux-efi@xxxxxxxxxxxxxxx
Link: http://lkml.kernel.org/r/1454364428-494-7-git-send-email-matt@codeblue
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>

-----------------------------------------------------------------------------
[ 0.000000] ACPI: X2APIC_NMI (uid[0x16b] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x16c] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x16d] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x16e] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x16f] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x170] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x171] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x172] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x173] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x174] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x175] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x176] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x177] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x178] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x179] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x17a] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x17b] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x17c] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x17d] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x17e] high level lint[0x1])
[ 0.000000] ACPI: X2APIC_NMI (uid[0x17f] high level lint[0x1])
[ 0.000000] IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
[ 0.000000] IOAPIC[1]: apic_id 9, version 32, address 0xfec01000, GSI 24-47
[ 0.000000] IOAPIC[2]: apic_id 10, version 32, address 0xfec04000, GSI 48-71
[ 0.000000] IOAPIC[3]: apic_id 11, version 32, address 0xfec08000, GSI 72-95
[ 0.000000] IOAPIC[4]: apic_id 12, version 32, address 0xfec09000, GSI 96-119
[ 0.000000] IOAPIC[5]: apic_id 13, version 32, address 0xfec0c000, GSI 120-143
[ 0.000000] IOAPIC[6]: apic_id 14, version 32, address 0xfec10000, GSI 144-167
[ 0.000000] IOAPIC[7]: apic_id 15, version 32, address 0xfec11000, GSI 168-191
[ 0.000000] IOAPIC[8]: apic_id 16, version 32, address 0xfec14000, GSI 192-215
[ 0.000000] IOAPIC[9]: apic_id 17, version 32, address 0xfec18000, GSI 216-239
[ 0.000000] IOAPIC[10]: apic_id 18, version 32, address 0xfec19000, GSI 240-263
[ 0.000000] IOAPIC[11]: apic_id 19, version 32, address 0xfec1c000, GSI 264-287
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 0.000000] IP: [<ffffffff811ec17d>] kmem_cache_alloc_trace+0xad/0x1c0
[ 0.000000] PGD 0
[ 0.000000] Oops: 0000 [#1] SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc2+ #18
[ 0.000000] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 008.004.020 SFW: 043.011.000 07/04/2016
[ 0.000000] task: ffffffff81c114c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 0.000000] RIP: 0010:[<ffffffff811ec17d>] [<ffffffff811ec17d>] kmem_cache_alloc_trace+0xad/0x1c0
[ 0.000000] RSP: 0000:ffffffff81c03d90 EFLAGS: 00010046
[ 0.000000] RAX: ffffffff81cd8618 RBX: 0000000000000000 RCX: 0000000000000002
[ 0.000000] RDX: 0000000000000018 RSI: 00000000024080c0 RDI: 0000000000000000
[ 0.000000] RBP: ffffffff81c03dc8 R08: 000000000000000f R09: 00000000fffffffe
[ 0.000000] R10: ffffffff813c55b8 R11: 00000000000003ed R12: 00000000024080c0
[ 0.000000] R13: 0000000000000000 R14: 0000000000000016 R15: 0000000000000000
[ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81d63000(0000) knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.000000] CR2: 0000000000000018 CR3: 0000000001c0a000 CR4: 00000000000406b0
[ 0.000000] Stack:
[ 0.000000] ffffffff81c03dd0 0000000000000018 0000000000000016 0000000001000000
[ 0.000000] ffffffff81cd8620 0000000000000016 0000000000000000 ffffffff81c03df0
[ 0.000000] ffffffff813c55b8 0000000000000016 0000000001000000 0000000000000016
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff813c55b8>] acpi_irq_set_penalty+0x60/0x8e
[ 0.000000] [<ffffffff813c5607>] acpi_irq_add_penalty+0x21/0x26
[ 0.000000] [<ffffffff813c5b95>] acpi_penalize_sci_irq+0x25/0x28
[ 0.000000] [<ffffffff81d905f7>] acpi_sci_ioapic_setup+0x68/0x78
[ 0.000000] [<ffffffff81d910e6>] acpi_boot_init+0x2cc/0x533
[ 0.000000] [<ffffffff81067978>] ? set_pte_vaddr_pud+0x48/0x50
[ 0.000000] [<ffffffff81d908b9>] ? acpi_parse_x2apic+0x77/0x77
[ 0.000000] [<ffffffff81d90842>] ? dmi_ignore_irq0_timer_override+0x30/0x30
[ 0.000000] [<ffffffff81d85c08>] setup_arch+0xc0e/0xcd3
[ 0.000000] [<ffffffff81d7c120>] ? early_idt_handler_array+0x120/0x120
[ 0.000000] [<ffffffff81d7cd94>] start_kernel+0xfc/0x506
[ 0.000000] [<ffffffff81d7c120>] ? early_idt_handler_array+0x120/0x120
[ 0.000000] [<ffffffff81d7c120>] ? early_idt_handler_array+0x120/0x120
[ 0.000000] [<ffffffff81d7c5ee>] x86_64_start_reservations+0x2a/0x2c
[ 0.000000] [<ffffffff81d7c73c>] x86_64_start_kernel+0x14c/0x16f
[ 0.000000] Code: 89 f0 49 8d 30 e8 54 f4 15 00 84 c0 74 b9 49 63 47 20 41 f7 c4 00 80 00 00 0f 18 0c 03 0f 85 fe 00 00 00 0f 1f 44 00 00 4c 89 f3 <4d> 63 7d 18 4c 8b 75 08 0f 1f 44 00 00 48 83 c4 10 48 89 d8 5b
[ 0.000000] RIP [<ffffffff811ec17d>] kmem_cache_alloc_trace+0xad/0x1c0
[ 0.000000] RSP <ffffffff81c03d90>
[ 0.000000] CR2: 0000000000000018
[ 0.000000] ---[ end trace 8e3ca9eeb1bcd5f8 ]---
[ 0.000000] Kernel panic - not syncing: Fatal exception
[ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception

===========================================================================
commit 98f91276900fa07d6f1c4ae4f120d69962f6433c
Merge: ff3d0a1 50a0cb5
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Date: Sat Dec 19 21:24:52 2015 +0100

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mflemi

Pull efi changes from Matt Fleming:

* We don't need to carry our own formatting code in the esrt driver
because the kobject API can do that for us - Rasmus Villemoes

* Update the arm64 file paths in Documentation/efi-stub.txt to match
the current tree - Alan Ott

* Consistently preface all print statements with "efi" arch/x86 so
that it's more obvious to users reporting problems which statements
in the kernel log are relevant for EFI - Matt Fleming

* Fix a boot crash in the ACPI BGRT driver and delete
efi_lookup_mapped_addr() since it's useless now that the EFI
mappings *only* exist in the 'efi_pgd' page table. Instead we
always early_memremap() the BGRT memory - Sai Praneeth Prakhya


-----------------------------------------------------------------------------
[ 0.012059] pid_max: default: 393216 minimum: 3072
[ 0.017536] ACPI: Core revision 20150930
[ 0.058871] ACPI: 5 ACPI AML tables successfully acquired and loaded
[ 8.222110] random: nonblocking pool is initialized
[ 105.569340] BUG: unable to handle kernel NULL pointer dereference at 0000000000000100
[ 105.577965] IP: [<ffffffff8109da52>] __queue_work+0x32/0x300
[ 105.584204] PGD 0
[ 105.586419] Oops: 0000 [#1] SMP
[ 105.589989] Modules linked in:
[ 105.593361] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-rc2+ #19
[ 105.600349] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 008.004.020 SFW: 043.011.000 07/04/2016
[ 105.610580] task: ffffffff81c114c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 105.618808] RIP: 0010:[<ffffffff8109da52>] [<ffffffff8109da52>] __queue_work+0x32/0x300
[ 105.627718] RSP: 0000:ffff88bd7f403dc0 EFLAGS: 00010046
[ 105.633554] RAX: 0000000000000086 RBX: 0000000000000087 RCX: ffffffff81ce44e0
[ 105.641400] RDX: ffffffff81ce4480 RSI: 0000000000000000 RDI: 0000000000002000
[ 105.649246] RBP: ffff88bd7f403df8 R08: 0000000000000000 R09: 0000000000004000
[ 105.657088] R10: ffffffff8221e0ec R11: 0000000000007ffe R12: ffffffff81ce4480
[ 105.664929] R13: 0000000000002000 R14: 0000000000000000 R15: ffffffff81aa335a
[ 105.672770] FS: 0000000000000000(0000) GS:ffff88bd7f400000(0000) knlGS:0000000000000000
[ 105.681662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 105.687975] CR2: 0000000000000100 CR3: 0000000001c0a000 CR4: 00000000000406b0
[ 105.695824] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 105.703670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 105.711510] Stack:
[ 105.713722] 0000000000000046 000020007f403dd8 0000000000000087 ffffffff81ce4560
[ 105.721885] 0000000000000570 ffffffff81ce45b0 ffffffff81aa335a ffff88bd7f403e10
[ 105.730045] ffffffff8109dd47 0000000000000381 ffff88bd7f403e60 ffffffff81429ec9
[ 105.738206] Call Trace:
[ 105.740895] <IRQ>
[ 105.743012] [<ffffffff8109dd47>] queue_work_on+0x27/0x40
[ 105.749146] [<ffffffff81429ec9>] credit_entropy_bits+0x1e9/0x350
[ 105.755856] [<ffffffff81064191>] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x20
[ 105.765036] [<ffffffff8142d00f>] ? add_interrupt_randomness+0x18f/0x1e0
[ 105.772401] [<ffffffff8142d00f>] add_interrupt_randomness+0x18f/0x1e0
[ 105.779586] [<ffffffff810e0a42>] handle_irq_event_percpu+0x92/0x180
[ 105.786568] [<ffffffff810e0b6b>] handle_irq_event+0x3b/0x60
[ 105.792792] [<ffffffff810e3f22>] handle_level_irq+0x82/0x100
[ 105.799110] [<ffffffff81019edb>] handle_irq+0xab/0x140
[ 105.804859] [<ffffffff8108b6e1>] ? _local_bh_enable+0x21/0x50
[ 105.811279] [<ffffffff8169e19d>] do_IRQ+0x4d/0xd0
[ 105.816544] [<ffffffff8169c207>] common_interrupt+0x87/0x87
[ 105.822762] <EOI>
[ 105.824883] [<ffffffff8106bd45>] ? __change_page_attr_set_clr+0xa5/0x2c0
[ 105.832543] [<ffffffff8106bd24>] ? __change_page_attr_set_clr+0x84/0x2c0
[ 105.840008] [<ffffffff8120218a>] ? __slab_alloc+0x4d/0x5c
[ 105.846045] [<ffffffff8106d36e>] kernel_map_pages_in_pgd+0x7e/0xc0
[ 105.852944] [<ffffffff81d96f49>] efi_setup_page_tables+0xc9/0x1d3
[ 105.859735] [<ffffffff81d96a2c>] efi_enter_virtual_mode+0x2ea/0x43d
[ 105.866730] [<ffffffff81d760e2>] start_kernel+0x447/0x4f0
[ 105.872762] [<ffffffff81d75a86>] ? set_init_arg+0x55/0x55
[ 105.878789] [<ffffffff81d75120>] ? early_idt_handler_array+0x120/0x120
[ 105.886058] [<ffffffff81d755ee>] x86_64_start_reservations+0x2a/0x2c
[ 105.893137] [<ffffffff81d7573c>] x86_64_start_kernel+0x14c/0x16f
[ 105.899831] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 ff 14 25 40 b5 c2 81 f6 c4 02 0f 85 b5 01 00 00 <41> 8b 86 00 01 00 00 a9 00 00 01 00 0f 85 cd 01 00 00 49 c7 c7
[ 105.921142] RIP [<ffffffff8109da52>] __queue_work+0x32/0x300
[ 105.927465] RSP <ffff88bd7f403dc0>
[ 105.931295] CR2: 0000000000000100
[ 105.934953] ---[ end trace c283e1394f7b3e18 ]---
[ 105.940029] Kernel panic - not syncing: Fatal exception in interrupt
[ 105.947025] ---[ end Kernel panic - not syncing: Fatal exception in interrupt


===========================================================================

commit 67a9108ed4313b85a9c53406d80dc1ae3f8c3e36
Author: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx>
Date: Fri Nov 27 21:09:34 2015 +0000

x86/efi: Build our own page table structures

With commit e1a58320a38d ("x86/mm: Warn on W^X mappings") all
users booting on 64-bit UEFI machines see the following warning,

------------[ cut here ]------------
WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5d
x86/mm: Found insecure W+X mapping at address ffff88000005f000/0xffff88000
...
x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found.
...

This is caused by mapping EFI regions with RWX permissions.
There isn't much we can do to restrict the permissions for these
regions due to the way the firmware toolchains mix code and
data, but we can at least isolate these mappings so that they do
not appear in the regular kernel page tables.

In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual
mapping") we started using 'trampoline_pgd' to map the EFI
regions because there was an existing identity mapping there
which we use during the SetVirtualAddressMap() call and for
broken firmware that accesses those addresses.

But 'trampoline_pgd' shares some PGD entries with
'swapper_pg_dir' and does not provide the isolation we require.
Notably the virtual address for __START_KERNEL_map and
MODULES_START are mapped by the same PGD entry so we need to be
more careful when copying changes over in
efi_sync_low_kernel_mappings().

This patch doesn't go the full mile, we still want to share some
PGD entries with 'swapper_pg_dir'. Having completely separate
page tables brings its own issues such as synchronising new
mappings after memory hotplug and module loading. Sharing also
keeps memory usage down.

Signed-off-by: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx>
Reviewed-by: Borislav Petkov <bp@xxxxxxx>
Acked-by: Borislav Petkov <bp@xxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Brian Gerst <brgerst@xxxxxxxxx>
Cc: Dave Jones <davej@xxxxxxxxxxxxxxxxx>
Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@xxxxxxxxx>
Cc: Stephen Smalley <sds@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Toshi Kani <toshi.kani@xxxxxx>
Cc: linux-efi@xxxxxxxxxxxxxxx
Link: http://lkml.kernel.org/r/1448658575-17029-6-git-send-email-matt@codebl
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>


-----------------------------------------------------------------------------
[ 0.012057] pid_max: default: 393216 minimum: 3072
[ 0.017537] ACPI: Core revision 20150930
[ 0.058986] ACPI: 5 ACPI AML tables successfully acquired and loaded
[ 8.223335] random: nonblocking pool is initialized
[ 105.570225] BUG: unable to handle kernel NULL pointer dereference at 0000000000000100
[ 105.578853] IP: [<ffffffff8109da52>] __queue_work+0x32/0x300
[ 105.585092] PGD 0
[ 105.587308] Oops: 0000 [#1] SMP
[ 105.590879] Modules linked in:
[ 105.594245] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-rc2+ #20
[ 105.601232] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 008.004.020 SFW: 043.011.000 07/04/2016
[ 105.611463] task: ffffffff81c114c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 105.619692] RIP: 0010:[<ffffffff8109da52>] [<ffffffff8109da52>] __queue_work+0x32/0x300
[ 105.628600] RSP: 0000:ffff88bd7f403dc0 EFLAGS: 00010046
[ 105.634435] RAX: 0000000000000086 RBX: 0000000000000087 RCX: ffffffff81ce44e0
[ 105.642282] RDX: ffffffff81ce4480 RSI: 0000000000000000 RDI: 0000000000002000
[ 105.650128] RBP: ffff88bd7f403df8 R08: 0000000000000000 R09: 0000000000004000
[ 105.657971] R10: ffffffff8221e0ec R11: 0000000000007ffe R12: ffffffff81ce4480
[ 105.665813] R13: 0000000000002000 R14: 0000000000000000 R15: ffffffff81aa32c2
[ 105.673655] FS: 0000000000000000(0000) GS:ffff88bd7f400000(0000) knlGS:0000000000000000
[ 105.682548] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 105.688862] CR2: 0000000000000100 CR3: 0000000001c0a000 CR4: 00000000000406b0
[ 105.696707] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 105.704549] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 105.712394] Stack:
[ 105.714606] 0000000000000046 000020007f403dd8 0000000000000087 ffffffff81ce4560
[ 105.722767] 0000000000000570 ffffffff81ce45b0 ffffffff81aa32c2 ffff88bd7f403e10
[ 105.730927] ffffffff8109dd47 0000000000000381 ffff88bd7f403e60 ffffffff81429ec9
[ 105.739089] Call Trace:
[ 105.741777] <IRQ>
[ 105.743894] [<ffffffff8109dd47>] queue_work_on+0x27/0x40
[ 105.750027] [<ffffffff81429ec9>] credit_entropy_bits+0x1e9/0x350
[ 105.756736] [<ffffffff81064191>] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x20
[ 105.765918] [<ffffffff8142d00f>] ? add_interrupt_randomness+0x18f/0x1e0
[ 105.773283] [<ffffffff8142d00f>] add_interrupt_randomness+0x18f/0x1e0
[ 105.780468] [<ffffffff810e0a42>] handle_irq_event_percpu+0x92/0x180
[ 105.787452] [<ffffffff810e0b6b>] handle_irq_event+0x3b/0x60
[ 105.793681] [<ffffffff810e3f22>] handle_level_irq+0x82/0x100
[ 105.799998] [<ffffffff81019edb>] handle_irq+0xab/0x140
[ 105.805740] [<ffffffff8108b6e1>] ? _local_bh_enable+0x21/0x50
[ 105.812155] [<ffffffff8169e25d>] do_IRQ+0x4d/0xd0
[ 105.817420] [<ffffffff8169c2c7>] common_interrupt+0x87/0x87
[ 105.823638] <EOI>
[ 105.825755] [<ffffffff8169b460>] ? _raw_spin_lock+0x10/0x30
[ 105.832184] [<ffffffff8106bd0d>] ? __change_page_attr_set_clr+0x6d/0x2c0
[ 105.839650] [<ffffffff8120218a>] ? __slab_alloc+0x4d/0x5c
[ 105.845687] [<ffffffff8106d36e>] kernel_map_pages_in_pgd+0x7e/0xc0
[ 105.852586] [<ffffffff81d96f49>] efi_setup_page_tables+0xc9/0x1d3
[ 105.859382] [<ffffffff81d96a2c>] efi_enter_virtual_mode+0x2ea/0x43d
[ 105.866377] [<ffffffff81d760e2>] start_kernel+0x447/0x4f0
[ 105.872409] [<ffffffff81d75a86>] ? set_init_arg+0x55/0x55
[ 105.878439] [<ffffffff81d75120>] ? early_idt_handler_array+0x120/0x120
[ 105.885708] [<ffffffff81d755ee>] x86_64_start_reservations+0x2a/0x2c
[ 105.892786] [<ffffffff81d7573c>] x86_64_start_kernel+0x14c/0x16f
[ 105.899483] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 5348 83 ec 10 89 7d d4 ff 14 25 40 b5 c2 81 f6 c4 02 0f 85 b5 01 00 00 <41> 8b 86 00 01 00 00 a9 00 00 01 00 0f 85 cd 01 00 00 49 c7 c7
[ 105.920801] RIP [<ffffffff8109da52>] __queue_work+0x32/0x300
[ 105.927124] RSP <ffff88bd7f403dc0>
[ 105.930960] CR2: 0000000000000100
[ 105.934610] ---[ end trace 57cf74f3fc81ae32 ]---
[ 105.939683] Kernel panic - not syncing: Fatal exception in interrupt
[ 105.946678] ---[ end Kernel panic - not syncing: Fatal exception in interrupt


===========================================================================
From c9f2a9a65e4855b74d92cdad688f6ee4a1a323ff Mon Sep 17 00:00:00 2001
From: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx>
Date: Fri, 27 Nov 2015 21:09:33 +0000
Subject: [PATCH] x86/efi: Hoist page table switching code into efi_call_virt()

This change is a prerequisite for pending patches that switch to
a dedicated EFI page table, instead of using 'trampoline_pgd'
which shares PGD entries with 'swapper_pg_dir'. The pending
patches make it impossible to dereference the runtime service
function pointer without first switching %cr3.

It's true that we now have duplicated switching code in
efi_call_virt() and efi_call_phys_{prolog,epilog}() but we are
sacrificing code duplication for a little more clarity and the
ease of writing the page table switching code in C instead of
asm.

Signed-off-by: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx>
Reviewed-by: Borislav Petkov <bp@xxxxxxx>
Acked-by: Borislav Petkov <bp@xxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Brian Gerst <brgerst@xxxxxxxxx>
Cc: Dave Jones <davej@xxxxxxxxxxxxxxxxx>
Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@xxxxxxxxx>
Cc: Stephen Smalley <sds@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Toshi Kani <toshi.kani@xxxxxx>
Cc: linux-efi@xxxxxxxxxxxxxxx

-----------------------------------------------------------------------------
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 2194.980 MHz processor
[ 0.000360] Calibrating delay loop (skipped), value calculated using timer frequency.. 4389.96 BogoMIPS (lpj=2194980)
[ 0.012063] pid_max: default: 393216 minimum: 3072
[ 0.017537] ACPI: Core revision 20150930
[ 0.058735] ACPI: 5 ACPI AML tables successfully acquired and loaded
[ 8.224598] random: nonblocking pool is initialized
[ 105.585551] BUG: unable to handle kernel NULL pointer dereference at 0000000000000100
[ 105.594195] IP: [<ffffffff8109d932>] __queue_work+0x32/0x300
[ 105.600438] PGD 0
[ 105.602657] Oops: 0000 [#1] SMP
[ 105.606223] Modules linked in:
[ 105.609595] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-rc2+ #22
[ 105.616583] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 008.004.020 SFW: 043.011.000 07/04/2016
[ 105.626819] task: ffffffff81c114c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 105.635049] RIP: 0010:[<ffffffff8109d932>] [<ffffffff8109d932>] __queue_work+0x32/0x300
[ 105.643960] RSP: 0000:ffff88bd7f403dc0 EFLAGS: 00010046
[ 105.649801] RAX: 0000000000000086 RBX: 0000000000000087 RCX: ffffffff81ce44e0
[ 105.657647] RDX: ffffffff81ce4480 RSI: 0000000000000000 RDI: 0000000000002000
[ 105.665492] RBP: ffff88bd7f403df8 R08: 0000000000000000 R09: 0000000000004000
[ 105.673339] R10: ffffffff8221e0ec R11: 0000000000007ffe R12: ffffffff81ce4480
[ 105.681181] R13: 0000000000002000 R14: 0000000000000000 R15: ffffffff81aa3252
[ 105.689031] FS: 0000000000000000(0000) GS:ffff88bd7f400000(0000) knlGS:0000000000000000
[ 105.697925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 105.704239] CR2: 0000000000000100 CR3: 0000000001c0a000 CR4: 00000000000406b0
[ 105.712086] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 105.719927] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 105.727767] Stack:
[ 105.729979] 0000000000000046 000020007f403dd8 0000000000000087 ffffffff81ce4560
[ 105.738147] 0000000000000570 ffffffff81ce45b0 ffffffff81aa3252 ffff88bd7f403e10
[ 105.746310] ffffffff8109dc27 0000000000000381 ffff88bd7f403e60 ffffffff81429da9
[ 105.754470] Call Trace:
[ 105.757159] <IRQ>
[ 105.759278] [<ffffffff8109dc27>] queue_work_on+0x27/0x40
[ 105.765414] [<ffffffff81429da9>] credit_entropy_bits+0x1e9/0x350
[ 105.772125] [<ffffffff81064191>] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x20
[ 105.781315] [<ffffffff8142ceef>] ? add_interrupt_randomness+0x18f/0x1e0
[ 105.788679] [<ffffffff8142ceef>] add_interrupt_randomness+0x18f/0x1e0
[ 105.795871] [<ffffffff810e0922>] handle_irq_event_percpu+0x92/0x180
[ 105.802859] [<ffffffff810e0a4b>] handle_irq_event+0x3b/0x60
[ 105.809092] [<ffffffff810e3e02>] handle_level_irq+0x82/0x100
[ 105.815422] [<ffffffff81019edb>] handle_irq+0xab/0x140
[ 105.821174] [<ffffffff8108b5c1>] ? _local_bh_enable+0x21/0x50
[ 105.827598] [<ffffffff8169e11d>] do_IRQ+0x4d/0xd0
[ 105.832873] [<ffffffff8169c187>] common_interrupt+0x87/0x87
[ 105.839096] <EOI>
[ 105.841218] [<ffffffff8106aca5>] ? __cpa_process_fault+0x1c5/0x440
[ 105.848305] [<ffffffff8106ba45>] __change_page_attr+0x785/0x9e0
[ 105.854916] [<ffffffff8106bd18>] __change_page_attr_set_clr+0x78/0x2c0
[ 105.862202] [<ffffffff8120206a>] ? __slab_alloc+0x4d/0x5c
[ 105.868236] [<ffffffff8106d36e>] kernel_map_pages_in_pgd+0x7e/0xc0
[ 105.875134] [<ffffffff81d96e5a>] efi_setup_page_tables+0xbc/0x1c6
[ 105.881926] [<ffffffff81d96a1e>] efi_enter_virtual_mode+0x2dc/0x42f
[ 105.888920] [<ffffffff81d760e2>] start_kernel+0x447/0x4f0
[ 105.894957] [<ffffffff81d75a86>] ? set_init_arg+0x55/0x55
[ 105.900985] [<ffffffff81d75120>] ? early_idt_handler_array+0x120/0x120
[ 105.908254] [<ffffffff81d755ee>] x86_64_start_reservations+0x2a/0x2c
[ 105.915337] [<ffffffff81d7573c>] x86_64_start_kernel+0x14c/0x16f
[ 105.922031] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 ff 14 25 40 b5 c2 81 f6 c4 02 0f 85 b5 01 00 00 <41> 8b 86 00 01 00 00 a9 00 00 01 00 0f 85 cd 01 00 00 49 c7 c7
[ 105.943345] RIP [<ffffffff8109d932>] __queue_work+0x32/0x300
[ 105.949668] RSP <ffff88bd7f403dc0>
[ 105.953500] CR2: 0000000000000100
[ 105.957153] ---[ end trace c8ed4cb8590a28f7 ]---
[ 105.962229] Kernel panic - not syncing: Fatal exception in interrupt
[ 105.969220] ---[ end Kernel panic - not syncing: Fatal exception in interrupt