[PATCH] KVM: arm64: Prevent kmemleak from accessing pKVM memory

From: Quentin Perret
Date: Thu Jun 16 2022 - 12:11:58 EST


Commit a7259df76702 ("memblock: make memblock_find_in_range method
private") changed the API using which memory is reserved for the pKVM
hypervisor. However, it seems that memblock_phys_alloc() differs
from the original API in terms of kmemleak semantics -- the old one
excluded the reserved regions from kmemleak scans when the new one
doesn't seem to. Unfortunately, when protected KVM is enabled, all
kernel accesses to pKVM-private memory result in a fatal exception,
which can now happen because of kmemleak scans:

$ echo scan > /sys/kernel/debug/kmemleak
[ 34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290!
[ 34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000
[ 34.991813] Kernel panic - not syncing: HYP panic:
[ 34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800
[ 34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000
[ 34.991813] VCPU:0000000000000000
[ 34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102
[ 34.994059] Hardware name: linux,dummy-virt (DT)
[ 34.994452] Call trace:
[ 34.994641] dump_backtrace.part.0+0xcc/0xe0
[ 34.994932] show_stack+0x18/0x6c
[ 34.995094] dump_stack_lvl+0x68/0x84
[ 34.995276] dump_stack+0x18/0x34
[ 34.995484] panic+0x16c/0x354
[ 34.995673] __hyp_pgtable_total_pages+0x0/0x60
[ 34.995933] scan_block+0x74/0x12c
[ 34.996129] scan_gray_list+0xd8/0x19c
[ 34.996332] kmemleak_scan+0x2c8/0x580
[ 34.996535] kmemleak_write+0x340/0x4a0
[ 34.996744] full_proxy_write+0x60/0xbc
[ 34.996967] vfs_write+0xc4/0x2b0
[ 34.997136] ksys_write+0x68/0xf4
[ 34.997311] __arm64_sys_write+0x20/0x2c
[ 34.997532] invoke_syscall+0x48/0x114
[ 34.997779] el0_svc_common.constprop.0+0x44/0xec
[ 34.998029] do_el0_svc+0x2c/0xc0
[ 34.998205] el0_svc+0x2c/0x84
[ 34.998421] el0t_64_sync_handler+0xf4/0x100
[ 34.998653] el0t_64_sync+0x18c/0x190
[ 34.999252] SMP: stopping secondary CPUs
[ 35.000034] Kernel Offset: disabled
[ 35.000261] CPU features: 0x800,00007831,00001086
[ 35.000642] Memory Limit: none
[ 35.001329] ---[ end Kernel panic - not syncing: HYP panic:
[ 35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800
[ 35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000
[ 35.001329] VCPU:0000000000000000 ]---

Fix this by explicitly excluding the hypervisor's memory pool from
kmemleak like we already do for the hyp BSS.

Cc: Mike Rapoport <rppt@xxxxxxxxxx>
Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Quentin Perret <qperret@xxxxxxxxxx>
---
An alternative could be to actually exclude memory allocated using
memblock_phys_alloc_range() from kmemleak scans to revert back to the
old behaviour. But nobody else has complained about this AFAIK, so I'd
be inclined to keep this local to pKVM. No strong opinion.
---
arch/arm64/kvm/arm.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 400bb0fe2745..28765bd22efb 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2110,11 +2110,11 @@ static int finalize_hyp_mode(void)
return 0;

/*
- * Exclude HYP BSS from kmemleak so that it doesn't get peeked
- * at, which would end badly once the section is inaccessible.
- * None of other sections should ever be introspected.
+ * Exclude HYP sections from kmemleak so that they don't get peeked
+ * at, which would end badly once inaccessible.
*/
kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start);
+ kmemleak_free_part(__va(hyp_mem_base), hyp_mem_size);
return pkvm_drop_host_privileges();
}

--
2.36.1.476.g0c4daa206d-goog