Re: BUG: /proc/kcore does not export direct-mapped memory on arm64 (and presumably some other architectures)

From: Laura Abbott
Date: Fri Apr 27 2018 - 20:58:39 EST


On 04/26/2018 02:16 PM, Kees Cook wrote:
On Thu, Apr 26, 2018 at 12:31 PM, Dave Anderson <anderson@xxxxxxxxxx> wrote:

While testing /proc/kcore as the live memory source for the crash utility,
it fails on arm64. The failure on arm64 occurs because only the
vmalloc/module space segments are exported in PT_LOAD segments,
and it's missing all of the PT_LOAD segments for the generic
unity-mapped regions of physical memory, as well as their associated
vmemmap sections.

The mapping of unity-mapped RAM segments in fs/proc/kcore.c is
architecture-neutral, and after debugging it, I found this as the
problem. For each chunk of physical memory, kcore_update_ram()
calls walk_system_ram_range(), passing kclist_add_private() as a
callback function to add the chunk to the kclist, and eventually
leading to the creation of a PT_LOAD segment.

kclist_add_private() does some verification of the memory region,
but this one below is bogus for arm64:

static int
kclist_add_private(unsigned long pfn, unsigned long nr_pages, void *arg)
{
... [ cut ] ...
ent->addr = (unsigned long)__va((pfn << PAGE_SHIFT));
... [ cut ] ...

/* Sanity check: Can happen in 32bit arch...maybe */
if (ent->addr < (unsigned long) __va(0))
goto free_out;

And that's because __va(0) is a bogus check for arm64. It is checking
whether the ent->addr value is less than the lowest possible unity-mapped
address. But "0" should not be used as a physical address on arm64; the
lowest legitimate physical address for this __va() check would be the arm64
PHYS_OFFSET, or memstart_addr:

Here's the arm64 __va() and PHYS_OFFSET:

#define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x)))
#define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | PAGE_OFFSET)

extern s64 memstart_addr;
/* PHYS_OFFSET - the physical address of the start of memory. */
#define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })

If PHYS_OFFSET/memstart_addr is anything other than 0 (it is 0x4000000000 on my
test system), the __va(0) calculation goes negative and creates a bogus, very
large, virtual address. And since the ent->addr virtual address is less than
bogus __va(0) address, the test fails, and the memory chunk is rejected.

Looking at the kernel sources, it seems that this would affect other
architectures as well, i.e., the ones whose __va() is not a simple
addition of the physical address with PAGE_OFFSET.

Anyway, I don't know what the best approach for an architecture-neutral
fix would be in this case. So I figured I'd throw it out to you guys for
some ideas.

I'm not as familiar with this code, but I've added Ard and Laura to CC
here, as this feels like something they'd be able to comment on. :)

-Kees


It seems backwards that we're converting a physical address to
a virtual address and then validating that. I think checking against
pfn_valid (to ensure there is a valid memmap entry)
and then checking page_to_virt against virt_addr_valid to catch
other cases (e.g. highmem or holes in the space) seems cleaner.
Maybe something like:

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index d1e82761de81..e64ecb9f2720 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -209,25 +209,34 @@ kclist_add_private(unsigned long pfn, unsigned long nr_pages, void *arg)
{
struct list_head *head = (struct list_head *)arg;
struct kcore_list *ent;
+ struct page *p;
+
+ if (!pfn_valid(pfn))
+ return 1;
+
+ p = pfn_to_page(pfn);
+ if (!memmap_valid_within(pfn, p, page_zone(p)))
+ return 1;
ent = kmalloc(sizeof(*ent), GFP_KERNEL);
if (!ent)
return -ENOMEM;
- ent->addr = (unsigned long)__va((pfn << PAGE_SHIFT));
+ ent->addr = (unsigned long)page_to_virt(p);
ent->size = nr_pages << PAGE_SHIFT;
- /* Sanity check: Can happen in 32bit arch...maybe */
- if (ent->addr < (unsigned long) __va(0))
+ if (!virt_addr_valid(ent->addr))
goto free_out;
/* cut not-mapped area. ....from ppc-32 code. */
if (ULONG_MAX - ent->addr < ent->size)
ent->size = ULONG_MAX - ent->addr;
- /* cut when vmalloc() area is higher than direct-map area */
- if (VMALLOC_START > (unsigned long)__va(0)) {
- if (ent->addr > VMALLOC_START)
- goto free_out;
+ /*
+ * We've already checked virt_addr_valid so we know this address
+ * is a valid pointer, therefore we can check against it to determine
+ * if we need to trim
+ */
+ if (VMALLOC_START > ent->addr) {
if (VMALLOC_START - ent->addr < ent->size)
ent->size = VMALLOC_START - ent->addr;
}