Re: Bug: Discontigmem virt_to_page() [Alpha,ARM,Mips64?]

From: Andrea Arcangeli (andrea@suse.de)
Date: Fri Apr 26 2002 - 17:46:41 EST


On Fri, Apr 26, 2002 at 07:27:11PM +0100, Russell King wrote:
> Hi,
>
> I've been looking at some of the ARM discontigmem implementations, and
> have come across a nasty bug. To illustrate this, I'm going to take
> part of the generic kernel, and use the Alpha implementation to
> illustrate the problem we're facing on ARM.
>
> I'm going to argue here that virt_to_page() can, in the discontigmem
> case, produce rather a nasty bug when used with non-direct mapped
> kernel memory arguments.
>
> In mm/memory.c:remap_pte_range() we have the following code:
>
> page = virt_to_page(__va(phys_addr));
> if ((!VALID_PAGE(page)) || PageReserved(page))
> set_pte(pte, mk_pte_phys(phys_addr, prot));
>
> Let's look closely at the first line:
>
> page = virt_to_page(__va(phys_addr));
>
> Essentially, what we're trying to do here is convert a physical address
> to a struct page pointer.
>
> __va() is defined, on Alpha, to be:
>
> #define __va(x) ((void *)((unsigned long) (x) + PAGE_OFFSET))
>
> so we produce a unique "va" for any physical address that is passed. No
> problem so far. Now, lets look at virt_to_page() for the Alpha:
>
> #define virt_to_page(kaddr) \
> (ADDR_TO_MAPBASE(kaddr) + LOCAL_MAP_NR(kaddr))
>
> Looks inoccuous enough. However, look closer at ADDR_TO_MAPBASE:
>
> #define ADDR_TO_MAPBASE(kaddr) \
> NODE_MEM_MAP(KVADDR_TO_NID((unsigned long)(kaddr)))
> #define NODE_MEM_MAP(nid) (NODE_DATA(nid)->node_mem_map)
> #define NODE_DATA(n) (&((PLAT_NODE_DATA(n))->gendata))
> #define PLAT_NODE_DATA(n) (plat_node_data[(n)])
>
> Ok, so here we get the map base via:
>
> plat_node_data[KVADDR_TO_NID((unsigned long)(kaddr))]->
> gendata.node_mem_map
>
> plat_node_data is declared as:
>
> plat_pg_data_t *plat_node_data[MAX_NUMNODES];
>
> Lets look closer at KVADDR_TO_NID() and MAX_NUMNODES:
>
> #define KVADDR_TO_NID(kaddr) PHYSADDR_TO_NID(__pa(kaddr))
> #define __pa(x) ((unsigned long) (x) - PAGE_OFFSET)
> #define PHYSADDR_TO_NID(pa) ALPHA_PA_TO_NID(pa)
> #define ALPHA_PA_TO_NID(pa) ((pa) >> 36) /* 16 nodes max due 43bit kseg */
>
> #define MAX_NUMNODES WILDFIRE_MAX_QBB
> #define WILDFIRE_MAX_QBB 8 /* more than 8 requires other mods */
>
> So, we have a maximum of 8 nodes total, and therefore the plat_node_data
> array is 8 entries large.
>
> Now, what happens if 'kaddr' is below PAGE_OFFSET (because the user has
> opened /dev/mem and mapped some random bit of physical memory space)?
>
> __pa returns a large positive number. We shift this large positive
> number left by 36 bits, leaving 28 bits of large positive number, which
> is larger than our total 8 nodes.
>
> We use this 28-bit number to index plat_node_data. Whoops.
>
> And now, for the icing on the cake, take a look at Alpha's pte_page()
> implementation:
>
> unsigned long kvirt; \
> struct page * __xx; \
> \
> kvirt = (unsigned long)__va(pte_val(x) >> (32-PAGE_SHIFT)); \
> __xx = virt_to_page(kvirt); \
> \
> __xx; \
>
>
> Someone *please* tell me where I'm wrong. I really want to be wrong,
> because I can see the same thing happening (in theory, and one report
> in practice from a developer) on a certain ARM platform.
>
> On ARM, however, we have cherry to add here. __va() may alias certain
> physical memory addresses to the same virtual memory address, which
> makes:
>
> VALID_PAGE(virt_to_page(__va(phys)))
>
> completely nonsensical.

correct. This should fix it:

--- 2.4.19pre7aa2/include/asm-alpha/mmzone.h.~1~ Fri Apr 26 10:28:28 2002
+++ 2.4.19pre7aa2/include/asm-alpha/mmzone.h Sat Apr 27 00:30:02 2002
@@ -106,8 +106,8 @@
 #define kern_addr_valid(kaddr) test_bit(LOCAL_MAP_NR(kaddr), \
                                          NODE_DATA(KVADDR_TO_NID(kaddr))->valid_addr_bitmap)
 
-#define virt_to_page(kaddr) (ADDR_TO_MAPBASE(kaddr) + LOCAL_MAP_NR(kaddr))
-#define VALID_PAGE(page) (((page) - mem_map) < max_mapnr)
+#define virt_to_page(kaddr) (KVADDR_TO_NID((unsigned long) kaddr) < MAX_NUMNODES ? ADDR_TO_MAPBASE(kaddr) + LOCAL_MAP_NR(kaddr) : 0)
+#define VALID_PAGE(page) ((page) != NULL)
 
 #ifdef CONFIG_NUMA
 #ifdef CONFIG_NUMA_SCHED

It still doesn't cover the ram between the end of a node and the start
of the next node, but at least on alpha-wildfire there can be nothing
mapped there (it's reserved for "more dimm ram" slots) and it would be
even more costly to check if the address is in those intra-node holes.

The invalid pages now will start at phys addr 64G*8 that is the maximum
ram that linux can handle the the wildfire. if you mmap the intra-node
ram via /dev/mem you risk for troubles anyways because there's no dimm
there and probably the effect is undefined or unpredictable, it's just a
"mustn't do that", /dev/mem is a "root" thing so the above approch looks
fine to me.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Apr 30 2002 - 22:00:13 EST