mlockall and mmap of IO devices don't mix

From: Joe Korty
Date: Fri Oct 03 2003 - 16:46:10 EST


2.6.0-test6: the use of mlockall(2) in a process that has mmap(2)ed
the registers of an IO device will hang that process uninterruptibly.
The task runs in an infinite loop in get_user_pages(), invoking
follow_page() forever.

Using binary search I discovered that the problem was introduced
in 2.5.14, specifically in ChangeSetKey

zippel@xxxxxxxxxxxxxx|ChangeSet|20020503210330|37095

The following patch backs out those lines of the above changeset
which are causing the problem:

--- a/mm/memory.c 2003-10-02 15:32:51.000000000 -0400
+++ b/mm/memory.c 2003-10-02 17:02:48.000000000 -0400
@@ -485,9 +485,7 @@
if (pte_present(pte)) {
if (!write ||
(pte_write(pte) && pte_dirty(pte))) {
- pfn = pte_pfn(pte);
- if (pfn_valid(pfn))
- return pfn_to_page(pfn);
+ return pte_page(pte);
}
}

The equivalent backout patch for 2.6.0-test6 is:

--- linux-2.6.0-test6/mm/memory.c.orig 2003-09-27 20:50:19.000000000 -0400
+++ linux-2.6.0-test6/mm/memory.c 2003-10-03 16:17:25.000000000 -0400
@@ -618,7 +618,6 @@
pgd_t *pgd;
pmd_t *pmd;
pte_t *ptep, pte;
- unsigned long pfn;
struct vm_area_struct *vma;

vma = hugepage_vma(mm, address);
@@ -645,13 +644,11 @@
pte_unmap(ptep);
if (pte_present(pte)) {
if (!write || (pte_write(pte) && pte_dirty(pte))) {
- pfn = pte_pfn(pte);
- if (pfn_valid(pfn)) {
- struct page *page = pfn_to_page(pfn);
-
+ struct page *page = pte_page(pte);
+ if (pfn_valid(pte_pfn(pte))) {
mark_page_accessed(page);
- return page;
}
+ return page;
}
}

I do not believe that the above constitutes a correct fix. The
problem is that follow_pages() is fundamentally not able to handle a
mapping which does not have a 'struct page' backing it up, and a
mapping to IO memory by definition has no 'struct page' structure to
back it up.

IMO the original 2.5.14 patch which 'broke' the kernel is correct.
It made follow_page() return zero for a 'struct page' address, which
is entirely reasonable when a mapping has no 'struct page'. Prior to
2.5.14 follow_page() would return some meaningless nonzero value
which, nevertheless, allowed its caller to continue and establish an
apparently correct mapping.

Joe

PS: my test program. Nice and short.

/* test mlockall() in conjunction with mmap(2) of an IO device.
* success -- program exits.
* failure -- program hangs up, unkillable.
*/
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>

int main(int argc, char **argv)
{
int stat, fd;
unsigned char *p;
unsigned long off;

if (argc > 1) {
off = strtoul(argv[1], NULL, 16);
} else {
fprintf(stderr, "arg is video card addr in /dev/mem");
exit(1);
}

stat = mlockall(MCL_CURRENT | MCL_FUTURE);
if (stat == -1) {
perror("mlockall");
exit(1);
}

fd = open ("/dev/mem", O_RDWR, 0666);
if (fd == -1) {
perror("open");
exit(1);
}

p = (unsigned char *)mmap(NULL, 0x4096,
PROT_READ | PROT_WRITE, MAP_SHARED, fd, (off_t)off);
if (!p) {
perror("mmap");
exit(1);
}
printf("[%08lx] %02x%02x%02x%02x\n",
off, p[0], p[1], p[2], p[3]);
return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/