Re: Memory leak in 2.4.27 kernel, using mmap raw packet sockets

From: Andrea Arcangeli
Date: Sat Nov 27 2004 - 02:19:11 EST


On Thu, Nov 25, 2004 at 03:12:42PM -0200, Marcelo Tosatti wrote:
> get_user_pages() bails out if ! VM_IO.
>
> Is that what you mean with "VM_IO enforcement" ?

yes. It bails out if VM_IO is set (not ! clear ;)

> I thought about the BUG() to catch potential offenders, but I was
> not sure if it was possible for a PG_reserved page to be part of VMA's
> which was being get_user_pages'd.

Exactly, it's much safer to go with the real fix of fixing it in
get_user_pages. If something we should put a bugcheck there.

> Now you tell me it is possible, and thats only the ZERO page. Fine.

Yes, and the ZERO_PAGE is actually the _only_ reserved page we must
allow to go through. Every other reserved page must be discarded (or
kernel-crash with BUG_ON if Alan feels confortable with the VM_IO
enforcement).

> This is what you suggests plus some extra hopefully useful debugging
>
>
> --- memory.c.orig 2004-11-25 14:51:00.074508952 -0200
> +++ memory.c 2004-11-25 15:08:38.026675776 -0200
> @@ -454,8 +454,9 @@
> int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start,
> int len, int write, int force, struct page **pages, struct vm_area_struct **vmas)
> {
> - int i;
> + int i, s;
> unsigned int flags;
> + struct vm_area_struct *savevma = NULL;
>
> /*
> * Require read or write permissions.
> @@ -463,7 +464,7 @@
> */
> flags = write ? (VM_WRITE | VM_MAYWRITE) : (VM_READ | VM_MAYREAD);
> flags &= force ? (VM_MAYREAD | VM_MAYWRITE) : (VM_READ | VM_WRITE);
> - i = 0;
> + i = s = 0;
>
> do {
> struct vm_area_struct * vma;
> @@ -499,9 +500,13 @@
> /* FIXME: call the correct function,
> * depending on the type of the found page
> */
> - if (!pages[i] || PageReserved(pages[i]))
> - goto bad_page;
> - page_cache_get(pages[i]);
> + if (!pages[i] || PageReserved(pages[i])) {
> + if (pages[i] != ZERO_PAGE(start)) {
> + savevma = vma;
> + goto bad_page;
> + }
> + } else
> + page_cache_get(pages[i]);
> }
> if (vmas)
> vmas[i] = vma;
> @@ -520,9 +525,15 @@
> */
> bad_page:
> spin_unlock(&mm->page_table_lock);
> + s = i;
> while (i--)
> page_cache_release(pages[i]);
> - i = -EFAULT;
> + /* catch bad uses of PG_reserved on !VM_IO vma's */
> + printk(KERN_ERR "get_user_pages PG_reserved page on"
> + "vma:%p flags:%lx page:%d\n", savevma,
> + savevma->flags, s);
> + BUG();
> + i = -EFAULT;
> goto out;
> }

Yes, however I wouldn't turn on the debugging code just in case some
driver forgets to set VM_IO and it doesn't use remap_page_range. There's
nothing fundamentally fatal in having a reserved page in a non VM_IO
vma (I mean, after fixing the above bit ;).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/