Re: Memory leak in 2.4.27 kernel, using mmap raw packet sockets

From: Marcelo Tosatti
Date: Fri Nov 26 2004 - 21:06:55 EST


On Fri, Nov 26, 2004 at 12:13:14AM +0100, Andrea Arcangeli wrote:
> On Thu, Nov 25, 2004 at 03:12:42PM -0200, Marcelo Tosatti wrote:
> > get_user_pages() bails out if ! VM_IO.
> >
> > Is that what you mean with "VM_IO enforcement" ?
>
> yes. It bails out if VM_IO is set (not ! clear ;)
>
> > I thought about the BUG() to catch potential offenders, but I was
> > not sure if it was possible for a PG_reserved page to be part of VMA's
> > which was being get_user_pages'd.
>
> Exactly, it's much safer to go with the real fix of fixing it in
> get_user_pages. If something we should put a bugcheck there.
>
> > Now you tell me it is possible, and thats only the ZERO page. Fine.
>
> Yes, and the ZERO_PAGE is actually the _only_ reserved page we must
> allow to go through. Every other reserved page must be discarded (or
> kernel-crash with BUG_ON if Alan feels confortable with the VM_IO
> enforcement).

Oh the VM_IO enforcement has been there for ages.

> > This is what you suggests plus some extra hopefully useful debugging
> >
> >
> > --- memory.c.orig 2004-11-25 14:51:00.074508952 -0200
> > +++ memory.c 2004-11-25 15:08:38.026675776 -0200
> > @@ -454,8 +454,9 @@
> > int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start,
> > int len, int write, int force, struct page **pages, struct vm_area_struct **vmas)
> > {
> > - int i;
> > + int i, s;
> > unsigned int flags;
> > + struct vm_area_struct *savevma = NULL;
> >
> > /*
> > * Require read or write permissions.
> > @@ -463,7 +464,7 @@
> > */
> > flags = write ? (VM_WRITE | VM_MAYWRITE) : (VM_READ | VM_MAYREAD);
> > flags &= force ? (VM_MAYREAD | VM_MAYWRITE) : (VM_READ | VM_WRITE);
> > - i = 0;
> > + i = s = 0;
> >
> > do {
> > struct vm_area_struct * vma;
> > @@ -499,9 +500,13 @@
> > /* FIXME: call the correct function,
> > * depending on the type of the found page
> > */
> > - if (!pages[i] || PageReserved(pages[i]))
> > - goto bad_page;
> > - page_cache_get(pages[i]);
> > + if (!pages[i] || PageReserved(pages[i])) {
> > + if (pages[i] != ZERO_PAGE(start)) {
> > + savevma = vma;
> > + goto bad_page;
> > + }
> > + } else
> > + page_cache_get(pages[i]);
> > }
> > if (vmas)
> > vmas[i] = vma;
> > @@ -520,9 +525,15 @@
> > */
> > bad_page:
> > spin_unlock(&mm->page_table_lock);
> > + s = i;
> > while (i--)
> > page_cache_release(pages[i]);
> > - i = -EFAULT;
> > + /* catch bad uses of PG_reserved on !VM_IO vma's */
> > + printk(KERN_ERR "get_user_pages PG_reserved page on"
> > + "vma:%p flags:%lx page:%d\n", savevma,
> > + savevma->flags, s);
> > + BUG();
> > + i = -EFAULT;
> > goto out;
> > }
>
> Yes, however I wouldn't turn on the debugging code just in case some
> driver forgets to set VM_IO and it doesn't use remap_page_range. There's
> nothing fundamentally fatal in having a reserved page in a non VM_IO
> vma (I mean, after fixing the above bit ;).

Sure, I'll comment the BUG() off during 2.4.29-rc.

How does that sound?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/