Re: [PATCH 1/5] freepgt: free_pgtables use vma list

From: David S. Miller
Date: Mon Mar 21 2005 - 18:28:24 EST


On Mon, 21 Mar 2005 14:31:36 -0800
"Luck, Tony" <tony.luck@xxxxxxxxx> wrote:

> Builds clean and boots on ia64.
>
> I haven't tried any hugetlb operations on it though.

It works on ia64 because it doesn't actually do anything
in flush_tlb_pgtables(), I bet.

Hugh, I know the exact trigger case, it's unmapping a VMA
right before the stack segment. So the free_pgtables() call
happens with this state:

prev->vm_end == 0x70186000
next->vm_start == 0xefab8000
vma->vm_start == 0x70186000
vma->vm_end == 0x70188000

(so we're doing munmap(0x70186000, PAGE_SIZE), the sparc64
stack segment for 32-bit tasks grows down from 0xf0000000,
the bottom of it is at 0xefab8000 at this point in time)

So the free_pgtables() call will be with:

floor == 0x70186000
ceiling == 0xefab8000

This should be fairly simple, so let's analyze exactly what
happens:

1) vma == the munmap() call's VMA
next == stack segment VMA, which sits right after "vma"
addr == 0x70186000 (base of munmap() area)

2) VMA optimization loop runs:
next->vm_start is 0xefab8000
vma->vm_end is 0x70188000
on sparc64 PMD_SIZE is 1UL << 23 or 0x800000
therefore vma->vm_end + (2 * PMD_SIZE) is 0x71188000
this is much less than 0xefab8000 so the loop terminates
immediately

Therefore, next is unchanged.

3) free_pgd_range() is invoked with:
addr == 0x70186000
end == 0x70188000
floor == 0x70186000
ceiling == 0xefab8000

4) We mask addr with PMD_MASK (which is 0xffffffffff800000)
This sets addr to 0x70000000, which makes it less
than floor, therefore addr has PMD_SIZE added to it.
Now, addr is 0x70800000, this is the source of the
problems as this value determines the "start" argument
passed to flush_tlb_pgtables(). Note how it is larger
than "end".

5) We also mask ceiling with PMD_MASK.
This sets ceiling to 0xef800000.
Now addr is less than or equal to ceiling - 1 so
we continue.

6) start is set to addr, which as stated is 0x70800000,
the free_pud_range() loop is executed

7) start is 0x70800000 and end is 0x70188000
and here we have the problem that start > end,
flush_tlb_pgtables() is called with the arguments
like this and we trigger the aforementioned BUG().

This adjustment of addr relative to floor is very
strange, it can advance "addr" (and thus "start")
past the end of the VMA we are unmapping.

In fact, it is miraculious that this free_pud_range()
calling loop terminates properly! Actually, it is
no mystery, since the next PGD address is the same
for both the original and adjusted value of "addr".
So the loop terminates after the first iteration.

Anyways, there's the full analysis, what do you make
of this Hugh? :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/