Re: 2.6.24-rc8-mm1: powerpc oopses

From: Matt Mackall
Date: Thu Jan 17 2008 - 19:13:25 EST



On Thu, 2008-01-17 at 16:05 -0800, Andrew Morton wrote:
> On Thu, 17 Jan 2008 17:39:54 -0600
> Matt Mackall <mpm@xxxxxxxxxxx> wrote:
>
> >
> > On Thu, 2008-01-17 at 14:51 -0800, Andrew Morton wrote:
> > > > proc_loop: /proc/3731/task/3731/pagemap
> > > > kernel: BUG: sleeping function called from invalid context at fs/proc/task_mmu.c:554
> > > > kernel: in_atomic():1, irqs_disabled():0
> > > > kernel: Call Trace:
> > > > kernel: [cf1cddf0] [c000840c] show_stack+0x3c/0x194 (unreliable)
> > > > kernel: [cf1cde20] [c002b2ec] __might_sleep+0xf4/0x108
> > > > kernel: [cf1cde30] [c00d2d54] add_to_pagemap+0x40/0x11c
> > > > kernel: [cf1cde50] [c00d2f44] pagemap_pte_range+0xa8/0x10c
> > > > kernel: [cf1cde70] [c0081b30] walk_page_range+0x148/0x23c
> > > > kernel: [cf1cdeb0] [c00d3104] pagemap_read+0x15c/0x244
> > > > kernel: [cf1cdef0] [c0092144] vfs_read+0xc4/0x16c
> > > > kernel: [cf1cdf10] [c009261c] sys_read+0x4c/0x90
> > > > kernel: [cf1cdf40] [c001328c] ret_from_syscall+0x0/0x40
> > >
> > > It's not really an oops - it's a warning. add_to_pagemap() is doing a
> > > put_user() inside pagemap_pte_range->pte_offset_map->kmap_atomic.
> > >
> > > A known bug, I'm afraid.
> > >
> > > How to fix?
> > >
> > > - double-buffer the data to be copied to userspace or
> > >
> > > - take a local copy of the pte page then work on that instead or
> > >
> > > - play copy_to_user_inatomic() tricks.
> >
> > Hmm, this fell off my radar. How about something like this as a minimal
> > fix (untested as -mm is a complete doorstop for me at the moment)?
> >
> > diff -r 5595adaea70f fs/proc/task_mmu.c
> > --- a/fs/proc/task_mmu.c Thu Jan 17 13:26:54 2008 -0600
> > +++ b/fs/proc/task_mmu.c Thu Jan 17 17:29:21 2008 -0600
> > @@ -582,20 +583,26 @@
> > {
> > struct pagemapread *pm = private;
> > pte_t *pte;
> > - int err = 0;
> > + int offset = 0, err = 0;
> >
> > pte = pte_offset_map(pmd, addr);
> > - for (; addr != end; pte++, addr += PAGE_SIZE) {
> > + for (; addr != end; offset++, addr += PAGE_SIZE) {
> > u64 pfn = PM_NOT_PRESENT;
> > - if (is_swap_pte(*pte))
> > - pfn = swap_pte_to_pagemap_entry(*pte);
> > - else if (pte_present(*pte))
> > - pfn = pte_pfn(*pte);
> > + if (is_swap_pte(pte[offset]))
> > + pfn = swap_pte_to_pagemap_entry(pte[offset]);
> > + else if (pte_present(pte[offset]))
> > + pfn = pte_pfn(pte[offset]);
> > +#ifdef CONFIG_HIGHPTE
> > + pte_unmap(pte);
> > err = add_to_pagemap(addr, pfn, pm);
> > + pte = pte_offset_map(pmd, addr);
> > +#else
> > + err = add_to_pagemap(addr, pfn, pm);
> > +#endif
> > if (err)
> > return err;
> > }
> > - pte_unmap(pte - 1);
> > + pte_unmap(pte);
> >
> > cond_resched();
> >
>
> Good point, it really can be taht simple.
>
> Do we need the ifdef? pte_offset_map/pte_unmap should be super-cheap on
> !CONFIG_HIGHPTE builds.

In that case, pte_unmap is free, pte_offset_map is just a bit of math.
So yeah, we can simplify this. How about:

diff -r 5595adaea70f fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c Thu Jan 17 13:26:54 2008 -0600
+++ b/fs/proc/task_mmu.c Thu Jan 17 18:11:13 2008 -0600
@@ -582,20 +583,20 @@
{
struct pagemapread *pm = private;
pte_t *pte;
- int err = 0;
+ int offset = 0, err = 0;

- pte = pte_offset_map(pmd, addr);
- for (; addr != end; pte++, addr += PAGE_SIZE) {
+ for (; addr != end; offset++, addr += PAGE_SIZE) {
u64 pfn = PM_NOT_PRESENT;
- if (is_swap_pte(*pte))
- pfn = swap_pte_to_pagemap_entry(*pte);
- else if (pte_present(*pte))
- pfn = pte_pfn(*pte);
+ pte = pte_offset_map(pmd, addr);
+ if (is_swap_pte(pte[offset]))
+ pfn = swap_pte_to_pagemap_entry(pte[offset]);
+ else if (pte_present(pte[offset]))
+ pfn = pte_pfn(pte[offset]);
+ pte_unmap(pte);
err = add_to_pagemap(addr, pfn, pm);
if (err)
return err;
}
- pte_unmap(pte - 1);

cond_resched();

--
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/