Re: [PATCH] VM, x86, PAT: Change implementation ofis_linear_pfn_mapping

From: Pallipadi, Venkatesh
Date: Wed Mar 11 2009 - 20:33:08 EST


On Wed, 2009-03-11 at 15:09 -0700, Frans Pop wrote:
> Pallipadi, Venkatesh wrote:
> > Use of vma->vm_pgoff to identify the pfnmaps that are fully
> > mapped at mmap time is broke. vm_pgoff is set by generic mmap
> > code even for cases where drivers are setting up the mappings
> > at the fault time.
> >
> > The problem was originally reported here.
> > http://marc.info/?l=linux-kernel&m=123383810628583&w=2
> >
> > Change is_linear_pfn_mapping logic to overload VM_NONLINEAR
> > flag along with VM_PFNMAP to mean full PFNMAP setup at mmap
> > time.
> >
> > Acked-by: Thomas Hellstrom <thellstrom@xxxxxxxxxx>
> > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@xxxxxxxxx>
> > Signed-off-by: Suresh Siddha <suresh.b.siddha>@intel.com>
>
> I've applied this patch on top of v2.6.29-rc7-143-g99adcd9 [1] and since
> then I've had my system, or rather X/KDE, hang several times. The last
> time the problem seems to have been KDE's kicker. I was running a kernel
> compile in a konsole window and that just continued and finished, but the
> keyboard was completely dead.
> I could still ssh in from another box. 'ps' would show the top processes,
> but hang as well at some point (in the middle of listing KDE processes.
>
> The hang was with pat enabled. I've now booted with nopat.

Frans,

Thanks for testing this. I don't seem to reproduce this on any of my
test systems with this patch on either tip or latest git. Do you see the
hang on every boot or once in a while? Are things stable with nopat?

> The log shows (full log attached):
> kernel: BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000008
> kernel: IP: [<ffffffff80322504>] prio_tree_remove+0x9c/0xcc
> kernel: PGD 7cab1067 PUD 7d644067 PMD 0
> kernel: Oops: 0000 [#1] SMP
> kernel: last sysfs file: /sys/class/power_supply/C23D/charge_full
> kernel: CPU 1
> kernel: Pid: 5415, comm: kicker Not tainted 2.6.29-rc7 #4 HP Comp
> aq 2510p Notebook PC
> kernel: RIP: 0010:[<ffffffff80322504>] [<ffffffff80322504>] prio
> _tree_remove+0x9c/0xcc
> [...]
> kernel: Call Trace:
> kernel: [<ffffffff803225df>] prio_tree_insert+0xab/0x22a
> kernel: [<ffffffff8027e90d>] vma_prio_tree_insert+0x23/0xc2
> kernel: [<ffffffff802864af>] __vma_link_file+0x70/0x72
> kernel: [<ffffffff80286c15>] vma_link+0x7d/0xab
> kernel: [<ffffffff802881ea>] mmap_region+0x313/0x479
> kernel: [<ffffffff80288646>] do_mmap_pgoff+0x2f6/0x35c
> kernel: [<ffffffff802ea99a>] do_shmat+0x28a/0x36c
> kernel: [<ffffffff802eaa8d>] sys_shmat+0x11/0x1c
> kernel: [<ffffffff8020c25b>] system_call_fastpath+0x16/0x1b
>
> From the symptoms I strongly suspect this patch to be the culprit.
>
> [1] Together with some other patches (mainly Rafael's latest patchset
> for "Rework disabling of interrupts during suspend-resume"), but I doubt
> any of those are related to this issue.
>

Nothing obvious strikes me with this patch and above OOPs. Can you
please try this patch alone on latest git and check whether you still
see the failures?

Thanks,
Venki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/