Re: Kernel Oops on enabling CONFIG_LOCK_STAT

From: Shreshtha
Date: Tue Jul 26 2011 - 09:19:32 EST


Hi Tejun Heo,

Problem persists even after applying the patch.
After applying the debug patch (and printk for rs,re etc.) related
output in mail below.

The log attached was with linux-2.6.35.7 kernel.
I tried newer kernel i.e. 2.6.39.3 and problem was *not* seen.
Seems that changes in mm/* solved the issue.
So I will take this thread from here.

Thanks for debug patch.

------------8<-------------------------------8<-------------
BOARD - 1
-----------

WARNING: at mm/percpu-vm.c:360 pcpu_alloc+0x360/0x9e4()
Modules linked in:
[<c0ce1148>] (unwind_backtrace+0x0/0xf0) from [<c0cf67d8>]
(warn_slowpath_common+0x4c/0x64)
[<c0cf67d8>] (warn_slowpath_common+0x4c/0x64) from [<c0cf6808>]
(warn_slowpath_null+0x18/0x1c)
[<c0cf6808>] (warn_slowpath_null+0x18/0x1c) from [<c0d56708>]
(pcpu_alloc+0x360/0x9e4)
[<c0d56708>] (pcpu_alloc+0x360/0x9e4) from [<c0d53110>]
(kmem_cache_open+0x17c/0x1ec)
[<c0d53110>] (kmem_cache_open+0x17c/0x1ec) from [<c0d54e48>]
(kmem_cache_create+0x1e0/0x2b4)
[<c0d54e48>] (kmem_cache_create+0x1e0/0x2b4) from [<c0017b98>]
(idr_init_cache+0x20/0x34)
[<c0017b98>] (idr_init_cache+0x20/0x34) from [<c0008d24>]
(start_kernel+0x244/0x30c)
[<c0008d24>] (start_kernel+0x244/0x30c) from [<00008080>] (0x8080)
---[ end trace 1b75b31a2719ed1d ]---
XXX pcpu_populate_chunk: cpu0: rs: 0x162, re: 0x166, page start: 0x162
end: 0x163
XXX pcpu_populate_chunk: cpu1: rs: 0x162, re: 0x166, page start: 0x162
end: 0x163


BOARD (newer) - 2
-----------
config_lock_stat-board2.log attached

-----------------8<---------------------8<----------------------------

Regards,
Shreshtha

On Fri, Jul 22, 2011 at 1:04 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello,
>
> On Fri, Jul 22, 2011 at 08:10:54AM +0200, Tejun Heo wrote:
>> Hrmm... so it's pcpu_populate_chunk() failure path.  Can you please
>> attach full kernel log?  Attaching full kernel log and including a bit
>> of hardware details is generally a good idea when reporting a bug.
>>
>> It's most likely there's a bug in the code which rolls back from
>> partial allocation after encountering alloc failure in the middle.
>> I'll take a deeper look there and report what I find.
>
> The code looks correct and seems to behave correct under induced
> error.  Can you please apply the following patch and see whether the
> problem goes away?
>
> diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> index ea53496..53eae44 100644
> --- a/mm/percpu-vm.c
> +++ b/mm/percpu-vm.c
> @@ -347,6 +347,7 @@ clear:
>        return 0;
>
>  err_unmap:
> +       pcpu_post_map_flush(chunk, page_start, unmap_end);
>        pcpu_pre_unmap_flush(chunk, page_start, unmap_end);
>        pcpu_for_each_unpop_region(chunk, rs, re, page_start, unmap_end)
>                pcpu_unmap_pages(chunk, pages, populated, rs, re);
>
> If not, can you please apply the attached debug patch, trigger the
> problem and post the log?
>
> Thank you.
>
> --
> tejun
>

Attachment: config_lock_stat-board2.log
Description: Binary data

Attachment: config_2-6-35-7_lock_stat
Description: Binary data

Attachment: linux-2-6-35-7_lock_stat-debug.log
Description: Binary data