Re: 3.0rc2 oops in mem_cgroup_from_task

From: KAMEZAWA Hiroyuki
Date: Thu Jun 09 2011 - 22:40:21 EST


On Thu, 9 Jun 2011 18:30:49 -0700 (PDT)
Hugh Dickins <hughd@xxxxxxxxxx> wrote:

> On Fri, 10 Jun 2011, KAMEZAWA Hiroyuki wrote:
> > On Thu, 9 Jun 2011 16:42:09 -0700
> > Ying Han <yinghan@xxxxxxxxxx> wrote:
> >
> > > ++cc Hugh who might have seen similar crashes on his machine.
>
> Yes, I was testing my tmpfs changes, and saw it on i386 yesterday
> morning. Same trace as Dave's (including khugepaged, which may or
> may not be relevant), aside from the i386/x86_64 differences.
>
> BUG: unable to handle kernel paging request at 6b6b6b87
>
> I needed to move forward with other work on that laptop, so just
> jotted down the details to come back to later. It came after one
> hour of building swapping load in memcg, I've not tried again since.
>
> >
> > Thank you for forwarding. Hmm. It seems the panic happens at khugepaged's
> > page collapse_huge_page().
>
> Yes, the inlining in my kernel was different,
> so collapse_huge_page() showed up in my backtrace.
>
> >
> > ==
> > count_vm_event(THP_COLLAPSE_ALLOC);
> > if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) {
> > ==
> > It passes target mm to memcg and memcg gets a cgroup by
> > ==
> > mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
> > ==
> > Panic here means....mm->owner's task_subsys_state contains bad pointer ?
>
> 781cc621 <mem_cgroup_from_task>:
> 781cc621: 55 push %ebp
> 781cc622: 31 c0 xor %eax,%eax
> 781cc624: 89 e5 mov %esp,%ebp
> 781cc626: 8b 55 08 mov 0x8(%ebp),%edx
> 781cc629: 85 d2 test %edx,%edx
> 781cc62b: 74 09 je 781cc636 <mem_cgroup_from_task+0x15>
> 781cc62d: 8b 82 fc 08 00 00 mov 0x8fc(%edx),%eax
> 781cc633: 8b 40 1c mov 0x1c(%eax),%eax <==========
> 781cc636: c9 leave
> 781cc637: c3 ret
>

then, access to task->cgroups->subsys[?] causes access to 6b6b6b87...

Then, task->cgroups or task->cgroups->subsys contains bad pointer.
Considering khugepaged, it grabs mm_struct and memcg make an access to
(mm->owner)->cgroups->subsys.

Then, from memcg's point of view, we need to doubt mm->owner is valid or not
for this kind of tasks.

Thank you for inputs.

-Kame







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/