Re: [next] unix stream crashes

From: Sedat Dilek
Date: Sat Sep 03 2011 - 01:54:52 EST


On Sat, Sep 3, 2011 at 7:35 AM, <Valdis.Kletnieks@xxxxxx> wrote:
> On Fri, 02 Sep 2011 16:55:03 PDT, Tim Chen said:
>
>> I'll like to isolate the problem to either the send path or receive
>> path. My suspicion is the error handling portion of the send path is not
>> quite right but I haven't yet found any issues after reviewing the
>> patch.
>
> Took a while, because it took a few tries to get netconsole working,
> and then I was seeing odd results, but here we go:
>
> next-20110831 - crashes 100% consistent.
> next-20110831 + revert 0856a30409 - OK.
> revert + scm_recv.patch - OK.
> revert + scm_send.patch - crashes 100% consistent.
>

YES, I can confirm this with next-20110826.

> Now the odd part - although I was seeing crashes 100% of the time, I saw a
> number of different tracebacks (but I never actually saw the same traceback
> that Jiri had). Also, the system died at different points - most of the time it
> would live long enough for GDM to prompt for a userid/password and then die,
> but sometimes it didn't get as far as the GDM screen. Hopefully the variety of
> crashes will tell you something useful.
>
> I'll be able to test patches for go/nogo over the weekend, but probably won't
> have a second machine to catch netconsole until I'm back in the office Monday.
>
> Example 1:
>
> [ Â142.316258] Kernel panic - not syncing: CRED: put_cred_rcu() sees ffff88010d1ff300 with usage -41
> [ Â142.316260]
> [ Â142.316275] Pid: 2264, comm: gdm-simple-slav Tainted: G Â Â Â ÂW Â 3.1.0-rc4-next-20110831-dirty #17
> [ Â142.316279] Call Trace:
> [ Â142.316283] Â<IRQ> Â[<ffffffff81577a6c>] panic+0x96/0x1a2
> [ Â142.316300] Â[<ffffffff8105cb54>] put_cred_rcu+0x32/0x91
> [ Â142.316306] Â[<ffffffff8157a44f>] rcu_do_batch+0xcb/0x1e4
> [ Â142.316313] Â[<ffffffff81092967>] invoke_rcu_callbacks+0x6c/0xc7
> [ Â142.316319] Â[<ffffffff810932f8>] __rcu_process_callbacks+0x118/0x124
> [ Â142.316325] Â[<ffffffff810934f0>] rcu_process_callbacks+0x64/0x72
> [ Â142.316331] Â[<ffffffff8103f8c4>] __do_softirq+0x110/0x278
> [ Â142.316338] Â[<ffffffff815a23ac>] call_softirq+0x1c/0x30
> [ Â142.316342] Â<EOI> Â[<ffffffff81003647>] do_softirq+0x44/0xf1
> [ Â142.316352] Â[<ffffffff8103f485>] _local_bh_enable_ip+0x12a/0x178
> [ Â142.316358] Â[<ffffffff8103f4dc>] local_bh_enable_ip+0x9/0xb
> [ Â142.316364] Â[<ffffffff8159a2f3>] _raw_write_unlock_bh+0x36/0x3a
> [ Â142.316372] Â[<ffffffff814c1ac3>] unix_release_sock+0x86/0x1ff
> [ Â142.316378] Â[<ffffffff8105b548>] ? up_read+0x1b/0x32
> [ Â142.316383] Â[<ffffffff814c1c5d>] unix_release+0x21/0x23
> [ Â142.316390] Â[<ffffffff81423d02>] sock_release+0x1a/0x6f
> [ Â142.316395] Â[<ffffffff81424a30>] sock_close+0x22/0x26
> [ Â142.316401] Â[<ffffffff810fcacb>] __fput+0x140/0x1fe
> [ Â142.316407] Â[<ffffffff810f97cb>] ? sys_close+0xe6/0x158
> [ Â142.316412] Â[<ffffffff810fcb9e>] fput+0x15/0x17
> [ Â142.316417] Â[<ffffffff810f8ef2>] filp_close+0x87/0x93
> [ Â142.316422] Â[<ffffffff810f97d6>] sys_close+0xf1/0x158
> [ Â142.316429] Â[<ffffffff815a0ffb>] system_call_fastpath+0x16/0x1b
>

I saw similiar call-traces with put_cred_rcu() - besides with
kmem_cache_alloc_trace().
My post-it says:
Kernel panic - not syncing: CRED: put_cred_rcu sees f67ac0c0 with usage -43

BTW, systemd (uses dbus/sockets) is more sensitive than Debian's
standard sysvinit.

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/