Re: Mysterious CFQ crash and RCU

From: Vivek Goyal
Date: Thu Aug 04 2011 - 11:05:24 EST


On Sat, Jun 04, 2011 at 04:06:33PM -0700, Paul E. McKenney wrote:
> On Sun, Jun 05, 2011 at 12:48:05AM +0200, Paul Bolle wrote:
> > On Sat, 2011-06-04 at 09:03 -0700, Paul E. McKenney wrote:
> > > More like "based on these diagnostics, I see no evidence of the RCU
> > > implementation misbehaving." Which is of course different than "I can
> > > prove that the RCU implementation is not misbehaving". That said, the
> > > fact that you are running on a single CPU makes it hard for me to see
> > > any latitude for RCU-implementation misbehavior.
> > >
> > > Clearly something is wrong somewhere.
> >
> > Yes.
> >
> > > Given the fact that on a single-CPU
> > > system, synchronize_rcu() is a no-op, and given that you weren't able
> > > to reproduce with CONFIG_TREE_PREEMPT_RCU=y, my guess is that there is
> > > a synchronize_rcu() that occasionally (illegally) gets executed within
> > > an RCU read-side critical section.
> >
> > I think I finally found it!
>
> Very cool!!!
>
> > The culprit seems to be io_context.ioc_data (not the most clear of
> > names!). It seems to be a single entry "last-hit cache" of an hlist
> > called cic_list. (There are three, subtly different, cic_lists in the
> > CFQ code!) It is not entirely clear, but that last-hit cache can get out
> > of sync with the hlist it is supposed to cache. My guess it that every
> > now and then a member of the hlist gets deleted while it's still in that
> > (single entry) cache. If it then gets retrieved from that cache it
> > already points to poisoned memory. For some strange reason this only
> > results in an Oops if one or more debugging options are set (as are set
> > in the Fedora Rawhide non-stable kernels that I ran into this). I have
> > no clue whatsoever, why that is ...
>
> Oooh... If I understand you correctly, this is exactly analogous to my
> first-ever RCU bug way back in the early 90s. This was a hashed version
> of a BSD-style routing table, where each hash bucket maintained a pointer
> to the last routing entry used. When I deleted a routing entry, I forgot
> to check the last-entry pointer.
>
> The amazing thing is that the guy who found and fixed this bug generated
> a correct fix despite having absolutely no idea what RCU was or how it
> worked. All he knew was that if he waited for the sun to reach Oregon,
> his users were going to be way beyond upset. ;-)
>
> > Anyhow, after ripping out ioc_data this bug seems to have disappeared!
> > Jens, Vivek, could you please have a look at this? In the mean time I
> > hope to pinpoint this issue and draft a small patch to really solve it
> > (ie, not by simply ripping out ioc_data).
>
> Again, very cool!!!

Hi Paul,

Remember this CFQ issue where we suspected that we are freeing up an
object before rcu period is over. We thought we fixed it. Guess what,
it seems be to be back.

**************************************************************************
[21265.163144] BUG: unable to handle kernel paging request at 6b6b6b43
[21265.164022] IP: [<f7d55fe0>] cic_free_func+0x10/0x6c [cfq_iosched]
[21265.164022] *pde = 00000000
[21265.164022] Oops: 0000 [#1] SMP
[21265.164022] Modules linked in: cfq_iosched tcp_lp fuse arc4 ppdev
snd_intel8x0m snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device
microcode
snd_pcm joydev snd_timer ath5k pcspkr snd_page_alloc thinkpad_acpi
i2c_i801
ath mac80211 snd soundcore cfg80211 rfkill parport_pc parport e1000
iTCO_wd
t iTCO_vendor_support binfmt_misc uinput yenta_socket video radeon ttm
drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[21265.164022]
[21265.164022] Pid: 18514, comm: mandb Not tainted
3.0.0-1.local2.fc17.i686 #1
IBM /
[21265.164022] EIP: 0060:[<f7d55fe0>] EFLAGS: 00010286 CPU: 0
[21265.164022] EIP is at cic_free_func+0x10/0x6c [cfq_iosched]
[21265.164022] EAX: f242e400 EBX: 6b6b6b43 ECX: 00000246 EDX: 6b6b6b43
[21265.164022] ESI: f242e400 EDI: f7d55fd0 EBP: df92df04 ESP: df92def0
[21265.164022] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[21265.164022] Process mandb (pid: 18514, ti=df92c000 task=ed1554c0
task.ti=df92c000)
[21265.164022] Stack:
[21265.164022] c0468df7 c0af0cb0 6b6b6b6b f242e400 f7d55fd0 df92df18
f7d54510
f242e400
[21265.164022] ed1554c0 ed155898 df92df20 f7d54530 df92df2c c05cc8ba
f242e400
df92df40
[21265.164022] c05cc94a ed1554c0 00000012 df92df74 df92df8c c04423e2
eedf2b5c
eedf2b00
[21265.164022] Call Trace:
[21265.164022] [<c0468df7>] ? lock_is_held+0x73/0x7b
[21265.164022] [<f7d55fd0>] ? cfqq_process_refs+0x18/0x18 [cfq_iosched]
[21265.164022] [<f7d54510>] call_for_each_cic+0x3e/0x4c [cfq_iosched]
[21265.164022] [<f7d54530>] cfq_free_io_context+0x12/0x14 [cfq_iosched]
[21265.164022] [<c05cc8ba>] put_io_context+0x34/0x5c
[21265.164022] [<c05cc94a>] exit_io_context+0x68/0x6d
[21265.164022] [<c04423e2>] do_exit+0x62f/0x66a
[21265.164022] [<c0442634>] do_group_exit+0x63/0x86
[21265.164022] [<c044266f>] sys_exit_group+0x18/0x18
[21265.164022] [<c083b0df>] sysenter_do_call+0x12/0x38
[21265.164022] Code: 5f 5d c3 55 89 e5 3e 8d 74 26 00 8b 50 3c 03 50 40 8b
00
29 d0 79 02 0f 0b 5d c3 55 89 e5 57 56 53 83 ec 08 3e 8d 74 26 00 89 d3 <8
b> 12 89 c6 f6 c2 01 75 02 0f 0b 8d 78 08 89 f8 89 55 ec e8 c8
[21265.164022] EIP: [<f7d55fe0>] cic_free_func+0x10/0x6c [cfq_iosched]
SS:ESP
0068:df92def0
[21265.164022] CR2: 000000006b6b6b43
[21265.363749] ---[ end trace a779435da6d0a152 ]---

**************************************************************************
Again details have been added to old bz.

https://bugzilla.redhat.com/show_bug.cgi?id=577968

Now two users have run into this issue. PaulB, again can reliably
reproduce this issue even with 3.0 kernels. He just needs to run
"mandb -c" on this machine.

He has again run the CFQ debug V2 patch which seems to be telling that
cfq_io_context object was freed while in rcu read section.

**************************************************************************
[ 562.598881] debug: Will exit io context cic=0xf0469230, Next pointer is
(null)
[ 562.598881] debug: cic=0xf0469230 exit io context done. Next pointer is
(null)
[ 562.598881] debug: Will decouple-free cic cic=0xf0469230, Next pointer
is (null)
[ 562.633403] debug: cic=0xf0469230 decouple-free cic done. Next pointer
is (null)
[ 562.644949] debug: freeing up cic=f0469230
[ 562.652971] cic=f0469230 freed while in RCU read lock

*****************************************************************************

I am hoping that I have written the debug patch right. As usual i am
running out of ideas on how to debug it further. Anything debug
information you would like to have a look at?

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/