Re: 2.6.15-mm2

From: Neil Brown
Date: Tue Jan 10 2006 - 23:15:52 EST

Next message: Paul E. McKenney: "Re: [PATCH 2/3] rcu: don't set ->next_pending in rcu_start_batch()"
Previous message: Nick Piggin: "Re: smp race fix between invalidate_inode_pages* and do_no_page"
In reply to: Ingo Molnar: "Re: 2.6.15-mm2"
Next in thread: Reuben Farrelly: "Re: 2.6.15-mm2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tuesday January 10, mingo@xxxxxxx wrote:
>
> * Andrew Morton <akpm@xxxxxxxx> wrote:
>
> > Reuben Farrelly <reuben-lkml@xxxxxxxx> wrote:
> > >
> > > Ok here's the latest one, this time with KALLSYMS_ALL, CONFIG_FRAME_POINTER,
> > > CONFIG_DETECT_SOFTLOCKUP and the DEBUG_WARN_ON(current->state != TASK_RUNNING);
> > > patch from Ingo.
> >
> > This is quite ugly. I'd be suspecting a block layer problem: RAID or
> > the underlying device driver (ahci) has lost an IO.
>
> yeah, now it more looks like that to me too. What happens is a raid1
> resync happens in the background - which is one of the more complex
> raid1 workloads - and there've been a good number of md patches
> recently. Reuben, does -git5 show the same symptoms?

There isn't a resync happening - if there was you would a process
called
mdX_resync
(for some X).

What I see here is:
pdflush at:
Call Trace:
[<c02a2f72>] md_write_start+0xbc/0x150
[<c029a659>] make_request+0x51/0x432
[<c01e1146>] generic_make_request+0xbe/0x13d
[<c01e120e>] submit_bio+0x49/0xd3

So it is trying to write to a raid1 which was 'clean' and needs to
be marked 'dirty' (or 'active') before the first write.
md_start_write arranges for the array's thread to do this.
What is that thread doing?

md2_raid1 D F7227200 0 386 11 390 382 (L-TLB)
...
Call Trace:
[<c029d004>] md_super_wait+0xd5/0xea
[<c02a4f93>] bitmap_unplug+0x1d8/0x1df
[<c029b72b>] raid1d+0x7d/0x555
[<c02a211a>] md_thread+0x44/0x14f

It probably hasn't tried to write out the superblock, and just
now it is writing out some write-intent-bitmap entries and waiting
for the write to complete.

md_super_wait is waiting for 'pending_writes' to become zero.
It is incremented when any superblock or bitmap write starts, and
is decremented when that write completes.

So a lost write request in one of the components of the array could
cause this, but it is too easy to simply blame it on someone else....

But there is something I don't understand....

If md2_raid1 is in bitmap_unplug, that means there are outstanding
write requests to md2_raid1, so the one that pdflush is currently
generating cannot be the first.

This suggests that pdflush is not writing to md2, but to something
else.
Ahhhh.. md0_raid1 is also blocked:
Call Trace:
[<c029d004>] md_super_wait+0xd5/0xea
[<c029ec29>] md_update_sb+0xc9/0x153
[<c02a3a20>] md_check_recovery+0x182/0x437
[<c029b6cd>] raid1d+0x1f/0x555

It has just updated the superblocks for md0 and is waiting for those
writes to complete. But they don't seem to want to complete.

So it seems that two raid1 arrays are blocked in slightly different
places.

I'm tempted to blame the IO scheduled, only because there have been
vaguely similar problems in the recent past that can be avoided by
changing the scheduler.

Reuben: could you check what IO scheduler your drives are using, and
try changing it. I suspect they use 'as' by default. Try 'cfq' or
'deadline'.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Paul E. McKenney: "Re: [PATCH 2/3] rcu: don't set ->next_pending in rcu_start_batch()"
Previous message: Nick Piggin: "Re: smp race fix between invalidate_inode_pages* and do_no_page"
In reply to: Ingo Molnar: "Re: 2.6.15-mm2"
Next in thread: Reuben Farrelly: "Re: 2.6.15-mm2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]