Re: disk/crypto performance regression 2.6.31 -> 2.6.32 (mmapproblem?)

From: Dave Chinner
Date: Tue Feb 23 2010 - 17:41:41 EST

Next message: David Miller: "Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling"
Previous message: David Rientjes: "Re: [PATCH] failslab: add abitily to filter slab caches"
In reply to: Dave Chinner: "Re: disk/crypto performance regression 2.6.31 -> 2.6.32 (mmapproblem?)"
Next in thread: James Cloos: "Re: disk/crypto performance regression 2.6.31 -> 2.6.32 (mmap problem?)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Feb 23, 2010 at 09:42:40PM +0100, Mikael Abrahamsson wrote:
> On Wed, 24 Feb 2010, Dave Chinner wrote:
>
> Thought I should send the dump only to you, would you like me to post to
> the list as well and include a link to the logfile (think attachments
> shouldn't be done too much to lkml, right?). Feel free to post any
> information from this letter or attachment back onto lkml if you feel
> it's appropriate.

Attachments under roughly 100kb (IIRC) are fine for LKML. This is
allowed specifically to let people attach log files and configs.

I've re-added the lkml CC to continue the discussion there. I also
added the DM mailing list, so that they know directly that barriers
are causing _significant_ system slowdowns. This is important,
because there have been several reports of this problem since the
start of the year to XFS forums as people are upgrading kernels.

>> Barriers are only recently supported across DM and MD, so it would be
>> worth checking you logs for the last mount of the filesystems to
>> confirm.
>
> You're right, it doesn't say that anymore in 2.6.31, so I think I'm
> indeed running with barriers on.

And the stack traces confirm that. Every single blocked process
output set has this trace in it:

kdmflush D 0000000000000000 0 3082 2 0x00000000
ffff880208553d20 0000000000000046 ffff880210a5f640 0000000000015880
ffff8802241ac7c0 0000000000015880 0000000000015880 0000000000015880
0000000000015880 ffff8802241ac7c0 0000000000015880 0000000000015880
Call Trace:
[<ffffffff81527cf8>] io_schedule+0x28/0x40
[<ffffffff813e7ca7>] dm_wait_for_completion+0x87/0x150
[<ffffffff81053c50>] ? default_wake_function+0x0/0x10
[<ffffffff813e8617>] ? __split_and_process_bio+0x107/0x170
[<ffffffff813e86a5>] dm_flush+0x25/0x70
[<ffffffff813e8749>] process_barrier+0x59/0xf0
[<ffffffff813e881b>] dm_wq_work+0x3b/0x100
[<ffffffff813e87e0>] ? dm_wq_work+0x0/0x100
[<ffffffff81073695>] run_workqueue+0x95/0x170
[<ffffffff81073814>] worker_thread+0xa4/0x120
[<ffffffff810789c0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81073770>] ? worker_thread+0x0/0x120
[<ffffffff810785d6>] kthread+0xa6/0xb0
[<ffffffff810130ea>] child_rip+0xa/0x20
[<ffffffff81078530>] ? kthread+0x0/0xb0
[<ffffffff810130e0>] ? child_rip+0x0/0x20

That is, it appears that DM is blocking waiting for a barrier to
complete. Everything else is backed up waiting for IO completion to
occur...

>>> Currently it's "lazy-count=0", so I'll change that setting tonight.
>
> I didn't do this before the test I'm referring to now.
>
>> When IO is really slow so we get a better idea of where things are
>> blocking. Running a few of these 30s apart will give a fair indication
>> of what is blocked and what is making progress....
>
> Attached to this email, logfile from yesterday and today.
>
> Some interesting parts as well, that didn't trigger from the above sysrq
> stuff:
>
> Feb 22 19:56:16 ub kernel: [201245.583915] INFO: task md0_raid5:425 blocked for more than 120 seconds.
> Feb 22 22:36:16 ub kernel: [210846.031875] INFO: task xfssyncd:3167 blocked for more than 120 seconds.
> Feb 23 18:10:18 ub kernel: [281287.492499] INFO: task kdmflush:3082 blocked for more than 120 seconds.
> Feb 23 18:12:18 ub kernel: [281407.491041] INFO: task kdmflush:3082 blocked for more than 120 seconds.
> Feb 23 21:36:18 ub kernel: [293647.665917] INFO: task md0_raid5:425 blocked for more than 120 seconds.
>
> Didn't really think md0_raid5 could be blocked like that...

They are blocking in:

[<ffffffff813dc0e1>] md_super_wait+0xc1/0xe0
[<ffffffff810789c0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff813dc4e0>] md_update_sb+0x230/0x3b0
[<ffffffff813de2f2>] md_check_recovery+0x272/0x590
[<ffffffffa0063621>] raid5d+0x21/0x320 [raid456]
[<ffffffff813dbd94>] md_thread+0x44/0x120

I'm assuming this is just taking a long time because of all the
other IO that is going on slowing this down. The other 3 (xfssyncd,
kdmflush) are all waiting for barrier IO to complete...

So it sounds like the immediate WAR for your extremely slow IO
problem is to mount the XFS filesystems with "-o nobarrier", but
keep in mind this can result in corruption if a crash occurrs.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: David Miller: "Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling"
Previous message: David Rientjes: "Re: [PATCH] failslab: add abitily to filter slab caches"
In reply to: Dave Chinner: "Re: disk/crypto performance regression 2.6.31 -> 2.6.32 (mmapproblem?)"
Next in thread: James Cloos: "Re: disk/crypto performance regression 2.6.31 -> 2.6.32 (mmap problem?)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]