Re: Boot time regression in 2.6.38 after initial wq merge

From: Tejun Heo
Date: Tue Feb 22 2011 - 03:18:01 EST


(cc'ing Dmitry and lkml)

Hello,

On Mon, Feb 21, 2011 at 09:44:12PM +0100, pantherchen@xxxxxxxxxxx wrote:
> I'm experiencing a significant boot time regression (almost +50%)
> starting with 2.6.38-rc1 (see boot charts before [0], and after [1]
> - note the long uninterruptible sleep of kworker/0:1).

Comparing the two boot charts, the uninterruptible sleeps in
kowrker/0:1 seems to be the same one from kseriod. They're about the
same duration. kseriod is replaced with a work item, so that explains
it.

> After doing some bisecting and thorough testing this merge seems to
> be the first one showing the regression: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=23d69b09b78c4876e134f104a3814c30747c53f1
> ("Merge branch 'for-2.6.38' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq").
>
> However, a kernel built from the branch 'for-2.6.38' at
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq doesn't show the
> behavior.
>
> I also tried to cherry-pick those 33 commits, applying them to the
> last good commit from Linus' tree (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4ead36407b41eae942c8c9f70ef963cd369c90e2)
> and it also doesn't show the regression.
>
> Do you know what's going on here? I'm just an end user and I've
> definitely run out of ideas.
>
> Thanks in advance for your help,
>
> Hernando Torque
> Thanks
>
> [0]: http://launchpadlibrarian.net/64639927/27-12-intel.png
> [1]: http://launchpadlibrarian.net/64640030/38-4-intel.png

>From the boot chart, two things are noticeable.

1. kworker/0:1's uninterruptible sleeps start later than kserio's.

It could be that cpu 0 was busy running other stuff and thus cmwq
delayed executing serio_event_work; however, if we look at the CPU
usage, that doesn't seem likely. The CPU is not busy at all and if
the CPU isn't busy, cmwq wouldn't introduce any noticeable delay in
work item execution.

Another possibility is the rescuer concurrency depletion bug is
delaying execution of queued work items early during boot. This
was fixed recently. Can you please give a shot at 2.6.38-rc6 and
see whether anything is different?

2. Most of the delay is caused by xorg starting up much later. xorg
seems to start up in parallel with the kseriod sleeps in 2.6.37 but
on 2.6.38 it seems to wait for the serio_event_work to finish.

I have no idea what xorg is waiting for. Dmitry, any clue?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/