Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)

From: Kevin Shanahan
Date: Tue Jan 20 2009 - 10:51:47 EST


On Tue, 2009-01-20 at 15:25 +0100, Ingo Molnar wrote:
> > I could run top, vmstat and cat /proc/sched_debug in a loop until the
> > problem occurs and then trim it. Something like:
> >
> > while true; do
> > date >> $FILE
> > echo "-- top: --" >> $FILE
> > top -H -c -b -d 1 -n 0.5 >> $FILE 2>/dev/null
> > echo "-- vmstat: --" >> $FILE
> > vmstat >> $FILE 2>/dev/null
> > echo "-- sched_debug #$i: --" >> $FILE
> > cat /proc/sched_debug >> $FILE 2>/dev/null
> > done
> >
> > That should take a snapshot every half second or so.
>
> Yeah, that would be lovely. You dont even have to trim it much - just give
> us a timestamp to look at for the delay incident. You might also want to
> start the kvm session while the script is already running - that way we'll
> get fresh statistics and see the whole thing.

I've uploaded the debug info here:
http://disenchant.net/tmp/bug-12465/

Some interesting sections should be around these times:

01:36:04 -> 01:36:27
01:37:30 -> 01:37:42
01:37:52 -> 01:37:56
01:39:37 -> 01:39:40
01:40:01 -> 01:40:14

The output from ping is there too so you can see how the delays usually
show up (e.g. in clusters). The large debug file runs from before I
launched the VMs, right through the ping test. The trimmed file just
cuts out everything before I started ping.

Regards,
Kevin.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/