Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)

From: Avi Kivity
Date: Wed Jan 21 2009 - 09:35:40 EST


Kevin Shanahan wrote:
On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
Steven Rostedt wrote:
Note, the wakeup latency only tests realtime threads, since other threads
can have other issues for wakeup. I could change the wakeup tracer as
wakeup_rt, and make a new "wakeup" that tests all threads, but it may
be difficult to get something accurate.
Kevin, can you retest with kvm at realtime priority?

Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
the problem is still there when running at normal priority:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 899283ms
rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14

Yeah, sure is.

Okay, so now I set the realtime attributes of the processes for the VM
instance being pinged:

flexo:~# ps ax | grep 6284
6284 ? Sl 6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
-m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
-daemonize
flexo:~# pstree -p 6284
qemu-system-x86(6284)âââ{qemu-system-x86}(6285)
ââ{qemu-system-x86}(6286)
ââ{qemu-system-x86}(6540)

(info cpus on the QEMU console shows 6285 and 6286 being the VCPU
processes. Not sure what the third child is for, maybe vnc?.)

flexo:~# chrt -r -p 3 6284
flexo:~# chrt -r -p 3 6285
flexo:~# chrt -r -p 3 6286
flexo:~# chrt -p 6284
pid 6284's current scheduling policy: SCHED_RR
pid 6284's current scheduling priority: 3
flexo:~# chrt -p 6285
pid 6285's current scheduling policy: SCHED_RR
pid 6285's current scheduling priority: 3
flexo:~# chrt -p 6286
pid 6286's current scheduling policy: SCHED_RR
pid 6286's current scheduling priority: 3

And the result of the ping test now:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 899326ms
rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms

So, a _huge_ difference. But what does it mean?

It means, a scheduling problem. Can you run the latency tracer (which only works with realtime priority), so we can tell if it is (a) kvm failing to wake up the vcpu properly or (b) the scheduler delaying the vcpu from running.

P.S. Can someone tell me if I'm doing the CC: to bugme-daemon wrong? I
thought that was supposed to add the emails as comments to the
bugzilla report?

So long as it isn't complaining, you can continue.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/