Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)

From: Kevin Shanahan
Date: Tue Jan 20 2009 - 12:55:04 EST


On Tue, 2009-01-20 at 15:04 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
> >> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm
> >> and the problem went away, correct?
> >>
> >
> > Well, the I couldn't make the test conditions identical, but it the
> > problem didn't occur with the test I was able to do:
> >
> > http://marc.info/?l=linux-kernel&m=123228728416498&w=2
> >
> >
>
> Can you also try with -no-kvm-irqchip?
>
> You will need to comment out the lines
>
> /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
> * to GSI 2. GSI maps to ioapic 1-1. This is not
> * the cleanest way of doing it but it should work. */
>
> if (vector == 0)
> vector = 2;
>
> in qemu/hw/apic.c (should also fix -no-kvm smp). This will change kvm
> wakeups to use signals rather than the in-kernel code, which may be buggy.

Okay, I commented out those lines and compiled a new kvm-82 userspace
and kernel modules. Using those on a vanilla 2.6.28 kernel, with all
guests run with -no-kvm-irqchip added.

As before a number of the XP guests wanted to chug away at 100% CPU
usage for a long time. Three of the guests clocked up ~40 minutes CPU
time before I decided to just shut them down. Perhaps coincidentally,
these three guests are the only ones with Office 2003 installed on them.
That could be the difference between those guests and the other XP
guests, but that's probably not important for now.

The two Linux SMP guests booted okay this time, though they seem to only
use one CPU on the host (I guess kvm is not multi-threaded in this
mode?). "hermes-old", the guest I am pinging in all my tests, had a lot
of trouble running the apache2 setup - it was so slow it was difficult
to load a complete page from our RT system. The kvm process for this
guest was taking up 100% cpu on the host constantly and all sorts of
wierd stuff could be seen by watching top in the guest:

top - 03:44:17 up 43 min, 1 user, load average: 3.95, 1.55, 0.80
Tasks: 101 total, 4 running, 97 sleeping, 0 stopped, 0 zombie
Cpu(s): 79.7%us, 10.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 9.9%si,
0.0%st
Mem: 2075428k total, 391128k used, 1684300k free, 13044k buffers
Swap: 3502160k total, 0k used, 3502160k free, 118488k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2956 postgres 20 0 19704 11m 10m S 1658 0.6 2:55.99 postmaster
2934 www-data 20 0 60392 40m 5132 R 31 2.0 0:17.28 apache2
2958 postgres 20 0 19700 11m 9.8m R 28 0.6 0:20.41 postmaster
2940 www-data 20 0 58652 38m 5016 S 27 1.9 0:04.87 apache2
2937 www-data 20 0 60124 40m 5112 S 18 2.0 0:11.00 apache2
2959 postgres 20 0 19132 5424 4132 S 10 0.3 0:01.50 postmaster
2072 postgres 20 0 8064 1416 548 S 7 0.1 0:23.71 postmaster
2960 postgres 20 0 19132 5368 4060 R 6 0.3 0:01.55 postmaster
2071 postgres 20 0 8560 1972 488 S 5 0.1 0:08.33 postmaster

Running the ping test with without apache2 running in the guest:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 902740ms
rtt min/avg/max/mdev = 0.568/3.745/272.558/16.990 ms

And with apache2 running:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 902758ms
rtt min/avg/max/mdev = 0.625/25.634/852.739/76.586 ms

In both cases it's quite variable, but the max latency is still not as
bad as when running with the irq chip enabled.

Anyway, the test is again not ideal, but I hope we're proving something.
That's all I can do for tonight - should be ready for more again
tomorrow night.

Regards,
Kevin.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/