Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

From: Sander Eikelenboom
Date: Tue Dec 01 2015 - 17:55:59 EST


On 2015-11-30 23:54, Boris Ostrovsky wrote:
On 11/30/2015 04:46 PM, Sander Eikelenboom wrote:
On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote:
On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote:
Hi all,

I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree
pulled on top.

Running this kernel under Xen on PV-guests with multiple vcpus goes well (on
idle < 10% cpu usage),
but a guest with only a single vcpu doesn't idle at all, it seems a kworker
thread is stuck:
root 569 98.0 0.0 0 0 ? R 16:02 12:47
[kworker/0:1]

Running a 4.3 kernel works fine with a single vpcu, bisecting would probably
quite painful since there were some breakages this merge window with respect
to Xen pv-guests.

There are some differences in the diff's from booting a 4.3, 4.4-single,
4.4-multi cpu boot:

Boris has been tracking a bunch of them. I am attaching the latest set of
patches I've to carry on top of v4.4-rc3.

Hi Konrad,

i will test those, see if it fixes all my issues and report back

They shouldn't help you ;-( (and I just saw a message from you confirming this)

The first one fixes a 32-bit bug (on bare metal too). The second fixes
a fatal bug for 32-bit PV guests. The other two are code
improvements/cleanup.



Thanks :)

-- Sander

Between 4.3 and 4.4-single:

-NR_IRQS:4352 nr_irqs:32 16
+Using NULL legacy PIC
+NR_IRQS:4352 nr_irqs:32 0

This is fine, as long as you have b4ff8389ed14b849354b59ce9b360bdefcdbf99c.


-cpu 0 spinlock event irq 17
+cpu 0 spinlock event irq 1

This is strange. I wouldn't expect spinlocks to use legacy irqs.


Could it be .. that with your fixup:
xen/events: Always allocate legacy interrupts on PV guests
(b4ff8389ed14b849354b59ce9b360bdefcdbf99c)
for commit:
x86/irq: Probe for PIC presence before allocating descs for legacy IRQs
(8c058b0b9c34d8c8d7912880956543769323e2d8)

that we now have the situation described in the commit message of 8c058b0b9c, but now for Xen PV instead of
Hyper-V ?
(seems both Xen and Hyper-V want to achieve the same but have different competing implementations ?)

(BTW 8c058b0b9c has a CC for stable ... so could be destined to cause more trouble).

--
Sander



and later on:

-hctosys: unable to open rtc device (rtc0)
+rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock

+genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)
+hvc_open: request_irq failed with rc -16.
+Warning: unable to open an initial console.


between 4.4-single and 4.4-multi:

Using NULL legacy PIC
-NR_IRQS:4352 nr_irqs:32 0
+NR_IRQS:4352 nr_irqs:48 0

This is probably OK too since nr_irqs depend on number of CPUs.

I think something is messed up with IRQ. I saw last week something
from setup_irq() generating a stack dump (warninig) for rtc_cmos but
it appeared harmless at that time and now I don't see it anymore.

-boris



and later on:

-rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock
+hctosys: unable to open rtc device (rtc0)

-genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)
-hvc_open: request_irq failed with rc -16.
-Warning: unable to open an initial console.

attached:
- dmesg with 4.3 kernel with 1 vcpu
- dmesg with 4.4 kernel with 1 vpcu
- dmesg with 4.4 kernel with 2 vpcus
- .config of the 4.4 kernel is attached.

-- Sander


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/