Re: [PATCH] kvm: x86: make lapic hrtimer pinned

From: Yang Zhang
Date: Tue Apr 05 2016 - 02:18:17 EST


On 2016/4/5 5:00, Rik van Riel wrote:
On Mon, 2016-04-04 at 16:46 -0400, Luiz Capitulino wrote:
When a vCPU runs on a nohz_full core, the hrtimer used by
the lapic emulation code can be migrated to another core.
When this happens, it's possible to observe milisecond
latency when delivering timer IRQs to KVM guests.

The huge latency is mainly due to the fact that
apic_timer_fn() expects to run during a kvm exit. It
sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
entry. However, if the timer fires on a different core,
we have to wait until the next kvm exit for the guest
to see KVM_REQ_PENDING_TIMER set.

This problem became visible after commit 9642d18ee. This
commit changed the timer migration code to always attempt
to migrate timers away from nohz_full cores. While it's
discussable if this is correct/desirable (I don't think
it is), it's clear that the lapic emulation code has
a requirement on firing the hrtimer in the same core
where it was started. This is achieved by making the
hrtimer pinned.

Given that delivering a timer to a guest seems to
involve trapping from the guest to the host, anyway,
I don't see a downside to your patch.

If that is ever changed (eg. allowing delivery of
a timer interrupt to a VCPU without trapping to the
host), we may want to revisit this.


Posted interrupt helps in this case. Currently, KVM doesn't use PI for lapic timer is due to same affinity for lapic timer and VCPU. Now, we can change to use PI for lapic timer. The only concern is what's frequency of timer migration in upstream Linux? If it is frequently, will it bring additional cost?

BTW, in what case the migration of timers during VCPU scheduling will fail?

--
best regards
yang